Rename additive_repetition_penalty to presence_penalty, add frequency_penalty (#4376)

2023-10-25 10:10:28 -05:00 · 2023-10-25 10:10:28 -05:00 · 72f6fc6923
commit 72f6fc6923
parent ef1489cd4d
14 changed files with 64 additions and 30 deletions
--- a/docs/03
+++ b/docs/03
@ -35,7 +35,8 @@ For more information about the parameters, the [transformers documentation](http
 * **top_p**: If not set to 1, select tokens with probabilities adding up to less than this number. Higher value = higher range of possible random results.
 * **top_k**: Similar to top_p, but select instead only the top_k most likely tokens. Higher value = higher range of possible random results.
 * **repetition_penalty**: Penalty factor for repeating prior tokens. 1 means no penalty, higher value = less repetition, lower value = more repetition.
-* **additive_repetition_penalty**: Similar to repetition_penalty, but with an additive offset on the raw token scores instead of a multiplicative factor. It may generate better results. 0 means no penalty, higher value = less repetition, lower value = more repetition.
+* **presence_penalty**: Similar to repetition_penalty, but with an additive offset on the raw token scores instead of a multiplicative factor. It may generate better results. 0 means no penalty, higher value = less repetition, lower value = more repetition. Previously called "additive_repetition_penalty".
+* **frequency_penalty**: Repetition penalty that scales based on how many times the token has appeared in the context. Be careful with this; there's no limit to how much a token can be penalized.
 * **repetition_penalty_range**: The number of most recent tokens to consider for repetition penalty. 0 makes all tokens be used.
 * **typical_p**: If not set to 1, select only tokens that are at least this much more likely to appear than random tokens, given the prior text.
 * **tfs**: Tries to detect a tail of low-probability tokens in the distribution and removes those tokens. See [this blog post](https://www.trentonbricken.com/Tail-Free-Sampling/) for details. The closer to 0, the more discarded tokens.