Add Support for Static NTK RoPE scaling for exllama/exllama_hf (#2955)

2023-07-04 00:13:16 -04:00 · 2023-07-04 00:13:16 -04:00 · 10c8c197bf
commit 10c8c197bf
parent 1610d5ffb2
7 changed files with 18 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -269,6 +269,7 @@ Optionally, you can use the following command-line flags:
 |`--gpu-split`     | Comma-separated list of VRAM (in GB) to use per GPU device for model layers, e.g. `20,7,7` |
 |`--max_seq_len MAX_SEQ_LEN`           | Maximum sequence length. |
 |`--compress_pos_emb COMPRESS_POS_EMB` | Positional embeddings compression factor. Should typically be set to max_seq_len / 2048. |
+|`--alpha_value ALPHA_VALUE`           | Positional embeddings alpha factor for NTK RoPE scaling. Same as above. Use either this or compress_pos_emb, not both. `

 #### GPTQ-for-LLaMa