Add Support for Static NTK RoPE scaling for exllama/exllama_hf (#2955)
This commit is contained in:
parent
1610d5ffb2
commit
10c8c197bf
7 changed files with 18 additions and 2 deletions
|
@ -269,6 +269,7 @@ Optionally, you can use the following command-line flags:
|
|||
|`--gpu-split` | Comma-separated list of VRAM (in GB) to use per GPU device for model layers, e.g. `20,7,7` |
|
||||
|`--max_seq_len MAX_SEQ_LEN` | Maximum sequence length. |
|
||||
|`--compress_pos_emb COMPRESS_POS_EMB` | Positional embeddings compression factor. Should typically be set to max_seq_len / 2048. |
|
||||
|`--alpha_value ALPHA_VALUE` | Positional embeddings alpha factor for NTK RoPE scaling. Same as above. Use either this or compress_pos_emb, not both. `
|
||||
|
||||
#### GPTQ-for-LLaMa
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue