Add Support for Static NTK RoPE scaling for exllama/exllama_hf (#2955)

This commit is contained in:
Panchovix 2023-07-04 00:13:16 -04:00 committed by GitHub
parent 1610d5ffb2
commit 10c8c197bf
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
7 changed files with 18 additions and 2 deletions

View file

@ -269,6 +269,7 @@ Optionally, you can use the following command-line flags:
|`--gpu-split` | Comma-separated list of VRAM (in GB) to use per GPU device for model layers, e.g. `20,7,7` |
|`--max_seq_len MAX_SEQ_LEN` | Maximum sequence length. |
|`--compress_pos_emb COMPRESS_POS_EMB` | Positional embeddings compression factor. Should typically be set to max_seq_len / 2048. |
|`--alpha_value ALPHA_VALUE` | Positional embeddings alpha factor for NTK RoPE scaling. Same as above. Use either this or compress_pos_emb, not both. `
#### GPTQ-for-LLaMa