Add RoPE scaling support for transformers (including dynamic NTK)

https://github.com/huggingface/transformers/pull/24653
This commit is contained in:
oobabooga 2023-08-08 21:24:28 -07:00
parent f4caaf337a
commit d8fb506aff
5 changed files with 16 additions and 9 deletions

View file

@ -299,12 +299,12 @@ Optionally, you can use the following command-line flags:
| `--rwkv-strategy RWKV_STRATEGY` | RWKV: The strategy to use while loading the model. Examples: "cpu fp32", "cuda fp16", "cuda fp16i8". |
| `--rwkv-cuda-on` | RWKV: Compile the CUDA kernel for better performance. |
#### RoPE (for llama.cpp and ExLlama only)
#### RoPE (for llama.cpp, ExLlama, and transformers)
| Flag | Description |
|------------------|-------------|
|`--compress_pos_emb COMPRESS_POS_EMB` | Positional embeddings compression factor. Should typically be set to max_seq_len / 2048. |
|`--alpha_value ALPHA_VALUE` | Positional embeddings alpha factor for NTK RoPE scaling. Scaling is not identical to embedding compression. Use either this or compress_pos_emb, not both. |
|`--alpha_value ALPHA_VALUE` | Positional embeddings alpha factor for NTK RoPE scaling. Use either this or compress_pos_emb, not both. |
#### Gradio