Add RoPE scaling support for transformers (including dynamic NTK)

https://github.com/huggingface/transformers/pull/24653
2023-08-08 21:24:28 -07:00 · 2023-08-08 21:24:28 -07:00 · d8fb506aff
commit d8fb506aff
parent f4caaf337a
5 changed files with 16 additions and 9 deletions
--- a/README.md
+++ b/README.md
@ -299,12 +299,12 @@ Optionally, you can use the following command-line flags:
 | `--rwkv-strategy RWKV_STRATEGY` | RWKV: The strategy to use while loading the model. Examples: "cpu fp32", "cuda fp16", "cuda fp16i8". |
 | `--rwkv-cuda-on`                | RWKV: Compile the CUDA kernel for better performance. |

-#### RoPE (for llama.cpp and ExLlama only)
+#### RoPE (for llama.cpp, ExLlama, and transformers)

 | Flag             | Description |
 |------------------|-------------|
 |`--compress_pos_emb COMPRESS_POS_EMB` | Positional embeddings compression factor. Should typically be set to max_seq_len / 2048. |
-|`--alpha_value ALPHA_VALUE`           | Positional embeddings alpha factor for NTK RoPE scaling. Scaling is not identical to embedding compression. Use either this or compress_pos_emb, not both. |
+|`--alpha_value ALPHA_VALUE`           | Positional embeddings alpha factor for NTK RoPE scaling. Use either this or compress_pos_emb, not both. |

 #### Gradio