ExLlama with long context (#2875)

2023-06-25 22:49:26 -03:00 · 2023-06-25 22:49:26 -03:00 · c52290de50
commit c52290de50
parent 9290c6236f
14 changed files with 22 additions and 25 deletions
--- a/README.md
+++ b/README.md
@ -266,6 +266,8 @@ Optionally, you can use the following command-line flags:
 | Flag             | Description |
 |------------------|-------------|
 |`--gpu-split`     | Comma-separated list of VRAM (in GB) to use per GPU device for model layers, e.g. `20,7,7` |
+|`--max_seq_len MAX_SEQ_LEN`           | Maximum sequence length. |
+|`--compress_pos_emb COMPRESS_POS_EMB` | Positional embeddings compression factor. Should typically be set to max_seq_len / 2048. |

 #### GPTQ-for-LLaMa