ExLlama with long context (#2875)
This commit is contained in:
parent
9290c6236f
commit
c52290de50
14 changed files with 22 additions and 25 deletions
|
@ -266,6 +266,8 @@ Optionally, you can use the following command-line flags:
|
|||
| Flag | Description |
|
||||
|------------------|-------------|
|
||||
|`--gpu-split` | Comma-separated list of VRAM (in GB) to use per GPU device for model layers, e.g. `20,7,7` |
|
||||
|`--max_seq_len MAX_SEQ_LEN` | Maximum sequence length. |
|
||||
|`--compress_pos_emb COMPRESS_POS_EMB` | Positional embeddings compression factor. Should typically be set to max_seq_len / 2048. |
|
||||
|
||||
#### GPTQ-for-LLaMa
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue