ExLlama with long context (#2875)

This commit is contained in:
oobabooga 2023-06-25 22:49:26 -03:00 committed by GitHub
parent 9290c6236f
commit c52290de50
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
14 changed files with 22 additions and 25 deletions

View file

@ -266,6 +266,8 @@ Optionally, you can use the following command-line flags:
| Flag | Description |
|------------------|-------------|
|`--gpu-split` | Comma-separated list of VRAM (in GB) to use per GPU device for model layers, e.g. `20,7,7` |
|`--max_seq_len MAX_SEQ_LEN` | Maximum sequence length. |
|`--compress_pos_emb COMPRESS_POS_EMB` | Positional embeddings compression factor. Should typically be set to max_seq_len / 2048. |
#### GPTQ-for-LLaMa