Implement CFG for ExLlama_HF (#3666)
This commit is contained in:
parent
2b675533f7
commit
d6934bc7bc
8 changed files with 122 additions and 26 deletions
|
@ -304,6 +304,7 @@ Optionally, you can use the following command-line flags:
|
|||
|------------------|-------------|
|
||||
|`--gpu-split` | Comma-separated list of VRAM (in GB) to use per GPU device for model layers, e.g. `20,7,7` |
|
||||
|`--max_seq_len MAX_SEQ_LEN` | Maximum sequence length. |
|
||||
|`--cfg-cache` | ExLlama_HF: Create an additional cache for CFG negative prompts. Necessary to use CFG with that loader, but not necessary for CFG with base ExLlama. |
|
||||
|
||||
#### GPTQ-for-LLaMa
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue