Add cache_8bit option

This commit is contained in:
oobabooga 2023-11-02 11:23:04 -07:00
parent 42f816312d
commit c0655475ae
7 changed files with 32 additions and 5 deletions

View file

@ -337,6 +337,7 @@ Optionally, you can use the following command-line flags:
|`--max_seq_len MAX_SEQ_LEN` | Maximum sequence length. |
|`--cfg-cache` | ExLlama_HF: Create an additional cache for CFG negative prompts. Necessary to use CFG with that loader, but not necessary for CFG with base ExLlama. |
|`--no_flash_attn` | Force flash-attention to not be used. |
|`--cache_8bit` | Use 8-bit cache to save VRAM. |
#### AutoGPTQ