AutoGPTQ: Add --disable_exllamav2 flag (Mixtral CPU offloading needs this)
This commit is contained in:
parent
7de10f4c8e
commit
3bbf6c601d
7 changed files with 16 additions and 4 deletions
|
@ -285,6 +285,7 @@ List of command-line flags
|
|||
| `--no_use_cuda_fp16` | This can make models faster on some systems. |
|
||||
| `--desc_act` | For models that don't have a quantize_config.json, this parameter is used to define whether to set desc_act or not in BaseQuantizeConfig. |
|
||||
| `--disable_exllama` | Disable ExLlama kernel, which can improve inference speed on some systems. |
|
||||
| `--disable_exllamav2` | Disable ExLlamav2 kernel. |
|
||||
|
||||
#### GPTQ-for-LLaMa
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue