Add --num_experts_per_token parameter (ExLlamav2) (#4955)
This commit is contained in:
parent
12690d3ffc
commit
f1f2c4c3f4
7 changed files with 28 additions and 20 deletions
|
@ -274,6 +274,7 @@ List of command-line flags
|
|||
|`--cfg-cache` | ExLlama_HF: Create an additional cache for CFG negative prompts. Necessary to use CFG with that loader, but not necessary for CFG with base ExLlama. |
|
||||
|`--no_flash_attn` | Force flash-attention to not be used. |
|
||||
|`--cache_8bit` | Use 8-bit cache to save VRAM. |
|
||||
|`--num_experts_per_token NUM_EXPERTS_PER_TOKEN` | Number of experts to use for generation. Applies to MoE models like Mixtral. |
|
||||
|
||||
#### AutoGPTQ
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue