Remove flexgen 2
This commit is contained in:
parent
75c2dd38cf
commit
77d2e9f060
4 changed files with 1 additions and 16 deletions
10
README.md
10
README.md
|
@ -178,7 +178,7 @@ Optionally, you can use the following command-line flags:
|
|||
|
||||
| Flag | Description |
|
||||
|--------------------------------------------|-------------|
|
||||
| `--loader LOADER` | Choose the model loader manually, otherwise, it will get autodetected. Valid options: transformers, autogptq, gptq-for-llama, exllama, exllama_hf, llamacpp, rwkv, flexgen |
|
||||
| `--loader LOADER` | Choose the model loader manually, otherwise, it will get autodetected. Valid options: transformers, autogptq, gptq-for-llama, exllama, exllama_hf, llamacpp, rwkv |
|
||||
|
||||
#### Accelerate/transformers
|
||||
|
||||
|
@ -255,14 +255,6 @@ Optionally, you can use the following command-line flags:
|
|||
| `--warmup_autotune` | (triton) Enable warmup autotune. |
|
||||
| `--fused_mlp` | (triton) Enable fused mlp. |
|
||||
|
||||
#### FlexGen
|
||||
|
||||
| Flag | Description |
|
||||
|------------------|-------------|
|
||||
| `--percent PERCENT [PERCENT ...]` | FlexGen: allocation percentages. Must be 6 numbers separated by spaces (default: 0, 100, 100, 0, 100, 0). |
|
||||
| `--compress-weight` | FlexGen: Whether to compress weight (default: False).|
|
||||
| `--pin-weight [PIN_WEIGHT]` | FlexGen: whether to pin weights (setting this to False reduces CPU memory by 20%). |
|
||||
|
||||
#### DeepSpeed
|
||||
|
||||
| Flag | Description |
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue