Reorganize model loading UI completely (#2720)

This commit is contained in:
oobabooga 2023-06-16 19:00:37 -03:00 committed by GitHub
parent 57be2eecdf
commit 7ef6a50e84
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
16 changed files with 365 additions and 243 deletions

View file

@ -211,6 +211,12 @@ Optionally, you can use the following command-line flags:
| `--extensions EXTENSIONS [EXTENSIONS ...]` | The list of extensions to load. If you want to load more than one extension, write the names separated by spaces. |
| `--verbose` | Print the prompts to the terminal. |
#### Model loader
| Flag | Description |
|--------------------------------------------|-------------|
| `--loader LOADER` | Choose the model loader manually, otherwise, it will get autodetected. Valid options: autogptq, gptq-for-llama, transformers, llamacpp, rwkv, flexgen |
#### Accelerate/transformers
| Flag | Description |
@ -265,7 +271,6 @@ Optionally, you can use the following command-line flags:
| Flag | Description |
|---------------------------|-------------|
| `--gptq-for-llama` | Use GPTQ-for-LLaMa to load the GPTQ model instead of AutoGPTQ. |
| `--wbits WBITS` | Load a pre-quantized model with specified precision in bits. 2, 3, 4 and 8 are supported. |
| `--model_type MODEL_TYPE` | Model type of pre-quantized model. Currently LLaMA, OPT, and GPT-J are supported. |
| `--groupsize GROUPSIZE` | Group size. |
@ -280,7 +285,6 @@ Optionally, you can use the following command-line flags:
| Flag | Description |
|------------------|-------------|
| `--flexgen` | Enable the use of FlexGen offloading. |
| `--percent PERCENT [PERCENT ...]` | FlexGen: allocation percentages. Must be 6 numbers separated by spaces (default: 0, 100, 100, 0, 100, 0). |
| `--compress-weight` | FlexGen: Whether to compress weight (default: False).|
| `--pin-weight [PIN_WEIGHT]` | FlexGen: whether to pin weights (setting this to False reduces CPU memory by 20%). |