Remove GGML support
This commit is contained in:
parent
cc7b7ba153
commit
ed86878f02
15 changed files with 24 additions and 123 deletions
|
@ -158,7 +158,7 @@ text-generation-webui
|
|||
│ │ └── tokenizer.model
|
||||
```
|
||||
|
||||
* GGML/GGUF models are a single file and should be placed directly into `models`. Example:
|
||||
* GGUF models are a single file and should be placed directly into `models`. Example:
|
||||
|
||||
```
|
||||
text-generation-webui
|
||||
|
@ -260,7 +260,7 @@ Optionally, you can use the following command-line flags:
|
|||
| `--quant_type QUANT_TYPE` | quant_type for 4-bit. Valid options: nf4, fp4. |
|
||||
| `--use_double_quant` | use_double_quant for 4-bit. |
|
||||
|
||||
#### GGML/GGUF (for llama.cpp and ctransformers)
|
||||
#### GGUF (for llama.cpp and ctransformers)
|
||||
|
||||
| Flag | Description |
|
||||
|-------------|-------------|
|
||||
|
@ -279,8 +279,6 @@ Optionally, you can use the following command-line flags:
|
|||
| `--cache-capacity CACHE_CAPACITY` | Maximum cache capacity. Examples: 2000MiB, 2GiB. When provided without units, bytes will be assumed. |
|
||||
| `--tensor_split TENSOR_SPLIT` | Split the model across multiple GPUs, comma-separated list of proportions, e.g. 18,17 |
|
||||
| `--llama_cpp_seed SEED` | Seed for llama-cpp models. Default 0 (random). |
|
||||
| `--n_gqa N_GQA` | GGML only (not used by GGUF): Grouped-Query Attention. Must be 8 for llama-2 70b. |
|
||||
| `--rms_norm_eps RMS_NORM_EPS` | GGML only (not used by GGUF): 5e-6 is a good value for llama-2 models. |
|
||||
| `--cpu` | Use the CPU version of llama-cpp-python instead of the GPU-accelerated version. |
|
||||
|`--cfg-cache` | llamacpp_HF: Create an additional cache for CFG negative prompts. |
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue