Bump llama-cpp-python
This commit is contained in:
parent
3e7c624f8e
commit
6170b5ba31
7 changed files with 21 additions and 2 deletions
|
|
@ -262,6 +262,7 @@ Optionally, you can use the following command-line flags:
|
|||
| `--no-mmap` | Prevent mmap from being used. |
|
||||
| `--mlock` | Force the system to keep the model in RAM. |
|
||||
| `--cache-capacity CACHE_CAPACITY` | Maximum cache capacity. Examples: 2000MiB, 2GiB. When provided without units, bytes will be assumed. |
|
||||
| `--tensor_split TENSOR_SPLIT` | Split the model across multiple GPUs, comma-separated list of proportions, e.g. 18,17 |
|
||||
| `--llama_cpp_seed SEED` | Seed for llama-cpp models. Default 0 (random). |
|
||||
| `--n_gqa N_GQA` | grouped-query attention. Must be 8 for llama-2 70b. |
|
||||
| `--rms_norm_eps RMS_NORM_EPS` | 5e-6 is a good value for llama-2 models. |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue