Bump llama-cpp-python to 0.2.18 (#4611)

This commit is contained in:
oobabooga 2023-11-16 22:55:14 -03:00 committed by GitHub
parent 61f429563e
commit 923c8e25fb
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
17 changed files with 92 additions and 174 deletions

View file

@ -325,7 +325,6 @@ Optionally, you can use the following command-line flags:
| `--mlock` | Force the system to keep the model in RAM. |
| `--n-gpu-layers N_GPU_LAYERS` | Number of layers to offload to the GPU. |
| `--tensor_split TENSOR_SPLIT` | Split the model across multiple GPUs. Comma-separated list of proportions. Example: 18,17. |
| `--llama_cpp_seed SEED` | Seed for llama-cpp models. Default is 0 (random). |
| `--numa` | Activate NUMA task allocation for llama.cpp. |
| `--logits_all`| Needs to be set for perplexity evaluation to work. Otherwise, ignore it, as it makes prompt processing slower. |
| `--cache-capacity CACHE_CAPACITY` | Maximum cache capacity (llama-cpp-python). Examples: 2000MiB, 2GiB. When provided without units, bytes will be assumed. |