Revert "Bump llama-cpp-python to 0.2.18 (#4611)"

This reverts commit 923c8e25fb.
2023-11-17 05:14:25 -08:00 · 2023-11-17 05:14:25 -08:00 · 9d6f79db74
commit 9d6f79db74
parent e0a7cc5e0f
17 changed files with 174 additions and 92 deletions
--- a/README.md
+++ b/README.md
@ -325,6 +325,7 @@ Optionally, you can use the following command-line flags:
 | `--mlock`     | Force the system to keep the model in RAM. |
 | `--n-gpu-layers N_GPU_LAYERS` | Number of layers to offload to the GPU. |
 | `--tensor_split TENSOR_SPLIT`       | Split the model across multiple GPUs. Comma-separated list of proportions. Example: 18,17. |
+| `--llama_cpp_seed SEED`             | Seed for llama-cpp models. Default is 0 (random). |
 | `--numa`      | Activate NUMA task allocation for llama.cpp. |
 | `--logits_all`| Needs to be set for perplexity evaluation to work. Otherwise, ignore it, as it makes prompt processing slower. |
 | `--cache-capacity CACHE_CAPACITY`   | Maximum cache capacity (llama-cpp-python). Examples: 2000MiB, 2GiB. When provided without units, bytes will be assumed. |