Add threads_batch parameter

2023-10-01 21:27:04 -07:00 · 2023-10-01 21:27:04 -07:00 · b6fe6acf88
commit b6fe6acf88
parent 41a2de96e5
7 changed files with 8 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -287,6 +287,7 @@ Optionally, you can use the following command-line flags:
 | Flag        | Description |
 |-------------|-------------|
 | `--threads` | Number of threads to use. |
+| `--threads-batch THREADS_BATCH` | Number of threads to use for batches/prompt processing. |
 | `--n_batch` | Maximum number of prompt tokens to batch together when calling llama_eval. |
 | `--n-gpu-layers N_GPU_LAYERS` | Number of layers to offload to the GPU. Only works if llama-cpp-python was compiled with BLAS. Set this to 1000000000 to offload all layers to the GPU. |
 | `--n_ctx N_CTX` | Size of the prompt context. |