added no_mmap & mlock parameters to llama.cpp and removed llamacpp_model_alternative (#1649)
--------- Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
This commit is contained in:
parent
2f1a2846d1
commit
fbcd32988e
5 changed files with 50 additions and 126 deletions
|
@ -220,8 +220,10 @@ Optionally, you can use the following command-line flags:
|
|||
|
||||
| Flag | Description |
|
||||
|-------------|-------------|
|
||||
| `--threads` | Number of threads to use in llama.cpp. |
|
||||
| `--n_batch` | Processing batch size for llama.cpp. |
|
||||
| `--threads` | Number of threads to use. |
|
||||
| `--n_batch` | Maximum number of prompt tokens to batch together when calling llama_eval. |
|
||||
| `--no-mmap` | Prevent mmap from being used. |
|
||||
| `--mlock` | Force the system to keep the model in RAM. |
|
||||
|
||||
#### GPTQ
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue