added no_mmap & mlock parameters to llama.cpp and removed llamacpp_model_alternative (#1649)

--------- Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
2023-05-03 00:25:28 +03:00 · 2023-05-03 00:25:28 +03:00 · fbcd32988e
commit fbcd32988e
parent 2f1a2846d1
5 changed files with 50 additions and 126 deletions
--- a/README.md
+++ b/README.md
@ -220,8 +220,10 @@ Optionally, you can use the following command-line flags:

 | Flag        | Description |
 |-------------|-------------|
-| `--threads` | Number of threads to use in llama.cpp. |
-| `--n_batch` | Processing batch size for llama.cpp. |
+| `--threads` | Number of threads to use. |
+| `--n_batch` | Maximum number of prompt tokens to batch together when calling llama_eval. |
+| `--no-mmap` | Prevent mmap from being used. |
+| `--mlock`   | Force the system to keep the model in RAM. |

 #### GPTQ