added no_mmap & mlock parameters to llama.cpp and removed llamacpp_model_alternative (#1649)

---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
This commit is contained in:
Ahmed Said 2023-05-03 00:25:28 +03:00 committed by GitHub
parent 2f1a2846d1
commit fbcd32988e
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
5 changed files with 50 additions and 126 deletions

View file

@ -220,8 +220,10 @@ Optionally, you can use the following command-line flags:
| Flag | Description |
|-------------|-------------|
| `--threads` | Number of threads to use in llama.cpp. |
| `--n_batch` | Processing batch size for llama.cpp. |
| `--threads` | Number of threads to use. |
| `--n_batch` | Maximum number of prompt tokens to batch together when calling llama_eval. |
| `--no-mmap` | Prevent mmap from being used. |
| `--mlock` | Force the system to keep the model in RAM. |
#### GPTQ