Revert "Remove GPTQ-for-LLaMa monkey patch support"

This reverts commit e3d3565b2a.
This commit is contained in:
oobabooga 2023-08-10 08:39:41 -07:00
parent 16e2b117b4
commit c7f52bbdc1
6 changed files with 103 additions and 0 deletions

View file

@ -157,4 +157,31 @@ Output generated in 123.79 seconds (1.61 tokens/s, 199 tokens)
You can also use multiple GPUs with `pre_layer` if using the oobabooga fork of GPTQ, eg `--pre_layer 30 60` will load a LLaMA-30B model half onto your first GPU and half onto your second, or `--pre_layer 20 40` will load 20 layers onto GPU-0, 20 layers onto GPU-1, and 20 layers offloaded to CPU.
### Using LoRAs with GPTQ-for-LLaMa
This requires using a monkey patch that is supported by this web UI: https://github.com/johnsmith0031/alpaca_lora_4bit
To use it:
1. Clone `johnsmith0031/alpaca_lora_4bit` into the repositories folder:
```
cd text-generation-webui/repositories
git clone https://github.com/johnsmith0031/alpaca_lora_4bit
```
⚠️ I have tested it with the following commit specifically: `2f704b93c961bf202937b10aac9322b092afdce0`
2. Install https://github.com/sterlind/GPTQ-for-LLaMa with this command:
```
pip install git+https://github.com/sterlind/GPTQ-for-LLaMa.git@lora_4bit
```
3. Start the UI with the `--monkey-patch` flag:
```
python server.py --model llama-7b-4bit-128g --listen --lora tloen_alpaca-lora-7b --monkey-patch
```