Remove GPTQ-for-LLaMa monkey patch support
AutoGPTQ will be the preferred GPTQ LoRa loader in the future.
This commit is contained in:
parent
bee73cedbd
commit
e3d3565b2a
6 changed files with 0 additions and 103 deletions
|
|
@ -198,31 +198,4 @@ Output generated in 123.79 seconds (1.61 tokens/s, 199 tokens)
|
|||
|
||||
You can also use multiple GPUs with `pre_layer` if using the oobabooga fork of GPTQ, eg `--pre_layer 30 60` will load a LLaMA-30B model half onto your first GPU and half onto your second, or `--pre_layer 20 40` will load 20 layers onto GPU-0, 20 layers onto GPU-1, and 20 layers offloaded to CPU.
|
||||
|
||||
### Using LoRAs with GPTQ-for-LLaMa
|
||||
|
||||
This requires using a monkey patch that is supported by this web UI: https://github.com/johnsmith0031/alpaca_lora_4bit
|
||||
|
||||
To use it:
|
||||
|
||||
1. Clone `johnsmith0031/alpaca_lora_4bit` into the repositories folder:
|
||||
|
||||
```
|
||||
cd text-generation-webui/repositories
|
||||
git clone https://github.com/johnsmith0031/alpaca_lora_4bit
|
||||
```
|
||||
|
||||
⚠️ I have tested it with the following commit specifically: `2f704b93c961bf202937b10aac9322b092afdce0`
|
||||
|
||||
2. Install https://github.com/sterlind/GPTQ-for-LLaMa with this command:
|
||||
|
||||
```
|
||||
pip install git+https://github.com/sterlind/GPTQ-for-LLaMa.git@lora_4bit
|
||||
```
|
||||
|
||||
3. Start the UI with the `--monkey-patch` flag:
|
||||
|
||||
```
|
||||
python server.py --model llama-7b-4bit-128g --listen --lora tloen_alpaca-lora-7b --monkey-patch
|
||||
```
|
||||
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue