Remove GPTQ-for-LLaMa monkey patch support

AutoGPTQ will be the preferred GPTQ LoRa loader in the future.
2023-08-09 23:59:04 -05:00 · 2023-08-09 23:59:04 -05:00 · e3d3565b2a
commit e3d3565b2a
parent bee73cedbd
6 changed files with 0 additions and 103 deletions
--- a/docs/GPTQ-models-(4-bit-mode).md
+++ b/docs/GPTQ-models-(4-bit-mode).md
@ -198,31 +198,4 @@ Output generated in 123.79 seconds (1.61 tokens/s, 199 tokens)

 You can also use multiple GPUs with `pre_layer` if using the oobabooga fork of GPTQ, eg `--pre_layer 30 60` will load a LLaMA-30B model half onto your first GPU and half onto your second, or `--pre_layer 20 40` will load 20 layers onto GPU-0, 20 layers onto GPU-1, and 20 layers offloaded to CPU.

-### Using LoRAs with GPTQ-for-LLaMa
-
-This requires using a monkey patch that is supported by this web UI: https://github.com/johnsmith0031/alpaca_lora_4bit
-
-To use it:
-
-1. Clone `johnsmith0031/alpaca_lora_4bit` into the repositories folder:
-
-```
-cd text-generation-webui/repositories
-git clone https://github.com/johnsmith0031/alpaca_lora_4bit
-```
-
-⚠️  I have tested it with the following commit specifically: `2f704b93c961bf202937b10aac9322b092afdce0`
-
-2. Install https://github.com/sterlind/GPTQ-for-LLaMa with this command:
-
-```
-pip install git+https://github.com/sterlind/GPTQ-for-LLaMa.git@lora_4bit
-```
-
-3. Start the UI with the `--monkey-patch` flag:
-
-```
-python server.py --model llama-7b-4bit-128g --listen --lora tloen_alpaca-lora-7b --monkey-patch
-```
-