Add docs for performance optimizations

2023-06-09 00:45:49 -03:00 · 2023-06-09 00:45:49 -03:00 · c333e4c906
commit c333e4c906
parent c6552785af
3 changed files with 71 additions and 0 deletions
--- a/docs/Performance-optimizations.md
+++ b/docs/Performance-optimizations.md
@ -0,0 +1,48 @@
+# Performance optimizations
+
+In order to get the highest possible performance for your hardware, you can try compiling the following 3 backends manually instead of relying on the pre-compiled binaries that are part of `requirements.txt`:
+
+* AutoGPTQ (the default GPTQ loader)
+* GPTQ-for-LLaMa (secondary GPTQ loader)
+* llama-cpp-python
+
+If you go this route, you should update the Python requirements for the webui in the future with
+
+```
+pip install -r requirements-minimal.txt --upgrade
+```
+
+and then install the up-to-date backends using the commands below. The file `requirements-minimal.txt` contains the all requirements except for the pre-compiled wheels for GPTQ and llama-cpp-python.
+
+## AutoGPTQ
+
+```
+conda activate textgen
+pip uninstall auto-gptq -i
+git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
+pip install .
+```
+
+## GPTQ-for-LLaMa
+
+```
+conda activate textgen
+pip uninstall quant-cuda -y
+cd text-generation-webui/repositories
+rm -r GPTQ-for-LLaMa
+git clone https://github.com/oobabooga/GPTQ-for-LLaMa
+cd GPTQ-for-LLaMa
+python setup_cuda.py install
+```
+
+## llama-cpp-python
+
+If you do not have a GPU:
+
+```
+conda activate textgen
+pip uninstall -y llama-cpp-python
+pip install llama-cpp-python
+```
+
+If you have a GPU, use the commands here instead: [llama.cpp-models.md#gpu-acceleration](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-acceleration)