Jonathan Yankovich a1ca1c04a1

Add details for configuring exllama

2023-06-16 23:46:25 -03:00

700 B

Raw Blame History

ExLlama

About

ExLlama is an extremely optimized GPTQ backend ("loader") for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code.

Installation:

Clone the ExLlama repository into your text-generation-webui/repositories folder:

mkdir repositories
cd repositories
git clone https://github.com/turboderp/exllama

Follow the remaining set up instructions in the official README: https://github.com/turboderp/exllama#exllama
Configure text-generation-webui to use exllama via the UI or command line:
- In the "Model" tab, set "Loader" to "exllama"
- Specify --loader exllama on the command line

700 B Raw Blame History

ExLlama

About

Installation:

700 B

Raw Blame History