Remove --sdp-attention, --xformers flags (#5126)
This commit is contained in:
parent
b7dd1f9542
commit
8e397915c9
4 changed files with 1 additions and 180 deletions
|
@ -231,8 +231,6 @@ List of command-line flags
|
|||
| `--load-in-8bit` | Load the model with 8-bit precision (using bitsandbytes). |
|
||||
| `--bf16` | Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU. |
|
||||
| `--no-cache` | Set `use_cache` to `False` while generating text. This reduces VRAM usage slightly, but it comes at a performance cost. |
|
||||
| `--xformers` | Use xformer's memory efficient attention. This is really old and probably doesn't do anything. |
|
||||
| `--sdp-attention` | Use PyTorch 2.0's SDP attention. Same as above. |
|
||||
| `--trust-remote-code` | Set `trust_remote_code=True` while loading the model. Necessary for some models. |
|
||||
| `--no_use_fast` | Set use_fast=False while loading the tokenizer (it's True by default). Use this if you have any problems related to use_fast. |
|
||||
| `--use_flash_attention_2` | Set use_flash_attention_2=True while loading the model. |
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue