Remove --sdp-attention, --xformers flags (#5126)

This commit is contained in:
oobabooga 2023-12-31 01:36:51 -03:00 committed by GitHub
parent b7dd1f9542
commit 8e397915c9
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 1 additions and 180 deletions

View file

@ -231,8 +231,6 @@ List of command-line flags
| `--load-in-8bit` | Load the model with 8-bit precision (using bitsandbytes). |
| `--bf16` | Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU. |
| `--no-cache` | Set `use_cache` to `False` while generating text. This reduces VRAM usage slightly, but it comes at a performance cost. |
| `--xformers` | Use xformer's memory efficient attention. This is really old and probably doesn't do anything. |
| `--sdp-attention` | Use PyTorch 2.0's SDP attention. Same as above. |
| `--trust-remote-code` | Set `trust_remote_code=True` while loading the model. Necessary for some models. |
| `--no_use_fast` | Set use_fast=False while loading the tokenizer (it's True by default). Use this if you have any problems related to use_fast. |
| `--use_flash_attention_2` | Set use_flash_attention_2=True while loading the model. |