Remove --sdp-attention, --xformers flags (#5126)

2023-12-31 01:36:51 -03:00 · 2023-12-31 01:36:51 -03:00 · 8e397915c9
commit 8e397915c9
parent b7dd1f9542
4 changed files with 1 additions and 180 deletions
--- a/README.md
+++ b/README.md
@ -231,8 +231,6 @@ List of command-line flags
 | `--load-in-8bit`                            | Load the model with 8-bit precision (using bitsandbytes). |
 | `--bf16`                                    | Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU. |
 | `--no-cache`                                | Set `use_cache` to `False` while generating text. This reduces VRAM usage slightly, but it comes at a performance cost. |
-| `--xformers`                                | Use xformer's memory efficient attention. This is really old and probably doesn't do anything. |
-| `--sdp-attention`                           | Use PyTorch 2.0's SDP attention. Same as above. |
 | `--trust-remote-code`                       | Set `trust_remote_code=True` while loading the model. Necessary for some models. |
 | `--no_use_fast`                             | Set use_fast=False while loading the tokenizer (it's True by default). Use this if you have any problems related to use_fast. |
 | `--use_flash_attention_2`                   | Set use_flash_attention_2=True while loading the model. |