transformers: add use_flash_attention_2 option (#4373)

2023-11-05 00:59:33 +08:00 · 2023-11-05 00:59:33 +08:00 · 4766a57352
commit 4766a57352
parent add359379e
6 changed files with 9 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -300,6 +300,7 @@ Optionally, you can use the following command-line flags:
 | `--sdp-attention`                           | Use PyTorch 2.0's SDP attention. Same as above. |
 | `--trust-remote-code`                       | Set `trust_remote_code=True` while loading the model. Necessary for some models. |
 | `--use_fast`                                | Set `use_fast=True` while loading the tokenizer. |
+| `--use_flash_attention_2`                   | Set use_flash_attention_2=True while loading the model. |

 #### Accelerate 4-bit