transformers: add use_flash_attention_2 option (#4373)
This commit is contained in:
parent
add359379e
commit
4766a57352
6 changed files with 9 additions and 1 deletions
|
@ -300,6 +300,7 @@ Optionally, you can use the following command-line flags:
|
|||
| `--sdp-attention` | Use PyTorch 2.0's SDP attention. Same as above. |
|
||||
| `--trust-remote-code` | Set `trust_remote_code=True` while loading the model. Necessary for some models. |
|
||||
| `--use_fast` | Set `use_fast=True` while loading the tokenizer. |
|
||||
| `--use_flash_attention_2` | Set use_flash_attention_2=True while loading the model. |
|
||||
|
||||
#### Accelerate 4-bit
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue