Training_PRO extension - added target selector (#3969)

This commit is contained in:
FartyPants 2023-09-17 16:00:00 -04:00 committed by GitHub
parent d71465708c
commit 230b562d53
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 40 additions and 8 deletions

View file

@ -9,3 +9,19 @@ This is an expanded Training tab
- adding EOS to each block or to hard cut only
- automatically lowers gradient accumulation if you go overboard and set gradient accumulation that will be higher than actual data - transformers would then throw error (or they used to, not sure if still true) but in any way, it will fix bad data
- turn BOS on and OFF
- target selector
###Notes:
This uses it's own chunking code for raw text based on sentence splitting. This will avoid weird cuts in the chunks and each chunk should now start with sentence and end on some sentence. It works hand in hand with Hard Cut.
A propper use is to structure your text into logical blocks (ideas) separated by three \n then use three \n in hard cut.
This way each chunk will contain only one flow of ideas and not derail in the thoughts.
And Overlapping code will create overlapped blocks on sentence basis too, but not cross hard cut, thus not cross different ideas either.
Does it make any sense? No? Hmmmm...
###Targets
Normal LORA is q, v and that's what you should use.
You can use (q k v o) or (q k v) and it will give you a lot more trainable parameters. The benefit is that you can keep rank lower and still attain the same coherency as q v with high rank. Guanaco has been trained with QLORA and q k v o for example and they swear by it.
I also added k-v-down which is lifted from IA3, which is very odd one to use for LORA, but it created adorable style craziness when training on raw structured text and bringing the loss all the way down to 1.1 . It didn't overfit (q-v would be just writing entire novels at loss 1.1) and it followed the instruction seeping from the previous fine-tuning. YMMW of course.
Using All will train all 7 targets q-k-v-o-up,down, gate - not sure if there is much benefit from attention only qkvo. It sure makes LORA huge. If that's what you like.