PEFT LoRA training on multiple datasets in parallel #1061
Replies: 1 comment
-
|
To make sure, you want to train the different LoRA adapters all independently of each other, so not LoRA 5 on top of LoRA 4 on top of ... (incremental training)? If that is so, it would seem to me that the "naive" approach 2 should be both easiest (less custom code, less debugging) and most efficient. I would only go for 1 if either the model doesn't fit on one GPU or 2 you want some kind of incremental training. However, there is no perfect answer that always applies, as it can depend on so many factors. I would suggest to follow the general advice for training neural nets, I don't think the fact that LoRA is being used here changes the general wisdom. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, I want to find the most efficient way to train N different LoRA weight adapters separately on N different datasets / tasks. I have access to up to to 8 A100 GPUs with 40GB VRAM each and want to optimize for speed.
Right now, training just 1 dataset (dolly-15k) using peft Lora on one GPU is going well–we are using up around 80% of 1 GPU's memory.

To most efficiently train N different sets of loRA adapters on N different datasets, optimizing for speed, there are different approaches I was thinking about:
For this, I am assuming N=5.
For both of these approaches, would I need to edit Trainer directly? How would you approach this? Thank you!
Beta Was this translation helpful? Give feedback.
All reactions