Replies: 1 comment 2 replies
-
sell those titans and one 3090 and buy a 4090. use 4090 as the main driver and the 3090 as additional memory. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey everyone, I need some help. I got my hands on two Titan RTX 24GB and also two RTX 3090 24GB cards. As far as I know the biggest differentiating factor between them is that the RTX 3090 has half rate FP16 tensor with FP32 accumulate. Which lands it almost 1/2 the performance of the Titan RTX in that area. In every other metric it wipes the floor with the Titan RTX.
The RTX 3090 is also ampere based so it supports flash attention 2 and therefore sample packing. As well as BFloat16. While the Titan RTX I had to run xformers and no sample packing.
In my testing, with this yaml configuration:
This would result in a 24-step training.
This results in these training times:
Titan RTX: 248 seconds
RTX 3090: 325 seconds
But if I enable sample packing on the RTX 3090, it can do it in one step resulting in:
RTX 3090 sample packing on: 28 seconds
I understand this is because this is a super small dataset that can be optimized to be packed and done in one step instead of 24 steps originally. But the Titan RTX is inherently faster without this optimization? Is there a way to turn on sample packing with the Titan RTX? I am contemplating which of the cards to keep and sell. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions