You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now the different approaches need some benchmarks.
I made a quick test that gave a 29% speed improvement, but while changing blend instructions I also changed the byte target from 2load_HFsave8th to 1load. That speedup seems to be to a bit too good to be true, so I might have done some mistake.
The text was updated successfully, but these errors were encountered:
Peter Cordes gave some suggestions about how to speed up the implementation in the comments
to the StackOverflow question
http://stackoverflow.com/questions/15198011/how-to-load-a-sliding-diagonal-vector-from-data-stored-column-wise-withsse
I tried out replacing some blend instructions with some other faster blend instructions in this commit
4f2b2c8
Now the different approaches need some benchmarks.
I made a quick test that gave a 29% speed improvement, but while changing blend instructions I also changed the byte target from 2load_HFsave8th to 1load. That speedup seems to be to a bit too good to be true, so I might have done some mistake.
The text was updated successfully, but these errors were encountered: