Try out different blend instructions for possible speedup #4

eriksjolund · 2016-11-19T10:54:06Z

Peter Cordes gave some suggestions about how to speed up the implementation in the comments
to the StackOverflow question

http://stackoverflow.com/questions/15198011/how-to-load-a-sliding-diagonal-vector-from-data-stored-column-wise-withsse

I tried out replacing some blend instructions with some other faster blend instructions in this commit

4f2b2c8

Now the different approaches need some benchmarks.
I made a quick test that gave a 29% speed improvement, but while changing blend instructions I also changed the byte target from 2load_HFsave8th to 1load. That speedup seems to be to a bit too good to be true, so I might have done some mistake.

The text was updated successfully, but these errors were encountered:

eriksjolund added enhancement performance labels Nov 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try out different blend instructions for possible speedup #4

Try out different blend instructions for possible speedup #4

eriksjolund commented Nov 19, 2016

Try out different blend instructions for possible speedup #4

Try out different blend instructions for possible speedup #4

Comments

eriksjolund commented Nov 19, 2016