Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try out different blend instructions for possible speedup #4

Open
eriksjolund opened this issue Nov 19, 2016 · 0 comments
Open

Try out different blend instructions for possible speedup #4

eriksjolund opened this issue Nov 19, 2016 · 0 comments

Comments

@eriksjolund
Copy link
Owner

Peter Cordes gave some suggestions about how to speed up the implementation in the comments
to the StackOverflow question

http://stackoverflow.com/questions/15198011/how-to-load-a-sliding-diagonal-vector-from-data-stored-column-wise-withsse

I tried out replacing some blend instructions with some other faster blend instructions in this commit

4f2b2c8

Now the different approaches need some benchmarks.
I made a quick test that gave a 29% speed improvement, but while changing blend instructions I also changed the byte target from 2load_HFsave8th to 1load. That speedup seems to be to a bit too good to be true, so I might have done some mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant