-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New kernel for reciprocal #643
Conversation
Signed-off-by: Magnus Lundmark <[email protected]>
Signed-off-by: Magnus Lundmark <[email protected]>
Signed-off-by: Magnus Lundmark <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an interesting kernel. I have a few comments.
My biggest worry would be to see this kernel fail randomly with some 1.0f / 1e-xx
value.
Signed-off-by: Magnus Lundmark <[email protected]>
I ran the code for some special values:
div is the _mm256_div_ps instruction So, the proposed implementation fails for special values more or less spectactularly. However, it's 20% faster for valid inputs and more accurate than _mm256_rcp_ps. _mm256_rcp_ps is 10% faster still but not as accurate in the valid domain. |
I did some more experiments and tried to add checks for nan, 0.0 and inf. Turns out it's not worth it and then it's better to just use div_ps() So, 30% faster but
|
How do we prevent that this kernel is constantly causing random CI failures? Values close to Also, the |
Values close to 0.f are no problem, it's only a very slight problem for subnormals where there's decreased accuracy.
I will investigate. |
New kernel that calculates the reciprocal: