Optimize array cast #2234

etcimon · 2018-06-27T00:20:57Z

Optimization because, in most cases, the new array already has the right length, and a modulo is quite expensive...

dlang-bot · 2018-06-27T00:20:58Z

Thanks for your pull request and interest in making D better, @etcimon! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + druntime#2234"

n8sh · 2018-06-27T01:28:05Z

This seems like an example of D checking at runtime things that are known at compile time. No check is necessary when fsize evenly divides tsize, and this function shouldn't even be called when fsize equals tsize. A good change but it indicates further work needs to be done upstream (after which this could be reverted since that short circuit branch should never be taken).
EDIT: See next message.

n8sh · 2018-06-27T03:14:25Z

Actually it looks like DMD is already smart enough not to do this unnecessarily. See dmd/e2ir.d#L4080-L4109:

// Convert from dynamic array to dynamic array
if (tty == Tarray && fty == Tarray)
{
    uint fsize = cast(uint)tfrom.nextOf().size();
    uint tsize = cast(uint)t.nextOf().size();


    if (fsize != tsize)
    {   // Array element sizes do not match, so we must adjust the dimensions
        if (fsize % tsize == 0)
        {
            // Set array dimension to (length * (fsize / tsize))
            // Generate pair(e.length * (fsize/tsize), es.ptr)


            elem *es = el_same(&e);


            elem *eptr = el_una(OPmsw, TYnptr, es);
            elem *elen = el_una(irs.params.is64bit ? OP128_64 : OP64_32, TYsize_t, e);
            elem *elen2 = el_bin(OPmul, TYsize_t, elen, el_long(TYsize_t, fsize / tsize));
            e = el_pair(totym(ce.type), elen2, eptr);
        }
        else
        {   // Runtime check needed in case arrays don't line up
            if (config.exe == EX_WIN64)
                e = addressElem(e, t, true);
            elem *ep = el_params(e, el_long(TYsize_t, fsize), el_long(TYsize_t, tsize), null);
            e = el_bin(OPcall, totym(ce.type), el_var(getRtlsym(RTLSYM_ARRAYCAST)), ep);
        }
    }
    goto Lret;
}

@etcimon Do you have an example where your branch would be taken?

n8sh · 2018-06-27T03:20:41Z

Although there is still room for possible improvements. For instance, when tsize is a power of 2, both the remainder check and the division can be done much more quickly.

etcimon · 2018-06-27T03:36:00Z

Sorry I'm trying to update my server's dmd and checking if every optimization made its way...

What I have is here:

https://github.com/etcimon/druntime/blob/2.070-custom/src/rt/arraycast.d#L26

It looks like that modulo wasn't even necessary maybe?

n8sh · 2018-06-27T06:41:39Z

It looks like that modulo wasn't even necessary maybe?

It's not logically necessary but D specifies it as a runtime error if the length in bytes isn't the same after the cast. Perhaps the check could be replaced with:

version (D_NoBoundsChecks) {}
else version (D_BetterC) { assert(nbytes % tsize == 0, "array cast misalignment"); }
else if (nbytes % tsize != 0) { throw new Error("array cast misalignment"); }

Although the betterC branch may currently be pointless since this function is part of the D runtime.

DmitryOlshansky

Also most sensible sizes (like 99% in arrays at least!) are pow2. I’m certain you can make a fast bypass for that and gain a lot.

Most common casts I’ve seen are:
uint[] <-> ubyte[]
size_t[] <-> ubyte[]

Plus some structs that are basically single integer or pairs. All of these are power of 2, you don’t need modulo for them.

etcimon · 2018-06-27T15:21:49Z

if (nbytes & (nbytes - 1)) == 0 && (tsize & (tsize - 1)) == 0)
  // both are a multiple of 2

What then?

It seems to me like it shouldn't be that big of a problem to discard the last bytes after an array cast, considering that most of the time the size is kept in another allocation reference and it wouldn't cause a memory leak.

n8sh · 2018-07-11T04:24:47Z

src/rt/arraycast.d

    {
        throw new Error("array cast misalignment");
    }
+


When you squash, please remove the extra blank line.

And the CI fail due to the extra whitespace here.

n8sh

A small improvement but no reason not to accept it.

…s said and done and performance becomes critical (When no bounds checks are asked for)

etcimon · 2018-07-16T18:25:55Z

I remember this being a huge bottleneck in crypto/math algorithms where ubyte[]<->int is done frequently

n8sh · 2018-07-16T19:52:20Z

In that case maybe it would make sense to add a second function optimized for the common case where division can be performed with a right shift. dmd.e2ir would need to be changed to take advantage of it.

n8sh · 2018-07-17T01:08:45Z

On x86 and x86_64 removing the remainder check may not avoid as much work as you hope. With optimization enabled both DMD and LDC are smart enough to use a single DIV instruction to simultaneously calculate nbytes / size and nbytes % tsize. On other architectures there might be more benefit.

n8sh · 2018-07-31T09:52:06Z

I remember this being a huge bottleneck in crypto/math algorithms where ubyte[]<->int is done frequently

A problem is that calls to _d_arraycast can't be inlined. For proof look at the ASM for the below when compiled with ldc -O3 or dmd -O -inline:

https://run.dlang.io/is/H2zTKs

import std.stdio;
int main()
{
    align(8)
    ubyte[16] a;
    ubyte[] b = a[];
    uint[] c = cast(uint[]) b;
    return c[0] + c[$ - 1];
}

etcimon · 2018-08-02T01:14:40Z

A problem is that calls to _d_arraycast can't be inlined.

That's definitely an issue that will need to be addressed. I know the ~40 cycles on the modulo were quite intimidating (vs 1 cycle for most other math) and makes the algorithms perform better in most algorithms where you need to convert some ubyte[] to int[] for the encryption/decryption/hashing operations.

JinShil · 2018-08-02T01:38:49Z

_d_arraycast should actually be implemented as a template, and the compiler should lower to that template similar to what was done for __equals, __cmp and others in object.d.

I've been working on it and basically have the fundamentals working, but I've run into a problem:

Currently runtime hooks that are neither @safe, @nogc, pure, or nothrow are being called from @safe, @nogc, pure, or nothrow contexts. This is because the lowering to the runtime hooks happens after semantic , so the compiler never checks it. If I change the runtime hook to a template, then the implementation will be subject to the semantic pass and all of it's compile-time checks and guarantees. That's good IMO, but that means I must make the implementation compatible with @safe, @nogc, pure, and nothrow, and there is no way to do that without breaking something.

For example, _d_arraycast currently throws a new Error. This is not compatible with @nogc.

If I use a static Error then it is no longer compatible with pure.
If I use an assert instead then that will break any user code that is currently catching the thrown Error.

I think if we can resolve that in some way, we'll have and implementation that can be inlined and better optimized. I'm awaiting a response from @WalterBright and @andralex about how to proceed.

etcimon · 2018-08-02T02:09:32Z

It definitely sounds like there should be a @trusted for templates!

JinShil · 2018-08-02T02:25:37Z

It definitely sounds like there should be a @trusted for templates!

Yes, that can be done.

wilzbach · 2018-08-02T02:38:19Z

If I use an assert instead then that will break any user code that is currently catching the thrown Error.

Catching Error is defined to result in undefined behavior. We don't need to cater for such code. So you can use assert?

etcimon · 2018-08-02T03:38:06Z

Catching Error is defined to result in undefined behavior. We don't need to cater for such code. So you can use assert?

Undefined behavior quickly becomes undocumented feature

Geod24 · 2018-08-02T04:03:59Z

Catching Error is defined to result in undefined behavior. We don't need to cater for such code. So you can use assert?

As mentioned in the other PR, I don't think so, because it would get removed in -release and thus break @safe code.
Although we could bind it to array bounds checking for those that really want to disable it.

jacob-carlborg · 2018-08-02T06:51:19Z

If I use a static Error then it is no longer compatible with pure

If it's an immutable object it is. This compiles an runs [1]:

immutable error = new Error("foo");

void foo() nothrow pure @nogc @safe
{
    throw error;
}

void main()
{
    foo();
}

Although the line number will be wrong for the actual exception. But it's correct in the backtrace, so it might be ok anyway.

[1] https://run.dlang.io/is/VEF4Uv

JinShil · 2018-08-02T08:05:33Z

#2264 has been submitted to enable more opportunity for optimizing array casts.

n8sh · 2018-08-11T22:48:55Z

Closing this because it is being made obsolete by #2264 and dlang/dmd#8531.

etcimon requested review from andralex and wilzbach as code owners June 27, 2018 00:20

JinShil approved these changes Jun 27, 2018

View reviewed changes

DmitryOlshansky suggested changes Jun 27, 2018

View reviewed changes

n8sh reviewed Jul 11, 2018

View reviewed changes

n8sh approved these changes Jul 11, 2018

View reviewed changes

Avoid costly misalignment checks during array cast when all testing i…

181ba25

…s said and done and performance becomes critical (When no bounds checks are asked for)

JinShil mentioned this pull request Aug 2, 2018

Convert _d_arraycast to template #2264

Merged

n8sh closed this Aug 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize array cast #2234

Optimize array cast #2234

etcimon commented Jun 27, 2018

dlang-bot commented Jun 27, 2018

n8sh commented Jun 27, 2018 •

edited

Loading

n8sh commented Jun 27, 2018

n8sh commented Jun 27, 2018

etcimon commented Jun 27, 2018

n8sh commented Jun 27, 2018 •

edited

Loading

DmitryOlshansky left a comment

etcimon commented Jun 27, 2018

n8sh Jul 11, 2018

wilzbach Jul 16, 2018

n8sh left a comment

etcimon commented Jul 16, 2018

n8sh commented Jul 16, 2018

n8sh commented Jul 17, 2018

n8sh commented Jul 31, 2018

etcimon commented Aug 2, 2018

JinShil commented Aug 2, 2018

etcimon commented Aug 2, 2018 •

edited

Loading

JinShil commented Aug 2, 2018

wilzbach commented Aug 2, 2018

etcimon commented Aug 2, 2018

Geod24 commented Aug 2, 2018

jacob-carlborg commented Aug 2, 2018

JinShil commented Aug 2, 2018

n8sh commented Aug 11, 2018

Optimize array cast #2234

Optimize array cast #2234

Conversation

etcimon commented Jun 27, 2018

dlang-bot commented Jun 27, 2018

Bugzilla references

Testing this PR locally

n8sh commented Jun 27, 2018 • edited Loading

n8sh commented Jun 27, 2018

n8sh commented Jun 27, 2018

etcimon commented Jun 27, 2018

n8sh commented Jun 27, 2018 • edited Loading

DmitryOlshansky left a comment

Choose a reason for hiding this comment

etcimon commented Jun 27, 2018

n8sh Jul 11, 2018

Choose a reason for hiding this comment

wilzbach Jul 16, 2018

Choose a reason for hiding this comment

n8sh left a comment

Choose a reason for hiding this comment

etcimon commented Jul 16, 2018

n8sh commented Jul 16, 2018

n8sh commented Jul 17, 2018

n8sh commented Jul 31, 2018

etcimon commented Aug 2, 2018

JinShil commented Aug 2, 2018

etcimon commented Aug 2, 2018 • edited Loading

JinShil commented Aug 2, 2018

wilzbach commented Aug 2, 2018

etcimon commented Aug 2, 2018

Geod24 commented Aug 2, 2018

jacob-carlborg commented Aug 2, 2018

JinShil commented Aug 2, 2018

n8sh commented Aug 11, 2018

n8sh commented Jun 27, 2018 •

edited

Loading

n8sh commented Jun 27, 2018 •

edited

Loading

etcimon commented Aug 2, 2018 •

edited

Loading