Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Optimize array cast #2234

Closed
wants to merge 1 commit into from
Closed

Optimize array cast #2234

wants to merge 1 commit into from

Conversation

etcimon
Copy link
Contributor

@etcimon etcimon commented Jun 27, 2018

Optimization because, in most cases, the new array already has the right length, and a modulo is quite expensive...

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @etcimon! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + druntime#2234"

@n8sh
Copy link
Member

n8sh commented Jun 27, 2018

This seems like an example of D checking at runtime things that are known at compile time. No check is necessary when fsize evenly divides tsize, and this function shouldn't even be called when fsize equals tsize. A good change but it indicates further work needs to be done upstream (after which this could be reverted since that short circuit branch should never be taken).
EDIT: See next message.

@n8sh
Copy link
Member

n8sh commented Jun 27, 2018

Actually it looks like DMD is already smart enough not to do this unnecessarily. See dmd/e2ir.d#L4080-L4109:

// Convert from dynamic array to dynamic array
if (tty == Tarray && fty == Tarray)
{
    uint fsize = cast(uint)tfrom.nextOf().size();
    uint tsize = cast(uint)t.nextOf().size();


    if (fsize != tsize)
    {   // Array element sizes do not match, so we must adjust the dimensions
        if (fsize % tsize == 0)
        {
            // Set array dimension to (length * (fsize / tsize))
            // Generate pair(e.length * (fsize/tsize), es.ptr)


            elem *es = el_same(&e);


            elem *eptr = el_una(OPmsw, TYnptr, es);
            elem *elen = el_una(irs.params.is64bit ? OP128_64 : OP64_32, TYsize_t, e);
            elem *elen2 = el_bin(OPmul, TYsize_t, elen, el_long(TYsize_t, fsize / tsize));
            e = el_pair(totym(ce.type), elen2, eptr);
        }
        else
        {   // Runtime check needed in case arrays don't line up
            if (config.exe == EX_WIN64)
                e = addressElem(e, t, true);
            elem *ep = el_params(e, el_long(TYsize_t, fsize), el_long(TYsize_t, tsize), null);
            e = el_bin(OPcall, totym(ce.type), el_var(getRtlsym(RTLSYM_ARRAYCAST)), ep);
        }
    }
    goto Lret;
}

@etcimon Do you have an example where your branch would be taken?

@n8sh
Copy link
Member

n8sh commented Jun 27, 2018

Although there is still room for possible improvements. For instance, when tsize is a power of 2, both the remainder check and the division can be done much more quickly.

@etcimon
Copy link
Contributor Author

etcimon commented Jun 27, 2018

Sorry I'm trying to update my server's dmd and checking if every optimization made its way...

What I have is here:

https://github.com/etcimon/druntime/blob/2.070-custom/src/rt/arraycast.d#L26

It looks like that modulo wasn't even necessary maybe?

@n8sh
Copy link
Member

n8sh commented Jun 27, 2018

It looks like that modulo wasn't even necessary maybe?

It's not logically necessary but D specifies it as a runtime error if the length in bytes isn't the same after the cast. Perhaps the check could be replaced with:

version (D_NoBoundsChecks) {}
else version (D_BetterC) { assert(nbytes % tsize == 0, "array cast misalignment"); }
else if (nbytes % tsize != 0) { throw new Error("array cast misalignment"); }

Although the betterC branch may currently be pointless since this function is part of the D runtime.

Copy link
Member

@DmitryOlshansky DmitryOlshansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also most sensible sizes (like 99% in arrays at least!) are pow2. I’m certain you can make a fast bypass for that and gain a lot.

Most common casts I’ve seen are:
uint[] <-> ubyte[]
size_t[] <-> ubyte[]

Plus some structs that are basically single integer or pairs. All of these are power of 2, you don’t need modulo for them.

@etcimon
Copy link
Contributor Author

etcimon commented Jun 27, 2018

if (nbytes & (nbytes - 1)) == 0 && (tsize & (tsize - 1)) == 0)
  // both are a multiple of 2
  

What then?

It seems to me like it shouldn't be that big of a problem to discard the last bytes after an array cast, considering that most of the time the size is kept in another allocation reference and it wouldn't cause a memory leak.

{
throw new Error("array cast misalignment");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you squash, please remove the extra blank line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the CI fail due to the extra whitespace here.

Copy link
Member

@n8sh n8sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small improvement but no reason not to accept it.

…s said and done and performance becomes critical (When no bounds checks are asked for)
@etcimon
Copy link
Contributor Author

etcimon commented Jul 16, 2018

I remember this being a huge bottleneck in crypto/math algorithms where ubyte[]<->int is done frequently

@n8sh
Copy link
Member

n8sh commented Jul 16, 2018

In that case maybe it would make sense to add a second function optimized for the common case where division can be performed with a right shift. dmd.e2ir would need to be changed to take advantage of it.

@n8sh
Copy link
Member

n8sh commented Jul 17, 2018

On x86 and x86_64 removing the remainder check may not avoid as much work as you hope. With optimization enabled both DMD and LDC are smart enough to use a single DIV instruction to simultaneously calculate nbytes / size and nbytes % tsize. On other architectures there might be more benefit.

@n8sh
Copy link
Member

n8sh commented Jul 31, 2018

I remember this being a huge bottleneck in crypto/math algorithms where ubyte[]<->int is done frequently

A problem is that calls to _d_arraycast can't be inlined. For proof look at the ASM for the below when compiled with ldc -O3 or dmd -O -inline:

https://run.dlang.io/is/H2zTKs

import std.stdio;
int main()
{
    align(8)
    ubyte[16] a;
    ubyte[] b = a[];
    uint[] c = cast(uint[]) b;
    return c[0] + c[$ - 1];
}

@etcimon
Copy link
Contributor Author

etcimon commented Aug 2, 2018

A problem is that calls to _d_arraycast can't be inlined.

That's definitely an issue that will need to be addressed. I know the ~40 cycles on the modulo were quite intimidating (vs 1 cycle for most other math) and makes the algorithms perform better in most algorithms where you need to convert some ubyte[] to int[] for the encryption/decryption/hashing operations.

@JinShil
Copy link
Contributor

JinShil commented Aug 2, 2018

_d_arraycast should actually be implemented as a template, and the compiler should lower to that template similar to what was done for __equals, __cmp and others in object.d.

I've been working on it and basically have the fundamentals working, but I've run into a problem:

Currently runtime hooks that are neither @safe, @nogc, pure, or nothrow are being called from @safe, @nogc, pure, or nothrow contexts. This is because the lowering to the runtime hooks happens after semantic , so the compiler never checks it. If I change the runtime hook to a template, then the implementation will be subject to the semantic pass and all of it's compile-time checks and guarantees. That's good IMO, but that means I must make the implementation compatible with @safe, @nogc, pure, and nothrow, and there is no way to do that without breaking something.

For example, _d_arraycast currently throws a new Error. This is not compatible with @nogc.

  1. If I use a static Error then it is no longer compatible with pure.
  2. If I use an assert instead then that will break any user code that is currently catching the thrown Error.

I think if we can resolve that in some way, we'll have and implementation that can be inlined and better optimized. I'm awaiting a response from @WalterBright and @andralex about how to proceed.

@etcimon
Copy link
Contributor Author

etcimon commented Aug 2, 2018

It definitely sounds like there should be a @trusted for templates!

@JinShil
Copy link
Contributor

JinShil commented Aug 2, 2018

It definitely sounds like there should be a @trusted for templates!

Yes, that can be done.

@wilzbach
Copy link
Member

wilzbach commented Aug 2, 2018

If I use an assert instead then that will break any user code that is currently catching the thrown Error.

Catching Error is defined to result in undefined behavior. We don't need to cater for such code. So you can use assert?

@etcimon
Copy link
Contributor Author

etcimon commented Aug 2, 2018

Catching Error is defined to result in undefined behavior. We don't need to cater for such code. So you can use assert?

Undefined behavior quickly becomes undocumented feature

@Geod24
Copy link
Member

Geod24 commented Aug 2, 2018

Catching Error is defined to result in undefined behavior. We don't need to cater for such code. So you can use assert?

As mentioned in the other PR, I don't think so, because it would get removed in -release and thus break @safe code.
Although we could bind it to array bounds checking for those that really want to disable it.

@jacob-carlborg
Copy link
Contributor

If I use a static Error then it is no longer compatible with pure

If it's an immutable object it is. This compiles an runs [1]:

immutable error = new Error("foo");

void foo() nothrow pure @nogc @safe
{
    throw error;
}

void main()
{
    foo();
}

Although the line number will be wrong for the actual exception. But it's correct in the backtrace, so it might be ok anyway.

[1] https://run.dlang.io/is/VEF4Uv

@JinShil
Copy link
Contributor

JinShil commented Aug 2, 2018

#2264 has been submitted to enable more opportunity for optimizing array casts.

@n8sh
Copy link
Member

n8sh commented Aug 11, 2018

Closing this because it is being made obsolete by #2264 and dlang/dmd#8531.

@n8sh n8sh closed this Aug 11, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants