make ceylon.language::sum fast #728

jvasileff · 2015-08-13T20:07:03Z

This implementation avoids using a boxed type for the sum accumulator variable when summing Integers and Floats.

If you want to merge this, I can also make the change for product.

Benchmark Script

shared void sumTest() {
    for (size in [1, 2, 4, 8, 32, 128, 256, 1024]) {
        value ints = (1..size).collect(identity);
        value floats = (1..size).collect(Integer.float);
        value repeat = 2^27 / size;

        benchmark {
            scale = repeat * size  / 10;
            warmupIter = 3;
            benchIter = 5;
            ["lang::sum(ints); size=``size``:", () => (0:repeat).each((_) => sum(ints))],
            ["math::sum(ints); size=``size``:", () => (0:repeat).each((_) => mathSum(ints))],
            ["optimized(ints); size=``size``:", () => (0:repeat).each((_) => sumNew4(ints))]
        };

        benchmark {
            scale = repeat*size  / 10;
            warmupIter = 3;
            benchIter = 5;
            ["lang::sum(floats); size=``size``:", () => (0:repeat).each((_) => sum(floats))],
            ["math::sum(floats); size=``size``:", () => (0:repeat).each((_) => mathSumF(floats))],
            ["optimized(floats); size=``size``:", () => (0:repeat).each((_) => sumNew4(floats))]
        };
    }
}

Integer Results

Summary min/max/avg/rstddev/pct
math::sum(ints); size=1: 116/124/118/3% (100%)
optimized(ints); size=1: 120/127/123/2% (103%)
lang::sum(ints); size=1: 139/141/140/0% (119%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=2: 65/69/66/1% (100%)
lang::sum(ints); size=2: 69/71/70/1% (105%)
math::sum(ints); size=2: 73/76/74/1% (112%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=4: 47/47/47/0% (100%)
math::sum(ints); size=4: 52/53/52/1% (109%)
lang::sum(ints); size=4: 57/59/58/1% (121%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=8: 37/38/37/1% (100%)
math::sum(ints); size=8: 41/42/41/0% (111%)
lang::sum(ints); size=8: 53/55/54/1% (143%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=32: 32/32/32/0% (100%)
math::sum(ints); size=32: 33/37/34/4% (103%)
lang::sum(ints); size=32: 55/56/56/0% (172%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=128: 29/33/30/5% (100%)
math::sum(ints); size=128: 30/31/30/0% (105%)
lang::sum(ints); size=128: 50/52/51/1% (173%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=256: 29/33/31/4% (99%)
math::sum(ints); size=256: 30/35/32/6% (102%)
lang::sum(ints); size=256: 51/57/55/4% (173%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=1024: 29/30/29/2% (100%)
math::sum(ints); size=1024: 29/30/30/0% (102%)
lang::sum(ints); size=1024: 51/52/51/1% (175%)

Float Results

Summary min/max/avg/rstddev/pct
lang::sum(floats); size=1: 92/97/95/1% (100%)
optimized(floats); size=1: 126/130/128/1% (135%)
math::sum(floats); size=1: 149/154/152/1% (160%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=2: 68/70/69/1% (100%)
lang::sum(floats); size=2: 68/72/70/2% (100%)
math::sum(floats); size=2: 92/94/93/0% (134%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=4: 50/52/51/2% (100%)
lang::sum(floats); size=4: 59/63/61/3% (117%)
math::sum(floats); size=4: 63/66/65/1% (127%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=8: 39/40/40/0% (100%)
math::sum(floats); size=8: 48/50/49/1% (122%)
lang::sum(floats); size=8: 55/57/56/1% (138%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=32: 32/33/33/1% (100%)
math::sum(floats); size=32: 37/37/37/0% (113%)
lang::sum(floats); size=32: 57/59/58/1% (174%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=128: 29/30/29/1% (100%)
math::sum(floats); size=128: 33/34/34/1% (113%)
lang::sum(floats); size=128: 53/55/54/1% (184%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=256: 28/30/29/2% (100%)
math::sum(floats); size=256: 32/33/33/1% (115%)
lang::sum(floats); size=256: 53/54/53/0% (188%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=1024: 28/29/28/1% (100%)
math::sum(floats); size=1024: 32/34/32/2% (115%)
lang::sum(floats); size=1024: 53/54/54/0% (190%)

This implementation avoids using a boxed type for the `sum` accumulator variable when summing Integers and Floats.

quintesse · 2015-08-13T21:06:00Z

Hmm I should test how my experimental boxing/unboxing code handles this. Because although it's a clever trick what you are doing here it depends completely on implementation details that could change at any time.

jvasileff · 2015-08-13T21:24:14Z

I beg to differ.

sum cannot possibly be fast without testing for Integer and Float; Value on its own will never be a primitive.
"Trick 1" is to test first rather than the Iterable or Iterator. This is solely to avoid the huge isReified cost
"Trick 2" is to explicitly assign values to explicit Integers. This should always result in primitives, unless boxing optimizations are totally broken.

For the last item, certainly a compiler improvement would be to treat Value&Integer as a primitive, which it can do since Integer is final. But that would not change the behavior of this code.

Again, for the last item, boxing logic could be improved such that only the accumulator would need to explicitly be an Integer (IOW, primitive + boxed always unboxes rather than sometimes boxing the primitive). But again, that improvement would not "break" this code.

jvasileff · 2015-08-13T21:32:10Z

Minor addition: I suppose pure Integers don't have to be primitives, with a "don't unbox" strategy. But then my other point above would have to be fixed for such a strategy to work, and in the end, this would result in this code performing as expected.

quintesse · 2015-08-13T21:36:48Z

I also beg to differ with your point 3, you don't get to decide what the compiler should always do, that's the compiler's business :)
It's not that this code will code will work or not, of course it works, and I agree with "Trick 1" (although hopefully even that we can improve in the future) but it's "Trick 2" that just looks nasty, even if it works.
That's why I'd be interested to know if my experimental code makes it unnecessary (because it already figured out by itself that the best thing to do would be to have an unboxed accumulator).

jvasileff · 2015-08-13T21:38:37Z

Ok, yeah, I agree that it looks ugly.

quintesse · 2015-08-13T21:38:55Z

Sorry, it's not the accumulator that "bothers" me, it's the unboxed value, that's the part that should be obvious to the compiler.

gavinking · 2015-08-13T22:04:37Z

@jvasileff I don't love this implementation, but the performance results are significant.

My feeling is that if we're going to do an optimization like this I would prefer to do it by making sum() native and writing the optimized version in Java. I think that would be a cleaner way to wring every last ounce of performance out. But perhaps I'm wrong about that...

gavinking · 2015-08-13T22:08:27Z

P.S. I agree that this hack:

Integer unboxed = val;

Should not be necessary and that the Java backend should recognize that val is unboxed. Please open an issue against ceylon-compiler.

jvasileff · 2015-08-13T22:36:16Z

this hack

See ceylon/ceylon-compiler#2270

quintesse · 2015-08-13T23:24:54Z

Just a question @gavinking , but why is this an issue for the compiler? Couldn't the typechecker just immediately simplify a Value&Integer to plain Integer?

gavinking · 2015-08-13T23:40:38Z

@quintesse No, of course not. Value is not a subtype of Integer!

gavinking · 2015-08-13T23:42:09Z

I mean Integer is not a subtype of Value.

jvasileff · 2015-08-14T01:27:26Z

AFAICT, a native implementation would:

Hide the ugliness
Optimize away assert (is Value result = sum);

But I'm not seeing further optimization opportunities. The second item, while significant for small streams, may also be eliminated with ceylon/ceylon-spec#1399 and ceylon/ceylon-compiler#2270 .

Some results:

Summary min/max/avg/rstddev/pct
lang::sum(ints); size=1: 85/95/91/3% (100%)
nativeSum(ints); size=1: 98/102/100/1% (114%)
math::sum(ints); size=1: 114/117/116/1% (133%)
optimized(ints); size=1: 167/175/169/1% (194%)

Summary min/max/avg/rstddev/pct
nativeSum(ints); size=2: 62/63/63/1% (100%)
optimized(ints); size=2: 64/66/65/1% (104%)
lang::sum(ints); size=2: 70/71/70/0% (113%)
math::sum(ints); size=2: 73/74/73/0% (117%)

Summary min/max/avg/rstddev/pct
nativeSum(ints); size=8: 35/37/36/1% (100%)
optimized(ints); size=8: 36/37/37/1% (100%)
math::sum(ints); size=8: 40/41/41/1% (112%)
lang::sum(ints); size=8: 52/53/53/1% (145%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=1024: 27/29/28/2% (100%)
nativeSum(ints); size=1024: 27/29/28/1% (100%)
math::sum(ints); size=1024: 27/30/28/3% (100%)
lang::sum(ints); size=1024: 48/52/50/2% (175%)

for:

package sandbox.ceylon.snap.base;

import com.redhat.ceylon.compiler.java.metadata.Ignore;
import com.redhat.ceylon.compiler.java.metadata.Name;
import com.redhat.ceylon.compiler.java.metadata.TypeInfo;
import com.redhat.ceylon.compiler.java.metadata.TypeParameter;
import com.redhat.ceylon.compiler.java.metadata.TypeParameters;
import com.redhat.ceylon.compiler.java.metadata.Variance;
import com.redhat.ceylon.compiler.java.runtime.model.TypeDescriptor;

import ceylon.language.Float;
import ceylon.language.Integer;
import ceylon.language.Iterable;
import ceylon.language.Iterator;
import ceylon.language.SharedAnnotation$annotation$;
import ceylon.language.Summable;

@com.redhat.ceylon.compiler.java.metadata.Ceylon(major = 8)
@com.redhat.ceylon.compiler.java.metadata.Method
public class nativeJavaSum_ {
    private nativeJavaSum_() {
    }

    @SharedAnnotation$annotation$
    @TypeInfo(
            value = "Value",
            erased = true)
    @TypeParameters({@TypeParameter(
            value = "Value",
            variance = Variance.NONE,
            satisfies = {"ceylon.language::Summable<Value>"},
            caseTypes = {})})
    public static <Value extends Summable<Value>>Value nativeJavaSum(@Ignore
    final TypeDescriptor $reified$Value, @Name("values")
    @TypeInfo(
            value = "{Value+}",
            erased = true)
    final Iterable<? extends Value, ? extends Object> values) {
        final Iterator<? extends Value> it = values.iterator();
        final Object first = it.next();
        if (first instanceof Integer) {
            long sum = ((Integer)first).longValue();
            while (true) {
                Object val$670;
                if ((val$670 = it.next()) instanceof Integer) {
                    sum += ((Integer)val$670).longValue();
                } else {
                    break;
                }
            }
            return (Value)Integer.instance(sum);
        } else {
            if (first instanceof Float) {
                double sum = ((Float)first).doubleValue();
                while (true) {
                    Object val$665;
                    if ((val$665 = it.next()) instanceof Float) {
                        sum += ((Float)val$665).doubleValue();
                    } else {
                        break;
                    }
                }
                return (Value) Float.instance(sum);
            } else {
                throw new UnsupportedOperationException();
            }
        }
    }
}

gavinking · 2015-08-14T01:46:54Z

Just to throw something extra out there, it seems to me that this (slightly ugly) implementation using each() is significantly faster than the version with a for loop, at least for streams with an optimized each() implementation:

shared Value sum2<Value>({Value+} values) 
        given Value satisfies Summable<Value> {
    variable value sum = values.first;
    variable value first = true;
    values.each(void (element) {
        if (first) {
            sum = element;
            first = false;
        }
        else {
            sum+=element;
        }
    });
    return sum;
}

gavinking · 2015-08-14T02:10:58Z

Scratch that: with a correct implementation it doesn't seem to make much difference.

However this implementation does seem to be faster:

shared Value sum4<Value>({Value+} values) 
        given Value satisfies Summable<Value> 
        => values.reduce(plus);

... at least in some cases.

gavinking · 2015-08-14T02:12:20Z

Yeah, that makes sense, since there is an optimized implementation of reduce() for collections backed by arrays.

jvasileff · 2015-08-14T03:41:37Z

each on a Sequence is much faster than the while loop for the Integer optimized case:

    if (is Integer first) {
        // unbox; don't infer type Value&Integer
        variable Integer sum = 0;
        values.each(void (item) {
            assert (is Integer item);
            Integer unboxed = item;
            sum += unboxed;
        });
        assert (is Value result = sum);
        return result;
    }

Summary min/max/avg/rstddev/pct
opti-Each(ints); size=1024: 20/20/20/1% (100%)
optimized(ints); size=1024: 39/41/40/2% (197%)

but the overhead kills performance for n<=8:

Summary min/max/avg/rstddev/pct
optimized(ints); size=1: 129/151/142/7% (100%)
opti-Each(ints); size=1: 369/435/399/7% (284%)

and of course this ugly impl reads the first element twice, so it isn't really an option anyway. But for ceylon.math.integer::sum, maybe?

I wasn't able to see an advantage with reduce over the standard while loop:

Summary min/max/avg/rstddev/pct
lang::sum(ints); size=1: 102/118/109/5% (100%)
   Reduce(ints); size=1: 125/134/131/2% (122%)

Summary min/max/avg/rstddev/pct
lang::sum(ints); size=8: 73/76/75/1% (100%)
   Reduce(ints); size=8: 86/99/93/5% (117%)

Summary min/max/avg/rstddev/pct
lang::sum(ints); size=1024: 70/79/73/5% (100%)
   Reduce(ints); size=1024: 72/75/74/2% (103%)

gavinking · 2015-08-14T04:06:30Z

reduce() is much faster in the case of an ArraySequence.

jvasileff · 2015-08-14T04:15:29Z

Ok, weird.

print(type(ints));
// ceylon.language::ArraySequence<ceylon.language::Integer>

But hotspot is a strange beast.

@jvasileff

this is based on code + ideas by @jvasileff in #728, but using native

gavinking · 2015-10-01T15:46:20Z

Apologies, @jvasileff, I have pushed a copy/paste of your code instead of merging this PR because I wanted to make it a native implementation. I know it's a bit rude. I try not to do that.

gavinking · 2015-10-01T15:46:45Z

P.S. @jvasileff thanks for the contribution!

jvasileff · 2015-10-01T21:35:58Z

No problem at all. Looks like the approach saved us both some time.

optimize sum() for Integers and Floats

5b2dc71

This implementation avoids using a boxed type for the `sum` accumulator variable when summing Integers and Floats.

quintesse added the optimization label Aug 13, 2015

quintesse added this to the 1.2 milestone Aug 13, 2015

gavinking self-assigned this Sep 18, 2015

gavinking added a commit that referenced this pull request Oct 1, 2015

optimized implementations of sum() and product()

c0d88c8

this is based on code + ideas by @jvasileff in #728, but using native

gavinking closed this Oct 1, 2015

gavinking mentioned this pull request Oct 1, 2015

experiment with sum()/product() optimization for min()/max() #773

Closed

CeylonMigrationBot mentioned this pull request Nov 14, 2015

make ceylon.language::sum fast eclipse-archived/ceylon#5606

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make ceylon.language::sum fast #728

make ceylon.language::sum fast #728

jvasileff commented Aug 13, 2015

quintesse commented Aug 13, 2015

jvasileff commented Aug 13, 2015

jvasileff commented Aug 13, 2015

quintesse commented Aug 13, 2015

jvasileff commented Aug 13, 2015

quintesse commented Aug 13, 2015

gavinking commented Aug 13, 2015

gavinking commented Aug 13, 2015

jvasileff commented Aug 13, 2015

quintesse commented Aug 13, 2015

gavinking commented Aug 13, 2015

gavinking commented Aug 13, 2015

jvasileff commented Aug 14, 2015

gavinking commented Aug 14, 2015

gavinking commented Aug 14, 2015

gavinking commented Aug 14, 2015

jvasileff commented Aug 14, 2015

gavinking commented Aug 14, 2015

jvasileff commented Aug 14, 2015

gavinking commented Oct 1, 2015

gavinking commented Oct 1, 2015

jvasileff commented Oct 1, 2015

make ceylon.language::sum fast #728

make ceylon.language::sum fast #728

Conversation

jvasileff commented Aug 13, 2015

quintesse commented Aug 13, 2015

jvasileff commented Aug 13, 2015

jvasileff commented Aug 13, 2015

quintesse commented Aug 13, 2015

jvasileff commented Aug 13, 2015

quintesse commented Aug 13, 2015

gavinking commented Aug 13, 2015

gavinking commented Aug 13, 2015

jvasileff commented Aug 13, 2015

quintesse commented Aug 13, 2015

gavinking commented Aug 13, 2015

gavinking commented Aug 13, 2015

jvasileff commented Aug 14, 2015

gavinking commented Aug 14, 2015

gavinking commented Aug 14, 2015

gavinking commented Aug 14, 2015

jvasileff commented Aug 14, 2015

gavinking commented Aug 14, 2015

jvasileff commented Aug 14, 2015

gavinking commented Oct 1, 2015

gavinking commented Oct 1, 2015

jvasileff commented Oct 1, 2015