make ceylon.language::sum fast #5606

CeylonMigrationBot · 2015-08-13T20:07:03Z

[@jvasileff] This implementation avoids using a boxed type for the sum accumulator variable when summing Integers and Floats.

If you want to merge this, I can also make the change for product.

Benchmark Script

shared void sumTest() {
    for (size in [1, 2, 4, 8, 32, 128, 256, 1024]) {
        value ints = (1..size).collect(identity);
        value floats = (1..size).collect(Integer.float);
        value repeat = 2^27 / size;

        benchmark {
            scale = repeat * size  / 10;
            warmupIter = 3;
            benchIter = 5;
            ["lang::sum(ints); size=``size``:", () => (0:repeat).each((_) => sum(ints))],
            ["math::sum(ints); size=``size``:", () => (0:repeat).each((_) => mathSum(ints))],
            ["optimized(ints); size=``size``:", () => (0:repeat).each((_) => sumNew4(ints))]
        };

        benchmark {
            scale = repeat*size  / 10;
            warmupIter = 3;
            benchIter = 5;
            ["lang::sum(floats); size=``size``:", () => (0:repeat).each((_) => sum(floats))],
            ["math::sum(floats); size=``size``:", () => (0:repeat).each((_) => mathSumF(floats))],
            ["optimized(floats); size=``size``:", () => (0:repeat).each((_) => sumNew4(floats))]
        };
    }
}

Integer Results

Summary min/max/avg/rstddev/pct
math::sum(ints); size=1: 116/124/118/3% (100%)
optimized(ints); size=1: 120/127/123/2% (103%)
lang::sum(ints); size=1: 139/141/140/0% (119%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=2: 65/69/66/1% (100%)
lang::sum(ints); size=2: 69/71/70/1% (105%)
math::sum(ints); size=2: 73/76/74/1% (112%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=4: 47/47/47/0% (100%)
math::sum(ints); size=4: 52/53/52/1% (109%)
lang::sum(ints); size=4: 57/59/58/1% (121%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=8: 37/38/37/1% (100%)
math::sum(ints); size=8: 41/42/41/0% (111%)
lang::sum(ints); size=8: 53/55/54/1% (143%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=32: 32/32/32/0% (100%)
math::sum(ints); size=32: 33/37/34/4% (103%)
lang::sum(ints); size=32: 55/56/56/0% (172%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=128: 29/33/30/5% (100%)
math::sum(ints); size=128: 30/31/30/0% (105%)
lang::sum(ints); size=128: 50/52/51/1% (173%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=256: 29/33/31/4% (99%)
math::sum(ints); size=256: 30/35/32/6% (102%)
lang::sum(ints); size=256: 51/57/55/4% (173%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=1024: 29/30/29/2% (100%)
math::sum(ints); size=1024: 29/30/30/0% (102%)
lang::sum(ints); size=1024: 51/52/51/1% (175%)

Float Results

Summary min/max/avg/rstddev/pct
lang::sum(floats); size=1: 92/97/95/1% (100%)
optimized(floats); size=1: 126/130/128/1% (135%)
math::sum(floats); size=1: 149/154/152/1% (160%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=2: 68/70/69/1% (100%)
lang::sum(floats); size=2: 68/72/70/2% (100%)
math::sum(floats); size=2: 92/94/93/0% (134%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=4: 50/52/51/2% (100%)
lang::sum(floats); size=4: 59/63/61/3% (117%)
math::sum(floats); size=4: 63/66/65/1% (127%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=8: 39/40/40/0% (100%)
math::sum(floats); size=8: 48/50/49/1% (122%)
lang::sum(floats); size=8: 55/57/56/1% (138%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=32: 32/33/33/1% (100%)
math::sum(floats); size=32: 37/37/37/0% (113%)
lang::sum(floats); size=32: 57/59/58/1% (174%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=128: 29/30/29/1% (100%)
math::sum(floats); size=128: 33/34/34/1% (113%)
lang::sum(floats); size=128: 53/55/54/1% (184%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=256: 28/30/29/2% (100%)
math::sum(floats); size=256: 32/33/33/1% (115%)
lang::sum(floats); size=256: 53/54/53/0% (188%)

Summary min/max/avg/rstddev/pct
optimized(floats); size=1024: 28/29/28/1% (100%)
math::sum(floats); size=1024: 32/34/32/2% (115%)
lang::sum(floats); size=1024: 53/54/54/0% (190%)

[Migrated from ceylon/ceylon.language#728]
[Closed at 2015-10-01 15:46:20]

The text was updated successfully, but these errors were encountered:

CeylonMigrationBot · 2015-08-13T21:06:00Z

[@quintesse] Hmm I should test how my experimental boxing/unboxing code handles this. Because although it's a clever trick what you are doing here it depends completely on implementation details that could change at any time.

CeylonMigrationBot · 2015-08-13T21:24:14Z

[@jvasileff] I beg to differ.

sum cannot possibly be fast without testing for Integer and Float; Value on its own will never be a primitive.
"Trick 1" is to test first rather than the Iterable or Iterator. This is solely to avoid the huge isReified cost
"Trick 2" is to explicitly assign values to explicit Integers. This should always result in primitives, unless boxing optimizations are totally broken.

For the last item, certainly a compiler improvement would be to treat Value&Integer as a primitive, which it can do since Integer is final. But that would not change the behavior of this code.

Again, for the last item, boxing logic could be improved such that only the accumulator would need to explicitly be an Integer (IOW, primitive + boxed always unboxes rather than sometimes boxing the primitive). But again, that improvement would not "break" this code.

CeylonMigrationBot · 2015-08-13T21:32:10Z

[@jvasileff] Minor addition: I suppose pure Integers don't have to be primitives, with a "don't unbox" strategy. But then my other point above would have to be fixed for such a strategy to work, and in the end, this would result in this code performing as expected.

CeylonMigrationBot · 2015-08-13T21:36:48Z

[@quintesse] I also beg to differ with your point 3, you don't get to decide what the compiler should always do, that's the compiler's business :)
It's not that this code will code will work or not, of course it works, and I agree with "Trick 1" (although hopefully even that we can improve in the future) but it's "Trick 2" that just looks nasty, even if it works.
That's why I'd be interested to know if my experimental code makes it unnecessary (because it already figured out by itself that the best thing to do would be to have an unboxed accumulator).

CeylonMigrationBot · 2015-08-13T21:38:37Z

[@jvasileff] Ok, yeah, I agree that it looks ugly.

CeylonMigrationBot · 2015-08-13T21:38:55Z

[@quintesse] Sorry, it's not the accumulator that "bothers" me, it's the unboxed value, that's the part that should be obvious to the compiler.

CeylonMigrationBot · 2015-08-13T22:04:37Z

[@gavinking]
@jvasileff I don't love this implementation, but the performance results are significant.

My feeling is that if we're going to do an optimization like this I would prefer to do it by making sum() native and writing the optimized version in Java. I think that would be a cleaner way to wring every last ounce of performance out. But perhaps I'm wrong about that...

CeylonMigrationBot · 2015-08-13T22:08:27Z

[@gavinking] P.S. I agree that this hack:

Integer unboxed = val;

Should not be necessary and that the Java backend should recognize that val is unboxed. Please open an issue against ceylon-compiler.

CeylonMigrationBot · 2015-08-13T22:36:16Z

[@jvasileff]

this hack

See #2270

CeylonMigrationBot · 2015-08-13T23:24:54Z

[@quintesse] Just a question @gavinking , but why is this an issue for the compiler? Couldn't the typechecker just immediately simplify a Value&Integer to plain Integer?

CeylonMigrationBot · 2015-08-13T23:40:38Z

[@gavinking] @quintesse No, of course not. Value is not a subtype of Integer!

CeylonMigrationBot · 2015-08-13T23:42:09Z

[@gavinking] I mean Integer is not a subtype of Value.

CeylonMigrationBot · 2015-08-14T01:27:26Z

[@jvasileff] AFAICT, a native implementation would:

Hide the ugliness
Optimize away assert (is Value result = sum);

But I'm not seeing further optimization opportunities. The second item, while significant for small streams, may also be eliminated with #4505 and #2270 .

Some results:

Summary min/max/avg/rstddev/pct
lang::sum(ints); size=1: 85/95/91/3% (100%)
nativeSum(ints); size=1: 98/102/100/1% (114%)
math::sum(ints); size=1: 114/117/116/1% (133%)
optimized(ints); size=1: 167/175/169/1% (194%)

Summary min/max/avg/rstddev/pct
nativeSum(ints); size=2: 62/63/63/1% (100%)
optimized(ints); size=2: 64/66/65/1% (104%)
lang::sum(ints); size=2: 70/71/70/0% (113%)
math::sum(ints); size=2: 73/74/73/0% (117%)

Summary min/max/avg/rstddev/pct
nativeSum(ints); size=8: 35/37/36/1% (100%)
optimized(ints); size=8: 36/37/37/1% (100%)
math::sum(ints); size=8: 40/41/41/1% (112%)
lang::sum(ints); size=8: 52/53/53/1% (145%)

Summary min/max/avg/rstddev/pct
optimized(ints); size=1024: 27/29/28/2% (100%)
nativeSum(ints); size=1024: 27/29/28/1% (100%)
math::sum(ints); size=1024: 27/30/28/3% (100%)
lang::sum(ints); size=1024: 48/52/50/2% (175%)

for:

package sandbox.ceylon.snap.base;

import com.redhat.ceylon.compiler.java.metadata.Ignore;
import com.redhat.ceylon.compiler.java.metadata.Name;
import com.redhat.ceylon.compiler.java.metadata.TypeInfo;
import com.redhat.ceylon.compiler.java.metadata.TypeParameter;
import com.redhat.ceylon.compiler.java.metadata.TypeParameters;
import com.redhat.ceylon.compiler.java.metadata.Variance;
import com.redhat.ceylon.compiler.java.runtime.model.TypeDescriptor;

import ceylon.language.Float;
import ceylon.language.Integer;
import ceylon.language.Iterable;
import ceylon.language.Iterator;
import ceylon.language.SharedAnnotation$annotation$;
import ceylon.language.Summable;

@com.redhat.ceylon.compiler.java.metadata.Ceylon(major = 8)
@com.redhat.ceylon.compiler.java.metadata.Method
public class nativeJavaSum_ {
    private nativeJavaSum_() {
    }

    @SharedAnnotation$annotation$
    @TypeInfo(
            value = "Value",
            erased = true)
    @TypeParameters({@TypeParameter(
            value = "Value",
            variance = Variance.NONE,
            satisfies = {"ceylon.language::Summable<Value>"},
            caseTypes = {})})
    public static <Value extends Summable<Value>>Value nativeJavaSum(@Ignore
    final TypeDescriptor $reified$Value, @Name("values")
    @TypeInfo(
            value = "{Value+}",
            erased = true)
    final Iterable<? extends Value, ? extends Object> values) {
        final Iterator<? extends Value> it = values.iterator();
        final Object first = it.next();
        if (first instanceof Integer) {
            long sum = ((Integer)first).longValue();
            while (true) {
                Object val$670;
                if ((val$670 = it.next()) instanceof Integer) {
                    sum += ((Integer)val$670).longValue();
                } else {
                    break;
                }
            }
            return (Value)Integer.instance(sum);
        } else {
            if (first instanceof Float) {
                double sum = ((Float)first).doubleValue();
                while (true) {
                    Object val$665;
                    if ((val$665 = it.next()) instanceof Float) {
                        sum += ((Float)val$665).doubleValue();
                    } else {
                        break;
                    }
                }
                return (Value) Float.instance(sum);
            } else {
                throw new UnsupportedOperationException();
            }
        }
    }
}

CeylonMigrationBot · 2015-08-14T01:46:54Z

[@gavinking] Just to throw something extra out there, it seems to me that this (slightly ugly) implementation using each() is significantly faster than the version with a for loop, at least for streams with an optimized each() implementation:

shared Value sum2<Value>({Value+} values) 
        given Value satisfies Summable<Value> {
    variable value sum = values.first;
    variable value first = true;
    values.each(void (element) {
        if (first) {
            sum = element;
            first = false;
        }
        else {
            sum+=element;
        }
    });
    return sum;
}

CeylonMigrationBot · 2015-08-14T02:10:58Z

[@gavinking] Scratch that: with a correct implementation it doesn't seem to make much difference.

However this implementation does seem to be faster:

shared Value sum4<Value>({Value+} values) 
        given Value satisfies Summable<Value> 
        => values.reduce(plus);

... at least in some cases.

CeylonMigrationBot · 2015-08-14T02:12:20Z

[@gavinking] Yeah, that makes sense, since there is an optimized implementation of reduce() for collections backed by arrays.

CeylonMigrationBot · 2015-08-14T03:41:37Z

[@jvasileff]
each on a Sequence is much faster than the while loop for the Integer optimized case:

    if (is Integer first) {
        // unbox; don't infer type Value&Integer
        variable Integer sum = 0;
        values.each(void (item) {
            assert (is Integer item);
            Integer unboxed = item;
            sum += unboxed;
        });
        assert (is Value result = sum);
        return result;
    }

Summary min/max/avg/rstddev/pct
opti-Each(ints); size=1024: 20/20/20/1% (100%)
optimized(ints); size=1024: 39/41/40/2% (197%)

but the overhead kills performance for n<=8:

Summary min/max/avg/rstddev/pct
optimized(ints); size=1: 129/151/142/7% (100%)
opti-Each(ints); size=1: 369/435/399/7% (284%)

and of course this ugly impl reads the first element twice, so it isn't really an option anyway. But for ceylon.math.integer::sum, maybe?

I wasn't able to see an advantage with reduce over the standard while loop:

Summary min/max/avg/rstddev/pct
lang::sum(ints); size=1: 102/118/109/5% (100%)
   Reduce(ints); size=1: 125/134/131/2% (122%)

Summary min/max/avg/rstddev/pct
lang::sum(ints); size=8: 73/76/75/1% (100%)
   Reduce(ints); size=8: 86/99/93/5% (117%)

Summary min/max/avg/rstddev/pct
lang::sum(ints); size=1024: 70/79/73/5% (100%)
   Reduce(ints); size=1024: 72/75/74/2% (103%)

CeylonMigrationBot · 2015-08-14T04:06:30Z

[@gavinking] reduce() is much faster in the case of an ArraySequence.

CeylonMigrationBot · 2015-08-14T04:15:29Z

[@jvasileff] Ok, weird.

print(type(ints));
// ceylon.language::ArraySequence<ceylon.language::Integer>

But hotspot is a strange beast.

CeylonMigrationBot · 2015-10-01T15:46:20Z

[@gavinking] Apologies, @jvasileff, I have pushed a copy/paste of your code instead of merging this PR because I wanted to make it a native implementation. I know it's a bit rude. I try not to do that.

CeylonMigrationBot · 2015-10-01T15:46:45Z

[@gavinking] P.S. @jvasileff thanks for the contribution!

CeylonMigrationBot · 2015-10-01T21:35:58Z

[@jvasileff] No problem at all. Looks like the approach saved us both some time.

@jvasileff

this is based on code + ideas by @jvasileff in #5606, but using native

CeylonMigrationBot added t-optimization m-language labels Nov 14, 2015

CeylonMigrationBot assigned gavinking Nov 14, 2015

CeylonMigrationBot added this to the 1.2 milestone Nov 14, 2015

CeylonMigrationBot closed this as completed Nov 14, 2015

CeylonMigrationBot mentioned this issue Nov 14, 2015

experiment with sum()/product() optimization for min()/max() #5651

Closed

gavinking added a commit that referenced this issue Nov 14, 2015

optimized implementations of sum() and product()

d174942

this is based on code + ideas by @jvasileff in #5606, but using native

eclipsewebmaster unassigned gavinking Dec 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make ceylon.language::sum fast #5606

make ceylon.language::sum fast #5606

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Oct 1, 2015

CeylonMigrationBot commented Oct 1, 2015

CeylonMigrationBot commented Oct 1, 2015

make ceylon.language::sum fast #5606

make ceylon.language::sum fast #5606

Comments

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 13, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Aug 14, 2015

CeylonMigrationBot commented Oct 1, 2015

CeylonMigrationBot commented Oct 1, 2015

CeylonMigrationBot commented Oct 1, 2015