Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected type promotion in reduce and mapreduce #21523

Closed
perrutquist opened this issue Apr 24, 2017 · 6 comments
Closed

Unexpected type promotion in reduce and mapreduce #21523

perrutquist opened this issue Apr 24, 2017 · 6 comments
Labels
types and dispatch Types, subtyping and method dispatch

Comments

@perrutquist
Copy link
Contributor

perrutquist commented Apr 24, 2017

The reduce function should, according to its docstring, "reduce the given collection with the given binary operator." I believe this is also what most programmers would expect a reduce function to do.

However, this is not always what it does. Some input types, e.g. Int32 are converted before the operator is applied. This can lead to unexpected results.

julia> my_op(a::Int32,b::Int32) = max(a,b)
my_op (generic function with 1 method)

julia> reduce(my_op, ones(Int32,42))
ERROR: MethodError: no method matching my_op(::Int64, ::Int64)
 in mapreduce_impl(::Base.#identity, ::#my_op, ::Array{Int32,1}, ::Int64, ::Int64, ::Int64) at ./reduce.jl:101
 in _mapreduce(::Base.#identity, ::#my_op, ::Base.LinearFast, ::Array{Int32,1}) at ./reduce.jl:168
 in mapreduce(::Function, ::Function, ::Array{Int32,1}) at ./reduce.jl:175
 in reduce(::Function, ::Array{Int32,1}) at ./reduce.jl:179

The exact same problem occurs with mapreduce(identity, my_op, ones(Int32,42)).

I realize that using an accumulator that is wider than the input could be beneficial in some special circumstances, such as if the sum function didn't exist and one wanted to create it using reduce(+, ...) without risking integer overflow. But in those cases one could supply a v0 argument of the desired return type (or there could be an optional accumulator_type argument).

@ararslan ararslan added the types and dispatch Types, subtyping and method dispatch label Apr 24, 2017
@stevengj
Copy link
Member

Duplicate of #20560

@stevengj
Copy link
Member

Hmm, I guess it is not quite a duplicate of that issue, but it is related. The resolution (#20607) should try to fix both problems.

@stevengj stevengj reopened this Apr 24, 2017
@perrutquist
Copy link
Contributor Author

perrutquist commented Apr 24, 2017

Issue #20560 is mainly about sum. Currently sum implemented as something like sum(a) = reduce(+, a) which makes the two issues related. But if we instead had something like sum(a) = reduce(+, zero(widen(eltype(a))), a) then the two issues would be disjoint.

To me, sum(a) means "the sum of the elements of a". This is a mathematical concept, not a programming one, and I would not necessarily expect the return type to be the same as the element type.

But reduce(+, a) means "apply + repeatedly". This is a programming concept, so for the return type to be something different than the return type of + makes no sense (to me).

It would actually be a nice feature if reduce(+,a) used a smaller accumulator than sum(a) and ran faster when SIMD instructions are available. My guess is that any programmer advanced enough to use mapreduce or reduce is also advanced enough to know about integer overflow. Less advanced programmers would use sum in any case, and the documentation should encourage them to do so.

@TotalVerb
Copy link
Contributor

Well stated. I too have never been a fan of reduce's auto-widening. In my opinion, it violates the principle of least surprise.

@StefanKarpinski
Copy link
Member

+100 @perrutquist

@perrutquist
Copy link
Contributor Author

Another benefit of making the distinction sum = "mathematical sum"; reduce= "iteration of + operator" is that it would be natural for sum to throw an OverflowError on integer overflow, sacrificing some speed for mathematical correctness.
If an M-bit accumulator is used to sum N-bit elements, then partial sums of 2^(M-N) elements can be computed without checking, so there would not be much of a performance penalty in common cases like M=64, N=32.
Advanced users could still use mapreduce(Int64, +, arr) to get the 64-bit sum of a 32-bit array without overflow checking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

No branches or pull requests

5 participants