-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
median of an empty array is undefined #101
Comments
The difference here makes sense. The median is by definition either a data value (if your list is of odd length) or the arithmetic mean of the middle two values (if your list is of even length). This definition assumes that the length is either even or odd, so if you have a length of zero, there's no reasonable definition. Mean on the other hand is the sum of the elements in the list divided by the number of elements. If there are 0 elements, regardless of how you define the sum, the division will produce |
Hello, I think
When the set is empty, there can be no way of preferring any particular location for the centre, so the answer is undefined (in both cases). For me, allowing a generalised definition for this boundary case, or restricting the function domain by returning an error, should be consistent. |
|
We already have that inconsistency for julia> mean(Float64[])
NaN
julia> mean(Rational{Int64}[])
ERROR: DivideError: integer division error
in //(::Rational{Int64}, ::Rational{Int64}) at ./rational.jl:33
in mean(::Array{Rational{Int64},1}) at ./statistics.jl:28 |
In my opinion this is a great use case for a Nullable (or a Result type). Adding to the list of things that don't work: julia> var([])
ERROR: MethodError: no method matching zero(::Type{Any}) |
@bjarthur Do you have a use case where it would be convenient to handle the |
@johnmyleswhite, I don't like the idea of using Nullable here; there is a big difference between a number not being available (Nullable) versus the answer being undefined (an exception or NaN), and we don't use Nullable for this purpose anywhere else (e.g. |
I agree with those issues, but that still leaves a choice between a result type (which I believe several folks are seriously considered building into the language) and throwing an exception. The issue is IMO a design decision about whether you want control flow to happen automatically or whether you want people to get a wrapper object and then decide whether to throw based on the presence or non-presence of a valid result in the wrapper. |
Using a Nullable/Result type here would be a major PITA because it would force you to call |
i stumbled across this difference btw were a change made, i'd prefer having |
Throwing an error from |
A situation where throwing an error is really annoying is when computing summary statistics over groups in the presence of missing values. See this Discourse post for an illustration. That can also happen if there are empty groups. Now that we have got rid of We could also simply add an argument to specify the return value you want if the input is empty, just like for reductions. Anyway I think we should do something about this as it's a painful difference for users coming from other languages (R, Pandas...), which don't throw an error. |
this would be a breaking change and have to wait to 2.0, no? |
No, AFAIK turning an error into something else is allowed. |
i see the discussion of treating
median
andmean
identically re. NaN. has it also been discussed whether they should treat empty arrays identically? currently they do not:really simple change to make
median
return NaN. all that has to change is the first line.The text was updated successfully, but these errors were encountered: