Make covariance and correlation work for iterators, skipmissing in particular. #34

pdeffebach · 2020-04-27T15:42:10Z

Currently cov and cor fail with iterators that are not vectors, e.g. skipmissing iterators or vectors with Iterators.Filter applied to them. This is part of the plan I have commented on at JuliaLang/julia#35050 (comment) to improve quality of life issues with missings.

Thanks to Missings.skipmissings (JuliaData/Missings.jl#111), this allows computing the correlation without missing values via cor(skipmissings(x, y)...).

Supersedes #30 because it is a more minimal implementation.

mschauer · 2020-04-27T15:47:04Z

Project.toml

@@ -1,5 +1,4 @@
 name = "Statistics"
-uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"


Intentional?

Yes. It's a hack to make sure julia knows to load this folder, it's described here for Pkg.

Normally the Travis script does that automatically, so you can revert this: https://github.com/JuliaLang/Statistics.jl/blob/master/.travis.yml#L24

Though you need it to run tests locally.

pdeffebach · 2020-04-27T16:13:13Z

I have added the functionality we want and added tests. What's left, assuming what I've written is okay, is to disallow some things that only kind of work at the moment.

X = rand(10, 2);
y = skipmissing(rand(10));

cov(X, y)

The above works, but we can't add any of the vardim arguments that one can for cov(X::Matrix, y::Vector). I can either add the full combinations of all these methods (cov(X::AbstractMatrix, y::Any; vardim) etc.) or we can disallow them for the present.

nalimilan · 2020-04-28T08:35:45Z

src/Statistics.jl

+
+Return the number one.
+"""
+cor(itr::Any) = one(real(eltype(collect(itr))))


Better first check whether Base.IteratorEltype(itr) isa Base.HasEltype && isconcrete(eltype(itr)), and in that case avoid calling collect.

Also remove the docstring for AbstractVector below, which is just a special case of this one.

nalimilan · 2020-04-28T08:37:36Z

src/Statistics.jl

@@ -630,7 +663,7 @@ function cov2cor!(C::AbstractMatrix, xsd::AbstractArray, ysd::AbstractArray)
 end

 # corzm (non-exported, with centered data)
-
+corzm(x::Any) = corzm(collect(x))


Same remark here and for corm as for cor about using the eltype when it's known.

nalimilan · 2020-04-28T08:40:38Z

src/Statistics.jl

+
+Compute the covariance between the iterators `x` and `y`. If `corrected` is `true` (the
+default), computes ``\\frac{1}{n-1}\\sum_{i=1}^n (x_i-\\bar x) (y_i-\\bar y)^*`` where
+``*`` denotes the complex conjugate and `n = length(collect(x)) = length(collect(y))`. If `corrected` is


Suggested change

``*`` denotes the complex conjugate and `n = length(collect(x)) = length(collect(y))`. If `corrected` is

``*`` denotes the complex conjugate and ``n`` the number of elements. If `corrected` is

src/Statistics.jl

nalimilan · 2020-04-28T08:41:49Z

src/Statistics.jl

+
+Compute the variance of the iterator `itr`. If `corrected` is `true` (the default) then the sum
+is scaled with `n-1`, whereas the sum is scaled with `n` if `corrected` is `false` where 
+`n = length(collect(itr))`.


Suggested change

`n = length(collect(itr))`.

``n`` is the number of elements.

nalimilan · 2020-04-28T08:44:44Z

src/Statistics.jl

+"""
+function cov(itr::Any; corrected::Bool=true)
+    x = collect(itr)
+    covm(x, mean(x); corrected=corrected)


Better call covzm directly to avoid an additional copy:

Suggested change

covm(x, mean(x); corrected=corrected)

covzm(map!(t -> t - xmean, x, x); corrected=corrected)

Same for the two-argument method.

nalimilan · 2020-04-28T08:45:10Z

src/Statistics.jl

@@ -518,16 +519,32 @@ end
 # covm (with provided mean)
 ## Use map(t -> t - xmean, x) instead of x .- xmean to allow for Vector{Vector}
 ## which can't be handled by broadcast
+covm(itr::Any, itrmean; corrected::Bool=true) = 
+    @show covm(collect(itr), itrmean; corrected=corrected)


Suggested change

@show covm(collect(itr), itrmean; corrected=corrected)

covm(collect(itr), itrmean; corrected=corrected)

nalimilan · 2020-04-28T08:47:36Z

Project.toml

@@ -1,5 +1,4 @@
 name = "Statistics"
-uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"


Normally the Travis script does that automatically, so you can revert this: https://github.com/JuliaLang/Statistics.jl/blob/master/.travis.yml#L24

Though you need it to run tests locally.

nalimilan

Thanks!

If these methods give correct results, then it's OK to allow it and only add keyword arguments later if needed. But they should be tested.

Can you also add tests for cov and cor?

src/Statistics.jl

nalimilan · 2020-04-28T16:23:49Z

src/Statistics.jl

@@ -644,9 +671,10 @@ corzm(x::AbstractMatrix, y::AbstractMatrix, vardim::Int=1) =
    cov2cor!(unscaled_covzm(x, y, vardim), sqrt!(sum(abs2, x, dims=vardim)), sqrt!(sum(abs2, y, dims=vardim)))

 # corm
-
+corm(x::Any, xmean) = corm(collect(x), xmean)


Also apply the eltype check here.

src/Statistics.jl

Co-Authored-By: Milan Bouchet-Valat <[email protected]>

pdeffebach · 2020-04-29T15:12:13Z

I have added many more tests. Everything is covered.

The rules for

cov(X::Matrix, y::itr)

Are that the rows of X must be observations. you can't use dims = 1 in this scenario. I experimented a bit with methods to make it work but I ended up with never-ending method ambiguities because of the implemtation.

I'm not sure how to best put that into a doscstring.

Future PRs should

Allow for columns to be observation with cov(X::Matrix, y::itr)
Allow for iterators of iterators -- collecting them into matrices

src/Statistics.jl

test/runtests.jl

src/Statistics.jl

nalimilan · 2020-04-29T16:50:57Z

src/Statistics.jl

@@ -630,7 +653,13 @@ function cov2cor!(C::AbstractMatrix, xsd::AbstractArray, ysd::AbstractArray)
 end

 # corzm (non-exported, with centered data)
-
+function corzm(itr::Any) 


Can you put this code in an internal method which will be called by all functions that need it? It's repeated three times.

Also:

Suggested change

function corzm(itr::Any)

function corzm(itr::Any)

src/Statistics.jl

Co-Authored-By: Milan Bouchet-Valat <[email protected]>

…ics.jl into iterator_cor2

bkamins · 2020-04-30T08:29:45Z

If you have two ::Any arguments you always seem to collect both. You should collect them only if they are not AbstractVector I think (the case is if you pass an AbstractVector and an iterator).
Maybe we should think about special casing of AbstractArray here. They are iterable, and for example I think that in many cases allowing the operation on them is problematic (it can lead to unintuitive results) - I guess you wanted to allow iterators that are not AbstractArray right?

nalimilan · 2020-04-30T08:33:27Z

If you have two ::Any arguments you always seem to collect both. You should collect them only if they are not AbstractVector I think (the case is if you pass an AbstractVector and an iterator).

The methods mutate the result to subtract the mean, so calling collect is correct here I think unless something is missing.

Maybe we should think about special casing of AbstractArray here. They are iterable, and for example I think that in many cases allowing the operation on them is problematic (it can lead to unintuitive results) - I guess you wanted to allow iterators that are not AbstractArray right?

Yes that's something that bothered me too. Probably only vectors should be allowed to be mixed with other iterators. For other cases it's not clear what should happen so better throw an error for now.

bkamins · 2020-04-30T08:37:51Z

The methods mutate the result to subtract the mean

I do not think they always do. Have a look at corm implementation as an example.

nalimilan · 2020-04-30T08:42:29Z

Ah right. So we need corm(x::AbstractVector, mx, y::Any, my) and corm(x::Any, mx, y::AbstractVector, my) to avoid a copy.

pdeffebach · 2020-04-30T13:50:42Z

I was under the impression that collect was a no-op for vectors. I will add those methods.

pdeffebach · 2020-04-30T14:19:39Z

@bkamins you are right about non-allocations. But adding methods results in tons of method ambiguity errors.

To resolve this without re-thinking the whole dispatch scheme, I implemented

_lazycollect(x::Any) = collect(x)
_lazycollect(x::AbstractVector) = x

just in places where we don't modify x. If we do modify it I use collect.

This result feels hacky, but it's better than method ambiguities. This is ready for review, by Milan and hopefully by Triage.

src/Statistics.jl

nalimilan · 2020-04-30T15:41:03Z

src/Statistics.jl

+
+    corm(cx, mean(cx), cy, mean(cy))
+end
+
 """
    cor(x::AbstractVector, y::AbstractVector)


Remove this docstring which is a special case of the previous one.

nalimilan · 2020-04-30T15:44:42Z

src/Statistics.jl

+_lazycollect(x::Any) = collect(x)
+_lazycollect(x::AbstractVector) = x
+
+function _matrix_error(x, y, fun)


Why not just throw an error from _lazycollect if passed a matrix? The error message will be less precise but that's not a big deal. And then you can also throw an error for any AbstractArray that isn't an AbstractVector, which is a case which isn't allowed currently and should probably remain an error.

Yes I would just add a special method to _lazycollect for other types. Actually I would do it for AbstractArray not only AbstractMatrix. The only thing to think about if we want to allow 0-dimensional AbstractArrays (they would produce NaN anyway).

I think that collecting 2 or more dimensional arrays in places where we expect vectors is not useful (but we can discuss this).

It doesn't work because we don't use it all the time, for instance when we collect and use map!.

I can add a _collect_if_itr_or_vec method for that scenario.

Co-authored-by: Milan Bouchet-Valat <[email protected]>

bkamins · 2020-04-30T17:48:28Z

Also many functions in Statistics allow passing any iterator, so it sounds consistent to allow them here too.

Yes, but these "many functions" have also a defined meaningful behaviour for AbstractArray, not only vectors and matrices - and I think this is an important distinction (so I feel it is safer to exclude AbstractArray for now other than vectors and matrices).

Actually I would have preferred a completely different design for cov/cor interface since we have eachcol/eachrow and eachslice functions to tell cov/cor along which dimensions the calculation should be made, but it is too disruptive so probably it would not be accepted anyway (but if you are interested I could write down a proposal).

Now regarding SkipMissings - I think it is needed anyway, as otherwise there is no way to tell cov/cor if we want to do e.g. pairwise or complete observations approach.

pdeffebach · 2020-04-30T18:53:15Z

(so I feel it is safer to exclude AbstractArray for now other than vectors and matrices).

Current implementation excludes cov(Any, Matrix) as well as other higher dimensional arrays.

Actually I would have preferred a completely different design for cov/cor interface since we have eachcol/eachrow and eachslice functions to tell cov/cor along which dimensions the calculation should be made,

Yes. The original attempt at #30 was essentially this, working only with iterators. I wold prefer an implementation which doesn't know about matrix inputs and only cares about iterators. Taking advantage of BLAS etc. for X'X could be an implementation detail. But that is very breaking. Perhaps eachrow based workflows will dominate dims = 1 workflows in 2.0.

Now regarding SkipMissings - I think it is needed anyway, as otherwise there is no way to tell cov/cor if we want to do e.g. pairwise or complete observations approach.

This PR is motivated by the new skipmissings (note the s) in Missings.jl, which I think solves this problem.

bkamins · 2020-04-30T19:25:55Z

Ah - then I would also prefer iterators as you do 😄. Actually @nalimilan proposed pairwise(fun, iterator) that would do exactly this and fun can be anything (cor/cov in this case).

And I fully agree that having eachrow etc. based design rather than dims is my preference (then we could easily feed any tabular type to such functions, eg. eachcol(data_frame)).

Regarding skipmissings - yes, I have just noticed it was added but not released yet. But how do you plan to handle "pairwise complete observations" with this design cleanly?

pdeffebach · 2020-04-30T19:27:42Z

Regarding skipmissings - yes, I have just noticed it was added but not released yet. But how do you plan to handle "pairwise complete observations" with this design cleanly?

Say x contains missing values but y does not. Then

sx, sy = skipmissings(x, y)
cov(sx, sy)

bkamins · 2020-04-30T19:32:51Z

But the typical use case is the following setting:

m = [  missing   2          3
       5           missing  10
      11         12           missing
      91         22         15]

and you want to calculate correlation matrix of the columns using "pairwise complete observations".

pdeffebach · 2020-04-30T19:37:02Z

There is currently no skipmissing implementation that preserves the dimensionality of an array. So that is an open problem that will have to be solved by additions to skipmissing or an iteration focused cov implementation. (Or turning that matrix into a data frame).

Keno · 2020-04-30T19:37:22Z

We discussed this a bit on triage and while we don't feel super qualified to comment, we felt that having independent iterators for the two arguments was likely problematic, because iterators don't in general have a strong guarantee over their ordering. E.g. cov(skipmissing(x), skipmissing(y)) would obviously be wrong. We felt that a more sensible API would take a single iterator that iterates over pairs. A related issue here is that the skipmissings API is a bit odd since it returns coupled iterators, but returns them as a tuple. It seemed like a more general design would be to have it return one iterator of pairs and then, if you only need one of them, project it down to the appropriate pair.

bkamins · 2020-04-30T19:52:31Z

@Keno - this is exactly what needs to be done in an ideal world in my opinion (and then you can have several rules what you "pass down" to the cor function in this case).

But doing it right would require a significant API change (i.e. be breaking) - so what should be a practical approach to this? Should a breaking PR be put on a table so that it can be judged for inclusion in 2.0 release.

An alternative is to leave cor/cov "as is" (i.e. in particular non-aware of missing) and add a more general pairwise design as proposed by @nalimilan that would be recommended to be use in more complex cases.

What would be the preference here?

Keno · 2020-04-30T19:56:38Z

I'd say design the interface you want and then figure out whether it's possible to do most of it backwards compatible. If not, stdlibs can also go to 2.0 before base Julia (that's the whole reason why we introduced them).

bkamins · 2020-04-30T20:11:07Z

So let me put my proposal on a table (it is essentially what I think @pdeffebach had in mind in #30). I write it for cov as for cor it is the same:

cov(itr; corrected::Bool=true; skipmissing::Symbol=:all)

Here itr is understood to be iterator of iterators to calculate the cov for. The result will be a square Matrix with length(itr) rows and columns where at index position [i,j] we keep covariance of i-th and j-th element of itr. skipmissing kwarg decides how missings should be handled (no handling at all is the default, other options are "complete cases" and "pairwise complete observations", other can be also added if we find it useful).

The second form is:

cov(itr1, itr2; corrected::Bool=true; skipmissing::Symbol=:all)

That does the same but between itr1 and itr2. The result is length(itr1) x length(itr2) matrix.

In this design we can treat everything inside itr as an iterable.

To be less breaking we can keep methods:

cov(X::SparseArrays.SparseMatrixCSC; dims, corrected)
cov(x::AbstractArray{T,1} where T<:Number; corrected)
cov(X::AbstractArray{T,2} where T<:Number; dims, corrected)
cov(x::AbstractArray{T,1} where T<:Number, y::AbstractArray{T,1} where T<:Number; corrected) 
cov(X::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T, Y::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T<:Number; dims, corrected)

(note that I have added <:Number restriction to T which is not present in Statistics.jl now). In this way we do the "sensible thing" in old cases (if collection contains Number it does not have much sense to treat it as an iterable) and at the same provide a general interface.

Alternatively we could allow <:Union{Missing, Number} instead of <:Number if we felt we want to allow missings in the old methods (though - as discussed here it is not super useful, as we will just produce missing in the output).

pdeffebach · 2020-04-30T20:18:38Z

Thank you for your comments. Here are my thoughts:

I think it's important to understand the purpose skipmissing serves. A researcher gets a data-set and wites functions on a subset of their data -- one without missing values.

function analyze(x, y)
    x .+ y .-  mean(y)
end

Now they move onto the rest of their data, which now has missing values. They have to go back and change their analyze function to make it work. With current behavior, they don't have to change their function analyze at all. They can call analyze(skipmissings(x, y)...) and be fine. With proposed functionality, they would have to call analyze(unzip(skipmissings(x, y))...), which is not that bad.

So skipmissings should emulate as closely as possible a workflow based off of Vectors without missing data.

That said, if the researcher wrote their analyze function with zips in mind -- iterators of tuples -- then the proposed behavior for skipmissing, which creates an iterator of Tuples, would be intuitive.

Therefore, I would want cov(itr, itr) to work unless we deprecate cov(Vector, Vector). Similarly, I would want skipmissings to return a tuple of iterators unless we feel that the dominant way of working with Vectors is by zipping them.

nalimilan · 2020-05-02T10:57:41Z

See previous discussion at JuliaStats/StatsBase.jl#343.

I think there are legitimate use cases for the current design of skipmissings as @pdeffebach noted above. However the question of how cor and cov should alllow skipping missing values (this PR) is semi-independent from that of skipmissings: we can provide a more convenient API for cor and cov but still need skipmissings for other functions which do not offer a convenient way to skip missing values.

Also, allowing to pass any iterator to cov and cor could be useful for other cases than skipping missings. For example, cor((log(x) for x in X), (log(y) for y in Y)) could be used to compute the log correlation without allocating a temporary copy. Though maybe that's not a big need, and if we introduced an AbstractIndexable supertype or a trait for AbstractArray and Broadcasted (JuliaLang/julia#31020 (comment)) we could restrict the signature to it, and one would write instead e.g. @lazy cor(log.(X), log.(Y)).

Now, regarding the cor and cov API, @bkamins's proposal to add a skipmissing keyword argument is one solution. But it doesn't address the very basic case that this PR is about, which is to compute the correlation between just two vectors (as opposed to pairwise correlation between multiple variables). More generally, I don't think we can drop the current behavior of cor(::Vector, ::Vector), and having both this method and cor(itr1, itr2) would be confusing (if we keep it in the long term).

So an alternative I had in mind is to introduce pairwise(fun, itr1[, itr2]; skipmissing) to compute pairwise correlation between multiple variables, which would be called as e.g. pairwise(cor, eachcol(X), skipmissing=:obs) or pairwise(cor, eachcol(X), eachcol(Y), skipmissing=:complete). The advantage is that it would also work for Distances.jl, which has a compatible API. The drawback is that the former is relatively verbose for a very common operation, if you compare with R's cor(X, use="complete") or Stata's pwcorr, casewise. So we could allow cor(X, skipmissing=:complete) for convenience.

Probably this shouldn't be discussed here... :-) BTW, another design challenge is to allow combining this with weights to allow computing weighted pairwise correlation while skipping missing values. Composability would be great to have in that case.

bkamins · 2020-05-02T21:22:55Z

src/Statistics.jl

@@ -504,6 +516,10 @@ function covzm(x::AbstractMatrix, vardim::Int=1; corrected::Bool=true)
    A .= A .* b
    return A
 end
+function covzm(x::Any, y::Any; corrected::Bool = true)


In what case covzm can get Any? It is an internal method and I thought it can only get already processed data.

The same question applies to covm below.

bkamins · 2020-05-02T21:25:40Z

src/Statistics.jl

 """
+function cov(itr::Any; corrected::Bool=true)


do we want to allow 0 or more than 2 dimensional arrays here?

bkamins · 2020-05-02T21:30:28Z

src/Statistics.jl


 Return the number one.
 """
+cor(itr::Any) = _return_one(itr)


If we touch this part of code then I do not understand the following (this is in general a separate PR, but we implement this method here, so I would like to clarify what is intended):

julia> x = rand(10, 2) 10×2 Array{Float64,2}: 0.281236 0.0338547 0.691944 0.830649 0.627939 0.62187 0.251539 0.162161 0.649065 0.627302 0.67754 0.227709 0.904292 0.481443 0.768511 0.439196 0.56268 0.885131 0.520348 0.0185026 julia> y = collect(eachrow(x)) 10-element Array{SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true},1}: [0.28123641854321346, 0.03385469973171129] [0.6919437304763723, 0.8306494675149141] [0.6279387501997036, 0.6218700109315964] [0.2515386883173856, 0.16216070470557398] [0.6490648176650291, 0.6273015737594985] [0.6775400549397448, 0.22770867432380815] [0.9042917317032804, 0.4814426050177454] [0.7685107010600403, 0.4391959677108144] [0.562680390776801, 0.8851305746997942] [0.5203479821836228, 0.018502575073925165] julia> cov(x) 2×2 Array{Float64,2}: 0.0409994 0.0321085 0.0321085 0.0983311 julia> cov(y) 2×2 Array{Float64,2}: 0.0409994 0.0321085 0.0321085 0.0983311 julia> cor(x) 2×2 Array{Float64,2}: 1.0 0.505691 0.505691 1.0 julia> cor(y) ERROR: MethodError: no method matching zero(::Type{SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true}})

and I do not understand why cov and cor behave differently in how they handle x and y.

bkamins · 2020-05-02T21:32:01Z

OK - if we want to go forward with the API proposal in this PR I have left some comments related only to it.

bkamins · 2020-05-02T21:49:32Z

Ah - and a general question, since cov accepts Vector{Vector} as a single argument and with new design "iterable of Vector" is accepted then do we want to accept "iterable of iterables"?

CameronBieganek · 2023-12-01T22:56:23Z

We discussed this a bit on triage [...] We felt that a more sensible API would take a single iterator that iterates over pairs.

I just became aware of this method of cor:

"""
cor(x::AbstractVector)

Return the number one.
"""

This basically torpedoes any idea of having a cor(itr) method where itr iterates individual observations. If we added that, then the following two calculations would return different numbers:

itr = ((1, 4), (3, 2), (5, 8), (7, 6))
cor(itr)

itr = [(1, 4), (3, 2), (5, 8), (7, 6)]
cor(itr)

Given this unfortunate situation, perhaps triage would reconsider allowing cor(itr1, itr2)? (I mean where itr1 and itr2 just iterate numbers.)

aplavin · 2023-12-02T01:35:18Z

stdlibs can also go to 2.0 before base Julia (that's the whole reason why we introduced them)

So, maybe removing single-argument cor(x) could be done in Statistics 2.0 in a reasonable timeframe? Pretty sure there are other inconsistencies that can be fixed, or improvements can be made, that are breaking and would fit 2.0 nicely.

For functions where arguments are fundamentally coupled (like cor(x, y)), it does make most sense to accept an iterable/collection of pairs instead of a pair of collections. And luckily, Julia has both convenient and zero-cost ways to create one iterator/collection/array from two.

Initial commit, collects everywhere

1cdf046

mschauer reviewed Apr 27, 2020

View reviewed changes

Add tests

f3e9641

pdeffebach changed the title ~~Initial commit, collects everywhere~~ Make covariance and correlation work for iterators, second attempt. Apr 27, 2020

nalimilan reviewed Apr 28, 2020

View reviewed changes

Respond to comments

2f9c4f8

nalimilan reviewed Apr 28, 2020

View reviewed changes

pdeffebach and others added 4 commits April 28, 2020 13:30

Apply suggestions from code review

52c18ea

Co-Authored-By: Milan Bouchet-Valat <[email protected]>

more comments -- ready for review

4620247

fix deleted line

b86ddba

many more tests

0221557

nalimilan reviewed Apr 29, 2020

View reviewed changes

pdeffebach and others added 3 commits April 29, 2020 12:56

Apply suggestions from code review

e3bc3cc

Co-Authored-By: Milan Bouchet-Valat <[email protected]>

Polish up tests

3493ed2

Merge branch 'iterator_cor2' of https://github.com/pdeffebach/Statist…

b940ae1

…ics.jl into iterator_cor2

pdeffebach added 3 commits April 30, 2020 09:41

Errors with matrices

8b49745

Add _return_one method for DRY

2b28908

Put pack uuid

e42c0b0

_lazycollect solution

cb3020c

nalimilan reviewed Apr 30, 2020

View reviewed changes

Apply suggestions from code review

36734bf

Co-authored-by: Milan Bouchet-Valat <[email protected]>

nalimilan added the triage label Apr 30, 2020

pdeffebach changed the title ~~Make covariance and correlation work for iterators, second attempt.~~ Make covariance and correlation work for iterators, skipmissing in particular. Apr 30, 2020

pdeffebach added 3 commits April 30, 2020 12:33

simplify error, add back uuid

b9f8f96

Futher simplify error

14c5701

add back uuid

11bd8f5

bkamins reviewed May 2, 2020

View reviewed changes

src/Statistics.jl

"""

function cov(itr::Any; corrected::Bool=true)

Copy link

Contributor

bkamins May 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to allow 0 or more than 2 dimensional arrays here?

bkamins reviewed May 2, 2020

View reviewed changes

pdeffebach mentioned this pull request May 15, 2020

Add a skipmissing kwarg to select/transform/combine JuliaData/DataFrames.jl#2258

Open

nalimilan mentioned this pull request Apr 20, 2021

Release v1.0 JuliaData/Missings.jl#115

Closed

PGS62 mentioned this pull request Apr 20, 2021

Add a skipmissing argument to corkendall JuliaStats/StatsBase.jl#683

Closed

nalimilan mentioned this pull request Sep 27, 2021

Missing values and weighting #88

Open

pdeffebach mentioned this pull request Feb 13, 2023

Add spreadmissings JuliaData/Missings.jl#122

Open

CameronBieganek mentioned this pull request Dec 13, 2023

Document iterators that iterate elements in a well-defined order JuliaLang/julia#52518

Open

		@@ -1,5 +1,4 @@
		name = "Statistics"
		uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

	``*`` denotes the complex conjugate and `n = length(collect(x)) = length(collect(y))`. If `corrected` is
	``*`` denotes the complex conjugate and ``n`` the number of elements. If `corrected` is

	covm(x, mean(x); corrected=corrected)
	covzm(map!(t -> t - xmean, x, x); corrected=corrected)

	@show covm(collect(itr), itrmean; corrected=corrected)
	covm(collect(itr), itrmean; corrected=corrected)

Make covariance and correlation work for iterators, skipmissing in particular. #34

Are you sure you want to change the base?

Make covariance and correlation work for iterators, skipmissing in particular. #34

Conversation

pdeffebach commented Apr 27, 2020 • edited by nalimilan Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdeffebach commented Apr 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdeffebach commented Apr 29, 2020

Choose a reason for hiding this comment

bkamins commented Apr 30, 2020

nalimilan commented Apr 30, 2020

bkamins commented Apr 30, 2020

nalimilan commented Apr 30, 2020

pdeffebach commented Apr 30, 2020

pdeffebach commented Apr 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkamins commented Apr 30, 2020

pdeffebach commented Apr 30, 2020

bkamins commented Apr 30, 2020

pdeffebach commented Apr 30, 2020

bkamins commented Apr 30, 2020

pdeffebach commented Apr 30, 2020

Keno commented Apr 30, 2020

bkamins commented Apr 30, 2020

Keno commented Apr 30, 2020

bkamins commented Apr 30, 2020

pdeffebach commented Apr 30, 2020

nalimilan commented May 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkamins commented May 2, 2020

bkamins commented May 2, 2020

CameronBieganek commented Dec 1, 2023

aplavin commented Dec 2, 2023

pdeffebach commented Apr 27, 2020 •

edited by nalimilan

Loading

pdeffebach commented Apr 27, 2020 •

edited

Loading