-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interface compatibility with Distributions.jl ? #8
Comments
Thanks for the great issue! I was actually starting to wonder about mixtures of copula (or copulas of mixtures, as you wrote). I address the type vs instance problem in Instance vs Type version section of the doc. As you noted: mixture models do not include the type of every component, so Based on these observations, using the instance version
I added this conversion line for function fit_mle(g::D, args...) where {D<:Distribution}
fit_mle(typeof(g), args...)
end So first when encountering a Note that I did not define the same for the In your (great) package, you so far used the type convention which, IMO, for Copulas (and other multivariate) cases is a bit heavy to write. The following works in your example! function Distributions.fit(D::SklarDist, x)
# The first thing to do is to fit the marginals :
@assert length(D) == size(x, 1)
# One could put fit but EM works for with fit_mle only
m = Tuple(Distributions.fit_mle(D.m[i], x[i, :]) for i in axes(x, 1))
u = pseudos(x)
# τ⁻¹(FrankCopula, τ) works but not τ⁻¹(FrankCopula{3, Float64}, τ)
# I found https://discourse.julialang.org/t/extract-type-name-only-from-parametric-type/14188/20
# to just extract the typename.
# Again defining fit(C::ArchimedeanCopula, u) = fit(::Type{CT},u) where {CT <: ArchimedeanCopula} would simplify the writting
C = Distributions.fit(Base.typename(typeof(D.C)).wrapper, u)
return SklarDist(C, m)
end
D̂mix = fit(D, simu) Note that I had to define the weighted using StatsBase
function Distributions.fit_mle(::Type{<:LogNormal}, x::AbstractArray{T}, w::AbstractArray) where T<:Real
lx = log.(x)
μ, σ = mean_and_std(lx, weights(w))
LogNormal(μ, σ)
end (Note that Hence, IMO, it seems that for compatibility between |
I apologize for not reading your documentation and posting without restraint. Thanks for the clear explanation and the link to the PR, I can appreciate that you already put a lot of efforts into this; I agree with you that, for mixtures, the I also agree with the PR reviewers that this adds different behaviors to the function. I see two outputs:
Plan
IssuesWhat if you want 100 Gaussians + a dirac ? Maybe via nesting as Also, this might be breaking... ImplicationsImplementations of A good argumentMy best argument for this change of parametrization on the
Although, the current type structure is kinda clunky (what are these 1 and 0 in the ProductDistribution type ?). Expanding ConclusionWhatever gets decided, I will adapt Post-scriptumI agree with you on the fact that abusing the meaning of |
Thanks for your comments. Since I just noticed (thanks to you) that I think now my main point against is that if one wants to use an algorithm that implies using
These are 3 ways to use the Coming back to your two scenarios:
|
About Categorical and type parameter conventions.Some naming conventions so that we understand each others :
I think in a dream world, I would like the parameter field to be clearly defined from the type parameters. This is in fact what is implicitly supposed by
Let's deal with the
Hence, the number of categories If we introduce The version About MixtureModelSame thing as for Categorical : The call PS: Currently, on this doc page, About Your interface and going forward.Yes, defining in your package
would probably be enough to 1) keep your existing code and 2) comply with the common interface, if and only if we make a strong case on If you agree with that, I could try making the PR myself. Or maybe an issue first to ask their opinion before, linking to this discussion ? |
Yes, thank you! About the initialization, other distributions could benefit from the notation |
Hey,
Thanks for this great addition to the ecosystem !
From Distribution.jl, it seems like the first argument to the fit_mle function should be the distributions type and not an instance of the type :
This is not really a problem for yo as you are free to overload this function as you want, and your interface actually makes a lot of sense since you exploit the guesses in your algorithm. But would it be possible to add methods following this convention, maybe with automatic guesses ? I have
fit_mle
bindings inCopulas.jl
that assume this convention, and thus do not work directly with your package :(Edit: I was trying to make a code example of what i would like, but I saw that mixures types do not include components types... More specifically, I would like to be able to type :
fit_mle(MixtureModel{Gamma,Gamma,Normal},data)
instead of
fit_mle(MixtureModel([Gamma(),Gamma(),Normal()],[1/3 1/3 1/3]),data)
Would that be possible ?
It would allow composability, as I am currently using :
which, under the hood, calls
fit_mle(Marginal_Type,marginal_data)
on each marignals.The text was updated successfully, but these errors were encountered: