-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature req] Convolutions of dependent random vectors. #240
Comments
Looks indeed like a bug, or rather a missed special case somewhere. By the way: julia> pdf(D1,big.([-30., -30.])) # works with bigfloats
9.800107565129330824766557211886842343262822051986735333987174617133175015382767e-5024
julia> pdf(C, [0,0]) # here is where your nan is coming from
NaN
julia> so yes, should definitely return 0. Let me take a look as soon as i have a moment. ERR: Scratch that, the bug is comming from Distributions.jl : julia> pdf(MvNormal(C.Σ),[-Inf,-Inf])
NaN
julia> |
I tried a few other copulas (Clayton, Gumbel, Joe) and am getting similar domain errors. Would this need to be addressed for |
For archimedeans the wanted pdf value at [0,0] ios not always 0, depends on the archimedean. But yes if some of them give domainerrors that is an issue. Can you post your other examples ? Let's make tests cases out of them so that it stays fixed in the future. |
Hum... Taking another look at the gaussian case, the NaN actually comes from
|
Here is what I was using to check the Archimedian Copulas
It is not clear to me at which point the NaN is being produced since
gives
but
|
Okay so let's split up. For the Gaussian, i reported upstream there : JuliaStats/Distributions.jl#1937 For the archimedeans, I'll come back when I know more |
@rand5 The archimedean ones should be fixed in the (currently registrating) v0.1.27 version of the copulas package. There was indeed an issue with the coputation of the pdf in the corners. On the new version, I have : using Copulas, Distributions, QuadGK
X1 = Normal()
d = 2
θ = 1.
Cs = [
ClaytonCopula(d,θ),
FrankCopula(d,θ),
GumbelCopula(d,θ),
JoeCopula(d,θ)
] #Copulas to check. Make sure θ is appropriate
N = 100
Zs = collect(range(-10.,10.,N))
ConvDs = zeros(length(Cs), length(Zs))
for k=1:length(Cs)
D1 = SklarDist(Cs[k],(X1,X1))
ConvDs[k,:] = map(Zs) do z
quadgk(x->pdf(D1,[x,z-x]), -100., 100., rtol=1e-6)[1]
end
end
p = plot()
for k in 1:length(Cs)
plot!(p, Zs, ConvDs[k,:], label=k)
end
p
producing the folloiwng densities: However, the Gaussian one is not fixed yet because the issues comes from Distributions.jl (and potentially form PDMats.jl directly), will take a bit more time as it is not up to me alone. :/ Apart the from gaussian, is this fix enough to cover what you need or do you have other questions? |
@lrnv I'm using v0.1.27 and still getting domain errors:
which throws:
|
Edit: this error is being thrown despite the fact the test code above it works fine |
@rand5 You can replace the function verbose_pdf(D,x)
u = cdf.(D.m, x)
@show u, pdf(D.C, u), pdf.(D.m, x)
return pdf(D, x)
end
ConvD = map(Zs) do z
quadgk(x->verbose_pdf(D1,[x,z-x]), -100., 100., rtol=1e-6)[1]
end With this, we see that in the Gumbel case the problem comes from values of the pdf on the boundary: # These calls are currently NaN but should be 0:
pdf(GumbelCopula(2,2.5), [0.1,0.0])
pdf(GumbelCopula(2,2.5), [0.0,0.1])
# This one is truly undefined, but for simplicity well make it zero too.
pdf(GumbelCopula(2,2.5), [0.0,0.0]) This is fixed in #244 and now released in v0.1.28, if you have time to take a shot ? |
@lrnv Thank you for fixing that- I can confirm I'm no longer getting the domain errors that were being thrown before. I think this has opened the door to another issue though. Take a look at these jumps:
If the standard deviation of the Normal component of X2 increases, they disappear, and if the standard deviation decreases they increase. Here is with
I see similar behavior Let me know if you'd like me to migrate this to its own issue. |
I guess this is simply integration errors and has nothing more to do with Copulas.jl in particular, right ? One "well-converging approximation" would be to decompose your integral and approximate it by monte-carlo on X1. Using your second most-problematix X2, we obtain: using Copulas, Distributions, QuadGK, Plots
X1 = Gumbel(15.,5.)
X2 = MixtureModel([Normal(-6.,0.05),X1],[0.9,0.1])
θ = 2.5
d= 2
C = GumbelCopula(d,θ)
D1 = SklarDist(C,(X1,X2))
N = 1000; zmin = -10.; zmax = 60.
Zs = collect(range(zmin,zmax,N))
ConvD = similar(Zs)
x1 = rand(X1, 10000) # you need enough for monte-carlo to kick in.
u1 = cdf.(X1, x1)
x2, u2 = similar(x1), similar(u1)
for i in eachindex(Zs)
x2 .= Zs[i] .- x1
u2 .= cdf(X2, x2)
p2 = pdf(X2, x2)
c = pdf(D1.C, hcat(u1,u2)')
ConvD[i] = mean(c .* p2)
end
plot(Zs, ConvD) You might find other more efficient and reliable integration methods, though. If you do find something reliable and generic enough, I would be interested to provide the Edit: You could also leverage the symetry and exchange x1 and x2 to obtain another approximate: # We could do the same by inverting 1 and 2 :
ConvD_sym = similar(Zs)
x2 = rand(X2, 10000) # you need enough for monte-carlo to kick in.
u2 = cdf.(X2, x2)
x1, u1 = similar(x2), similar(u2)
for i in eachindex(Zs)
x1 .= Zs[i] .- x2
u1 .= cdf(X1, x1)
p1 = pdf(X1, x1)
c = pdf(D1.C, hcat(u1,u2)')
ConvD_sym[i] = mean(c .* p1)
end
plot!(Zs, ConvD_sym) which is weirdly much more stable... Maybe we could understand why ? |
My last idea is to swap the mixture and the convolution, as these operations should commute nicely : X + (Y | Z) should be equal to (X+Y | X+Z), and the second one will be much easier to compute if the supports of Y and Z are separated as they are in our example. The presence of the copula does not change this. |
Looking at the Monte Carlo decomposition you described, it seems like the variability between the two approaches (going from X2->X1 versus X1->X2) comes down to whether we evaluate I agree that the fundamental issue is that the supports are very different from each other and like the idea of taking advantage of the commute, but I'm having trouble translating this into a MWE. Let me know if you have something I can pressure test. |
For the commute, you might want to leverage multiple dispatch, since every information you need is in the type of # Special mixture case:
function convolve(D::SklarDist{CT, Tuple{T1,T2}}) where {T1 <: Mixture}
return Mixture(
convolve(SklarDist(D.C, (D.m[1].components[1],D.m[2]))),
convolve(SklarDist(D.C, (D.m[1].components[2],D.m[2]))),
D.m[1].weights # carry on the mixing weights.
)
end
# Generic version:
function convolve(D::SklarDist)
# do the generic stuff.
end But I think this requires defining a subtype of |
How about the following for a slightly revised implementation of the generic version:
Testing it:
I'm not that familiar with Copulas, and one thing I don't understand is why |
Yes they are equivalent, you can definitely replace these two lines with Furthermore, your code will only work for bivariate SklarDist, and not for d-variates ones for d > 2, so maybe you could restrict to bivariates cases only (the dimension is available in the type of the object). Anyway, it looks like on your example this is working quite nicely ? Do you have anything else that you need here ? May i ask you why you ended up on this problem in the first place ? |
Correct, this is just for bivariates. I was imagining that if there were 3 components (A, B and C), then we could do some sort of nested evaluation like I think the next key item for the bivariate case, which you highlighted, is to take the function evaluations that The reason I'm on this topic is I'm trying to model a decision process where you need to assemble a portfolio of "games" that maximizes returns. For each game there are multiple opportunities to trigger small negative returns and an opportunity to trigger a much larger positive return, represented by something like: Game1:
Game2:
Game3:
There is some dependence between Game1 and Game2 and between Game1 and Game3, which I'm modeling via copulas. The fundamental question is, is it better to play Game1 and Game2 or Game1 and Game3? The expected total return of playing Game1 and Game2 (or Game1 and Game3) is given by the convolution of those distributions, but, as you can see, in each case the support for the components associated with negative returns in each game is very different from the support for the positive returns, making it difficult to compute the convolutions. |
For more than 2 dimensions, you will be able to do nested evaluations like For the mixtures on the other hand, I think you'll better use the convolution of mixture <=> mixture of convolution. to recusively simplify your problem, by defining a few methods as I proposed in a previous message (if you have trouble doing it, please come back I'd gladly help you). Very cool problem indeed ! Do not hesitate to forward the results / publications / output of the project i'd love to read it more in depth. |
@lrnv Any interest in collaborating on a technical paper? |
@rand5 Sure, if I can be of any help |
With regards to your suggestion to do something like this:
Thinking through this-- It boils down to taking one component from the first mixture, then convolving it with one component from the second mixture with the joint distribution defined by the copula. The components being convolved are not necessarily the same distribution type, so, even if they had been independent, there won't be a clear formula for the result in terms of the parameter of the input distributions. I guess the reason I think for what we are envisioning, the convolution isn't really the hardest part- its then also finding the CDF, quantile function, and six other quantities that need to be defined to make the result compliant with One solution would be to have a general pipeline where the input is a function that produces the pdf of a random variable, then the output is a |
I'm migrating a problem discussed here that involves computing the convolution of two dependent random variables where their joint distribution is defined by a copula. I believe the issue may stem from the fact that evaluating the pdf of the copula sometimes returns NaN when it should return 0.
Here is an example with marginal distributions that should have support over all reals:
The link above has an attempt at a brute force work-around, but it doesn't seem to be sufficient (see the plot in the link).
The text was updated successfully, but these errors were encountered: