Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add low NHI HCDs #17

Open
londumas opened this issue Apr 6, 2019 · 40 comments
Open

add low NHI HCDs #17

londumas opened this issue Apr 6, 2019 · 40 comments
Assignees
Labels
enhancement New feature or request

Comments

@londumas
Copy link
Collaborator

londumas commented Apr 6, 2019

@fjaviersanchez, @TEtourneau tested adding the low NHI HCDs and he does not recover the proper number density of NHI~20 HCDs. do you know what is happening? It is good at ~17 and ~22. Maybe you have a power-low instead of the pygm model.

using /global/cscratch1/sd/tetourne/DesiMocks/v4.3.0/mock_0/output/master_DLA_3.fits
NHI_distribution

@londumas londumas added the enhancement New feature or request label Apr 6, 2019
@jmarclegoff
Copy link
Collaborator

@londumas, this must be for the last version produced by Thomas with N_HI_min = 17.2. Thomas saw that setting N_HI_min = 17.2 reduces the number of DLA with N_HI> 20 by a factor 5. This seems compatible with having a correct distribution when N_HI_min = 20. If this is the case we can focus on understanding what is happening when we reduce N_HI_min.
May be you could make the same plot with Thomas's previous version which still had N_HI_min = 20.

@fjaviersanchez
Copy link

@londumas, in the code we directly call pyigm to get f(N) (see here; also note that fN.evaluate returns log(f(N)) see here for more details and that's why there's an exponential in the function to get the probability of getting a given f(N,z)).

I am confused on what's going on 😕, though because if we are using pyigm I would expect this to be automatically right. Could you please add a comparison with pyigm integrating f(N,z)dz from z=2.1 until the edge of the box (is that z=4??)? I think that would be the fairer comparison. In any case, from that plot, it looks like going as low as N=17.2 is too much and we may want to dial it back to N=20 or oversample the number of DLAs in general and then apply some sort of selection function. Thoughts?

@jmarclegoff
Copy link
Collaborator

@fjaviersanchez, It looks like if you just use --nmin parameter for DLA_Saclay.py it goes only to dNdz(). I believe it should also go into add_DLA_table_to_object_Saclay(0 in order to go to get_N().
So we need to know how @TEtourneau did to go down to N_HI_min = 17.2.

@londumas
Copy link
Collaborator Author

londumas commented Apr 6, 2019

using the catalog that goes above NHI>20, we get the same as before.
/global/cscratch1/sd/tetourne/DesiMocks/v4.3.0/mock_0/output/master_DLA_3.fits
with_NHI_g20

@jmarclegoff
Copy link
Collaborator

Ok, so it's likely that there was some problem in the way N_HI_min was reduced.
Let's wait for @TEtourneau to tell us what he did exactly.

@TEtourneau
Copy link
Collaborator

I did use --nmin parameter in both dNdz() and get_N() functions (I added the use of --nmin in add_DLA_table_to_object_Saclay)

@jmarclegoff
Copy link
Collaborator

jmarclegoff commented Apr 11, 2019

I did some tests of pyigm and DLA_Saclay.py

z = 2.37 *np.ones(1000000)
nHi = get_N(z,Nmin=17.2)
plt.hist(nHi,bins=100)
Nmin = 17.2
Nmax = 22.5
nsamp = 100
nn = np.linspace(Nmin,Nmax,nsamp)
z = 2.37 *np.ones(1)
auxfN = fN_default.evaluate(nn,z)
fN  = np.exp( auxfN.reshape(nsamp) )
plt.plot(nn,5.64E12*fN)

This code gives the plot below and shows that get_N() produces the proper distribution of N_HI up to an ad hoc normalisation (the orange curve is 5.64E12 * exp(f(N_HI)) ).
image

Then I tested dNdz()

def dNdz(z, Nmin=20.0, Nmax=22.5):
    """ Get the column density distribution as a function of z,
    for a given range in N"""
    if use_pyigm:
        # get incidence rate per path length dX (in comoving coordinates)
        dNdX = fN_default.calculate_lox(z,Nmin,Nmax)
        # convert dX to dz
        dXdz = fN_cosmo.abs_distance_integrand(z)
        return dNdX * dXdz
    else:
        return dnHD_dz_cumlgN(z,Nmax)-dnHD_dz_cumlgN(z,Nmin)

In [42]: dNdz(z,Nmin=20)/ dNdz(z,Nmin=17.2)
Out[42]: array([0.21780969])

This does not seem consistent with the plot above. Indeed
In [32]: 1.*len(nHi[nHi>20]) / len(nHi)
Out[32]: 0.039272

Is it that fN_default.calculate_lox()is not doing what we believe ? I went to github/pyigm and did not find much documentation but may be I did not look t the proper place.
On the other hand, this seems just a problem of normalisation and should not explain the distribution of f(n_HI) obtained by @londumas for the mocks.
I do not know the details of DLA_Saclay.py, @fjaviersanchez, do you have any idea ?

@jmarclegoff
Copy link
Collaborator

By the way, @fjaviersanchez from igmhub/LyaCoLoRe#32
I understand that you set Nmin = 17.2 in LyaCoLoRe. Are you still waiting for James to produce the DLA catalogue ? It would be interesting to see if this results in the same N_HI distribution as measured by @londumas for Saclay mocks.

@fjaviersanchez
Copy link

@jmarclegoff thanks a lot for the plots and for following up on this. I think that you are right and fN_default.calculate_lox is not giving what we expected. I modified dNdz as follows:

def dNdz(z, Nmin=20.0, Nmax=22.5, nsamp=100):
    """ Get the column density distribution as a function of z,
    for a given range in N""" 
    # get incidence rate per path length dX (in comoving coordinates)
    nn = np.linspace(Nmin,Nmax,nsamp)
    aux = fN_default.evaluate(nn, z)
    dNdz = np.sum(np.exp(aux)*(nn[1]-nn[0]))
    return dNdz

I tested this with the data that you provided above and it seems to be working. What I did is the following:

dNdz(2.7, Nmin=20.0)/dNdz(2.7, Nmin=17.2) ~ 0.038

1-np.count_nonzero((nHi>=17.2) & (nHi <=20))*1.0/len(nHi) ~ 0.039 (which is equivalent to np.count_nonzero(nHi>20)*1.0/len(nHi) as you did).

As for the London mocks: yes, I am waiting for James to produce the new mocks but I don't know exactly @andreufont and @jfarr03's plans. In any case, I suspect that they'd need to change dNdz as well.

Please, feel free to test this fix and let me know if there are any other issues. Thanks again!

@TEtourneau
Copy link
Collaborator

@fjaviersanchez Thanks for the suggestion. I tested what you posted (I just copied/pasted the new dNdz function into the DLA_Saclay.py code).
I first did 2 runs, with rmin = 20 and rmin = 17.2
For rmin = 20, I only found 1 DLA (for about 550 000 QSO)
for rmin = 17.2 I found 14 DLA
These catalogs are here:
/global/cscratch1/sd/tetourne/DesiMocks/v4.3.0/mock_0/Chunk_ra0189.982649735_dec020/dla5.fits : rmin=20
/global/cscratch1/sd/tetourne/DesiMocks/v4.3.0/mock_0/Chunk_ra0189.982649735_dec020/dla6.fits : rmin=17.2

I then did 2 other runs, with a factor 20000 in front of dNdz:
I found 11000 DLA for rmin=20 (here: /global/cscratch1/sd/tetourne/DesiMocks/v4.3.0/mock_0/Chunk_ra0189.982649735_dec020/dla7.fits)
I found 285765 DLA for rmin=17.2, with 11158 DLA with n_HI > 20 (here: /global/cscratch1/sd/tetourne/DesiMocks/v4.3.0/mock_0/Chunk_ra0189.982649735_dec020/dla8.fits)

Here are the histograms for these 2 last files:
Screen Shot 2019-04-12 at 3 16 09 PM
Screen Shot 2019-04-12 at 3 16 36 PM

@TEtourneau
Copy link
Collaborator

I forgot:
The number of QSO I indicated is all the QSO, so 1.8 < z < 2.1
There are 359362 QSO at z > 2.1
and 11158 DLA with n_HI > 20 (including the 20000 factor on dN/dz)
So we have ~ 0.03 DLA per QSO.
So, if I'm right, we are still missing a factor 0.2 / 0.03 ~ 6.4 on dN/dz

@fjaviersanchez
Copy link

fjaviersanchez commented Apr 12, 2019

thanks for testing this so quickly @TEtourneau. Maybe I understood it incorrectly but it sounds like there's a normalization problem. Do you think that using a factor of 6.4*20000 will solve the discrepancies?

@jmarclegoff
Copy link
Collaborator

@fjaviersanchez, it will indeed produce DLAs with proper N_HI distribution and about 0.2 DLA per spectra. So on short term it will solve the problem with an ad hoc factor. But I think it would be nice to understand where this factor is coming from.

@fjaviersanchez
Copy link

My guess (maybe wrong) is that this factor is related to the volume of the simulation or the cell, but I still have to confirm it.

@jmarclegoff
Copy link
Collaborator

@fjaviersanchez we have put just a constant ad hoc factor, but this factor depends probably on z, so at some point we should try to understand the factor to get its z dependence.

@jmarclegoff
Copy link
Collaborator

The distribution of n_DLA / n_QSO vs z used to be flat around 0.2 in reasonable agreement with the data. With the constant ad hoc factor @TEtourneau got a ratio that is decreasing steeply with z.
Screen Shot 2019-04-19 at 10 34 15 AM
He replaced the constant factor with a factor that depends linearly with z. He now gets a ratio that is fairly constant around 0.2
Screen Shot 2019-04-19 at 1 52 56 PM
This is probably fair enough for the moment but in the longer term we should understand what is going on. May be we should contact Prochaska to get explanations on pyigm functions.

@londumas
Copy link
Collaborator Author

@jmarclegoff, looks very good. Indeed @fjaviersanchez, I think it seems to be linked to the simulation volume or something of the sort. It would be nice to find that out at some point instead of using hadoc corrections. But what @TEtourneau and @jmarclegoff is very good for tests.

@londumas
Copy link
Collaborator Author

@TEtourneau, can I have a look at your new catalog? Could you send me the path on Nersc?

@jmarclegoff
Copy link
Collaborator

@londumas , this is not a big catalogue, it was just for test.
The only file I could find from today is
/global/projecta/projectdirs/desi/mocks/lya_forest/develop/debug/256-256-1536/mock_20/output/master_DLA.fits
but I am not 100% sure, might be better to wait for @TEtourneau confirmation.
There are 4 eBOSS realisations that are nearly finished. The last step is in queue, where spectra are put in the healpix pixels and DLAs are draw.

@TEtourneau
Copy link
Collaborator

Here is a catalog with the "new" dNdz:
/global/cscratch1/sd/tetourne/DesiMocks/v4.3.0/mock_0/Chunk_ra0189.982649735_dec020/dla9.fits

This one include the new function dNdz, the extra factor 20000*6.4 and a linear redshift compensation.

@londumas
Copy link
Collaborator Author

@TEtourneau, looks good. Could you tell me which is the catalog of quasars master.fits so I can look at the ratio.

@TEtourneau
Copy link
Collaborator

The master.fits file is here:
/global/cscratch1/sd/tetourne/DesiMocks/v4.3.0/mock_0/output/master.fits

@londumas
Copy link
Collaborator Author

@TEtourneau, looks very good. Do you have a full mock ready somewhere with all the improvements? I be happy to do a full check of the DLA once it is the case.

nhI_ratio

@TEtourneau
Copy link
Collaborator

Actually there are 4 realisations in .../saclay/v4.4/
I didn't check the results yet, but DLA should be ok, produced with the "test" version.
I didn't run quickquasars yet. But you can still have a look to the catalogs.

@londumas
Copy link
Collaborator Author

@TEtourneau, thanks. I see three in /project/projectdirs/desi/mocks/lya_forest/saclay/v4.4/ can I use that?

@TEtourneau
Copy link
Collaborator

you can use the realisations v4.4.7, 4.4.8 and 4.4.9.
v4.4.6 is still being moved from scratch/ to projecta/

@londumas
Copy link
Collaborator Author

@TEtourneau, there is still a hard cut at NHI=20:

import fitsio
h = fitsio.FITS('/project/projectdirs/desi/mocks/lya_forest/saclay/v4.4/v4.4.7/master_DLA.fits')
h['DLACAT']['N_HI_DLA'][:].min()
h['DLACAT']['N_HI_DLA'][:].max()

gives

20.000000291529926
22.522444878416994

@londumas
Copy link
Collaborator Author

Here is the distribution:
NHI_distribution

@TEtourneau
Copy link
Collaborator

I'll have a look. It is very likely that I forgot to put the new options (like nhi_min=17.2)

@londumas
Copy link
Collaborator Author

Looks good now. The raw distribution looks good, however there is still some discrepancies in the n(z) plot. But this plot is not from me but from @alxogm and I don't know if the code still works at NHI<20 and if I am running it correctly.
Thanks for all the work @TEtourneau .

distrib_NHI
distrib_NHI_2

Distrib higher than 20.
distrib_higher_20

@jmarclegoff
Copy link
Collaborator

@londumas, I guess the factor 10 in the amplitude between mock and data in the first plot is meaningless, as can be expected from the third plot.
The second plot does not exhibit the same shape for the mocks as plot 1 and it seems to be identical to your very first plot which started this issue, I am confused.

@alxogm
Copy link

alxogm commented Apr 23, 2019

Hi, I’m just coming back from holidays. I can check tomorrow, Wednesday, if there is any issue in plot two in my side.

@alxogm
Copy link

alxogm commented May 3, 2019

Hi all, sorry for the delay. So I've tested and yes the issue is in the plot f(NHI) function. Since f(NHI) is the number of DLAs within a bin of NHI normalized by the total absorption distance in the sample. Since now the NHI range is much wider, this normalization was not ok. When the DLAs with NHI< 20 are added, this total absorption distance changes and affects the overall normalization. By computing f(NHI) in three regions it looks better. Mocks and distribution do match. I'll work on improve the fNHI function and update the QA so that this is plot is improved. (Sorry if this caused a delay)
Screen Shot 2019-05-02 at 9 27 49 PM

@londumas
Copy link
Collaborator Author

londumas commented May 3, 2019

@alxogm, thank you for looking into that. instead of updating the QA, can you send me the updated code, and I will do it myself, in order not to loose all other changes to other cells? Also could you tell us why you had to split into three parts?

@andreufont
Copy link

Hi @alxogm - I don't follow the reasoning. Why would the distance change when you change the limit of N_HI considered?

@andreufont
Copy link

We should aim at having a good fit of f(z,N) at a fixed z, covering the full range of N, without having to tweak anything.

@alxogm
Copy link

alxogm commented May 3, 2019

@andreufont yes we shouldn't tweak anything. This shows the normalization I was using was wrong, I'm fixing it. The plot above only shows, I think, that mocks are ok, but is not the final plot. @londumas yes, I'll send the update of the code once I test it works ok with no splitting in regions, but normalizing correctly.

@londumas
Copy link
Collaborator Author

fixed

@alxogm alxogm reopened this Sep 11, 2019
@alxogm
Copy link

alxogm commented Sep 12, 2019

Sorry, that I'm reopening this issue after so long. I checked my small piece of code to compute the fNHI against the London v9 mocks and works fine, as is, no need to split into regions as I was doing above (and never really improved it). Also gives a very similar result that an independent measure from James on such London mocks.
Screen Shot 2019-09-11 at 1 02 40 AM

As seen from James's plots of dn/dz for different NHI range, the issue might be precisely here
dndz_saclay_v4.4.0.pdf

For the computation of fNHI this matters because if you have more DLAS at some NHI bin, the redshift will be contributing to the total absorption length, which influence the full fNHI shape. When I was computing the fNHI by regions, each region was only scaled by the absorption length in that NHI range not the total one, making it more "similar" to expected one, but it was incorrect.

@TEtourneau
Copy link
Collaborator

TEtourneau commented Sep 27, 2019

I managed to use the dn/dz and f(N_HI) distribution from pyigm (using the functions from LyaCoLoRe: https://github.com/igmhub/LyaCoLoRe/blob/master/py/DLA.py)
Here are plots showing the current distributions in the mocks (on realisation v4.7.1):
Screen Shot 2019-09-26 at 1 22 33 PM
Screen Shot 2019-09-26 at 1 22 39 PM
Screen Shot 2019-09-26 at 1 22 46 PM
Screen Shot 2019-09-26 at 1 22 54 PM
Screen Shot 2019-09-26 at 2 55 43 PM
Screen Shot 2019-09-26 at 2 55 50 PM

Everything looks good: we recover the input from pyigm, and the other distributions agree with the mocks from LyaCoLoRe.

The ratio dn_dla / dn_qso doesn't agree with the ratio measured in data. To do a proper comparison, we should look at this ratio for the DLA reconstructed with the DLA finder algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants