Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta r_parallel is non zero #41

Open
londumas opened this issue Jan 21, 2019 · 34 comments
Open

Delta r_parallel is non zero #41

londumas opened this issue Jan 21, 2019 · 34 comments

Comments

@londumas
Copy link
Contributor

londumas commented Jan 21, 2019

Using london/v5.0.0/quick-0.0, ran without any systematic shift of the quasar redshift along the line-of-sight, we still get a non-zero drp = Delta r_parallel:
drp( z = 2.26) = 0.15 +/- 0.06 Mpc/h.
It can be because of five things:

  • In LyaCoLoRe or quickquasars, the definition of the Lya line is different from 1215.67. @jfarr03, could you check the this possibility? A simple grep 121 could do the trick, or something of the kind.
  • It is simply a random fluctuation of 3 sigma. I rand this mock here picca/london/v5.0.0/quick-0.0, with only 300,000 forests for memory issues. I could ran the analysis on a version of the mock with higher SNR, I will do that now.
  • It is a random fluctuation of the box, another box would give something different. Once we have O(10) mocks, we will be able to assess this.
  • It can be linked to the definition of the speed of light also. @jfarr03 are you using scipy.constants.speed_of_light? Are you using 299792458 or an approximation?
  • It can be linked to imperfections in the distortion matrix and in the model that do not allow to recover perfectly drp=0.
@londumas
Copy link
Contributor Author

Using the 10 realizations of v4.0.<*>/quick-2.0, I get:
<drp> = 0.052 +/- 0.012 Mpc/h
Here is a plot of the best fit values of drp as a function of mock index.

drp_vs_index_mock

@andreufont
Copy link
Collaborator

Hi @londumas , thank you very much for looking into this, and for opening this issues.

Even thought at this point it might be better to focus on getting the right bias/beta/DLA/metals, it is also important to keep this in mind and to have this list of future improvements.

@londumas
Copy link
Contributor Author

@andreufont, I totally agree with all you just said.

@londumas
Copy link
Contributor Author

Here is the status of the ticket as of v6.0/v6.0.<*>/eboss-0.0/:

xcf (cross): <drp_QSO> = 0.130 +/- 0.019 (<0.079>, 0.025)
cfxcf (combined): <drp_QSO> = 0.122 +/- 0.018 (<0.077>, 0.024)

where the value is the mean of the 10 realizations, and the error the std/sqrt(10).
In parenthesis, I give the mean error and the mean error divided by sqrt(10).
The first plot gives the best fit drp for each of the 10 realizations of the cross and the combined fit.
The second plot gives the slice along the line-of-sight of the cross-correlation. We can observe the asymmetry between the two points near drp=0. This asymmetry is also governed by the distortion matrix and the position on the grid so it is only an indication.

We had such issues in both autoDR12 and crossDR12 mocks and it came from:

  • not using c = 299792458 m/s
  • not using lambda_Lya = 1215.67 A

It can also come from:

  • spurious shift introduced when computing the velocity of each quasar, through some approximation
  • spurious shift introduced on the ressampling in picca
  • a systematic error in the fitter because of the grid center
  • ...

@jfarr03, what do you think can be causing that on your side?
@andreufont, what do you think can be causing that on the quickquasar side? I had a look and everything seem well in the script. But I might miss something or some dependency.
@londumas, I will send the cross-correlation (xcf), with the catalog of quasars, using the redshift without RSD and see if we recover drp = 0..

drp

slice_xcf

@andreufont
Copy link
Collaborator

Hi @londumas , if you wanted to avoid quickquasars, you could compute the cross-correlation of the transmission files with the quasar catalog. No need to touch actual spectra.

@londumas
Copy link
Contributor Author

@andreufont, This is a good point, anyway I need to compute the undistorted correlation function.

@londumas
Copy link
Contributor Author

Using the Lya absorption in the Lyb region, we get the same results, with more noise.

cross: <drp_QSO> = 0.108 +/- 0.087 (<0.25>, 0.079)
auto+cross: <drp_QSO> = 0.116 +/- 0.086 (<0.22>, 0.068)

drp_withLybforest

slice_xcf_withLybforest

@londumas
Copy link
Contributor Author

londumas commented Jun 26, 2019

@jfarr03, this is still there in v8.0. I bet it is real.
Here is a simple test you could look at: plot the large scale delta field and the transmission and on top of it the position of the quasar found. We should always see it on the peak of the delta_gaussian, if it is shifted then it would explain. I would do that for one, then for a stack of 10,000 and then for a stack of 10,000 with (zqso+rsd). How does that sound?

xcf: <drp_QSO> = 0.128 +/- 0.013 (<0.049>, 0.015)

shift-qso

@londumas
Copy link
Contributor Author

Here is the stack of the transmission, we can see in both cases that the maximum is not at lRF = 1215.67. Would be interesting to get the same lot for delta_gaussian.

  • first plot: with quasar RSD
  • second plot without quasar RSD

stack_transmission_zqwithrsd

stack_transmission_zqnorsd

@londumas
Copy link
Contributor Author

Similar plots for the Saclay mocks do not present a strong peak but a slope, does not mean either that it is ok their, but it is different: igmhub/SaclayMocks#11. @TEtourneau and @jmarclegoff, did you find any bugs that could cause such an issue? @jfarr03, examples of such an effect are: wrong definition of Lyman-alpha, grid center vs. grid edge, speed of light...

stack_transmission_zqwithrsd_saclay

stack_transmission_zqnorsd_saclay

@jmarclegoff
Copy link

jmarclegoff commented Jun 26, 2019

@jvstermer, has confirmed with 10 mocks the shift you observed with one mock.
We did not find any bug to explain the issue.
Some thinking or questions :

  • I would have expected that the RSD would smooth the curves of delta and T versus lambda, while if the first plot is with RSD and the second without, RSD seems to sharpen the curve.
  • We should not consider T above 1215.6, which is unity by construction in our mocks, and I understand that the dip is due to the proximity effect. Do you mean that drp != 0 corresponds to the dip not being at 1216.67?
  • the shift is about +0.13 Mpc/h for London, it is <0 and much larger for Saclay, on the order of -1 Mpc/h, to be checked with Julianna. However, the plots are similar.

@londumas
Copy link
Contributor Author

@jmarclegoff,

  • @jvstermer, could you update the results for the saclayMocks? It is in this ticket: drp is not zero SaclayMocks#11.
  • I think it is sharper because T(lRF>1215.67)=0 by definition. but I agree that it should be the contrary if we look at delta_gaussian.
  • I totally agree that we should normally look at lRF<=1215.67. I simply wanted to look at the overall shape.
  • It is possible that drp!=0 corresponds to the dip not being at 1215.67. But it could be another thing. This is one test, among many one can design.

I think that looking at delta_gaussian will be more useful, and easier to interpret.

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 26, 2019

@londumas thanks for looking into this, it's very interesting. Would you be able to produce the plots of stacked F/delta with a finer binning in lambda? The plot without RSDs shows F=1 for all points above 1215.67, and it's not clear that (lambda_rest=1215.67) is not 1. on the other hand, the plot with RSDs clearly shows that one point at lambda_rest>1215.67 has F<1.

I suspect that the issue could be to do with the QSOs being placed randomly in the CoLoRe cells, and how this is then handled within LyaCoLoRe. The value of drp found is ~half of the small cell size used in LyaCoLoRe, so that could be related. I'll take a look into it and will get back to you.

@londumas
Copy link
Contributor Author

@jfarr03, nice catch, indeed T is not 1 at lRF>=1215.67 A:

  • with RSD
  • no RSD

stack-withRSD

stack-noRSD

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 26, 2019

@londumas Great thanks for checking. I'm pretty sure that this is a bug to do with the placement of the QSO in each cell - I'm working on it now. Do you have a script to generate these plots handy somewhere so I can test for myself?

@londumas
Copy link
Contributor Author

londumas commented Jun 26, 2019

@jfarr03, you can do that with the jupyther notebook: https://github.com/igmhub/desi_QA/blob/master/Mocks/QA_plots.ipynb, section "Lya Transmission".

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 26, 2019

Perfect, thanks!

@londumas
Copy link
Contributor Author

@jfarr03, how do you think "the placement of the QSO in each cell" can get this non zero drp ? If you say it is random? Maybe it is flat random in z but not in flat random in comoving distance or something like that?

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 26, 2019

@londumas I think it is to do with the way LyaCoLoRe handles interpolating to the small cell size: when it does so, it needs to ignore the new, small cells that are "invalid" as they are behind the QSO, even though they came from a "valid" CoLoRe cell (the one that the QSO was in).

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 26, 2019

Actually I think the issue might be slightly different - there's several different interpolations that go on so I'm going to debug them one by one! I'll let you know when I have results.

@londumas
Copy link
Contributor Author

ok, thanks @jfarr03.

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 27, 2019

@londumas I fixed a couple of bugs last night, and have now looked into this a little more this morning. I believe the plot below helps to explain the situation. There's quite a lot of explanation needed for it, so please forgive the very long comment, but hopefully it's helpful!

The plot shows F vs lambda_rest for one spectrum with/without QSO RSDs, and with/without skewer RSDs. The points represent the cells used within LyaCoLoRe, which we then interpolate between to produce the transmission files. The black vertical line is Lya wavelength.

near_QSO_RSD_effects

The red line has no QSO RSDs and no skewer RSDs. All of the points to the right of Lya are at F=1, as should be the case as these are cells beyond the QSO.

Including skewer RSDs but not QSO RSDs gives us the green line. In this case, the cells near to the QSO have positive peculiar velocity, and so their absorption appears at lambda_rest>Lya. I think this is equivalent to your second plot, in which the stacked skewers have RSDs, but Z_QSO_NO_RSD is used. The smoothing in that plot is because the skewer cells are shifted by RSDs but the QSOs are not, so the skewers near the QSO move to higher/lower lambda_rest and the peak is smoothed.

Including QSO RSDs but not skewer RSDs gives us the orange line. The QSO has positive peculiar velocity, and so moves further along the skewer. Thus some of the F=1 cells that were to the right of Lya in the red line, now move to the left. The orange line is thus exactly the same as the red one, but shifted slightly to the left.

Including both QSO RSDs and skewer RSDs gives us the blue line. We can see a small number of points to the right of the Lya line have F<1. This is because they represent cells that are very close to the QSO, yet have peculiar velocity greater than the QSO. As such, when QSO and skewer RSDs are added, the cells "overtake" the QSO, and move to the right of the Lya line. I think this is equivalent to your first plot, in which the stacked skewers have RSDs, and Z_QSO_RSD is used. The slight smoothing here is because of the slight difference in RSD velocities between near-QSO cells and QSOs themselves.

This blue line is ultimately the one that matters, as we include both QSO and skewer RSDs in our final realisations. Physically, I'm not sure whether a situation like this is possible. Certainly, gas near to the QSO could have a higher peculiar velocity that then QSO itself, but whether it could be sufficiently higher to "overtake", I'm not sure.

The reason that this "overtaking" happens in the mocks is, I think, because we boost the velocities of the cells, but not those of the QSOs. When I include both QSO RSDs and skewer RSDs, but ignore the velocity boost I get the purple line. As you can see, F=1 for all points to the right of Lya as the near-QSO cells are unable to "overtake".

This boosting is important, though, for getting the right value of beta_Lya, and so we need to keep it. One fix would be to boost the velocities of the QSOs as well, but this could result in beta_QSO becoming too high. I guess we would need to measure the QSO-autocorrelation to check what the current value of beta_QSO is, and thus assess whether boosting QSO velocities would be an issue.

Alternatively, I can try and remove/reduce the boost only for the cells near the QSO. This would be a little fiddly, and the number of cells to do this for would be somewhat arbitrary, but it should work.

Let me know if you have any thoughts or questions!

@londumas
Copy link
Contributor Author

@jfarr03, Thanks for looking into this:

  • first of all, I really do not think that having T!=1 at lRF>1215.67 is an issue, it even happens in data. The thing I do not understand is why it does not peak at 1215.67.
  • second, we do not need to fix that for the 100 mocks, so do not worry too much, however it is nice to try to understand it
  • Could you redo these different plots, but doing a mean over at least 1000 los? It would better help to understand what happens.
  • I do not think we should change anything in the boost.

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 27, 2019

@londumas I've tried to simplify the problem as much as possible, and have stacked the Gaussian field from CoLoRe in terms of cell index (rather than interpolating to use lambda_rest). It seems as if the cell in front of each QSO is slightly more dense than the cell containing the QSO. Equally, the cell beyond the QSO does not have zero mean density:

near_QSO_delta_G_stack

I imagine this is to do with the way the skewers are interpolated from the grid in CoLoRe probably?

@londumas
Copy link
Contributor Author

@jfarr03, This is exactly the plot I had in mind, thanks.

  • are you using zqso with or without RSD?
  • why is the gaussian colore zero beyond the qso? I would have expected a symmetric curve bellow and beyond the quasar.

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 27, 2019

This is without QSO RSDs, to make things as simple as possible.

Gaussian CoLoRe is zero beyond the QSO by design - I guess it doesn't have to be but it seems pointless to store that information. It shouldn't cause any issues as long as things are working correctly (which they might not be)

@londumas
Copy link
Contributor Author

@jfarr03, could you add in the same plot zqso(with RSD)? Could you also do a finer grid?
Would it be hard to also stack the column of Gaussian field, that is neighbor to the line-of-sight, in the perpendicular direction, so we could see if this issue is also there in 3D? I do not know if it is hard.
So if I understand properly, it might be a feature of CoLoRe not of LyaCoLoRe? Would it be possible to have the CoLoRe developer have a look at that?

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 27, 2019

@londumas Sure, here it is:

near_QSO_delta_G_stack

A finer grid isn't really possible: the points in this plot represent the cells themselves as I didn't want to include any interpolation of the skewers. Stacking neighbours might be tricky - I can try in future but I don't have time right now I'm afraid.

It might be a feature of CoLoRe - I edited some of the CoLoRe interpolation methods about 18 months ago, so I can have a quick look at some point. If it is beyond me, I'll ask David Alonso, the main developer.

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 27, 2019

Note: the cells here don't include any RSDs, of course

@londumas
Copy link
Contributor Author

@jfarr03, Thanks for this plot, this grid makes totally sense then, if it is the natural binning of the mocks, no need to change it. It is possible that the issue indeed comes from the fact that delta(z>zqso) is not stored, then when rebinning, depending on the shape of the kernel, then you could be mixing delta=0 with non zero values. Interesting...

@jmarclegoff
Copy link

  • @jfarr03 said: I can try and remove/reduce the boost only for the cells near the QSO
    However, I believe that changing these cells will not have an effect on drp since these
    pixels are not included in the analysis (lambda_rf_max is still 1200 ?).
    More generally I think a local effect, due to F=1 above 1215.67, cannot be the reason for drp != 0.
  • drp has increased from 0,05 to 0,13 between v4 and v6. Maybe considering the differences between v4 and v6 could help to point the origin of drp != 0 ?
  • Can someone remind me of the definition of the sign of drp ?

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 27, 2019

@londumas I think the way CoLoRe works is by starting at the r=0 end of the skewer, then adding a value to each cell moving outwards by interpolating the 3D field. Once it gets to a cell for which the centre is beyond the QSO, it stops and leaves the rest of the cells with 0 value. That's why sometimes the cell beyond the QSO is 0, sometimes not. Beyond that, the cells are always 0.

I don't think these 0 values are mixed in at all: when we move to the smaller cell size, NGP interpolation is used, and all of the small cells that are beyond the cell housing the QSO are set to 0. At any rate, the peak-shift problem is present in the CoLoRe skewers, so I think the problem might lie there.

@jmarclegoff that's a good point with the value changing between v4 and v6. After a quick look, the main changes relate to the tuning parameters, DLA implementation and n(z) of QSOs. I don't know enough about drp to know which of these, if any, is likely to have an effect.

@jfarr03
Copy link
Collaborator

jfarr03 commented Jun 27, 2019

@londumas Re-evaluating the plot above using a full CoLoRe output file (before, I used one which had been restricted to a single HEALPix pixel), the problem seems to disappear:

near_QSO_delta_G_stack

This is equivalent to stacking over ~230,000 spectra rather than ~2,300. Either the previous plot was just a statistical fluke, or there was an error introduced when when generating the single HEALPix pixel file. I'll look into which of these it was tomorrow.

@londumas
Copy link
Contributor Author

Here is the update as of v9.0, with the current first 19 realizations. More are coming.
It is a bit lower than what was found on version v8.0 (#41 (comment)), but it could be a statistical fluctuation.
At least the different changes @jfarr03 made between these two versions did not make it worse, and seem to have made it better.

cross alone: <drp_QSO> = 0.095 +/- 0.015 (<0.060>, 0.014)
auto+cross: <drp_QSO> = 0.087 +/- 0.014 (<0.059>, 0.0148)

drp_v90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants