TreeCorr is a package for efficiently computing 2-point and 3-point correlation functions.
- The code is hosted at https://github.com/rmjarvis/TreeCorr
- It can compute correlations of regular number counts, weak lensing shears, or scalar quantities such as convergence or CMB temperature fluctutations.
- 2-point correlations may be auto-correlations or cross-correlations. This includes shear-shear, count-shear, count-count, kappa-kappa, etc. (Any combination of shear, kappa, and counts.)
- 3-point correlations currently can only be auto-correlations. This includes shear-shear-shear, count-count-count, and kappa-kappa-kappa. The cross varieties are planned to be added in the near future.
- Both 2- and 3-point functions can be done with the correct curved-sky calculation using RA, Dec coordinates, on a Euclidean tangent plane, or in 3D using either (RA,Dec,r) or (x,y,z) positions.
- The front end is in Python, which can be used as a Python module or as a standalone executable using configuration files. (The executable is corr2 for 2-point and corr3 for 3-point.)
- The actual computation of the correlation functions is done in C++ using ball trees (similar to kd trees), which make the calculation extremely efficient.
- When available, OpenMP is used to run in parallel on multi-core machines.
- Approximate running time for 2-point shear-shear is ~30 sec * (N/10^6) / core for a bin size b=0.1 in log(r). It scales as b^(-2). This is the slowest of the various kinds of 2-point correlations, so others will be a bit faster, but with the same scaling with N and b.
- The running time for 3-point functions are highly variable depending on the range of triangle geometries you are calculating. They are significantly slower than the 2-point functions, but many orders of magnitude faster than brute force algorithms.
- If you use TreeCorr in published research, please reference: Jarvis, Bernstein, & Jain, 2004, MNRAS, 352, 338 (I'm working on new paper about TreeCorr, including some of the improvements I've made since then, but this will suffice as a reference for now.)
- If you use the three-point multipole functionality of TreeCorr, please also reference Porth et al, 2023, arXiv:2309.08601
- Record on the Astrophyics Source Code Library: http://ascl.net/1508.007
- Developed by Mike Jarvis. Fee free to contact me with questions or comments at mikejarvis17 at gmail. Or post an issue (see below) if you have any problems with the code.
The code is licensed under a FreeBSD license. Essentially, you can use the
code in any way you want, but if you distribute it, you need to include the
file TreeCorr_LICENSE
with the distribution. See that file for details.
The easiest ways to install TreeCorr are either with pip:
pip install treecorr
or with conda:
conda install -c conda-forge treecorr
If you have previously installed TreeCorr, and want to upgrade to a new released version, you should do:
pip install treecorr --upgrade
or:
conda update -c conda-forge treecorr
Depending on the write permissions of the python distribution for your specific system, you might need to use one of the following variants for pip installation:
sudo pip install treecorr pip install treecorr --user
The latter installs the Python module into ~/.local/lib/python3.X/site-packages
,
which is normally already in your PYTHONPATH, but it puts the executables
corr2
and corr3
into ~/.local/bin
which is probably not in your PATH.
To use these scripts, you should add this directory to your PATH. If you would
rather install into a different prefix rather than ~/.local, you can use:
pip install treecorr --install-option="--prefix=PREFIX"
This would install the executables into PREFIX/bin
and the Python module
into PREFIX/lib/python3.X/site-packages
.
If you would rather download the tarball and install TreeCorr yourself, that is also relatively straightforward:
You can download the latest tarball from:
https://github.com/rmjarvis/TreeCorr/releases/Or you can clone the repository using either of the following:
git clone [email protected]:rmjarvis/TreeCorr.git git clone https://github.com/rmjarvis/TreeCorr.gitwhich will start out in the current stable release branch.
Either way, cd into the TreeCorr directory.
All required dependencies should be installed automatically for you by pip or conda, so you should not need to worry about these. But if you are interested, the dependencies are:
- numpy
- pyyaml
- LSSTDESC.Coord
- pybind11
They can all be installed at once by running:
pip install -r requirements.txtor:
conda install -c conda-forge treecorr --only-depsNote
Several additional modules are not required for basic TreeCorr operations, but are potentially useful.
- fitsio is required for reading FITS catalogs or writing to FITS output files.
- pandas will signficantly speed up reading from ASCII catalogs.
- pandas and pyarrow are required for reading Parquet files.
- h5py is required for reading HDF5 catalogs.
- mpi4py is required for running TreeCorr across multiple machines using MPI.
These are all pip installable:
pip install fitsio pip install pandas pip install pyarrow pip install h5py pip install mpi4pyBut they are not installed with TreeCorr automatically.
Also, beware that many HPC machines (e.g. NERSC) have special instructions for installing mpi4py, so the above command may not work on such systems. If you have trouble, look for specialized instructions about how to install mpi4py properly on your system.
You can then install TreeCorr from the local distribution. Typically this would be the command:
pip install .If you don't have write permission in your python distribution, you might need to use:
pip install . --userIn addition to installing the Python module
treecorr
, this will install the executablescorr2
andcorr3
in abin
folder somewhere on your system. Look for a line like:Installing corr2 script to /anaconda3/binor similar in the output to see where the scripts are installed. If the directory is not in your path, you will also get a warning message at the end letting you know which directory you should add to your path if you want to run these scripts.
If you want to run the unit tests, you can do the following:
pip install -r test_requirements.txt cd tests pytest
This software is able to compute a variety of two-point correlations:
NN: | The normal two-point correlation function of number counts (typically galaxy counts). |
---|---|
GG: | Two-point shear-shear correlation function. |
KK: | Nominally the two-point kappa-kappa correlation function, although any scalar quantity can be used as "kappa". In lensing, kappa is the convergence, but this could be used for temperature, size, etc. |
NG: | Cross-correlation of counts with shear. This is what is often called galaxy-galaxy lensing. |
NK: | Cross-correlation of counts with kappa. Again, "kappa" here can be any scalar quantity. |
KG: | Cross-correlation of convergence with shear. Like the NG calculation, but weighting the pairs by the kappa values the foreground points. |
There are also additional combinations involving complex fields with different spin than 2 (shear is a spin-2 field). See Two-point Correlation Functions for more details.
This software is not yet able to compute three-point cross-correlations, so the only avaiable three-point correlations are:
NNN: | Three-point correlation function of number counts. |
---|---|
GGG: | Three-point shear correlation function. We use the "natural components" called Gamma, described by Schneider & Lombardi (2003) (Astron.Astrophys. 397, 809) using the triangle centroid as the reference point. |
KKK: | Three-point kappa correlation function. Again, "kappa" here can be any scalar quantity. |
See Three-point Correlation Functions for more details.
The executables corr2 and corr3 each take one required command-line argument, which is the name of a configuration file:
corr2 config_file corr3 config_file
A sample configuration file for corr2 is provided, called sample.params. See Configuration Parameters for the complete documentation about the allowed parameters.
You can also specify parameters on the command line after the name of the configuration file. e.g.:
corr2 config_file file_name=file1.dat gg_file_name=file1.out corr2 config_file file_name=file2.dat gg_file_name=file2.out ...
This can be useful when running the program from a script for lots of input files.
See Using configuration files for more details.
The typical usage in python is in three stages:
- Define one or more Catalogs with the input data to be correlated.
- Define the correlation function that you want to perform on those data.
- Run the correlation by calling
process
. - Maybe write the results to a file or use them in some way.
For instance, computing a shear-shear correlation from an input file stored in a fits file would look something like the following:
>>> import treecorr >>> cat = treecorr.Catalog('cat.fits', ra_col='RA', dec_col='DEC', ... ra_units='degrees', dec_units='degrees', ... g1_col='GAMMA1', g2_col='GAMMA2') >>> gg = treecorr.GGCorrelation(min_sep=1., max_sep=100., bin_size=0.1, ... sep_units='arcmin') >>> gg.process(cat) >>> xip = gg.xip # The xi_plus correlation function >>> xim = gg.xim # The xi_minus correlation function >>> gg.write('gg.out') # Write results to a file
For more details, see our slightly longer Getting Started Guide.
Or for a more involved worked example, see our Jupyter notebook tutorial.
And for the complete details about all aspects of the code, see the Sphinx-generated documentation.
If you find a bug running the code, please report it at:
https://github.com/rmjarvis/TreeCorr/issues
Click "New Issue", which will open up a form for you to fill in with the details of the problem you are having.
If you would like to request a new feature, do the same thing. Open a new issue and fill in the details of the feature you would like added to TreeCorr. Or if there is already an issue for your desired feature, please add to the discussion, describing your use case. The more people who say they want a feature, the more likely I am to get around to it sooner than later.