Thank you for your interest in contributing! We're eager to see your ideas and look forward to working with you.
This document describes the technical procedures we follow in this project. It should also be stressed that as members of the Scikit-HEP community, we are all obliged to maintaining a welcoming, harassment-free environment. See the Code of Conduct for details.
The front page for the Awkward Array project is its GitHub README. This leads directly to tutorials and reference documentation that you may have already seen. It also includes instructions for compiling for development.
The first thing you should do if you want to fix something is to submit an issue through GitHub. That way, we can all see it and maybe one of us or a member of the community knows of a solution that could save you the time spent fixing it. If you "assign yourself" to the issue (top of right side-bar), you can signal your intent to fix it in the issue report.
Feel free to open pull requests in GitHub from your forked repo when you start working on the problem. We recommend opening the pull request early so that we can see your progress and communicate about it. (Note that you can git commit --allow-empty
to make an empty commit and start a pull request before you even have new code.)
Please make the pull request a draft to indicate that it is in an incomplete state and shouldn't be merged until you click "ready for review."
Currently, we have three regular reviewers of pull requests:
You can request a review from one of us or just comment in GitHub that you want a review and we'll see it. Only one review is required to be allowed to merge a pull request. We'll work with you to get it into shape.
If you're waiting for a response and haven't heard in a few days, it's possible that we forgot/got distracted/thought someone else was reviewing it/thought we were waiting on you, rather than you waiting on us—just write another comment to remind us.
If you want to contribute frequently, we'll grant you write access to the scikit-hep/awkward
repo itself. This is more convenient than pull requests from forked repos.
Unless you ask us not to, we might commit directly to your pull request as a way of communicating what needs to be changed. That said, most of the commits on a pull request are from a single author: corrections and suggestions are exceptions.
Therefore, we prefer git branches to be named with your GitHub userid, such as jpivarski/write-contributing-md
.
The titles of pull requests (and therefore the merge commit messages) should follow these conventions. Mostly, this means prefixing the title with one of these words and a colon:
- feat: new feature
- fix: bug-fix
- perf: code change that improves performance
- refactor: code change that neither fixes a bug nor adds a feature
- style: changes that do not affect the meaning of the code
- test: adding missing tests or correcting existing tests
- build: changes that affect the build system or external dependencies
- docs: documentation only changes
- ci: changes to our CI configuration files and scripts
- chore: other changes that don't modify src or test files
- revert: reverts a previous commit
Almost all pull requests are merged with the "squash and merge" feature, so details about commit history within a pull request are hidden from the main
branch's history. Feel free, therefore, to commit with any frequency you're comfortable with.
It is unnecessary to manually edit (rebase) commit history within a pull request.
The installation for developers procedure is described in brief on the front page, and in more detail here.
Awkward Array is shipped as two packages: awkward
and awkward-cpp
. The awkward-cpp
package contains the compiled C++ components required for performance, and awkward
is only Python code. If you do not need to modify any C++ (the usual case), then awkward-cpp
can simply be installed using pip
or conda
.
Subsequent steps require the generation of code and datafiles (kernel specification, header-only includes). This can be done with the prepare
nox session:
nox -s prepare
The prepare
session accepts flags to specify exact generation targets, e.g.
nox -s prepare -- --tests --docs
This can reduce the time taken to perform the preparation step in the event that only the package-building step is needed.
nox
also lets us re-use the virtualenvs that it creates for each session with the -R
flag, eliminating the dependency reinstall time:
nox -R -s prepare
The C++ components can be installed by building the awkward-cpp
package:
python -m pip install ./awkward-cpp
If you are working on the C++ components of Awkward Array, it might be more convenient to skip the build isolation step, which involves creating an isolated build environment. First, you must install the build requirements:
python -m pip install "scikit-build-core[pyproject,color]" pybind11 ninja cmake
Then the installation can be performed without build isolation:
python -m pip install --no-build-isolation --check-build-dependencies ./awkward-cpp
With awkward-cpp
installed, an editable installation of the pure-python awkward
package can be performed with
python -m pip install -e .
Finally, let's run the integration test suite to ensure that everything's working as expected:
python -m pytest -n auto tests
For more fine-grained testing, we also have tests of the low-level kernels, which can be invoked with
python -m pytest -n auto awkward-cpp/tests-spec
python -m pytest -n auto awkward-cpp/tests-cpu-kernels
This assumes that the nox -s prepare
session ran the --tests
target.
Furthermore, if you have an Nvidia GPU and CuPy installed, you can run the CUDA tests with
python -m pytest tests-cuda-kernels
python -m pytest tests-cuda
Sometimes it's convenient to build a wheel for the awkward-cpp
package, so that subsequent re-installs do not require the package to be rebuilt. The build
package can be used to do this, though care must be taken to specify the current Python interpreter in pipx:
pipx run --python=$(which python) build --wheel awkward-cpp
The built wheel will then be available in awkward-cpp/dist
.
The Awkward Array project uses pre-commit to handle formatters and linters. This automatically checks (and may push corrections to) your pull request's git branch.
To respond more quickly to pre-commit's feedback, it can help to install it and run it locally. Once it is installed, run
pre-commit run -a
to test all of your files. If you leave off the -a
, it will run only on currently stashed changes.
As stated above, we use pytest to verify the correctness of the code, and GitHub will reject a pull request if either pre-commit or pytest fails (red "X"). All tests must pass for a pull request to be accepted.
Note that if a pull request doesn't modify code, only the documentation tests will run. That's okay: documentation-only pull requests only need the documentation tests to pass.
Unless you're refactoring code, such that your changes are fully tested by the existing test suite, new code should be accompanied by new tests. Our testing suite is organized by GitHub issue or pull request number: that is, test file names are
tests/test_XXXX-yyyy.py
where XXXX
is either the number of the issue your pull request fixes or the number of the pull request and yyyy
is descriptive text, often the same as the git branch. This makes it easier to run your test in isolation:
python -m pytest tests/test_XXXX-yyyy.py
and it makes it easier to figure out why a particular test was added. The easiest way to make a new testing file is to copy an existing one and replace its test_zzzz
functions with your own. The previous tests should also give you a sense of the way we test things and the kinds of things that are constrained in tests.
Documentation is automatically built by each pull request. You usually won't need to build the documentation locally, but if you do, this section describes how.
We use Sphinx to generate documentation. You may need to install some additional packages:
To build documentation locally, first prepare the generated data files with
nox -s prepare
Only the --headers
and --docs
flags are actually required at the time of writing. These can be passed with:
nox -s prepare -- --docs --headers
Then, use nox
to run the various documentation build steps
nox -s docs
This command executes multiple custom Python scripts (some require a working internet connection), in addition to using Sphinx and Doxygen to generate the required browser viewable documentation.
To view the built documentation, open
docs/_build/html/index.html
from the root directory of the project in your preferred web browser, e.g.
python -m http.server 8080 --directory docs/_build/html/
Before re-building documentation, you might want to delete the files that were generated to create viewable documentation. A simple command to remove all of them is
rm -rf docs/reference/generated docs/_build docs/_static/doxygen
There is also a cache in the docs/_build/.jupyter_cache
directory for Jupyter Book, which can be removed.
The Awkward Array main
branch must be kept in an unbroken state. There are two reasons for this: so that developers can work independently on known-to-be-working states and so that users can test the latest changes (usually to see if the bug they've discovered is fixed by a potential correction).
The main
branch is also never far from the latest released version. We usually deploy patch releases (z
in a version number like x.y.z
) within days of a bug-fix.
Committing directly to main
is not allowed except for
- updating the
pyproject.toml
file to increase the version number, which should be independent of pull requests - updating documentation or non-code files
- unprecedented emergencies
and only by the the reviewing team.
The main-v1
branch was split from main
just before Awkward 1.x code was removed, so it exists to make 1.10.x bug-fix releases. These commits must be drawn from main-v1
, not main
, and pull requests must target main-v1
(not the GitHub default). A single commit cannot be applied to both main
and main-v1
because they have diverged too much. If a bug-fix needs to be applied to both (unlikely), it will have to be reimplemented on both.
Currently, only one person can deploy releases:
- Jim Pivarski (jpivarski)
There are two kinds of releases: (1) awkward-cpp
updates, which only occur when the C++ is updated (rare) and involves compilation on many platforms (takes hours), and (2) awkward
updates, which can happen with any bug-fix. The releases listed in GitHub are awkward
releases, not awkward-cpp
.
If you need your merged pull request to be deployed in a release, just ask!
When making an awkward-cpp
release (1), the following manual steps must also be taken:
- Creating a
git
tagawkward-cpp-{version}
for the new version epoch.
When making an awkward
release (2), the following manual steps must also be taken:
- Attaching the
headers.zip
from thedeploy.yml
workflow to the release artefacts. - Adding a
doc/switcher.json
entry for new minor/major versions.