We want to make contributing to this project as easy and transparent as possible.
We actively welcome your pull requests.
- Fork the repo and create your branch from
main
. - If you've added code that should be tested, add tests.
- If you've changed APIs, update the documentation.
- Ensure the test suite passes.
- If you haven't already, complete the Contributor License Agreement ("CLA").
torchtext
enforces a fairly strict code format for Python, text, and configuration files through
pre-commit
. You can install it with
pip install pre-commit
or
conda install -c conda-forge pre-commit
To check and in most cases fix the code format, stage all your changes (git add
) and execute pre-commit run
. To
perform the checks automatically before every git commit
, you can install the checks as hooks with
pre-commit install
.
In addition, torchtext
also enforces a fairly strict code format for C++ files through a custom version of
clang-format
. You can download it from
- https://oss-clang-format.s3.us-east-2.amazonaws.com/mac/clang-format-mojave
- https://oss-clang-format.s3.us-east-2.amazonaws.com/linux64/clang-format-linux64
depending on your platform. To run the formatter, make the binary executable (chmod +x
) and execute
python run-clang-format.py \
--recursive \
--clang-format-executable=$CLANG_FORMAT \
torchtext/csrc
where $CLANG_FORMAT
denotes the path to the downloaded binary.
The following steps outline how to add third party libraries to torchtext. We assume that the third party library has
correctly setup their CMakeLists.txt
file for other libraries to take a dependency on.
- Add the third party library as a submodule. Here is a great
tutorial on working with submodules in git.
- Navigate to
third_party/
folder and rungit submodule add <repo-URL>
- Verify the newly added module is present in the
.gitmodules
file
- Navigate to
- Update
third_party/CMakeLists.txt
to add the following line:add_subdirectory(<name-of-submodule-folder> EXCLUDE_FROM_ALL)
- (Optional) If any of the files within the
csrc/
folder make use of the newly added third party library then- Add the new submodule folder to
LIBTORCHTEXT_INCLUDE_DIRS
and toEXTENSION_INCLUDE_DIRS
- Add the "targets" name defined by the third party library's
CMakeLists.txt
file toLIBTORCHTEXT_LINK_LIBRARIES
- Note that the third party libraries are linked statically with torchtext
- Add the new submodule folder to
- Verify the torchtext build works by running
python setup.py develop
Custom C++ operators can be implemented and registered in torchtext for several reasons including to make an existing Python component more efficient, and to get around the limitations when working with multithreading in Python (due to the Global Interpreter Lock). These custom kernels (or “ops”) can be embedded into a TorchScripted model and can be executed both in Python and in their serialized form directly in C++. You can learn more in this tutorial on writing custom C++ operators
Steps to register an operator:
- Add the new custom operator to the
torchtext/csrc
folder. This entails writing the header and the source file for the custom op. - Add the new source files to the
LIBTORCHTEXT_SOURCES
list. - Register the operators with torchbind and pybind
- Torchbind registration happens in the
register_torchbindings.cpp
file - Pybind registration happens in the
register_pybindings.cpp
file.
- Torchbind registration happens in the
- Write a Python wrapper class that is responsible for exposing the torchbind/pybind registered operators via Python.
You can find some examples of this in the
torchtext/transforms.py
file. - Write a unit test that tests the functionality of the operator through the Python wrapper class. You can find some
examples in the
test/test_transforms.py
file.
In order to accept your pull request, we need you to submit a CLA. You only need to do this once to work on any of Facebook's open source projects.
Complete your CLA here: https://code.facebook.com/cla
We use GitHub issues to track public bugs. Please ensure your description is clear and has sufficient instructions to be able to reproduce the issue.
Facebook has a bounty program for the safe disclosure of security bugs. In those cases, please go through the process outlined on that page and do not file a public issue.
By contributing to text, you agree that your contributions will be licensed under the LICENSE file in the root directory of this source tree.