Since every local Git repository contains a copy of the entire project history, it is important to avoid adding large binary files directly to the repository. Large binary files added and removed throughout a project's history will cause the repository to become bloated, take up too much disk space, require excessive time and bandwidth to download, etc.
A solution to this problem which has been adopted by ITK is to store binary files, such as images, in a separate location outside the Git repository, then download the files at build time with CMake.
A "content link" file contains an identifying SHA512 hash. The content
link is stored in the Git repository at the path where the file would exist,
but with a .sha512
extension appended to the file name. CMake will find
these content link files at build time, download them from a list of server
resources, and create symlinks or copies of the original files at the
corresponding location in the build tree.
See also our Data guide for more information. If you just want to browse and download the ITK testing images, see the data.kitware.com ITK collection.
Note: for historical reasons, before SHA512 hash files were used in ITK, MD5 hash content link files were used.
[ITK examples] and ITK class tests (see Section 9.4 of the ITK Software Guide) rely on input and baseline images (or data in general) to demonstrate and check the features of a given class. Hence, when developing an ITK example or test, images will need to be added to the Git repository.
When using images for an ITK example or test images, the following principles need to be followed:
- Images should be small.
- The source tree is not an image database, but a source code repository.
- Adding an image larger than 50 Kb should be justified by a discussion with the ITK community.
- Regression (baseline) images should not use Analyze format unless the
test is for the
AnalyzeImageIO
and related classes. - Images should use non-trivial Metadata.
- Origin should be different form zeros.
- Spacing should be different from ones, and it should be anisotropic.
- Direction should be different from identity.
The data.kitware.com server is an ITK community resource where any community member can upload binary data files. There are three methods available to upload data files:
- The UploadBinaryData.sh shell script.
- The Girder web interface.
- The
girder-cli
command line executable that comes with the girder-client Python package.
Before uploading data, please visit data.kitware.com and register for an account.
Once files have been uploaded to your account, they will be publicly available and accessible since data is content addressed. Specifically, the hashsum_download plugin in Girder looks through all public (or private if authenticated) data for files with the given hash. Thus, so as long as the file is publically available somewhere on data.kitware.com, ITK will be able to retrieve the corresponding file.
At release time, the release manager will upload and archive repository data references in the ITK collection and other redundant storage locations.
The UploadBinaryData.sh script will authenticate to data.kitware.com,
upload the file to your user account's Public folder, and create a
*.sha512
CMake ExternalData
content link file. After the content link
has been created, you will need to add the *.sha512
file to your commit.
When ./Utilities/SetupForDevelopment.sh
is executed, as described in
CONTRIBUTING.md, authentication to Girder is configured in Git. If the Git
girder.api-key
config or GIRDER_API_KEY
environmental variable is not set,
a prompt will appear for your username and password. The API key can be
created in the data.kitware.com user account web browser interface.
To upload new binary testing data:
- Place the binary file at the desired location in the Git repository.
- Run the
git data-upload
alias, and pass in the binary file(s) as arguments. E.g.cd ITK; git data-upload ./Modules/Core/Common/test/Input/cthead1.png
. - In the corresponding
test/CMakeLists.txt
file, use theitk_add_test
macro and reference the relative file path withDATA
and braces. E.g.:DATA{Input/cthead1.png}
. - Re-build ITK, the
ITKData
target specifically, and the testing data will be downloaded into the build tree. The path in the build tree is used in test execution.
- After logging in, you will be presented with the welcome page. Click on the personal data space link.
- Next, select the Public folder of your personal data space.
- Click the green upload button.
- Click the Browse or drop files to select the files to upload.
- Click Start Upload to upload the file to the server.
- Next, proceed to Download the Content Link.
A Python script to upload files from the command line, girder-cli
, is
available with the girder-client Python package. To install it, type:
$ python -m pip install girder-client
To upload files with the girder-cli
script, we need to obtain an API key and a
parent folder id from the web interface.
- After logging in, select My account from the user drop down.
- Next, select the API keys tab.
- Create a new API key if one is not available by clickin on Create new key.
- The show link will show the key, which can be copied into the command line.
- Next, select My Folders from the user drop down.
- Next, select the Public folder of your personal data space.
- Click the i button for information about the folder.
- The Unique ID can be copied into the command line.
Use both the API key and the folder ID when calling girder-cli
. For example,
$ girder-cli \
--api-key 12345ALongSetOfCharactersAndNumbers \
--api-url https://data.kitware.com/api/v1 \
upload \
58becaee8d777f0aefede556 \
/tmp/cthead1.png
Next, proceed to Download the Content Link.
- Click on the file that has been uploaded.
- Click on the i button for further information.
- Finally, click on the Download key file icon to download the key file.
Move the content link file to the source tree at the location where the actual file is desired in the build tree. Stage the new file to your commit:
$ git add -- path/to/file.sha512