Thank you for considering a contribution to Archivematica! This document outlines the change submission process for Archivematica, along with our standards for new code contributions. Following these guidelines helps us assess your changes faster and makes it easier for us to merge you submission!
There are many ways to contribute: writing tutorials or blog posts about your experience, improving the documentation, submitting bug reports, answering questions on the mailing list, or writing code which can be incorporated into Archivematica itself.
Table of Contents
If you find a security vulnerability, do NOT open an issue. Email [email protected] instead.
Artefactual staff use the issue tracker for any work they do on the Archivematica project. Anyone is welcome to request an account on that system to file issues there. To register for an account, please email [email protected]. Please note that it may take up to two business days for your new account request to be approved.
Issues can also be filed using GitHub Issues in the Archivematica project or any of the supporting Github repositories in the Artefactual organization. You can also post in our technical or user mailing lists. A post to the mailing list is always welcome, especially if you're unsure if it's a bug or a local problem!
Useful questions to answer if you're having problems include:
- What version of Archivematica and the Storage Service are you using?
- How was Archivematica installed? Package install, Ansible, etc?
- Was this a fresh install or an upgrade?
- What did you do to cause this bug to happen?
- What did you expect to happen?
- What did you see instead?
- Can you reproduce this reliably?
- If a specific Job is failing, what output did it produce? This is available by clicking on the gear icon.
Every new feature and bugfix to a project is expected to go through code review before inclusion. This applies both to developers at Artefactual and to outside contributors.
Here's an outline of the code review process:
- Fork the Artefactual project on GitHub, and commit your changes to a branch in your fork.
- Open a pull request.
- Back and forth discussion with developers on the branch.
- Make any changes suggested by reviewers.
- Repeat 3 and 4 as necessary.
- Clean up commit history, if necessary.
- Sign a Contributor's Agreement, if you haven't already.
- Your branch will be merged!
So you have something to contribute to an Artefactual project. Great!
To install Archivematica, see our development installation instructions.
Artefactual uses GitHub's pull request feature for code review. Every change being submitted to an Artefactual project should be submitted as a pull request to the appropriate repository. A branch being submitted for code review should contain commits covering a related section of code. Try not to bundle unrelated changes together in one branch; it makes review harder.
If you're not familiar with forking repositories and creating branches in GitHub, consult their guide.
Your code doesn't have to be ready yet to submit for code review! You should submit a branch for code review as soon as you want to get feedback on it. Sometimes, that means submitting a feature-complete branch, but sometimes that means submitting an early WIP in order to get feedback on direction. Don't be shy about submitting early.
GitHub has an excellent guide on using the pull request feature.
Discussion on pull requests is usually a back and forth process. Don't feel like you have to make every change the reviewer suggests; the pull request is a great place to have in-depth conversation on the issue.
Do make use of GitHub's line comment feature!
By viewing the "Files changed", you can leave a comment on any line of the diff. This is a great way to scope discussion to a particular part of a diff. Any discussion you have in a specific part of the diff will also be automatically hidden once a change is pushed that addresses it, which helps keep the pull request page clear and readable.
Anyone can participate in code review discussions. Feel free to jump in if you have something to contribute on another pull request, even if you're not the one who opened it.
Once code review is finished or nearly finished, and no further development is planned on the branch, the branch's commit history should be cleaned up. You can alter the commit history of a branch using git's powerful interactive rebase feature.
In order for the Archivematica development team to accept any patches or code commits, contributors must first sign this Contributor's Agreement. The Archivematica contributor's agreement is based almost verbatim on the Apache Foundation's individual contributor license.
If you have any questions or concerns about the Contributor's Agreement, please email us at [email protected] to discuss them.
One of the key challenges for open source software is to support a collaborative development environment while protecting the rights of contributors and users over the long-term. Unifying Archivematica copyrights through contributor agreements is the best way to protect the availability and sustainability of Archivematica over the long-term as free and open-source software. In all cases, contributors who sign the Contributor's Agreement retain full rights to use their original contributions for any other purpose outside of Archivematica, while enabling Artefactual Systems, any successor organization which may eventually take over responsibility for Archivematica, and the wider Archivematica community to benefit from their collaboration and contributions in this open source project.
Artefactual Systems has made the decision and has a proven track record of making our intellectual property available to the community at large. By standardizing contributions on these agreements the Archivematica intellectual property position does not become too complicated. This ensures our resources are devoted to making our project the best they can be, rather than fighting legal battles over contributions.
Please print out, read, sign, and scan the contributor agreement and email it to [email protected] Alternatively, you may send an original signed agreement to:
Artefactual Systems Inc.
201 - 301 Sixth Street
New Westminster BC V3L 3A7
Canada
Archivematica uses the Python PEP8 community style guidelines. Newly-written code should conform to PEP-8 style. PEP8 is a daunting document, but there are very good linters available that you can run to check style in your code. The flake8 tool checks for style problems as well as errors and complexity. It can be used at the command line or as a plugin in your preferred text editor/IDE.
A few additional stylistic preferences might not get flagged by linters:
- Don't use variable or parameter names that shadow builtin functions and types. For example, don't name a variable "file". (Unfortunately, Python uses many useful names for its builtin types and functions.)
- Sort imports alphabetically within their grouping to reduce duplicate imports.
Archivematica ignores the PEP8 rule concerning line length; lines longer than 80 characters are acceptable. This is error "501", for the purposes of silencing it in flake8 or other linting tools.
Older portions of the Archivematica codebase don't always conform to PEP8 rules; the codebase is slowly transitioning to PEP8 style. Don't worry about rewriting existing code to conform to the guidelines when adding new code; however, if possible, it's great if your new code does conform.
When working in sections of code that don't conform to PEP8, it's okay to relax a few PEP8 rules in order to match existing code. In particular:
- When modifying an existing function which uses camelCase variables or parameters, it's okay to make your new variables/parameters camelCase to match.
- When adding new functions to a module or class that uses camelCase naming, it's okay to make your new function and its parameters camelCase to match. Try to use snake_case internally, however.
You should try to write Python 2 / Python 3 compatible code where possible.
Using from __future__ import print_function, division, absolute_import
will help with this.
Some libraries run in Python 2 and Python 3 already; this behaviour should be maintained.
New classes and functions should generally be documented using docstrings; these help in providing clarity, and can also be used to generate API documentation later. Generally any function which isn't obvious (any function longer than a line or two) should have a docstring. When in doubt: document! Python's PEP 257 document provides a useful guideline for docstring style. Generally, prefer using Sphinx-compatible docstrings. More examples and attributes to use can be found on the Sphinx website.
New code should also have unit tests.
Tests are written in unittest style and run with py.test.
For tests requiring the Django ORM, we use the Django-provided TestCase, which extends unittest.TestCase
.
Tests are found in the tests
directory, a sibling of the directory containing the code.
test_foo.py
contains tests for foo.py
.
For clarity, group tests for the same function and similar tests into the same class within that file.
This will also allow you to share setup and teardown code.
If you are testing code that makes HTTP requests, using VCR.py is highly recommended. It should already be a development dependency.
A clean commit history makes it easier to see what changes are related, and why there were made. It makes reviewing code easier, and spelunking in the codebase to find why a change was made more rewarding. The following few criteria help outline what makes for a clean commit history:
Any single commit should cover a specific change. That change might span multiple files, but it shouldn't introduce multiple distinct functional changes unless it's impossible to separate those changes from each other. It's okay (and normal!) for a branch that makes a lot of changes to consist of many small commits, each of which makes individual small changes. This makes it easier to revert individual commits from a branch, and also makes it easier to isolate the commit that caused a particular bug with git-blame.
In development, it's common to introduce a change with a commit and then introduce multiple other commits with small fixes for that change. When getting ready for merging, those commits should all be squashed together.
No individual commit should put the software in a broken state. It's fine for code to go unused (because the feature it's been introduced for isn't there yet), but no new functional issues should be introduced by a given commit. This ensures that reverting any individual commit from the branch is safe, and that git-bisect stays reliable for tracking down bugs.
The commit summary (the first line of a commit message) should be short and clear. Usually, the first line of the commit message should be 50 - 80 characters. Since the summary is just a summary of the change, it should be readable at a glance when looking through git history.
A good commit summary makes it clear a) what part of the code is changing, and b) what the change is. For example:
Clear commit summary:
normalize: fall back to default rule for unidentified files
Unclear commit summaries:
Fixed some normalization bugs
Bugfixes
The unclear messages make it hard to tell at a glance what changed, and that makes browsing the commit history harder.
The commit message is the rest of the commit past the first line. If a commit makes a small and obvious change, it's fine to not even have a commit message past the summary.
The commit message is your place to clarify the justification for a change. While there's no need to rehash anything that code comments already say, if there's more detail that helps a reader understand why a change was made, be as verbose as you need to! Remember: future-you (or another developer) will read this when going through the commit history to understand why a change was made. Make their life easier.