Please find the latest developent environment under the main Archivematica repository: https://github.com/artefactual/archivematica/tree/qa/1.x/hack. Use this repository if you are still working on Archivematica v1.12 or older. Thank you!
- Audience
- Requirements
- Installation
- Web UIs
- Upgrading to the latest version of Archivematica
- Source code auto-reloading
- Logs
- Scaling
- Ports
- Tests
- Cleaning up
- Instrumentation
- Troubleshooting
This Archivematica environment is based on Docker Compose and it is specifically designed for developers. Compose can be used in a production environment but that is beyond the scope of this recipe.
Artefactual developers use Docker Compose heavily so it's important that you're familiar with it. Please read the documentation.
System requirements. Memory usage when the environment is
initialized (obtained using docker stats
):
CONTAINER NAME MEM USAGE (MiB)
archivematica-mcp-server 60.6
archivematica-mcp-client 32.5
archivematica-dashboard 60.0
archivematica-storage-service 74.6
clamavd 545.6
gearmand 1.6
mysql 530.4
redis 1.8
nginx 2.5
elasticsearch 229.3
fits 70.7
Software dependencies: Docker, Docker Compose, git and make.
It is beyond the scope of this document to explain how these dependencies are installed in your computer. If you're using Ubuntu 16.04 the following commands may work:
$ sudo apt update
$ sudo apt install -y build-essential python-dev git
$ sudo pip install -U docker-compose
And install Docker CE following these instructions.
Install the rng-tools
daemon if you want to set up GPG encrypted spaces. The
Storage Service container should have access to the /dev/random
device.
Docker will provide instructions on how to use it as a non-root user. This may not be desirable for all.
If you would like to use Docker as a non-root user, you should now consider
adding your user to the "docker" group with something like:
sudo usermod -aG docker <user>
Remember that you will have to log out and back in for this to take effect!
WARNING: Adding a user to the "docker" group will grant the ability to run
containers which can be used to obtain root privileges on the
docker host.
Refer to https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
for more information.
The impact to those following this recipe is that any of the commands below which call Docker will need to be run as a root user using 'sudo'.
Installation of Archivematica on machines running macOS using Docker is possible, but still in development and may require some extra steps. If you are new to Archivematica and/or Docker, or have an older machine, it may be better to instead use a Linux machine.
For the Elasticsearch container to run properly, you may need to increase the
maximum virtual memory address space vm.max_map_count
to at least [262144]
.
This is a configuration setting on the host machine running Docker, not the
container itself.
To make this change:
sudo sysctl -w vm.max_map_count=262144
To persist this setting, modify /etc/sysctl.conf
and add:
vm.max_map_count=262144
For more information, please consult the Elasticsearch 6.x
documentation.
If you haven't already, create a directory to store this repository using git clone, and pull all the submodules:
$ git clone https://github.com/artefactual-labs/am.git
$ git submodule update --init --recursive
Run the installation (and all Docker Compose) commands from within the compose directory:
$ cd ./am/compose
These are the commands you need to run when starting from scratch:
$ make create-volumes
$ docker-compose up -d --build
$ make bootstrap
$ make restart-am-services
make create-volumes
creates two external volumes. They're heavily used in our
containers but they are provided in the host machine:
$HOME/.am/am-pipeline-data
- the shared directory.$HOME/.am/ss-location-data
- the transfer source location.
Make commands above, and any subsequent calls to it below can be reviewed using the following command from the compose directory:
$ make help
The installation instructions above will install the submodules defined in
https://github.com/artefactual-labs/am/tree/master/src which are from the
qa/1.x
branch of Archivematica and the qa/0.x
branch of Archivematica
Storage Service.
To upgrade your installation to include the most recent changes in the submodules, use the following commands:
$ git pull --rebase
$ git submodule update --init --recursive
$ docker-compose up -d --force-recreate --build
$ make bootstrap
$ make restart-am-services
The submodules are not always up to date, i.e. they may not be pointing to the
latest commits of their tracking branches. They can be updated manually using
git pull --rebase
:
$ cd ../src/archivematica && git pull --rebase
$ cd ../src/archivematica-storage-service && git pull --rebase
Once you're done, run:
$ docker-compose up -d --force-recreate --build
$ make bootstrap
$ make restart-am-services
Working with submodules can be a little confusing. GitHub's Working with submodules blog post is a good introduction.
- Archivematica Dashboard: http://127.0.0.1:62080/
- Archivematica Storage Service: http://127.0.0.1:62081/
The default credentials for the Archivematica Dashboard and the Storage Service
are username: test
, password: test
.
Dashboard and Storage Service are both served by Gunicorn. We set up Gunicorn with the reload setting enabled meaning that the Gunicorn workers will be restarted as soon as code changes.
Other components in the stack like the MCPServer
don't offer this option and
they need to be restarted manually, e.g.:
$ docker-compose up -d --force-recreate --no-deps archivematica-mcp-server
If you've added new dependencies or changes the Dockerfile
you should also
add the --build
argument to the previous command in order to ensure that the
container is using the newest image, e.g.:
$ docker-compose up -d --force-recreate --build --no-deps archivematica-mcp-server
In recent versions of Archivematica we've changed the logging configuration so the log events are sent to the standard streams. This is a common practice because it makes much easier to aggregate the logs generated by all the replicas that we may be deploying of our services across the cluster.
Docker Compose aggregates the logs for us so you can see everything from one place. Some examples:
docker-compose logs -f
docker-compose logs -f archivematica-storage-service
docker-compose logs -f nginx archivematica-dashboard
Docker keeps the logs in files using the JSON File logging driver. If you want to clear them, we provide a simple script that can do it for us quickly but it needs root privileges, e.g.:
$ sudo make flush-logs
With Docker Compose we can run as many containers as we want for a service,
e.g. by default we only provision a single replica of the
archivematica-mcp-client
service but nothing stops you from running more:
$ docker-compose up -d --scale archivematica-mcp-client=3
We still have one service but three containers. Let's verify that the workers are connected to Gearman:
$ echo workers | socat - tcp:127.0.0.1:62004,shut-none | grep "_v0.0" | awk '{print $2}' - | sort -u
172.19.0.15
172.19.0.16
172.19.0.17
Service | Container port | Host port |
---|---|---|
mysql | tcp/3306 |
tcp/62001 |
elasticsearch | tcp/9200 |
tcp/62002 |
redis | tcp/6379 |
tcp/62003 |
gearman | tcp/4730 |
tcp/62004 |
fits | tcp/2113 |
tcp/62005 |
clamavd | tcp/3310 |
tcp/62006 |
nginx » archivematica-dashboard | tcp/80 |
tcp/62080 |
nginx » archivematica-storage-service | tcp/8000 |
tcp/62081 |
selenium-hub | tcp/4444 |
tcp/62100 |
The Makefile
includes many useful targets for testing. List them all with:
$ make 2>&1 | grep test-
The sources of the acceptance tests have been made available inside Docker using volumes so you can edit them and the changes will apply immediately.
In many cases, as a tester or a developer, you want to restart all the containers at once and make sure the latest version of the images are built. But also, you don't want to lose your data like the search index or the database. If this is case, run the following command:
$ docker-compose up -d --force-recreate --build
Additionally you may want to delete all the deta including the stuff in the external volumes:
$ make flush
Both snippets can be combined or used separately.
You may need to update the codebase, and for that you can run this command:
$ git submodule update --init --recursive
The most effective way is:
$ docker-compose down --volumes
It doesn't delete the external volumes described in the Installation section of this document. You have to delete the volumes manually with:
$ docker volume rm am-pipeline-data
$ docker volume rm ss-location-data
Optionally you may also want to delete the directories:
$ rm -rf $HOME/.am/am-pipeline-data $HOME/.am/ss-location-data
To use different settings on the MySQL container, please edit the
etc/mysql/tunning.conf
file and rebuild the container with:
$ docker down
$ docker-compose up -d
Prometheus and Grafana can be used to monitor Archivematica processes.
To run them, reference the docker-compose.instrumentation.yml
file:
$ docker-compose -f docker-compose.yml -f docker-compose.instrumentation.yml up -d
Prometheus will start on localhost:9090; Grafana on localhost:3000.
Extending the default environment, you can deploy an instance of
Percona Monitoring and Management configured by default to
collect metrics and query analytics data from the mysql
service. To set up the
PMM server and client services alongside all the others you'll need to indicate
two Docker Compose files:
$ docker-compose -f docker-compose.yml -f docker-compose.pmm.yml up -d
To access the PMM server interface, visit http://localhost:62007:
- Username:
pmm
- Password:
pmm
We're using Nginx as a proxy. Likely the underlying issue is that either the
Dashboard or the Storage Service died. Run docker-compose ps
to confirm it:
Name State
-------------------------------------------------
compose_archivematica-storage-service_1 Exit 3
You want to see what's in the logs, e.g.:
$ docker-compose logs -f archivematica-storage-service
ImportError: No module named storage_service.wsgi
[2017-10-26 19:28:24 +0000] [11] [INFO] Worker exiting (pid: 11)
[2017-10-26 19:28:24 +0000] [7] [INFO] Shutting down: Master
[2017-10-26 19:28:24 +0000] [7] [INFO] Reason: Worker failed to boot.
Now we know why - I had deleted the wsgi
module. The worker crashed and
Gunicorn gave up. This could happen for example when we're rebasing a branch
and git is not atomically moving things around. But it's fixed now and you want
to give it another shot so we run docker-compose up -d
to ensure that all the
services are up again. Next run docker-compose ps
to verify that it's all up.
The MCPClient Docker image bundles a number of tools that are used by Archivematica. They're listed in the osdeps files but the MCPClient.Dockerfile is not making use of it yet. If you need to introduce dependency changes in MCPClient you will need to update both.
In #931 we started publishing a MCPClient base Docker image to speed up development workflows. This means that the dependencies listed in the osdeps files are now included in MCPClient-base.Dockerfile. Once the file is updated you would publish the corresponding image as follows:
$ docker build -f MCPClient-base.Dockerfile -t artefactual/archivematica-mcp-client-base:20180219.01.52dc9959 .
$ docker push artefactual/archivematica-mcp-client-base:20180219.01.52dc9959
Where the tag 20180219.01.52dc9959
is a combination of the date, build number
that day and the commit that updates the Dockerfile.
As a developer you may want to build the image and test it before publishing it, e.g.:
- Edit
MCPClient-base.Dockerfile
. - Build image with new tag.
- Update the
FROM
instruciton inMCPClient.Dockerfile
to use it. - Build the image of the
archivematica-mcp-client
service. - Test it.
- Publish image.
Our Docker named volumes are stored under /tmp
which means that it is
possible that they will be recycled at some point by the operative system. This
frequently happens when you restart your machine.
Under this scenario, if you try to bring up the services again you will likely see one or more errors like the following:
ERROR: for compose_archivematica-mcp-server_1 Cannot create container for
service archivematica-mcp-server: error while mounting volume with options:
type='none' device='/home/user/.am/am-pipeline-data' o='bind': no such file
or directory
The solution is simple. You need to create the volumes again:
$ make create-volumes
And now you're ready to continue as usual:
$ docker-compose up -d --build
Optionally, you can define new persistent locations for the external volumes.
The defaults are defined in the Makefile
:
# Paths for Docker named volumes
AM_PIPELINE_DATA ?= $(HOME)/.am/am-pipeline-data
SS_LOCATION_DATA ?= $(HOME)/.am/ss-location-data
Running tests with make test-mcp-client
and such can be very slow because the
database is re-created on each attempt. When the tests are done the database is
removed unless you use --reuse-db
, e.g.: you can use the following command to
run the MCPClient tests.
docker-compose run --no-deps --user=root --workdir /src/MCPClient --rm --entrypoint=py.test archivematica-mcp-client --reuse-db --exitfirst
The difference is noticeable.
In the event that make bootstrap
fails to run while installing, the Bootstrap
components may need to be installed individually inside the application. This
error message is more likely if you are attemping to install Archivematica on a
Mac.
First, go into the Makefile and comment out everything in the
bootstrap-dashboard-frontend
section of the script, then run make bootstrap
again and continue with the install process.
To install frontend Bootstrap dependencies manually:
yarn --cwd src/archivematica/src/dashboard/frontend install
npm
should also work fine.
If after running the bootstrap processes and docker-compose ps
still shows
that the dashboard and elasticsearch are still down then check the
elasticsearch logs using:
$ docker-compose logs -f elasticsearch
You may see entries as follows:
[2019-01-25T16:36:43,535][INFO ][o.e.n.Node ] [am-node] starting ...
[2019-01-25T16:36:43,671][INFO ][o.e.t.TransportService ] [am-node] publish_address {172.18.0.7:9300}, bound_addresses {0.0.0.0:9300}
[2019-01-25T16:36:43,681][INFO ][o.e.b.BootstrapChecks ] [am-node] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2019-01-25T16:36:43,688][INFO ][o.e.n.Node ] [am-node] stopping ...
[2019-01-25T16:36:43,720][INFO ][o.e.n.Node ] [am-node] stopped
[2019-01-25T16:36:43,720][INFO ][o.e.n.Node ] [am-node] closing ...
[2019-01-25T16:36:43,730][INFO ][o.e.n.Node
This indicates that you may need to increase the virtual memory available to Elasticsearch, as discussed in the section Elasticsearch container above.
You've read this far but you haven't yet figured out why your development environment is not working? Here are some tips:
- Does your system meet the requirements? Some services like Elasticsearch or ClamAV need a lot of memory!
- Make sure that you've checked out the latest commit of this repository.
- Make sure that your repositories under
/src
(submodules) are up to date. If you are working off your own branches, make sure they are not outdated. Rebase often! - This repo has dedicated branches to support released versions of Archivematica, e.g. stable/1.7.x is the recommended branch if you're doing work that targets a potential v1.7.x patch release.
- Look for open/closed issues that may relate to your problem! A few repos where you may find them: artefactual/archivematica, artefactual-labs/am and archivematica/issues.
- Get support.
In some cases the pmm_client
service fails to start reporting the following
error:
[main] app already is running, exiting
You'll need to fully recreate the container to make it work:
docker-compose -f docker-compose.yml -f docker-compose.pmm.yml rm pmm_client
docker-compose -f docker-compose.yml -f docker-compose.pmm.yml up -d