JADE (JAnelia Data Environment) is a distributed storage system that can manage a set of configured volumes local to the machine or shared
- NFS-mounted from a remote host. Typically the shared volumes are read only with the exception of the special OVERFLOW_VOLUME which is selected when a node runs out of available space on the local volumes. The system consists of a master node (service) that manages the resource allocation and one or more workers or agents that are responsible with reading and writing the data onto/from the managed volumes.
The development environment requires access to our development MongoDB instance running on dev-mongodb in order to run the integration tests. If you are already inside Janelia that is not a problem - since you automatically have access to it, otherwise you may have to be on VPN in order to access it. Another option is to use any other MongoDB instance and write a java properties file that overrides the MongoDB settings. Then set up the environment variables - JACSSTORAGE_CONFIG for production and JACSSTORAGE_CONFIG_TEST for build to point to it, e.g.
mkdir local
cat <<EOF > local/myConfig.properties
MongoDB.ConnectionURL=mongodb://myusername:mypassword@localhost:27017/?authSource=db1
EOF
export JACSSTORAGE_CONFIG=$PWD/local/myConfig.properties
export JACSSTORAGE_CONFIG_TEST=$PWD/local/myConfig.properties
As a note if you only want to compile and create the distribution then you don't need the MongoDB setup because the integration tests run only as part of the build task, installDist only compiles and creates the zip and tar distributions.
To install MongoDB on MacOS:
With Homebrew:
brew install mongodb
With macports:
sudo port install mongodb
On Centos based Linux distributions (Centos, Scientific Linux) you can use:
yum install mongodb-org-server
On Debian based Linux distributions (Debian, Ubuntu) you can use:
sudo apt-get install mongodb-org
Once MongoDB is installed on your machine you really don't have to do anything else because the tests or the application will create the needed databases and the collections as long as the configured mongo user has prvileges to do so.
To full build the application, which includes running all unit tests and integration tests, and create the distribution simply run:
./gradlew clean build installDist
To only compile the application and create the distribution run (this will not run any tests and therefore it will not require any Mongo database setup):
./gradlew clean installDist
To run only the integration tests:
./gradlew integrationTest
If you want to use a different test database than the development MongoDB instance you can create a configuration file, as explained above, in which you override the database connection settings and then use JACSSTORAGE_CONFIG_TEST environment variable to point to it, eg.,
JACSSTORAGE_CONFIG_TEST=/my/prefered/location/for/dev/my-config-test.properties ./gradlew clean build installDist
Note:
When using the environment variable to reference the configuration use the full path in order to guarantee that the right properties are being used.
To generate an RPM package create a gradle.properties:
cat > local/gradle.properties <<EOF
jacs.runtime.env.apiKey=JacsStorageAuthorizedAPI.Dev
jacs.runtime.env.jwtSecret=<put the secret key here>
jacs.runtime.env.agentHttpPort=9881
jacs.runtime.env.masterHttpPort=9880
jacs.runtime.env.logsRootDir=/data/jacsstorage/prod-logs
EOF
./gradlew --gradle-user-home=local ospackage
Then on centos use yum to install the generated packages
sudo yum install jacsstorage-masterweb/build/distributions/jacsstorage-masterweb-${jadeVersion}-1.i386.rpm
sudo yum install jacsstorage-agentweb/build/distributions/jacsstorage-agentweb-${jadeVersion}-1.i386.rpm
where jadeVersion is the version from the main build.gradle file.
Note that 'ospackage' task just like 'installDist' will not run any unit tests or integration tests so you don't need access to any MongoDB instance.
docker build jacsstorage-masterweb --build-arg SSH_PRIVATE_KEY="`cat ~/.ssh/id_rsa`" -t jacsstorage-masterweb
docker build jacsstorage-agentweb --build-arg SSH_PRIVATE_KEY="`cat ~/.ssh/id_rsa`" -t jacsstorage-agentweb
docker-compose build --build-arg SSH_PRIVATE_KEY="`cat ~/.ssh/id_rsa`"
The above command builds master by default, so if you need to containerize a different branch (such as dev in the example below) one can use:
docker-compose build --build-arg SSH_PRIVATE_KEY="`cat ~/.ssh/id_rsa`" --build-arg BUILD_TAG=dev
To run the async services with the default settings which assume a Mongo database instance running on the same machine where the web server is running:
jacsstorage-web/build/install/jacsstorage-web/bin/jacsstorage-web
If you want to debug the application you can start the application with the debug agent as below:
JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005" jacsstorage-web/build/install/jacsstorage-web/bin/jacsstorage-web
The default production settings could be overwritten with your own settings in a java properties file similar to the test settings the only difference is the name of the environment variable - for production settings use JACSSTORAGE_CONFIG environment variable.
JACSSTORAGE_CONFIG=/usr/local/etc/myjacsstorage-config.properties jacsstorage-web/build/install/jacsstorage-web/bin/jacsstorage-web
If the master and the agent are installed as system services:
sudo systemctl daemon-reload sudo systemctl start jacsstorage-masterweb sudo systemctl start jacsstorage-agentweb
The storage service persists data in "data bundles". A data bundle is a group of files that are all persisted on the same data node under the same GUID based on the locality of reference, i.e., data files that often need to be accessed together by some other service or application. These could be for example data files associated with a particular sample, or they can be data files that resulted from a certain processing pipeline. It is up to the user of the storage service to group the files that need to be persisted "together". The storage service may also associate certain properties with the data bundle that could be used later for searching the persisted bundles. The data files that are part of a bundle can also be organized in a directory hierarchy and the user can control whether these data files should be persisted in an expanded directory structure or in a TAR archive.
Most storage service invocation required authenticated access. The authentication is verified using a Json Web Token (JWT) passed in with every request in the 'Authorization' header as a bearer token. This is very similar to the SCP command where the user is prompted for username and password for each invocation.
One can obtain a JWT from the authorization service:
development environment: 'https://jacs-dev.int.janelia.org/SCSW/AuthenticationService/v1/authenticate'
or
production environment: 'http://api.int.janelia.org:8030/authenticate'
cat > local/auth.sh <<EOF
#!/bin/sh
AUTH_ENDPOINT="https://jacs-dev.int.janelia.org/SCSW/AuthenticationService/v1/authenticate"
username=$1
password=$2
curl -X POST "" \
-H "accept: application/json" \
-H "content-type: application/json" \
-d "{ \"username\": \"${username}\", \"password\": \"${password}\"}"
EOF
sh local/auth.sh myusername mypassword
The above call will return a JSON blob that looks as below. The value of the 'token' attribute is the one that will need to be passed with all the invcations that require authentication as a bearer token in the 'Authorization' header - 'Authorization: Bearer '.
{"token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE1MTg2MjY5NDAsInVzZXJfbmFtZSI6ImphY3MifQ.Yoku2rQfRn4GzoLYCfFc4Sag0jjrnYI_-A5W1W4I-o4","user_name":"jacs"}
To setup the storage volumes managed by an agent - the config must have a property StorageAgent.BootstrappedVolumes
which contains a comma delimited list of volumes to be bootstrapped. Then for each volume from the list
there must be a set of properties that defines the corresponding root path, whether the volume is local to the host
or it is a shared volume, the volume tags. The format of this properties is:
StorageVolume.<volumeName>.<volumeProperty>
The current supported properties are:
RootDir
- defines the volume's root directory
Shared
- specifies whether this volume is on a shared mount point
PathPrefix
- virtual root directory
Tags
- list of features or labels attached to the volume
Once the configuration is prepared for bootstrapping the you only need to start the agent with -bootstrapStorageVolumes
flag
The method documentation is available here
This would be similar to creating a subdirectory in the user's home directory:
md /users/home/myusername/aWorkingSubdirForProject1
The equivalent storage service curl invocation is:
curl -i -X POST 'http://localhost:8880/jacsstorage/master_api/v1/storage' \
-H 'Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE1MTgyMDE0MjUsInVzZXJfbmFtZSI6ImphY3MifQ.El8GcDhswj-mNmBK2uMaAXHqBPDN_AGgNm_oyU3McQs' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d "{ \"name\": \"aWorkingSubdirForProject1\", \"storageFormat\": \"DATA_DIRECTORY\"}"
The method returns a JSON block that contains information about the new created data bundle:
{
id=2503313663696306217,
name=workspace1,
ownerKey=user:jacs,
path=306/217/2503313663696306217,
readersKeys=[],
writersKeys=[],
storageRootPrefixDir=/localhost/d1,
storageRootRealDir=/var/tmp/d1,
storageHost=localhost,
storageTags=[jade, d1],
connectionURL=http://localhost:8881/jacsstorage/agent_api/v1,
storageFormat=DATA_DIRECTORY,
requestedSpaceInBytes=null,
checksum=null,
metadata={}}
The method documentation is available here
curl -i 'http://localhost:8880/jacsstorage/master_api/v1/storage/2503313663696306217' \
-H 'Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE1MTgyMDE0MjUsInVzZXJfbmFtZSI6ImphY3MifQ.El8GcDhswj-mNmBK2uMaAXHqBPDN_AGgNm_oyU3McQs' \
-H 'accept: application/json'
The method returns a JSON block identical to the one returned by the allocate operation.
The method documentation is available here
curl -i 'http://localhost:8880/jacsstorage/master_api/v1/storage?name=workspace1' \
-H 'Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE1MTgyMDE0MjUsInVzZXJfbmFtZSI6ImphY3MifQ.El8GcDhswj-mNmBK2uMaAXHqBPDN_AGgNm_oyU3McQs' \
-H 'accept: application/json'
If the user does not have admin privileges search automatically uses the current user as the owner and only searches entries created by the current user.
The method documentation is available here
This command creates a subdirectory in the data bundle's workspace. The corresponding shell commands are change directory to the workspace followed by create subdirectory in the current directory, i.e.,
cd /users/home/myusername/aWorkingSubdirForProject1
md myDir1
The equivalent storage service curl invocation is and the command must use the base URL returned in the 'connectionURL' field of the allocate result or get storage info result:
curl -X POST "http://localhost:8881/jacsstorage/agent_api/v1/agent_storage/2501203311319875608/directory/myDir1" \
-H "accept: application/json" \
-H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE1MTgyMDE0MjUsInVzZXJfbmFtZSI6ImphY3MifQ.El8GcDhswj-mNmBK2uMaAXHqBPDN_AGgNm_oyU3McQs" \
-H "Content-Type: application/octet-stream"
The directory entry can be a path hierarchy but the constraint is that all parent directories must already exist in the storage bundle. For example if the command is:
curl -X POST "http://localhost:8881/jacsstorage/agent_api/v1/agent_storage/2501203311319875608/directory/myDir1/myDir1.1//myDir1.1.3" \
-H "accept: application/json" \
-H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE1MTgyMDE0MjUsInVzZXJfbmFtZSI6ImphY3MifQ.El8GcDhswj-mNmBK2uMaAXHqBPDN_AGgNm_oyU3McQs" \
-H "Content-Type: application/octet-stream"
The entries 'myDir1' and 'myDir1/myDir1.1' must already exist in the bundle '2501203311319875608' and they must be directory entries.
The method returns the a JSON block for the new entry as well as the access URL in the header's 'location' attribute
The method documentation is available here
curl -X POST "http://localhost:8881/jacsstorage/agent_api/v1/agent_storage/2501203311319875608/file/myDir1/myFile1.1" \
-H "accept: application/json" \
-H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE1MTgyMDE0MjUsInVzZXJfbmFtZSI6ImphY3MifQ.El8GcDhswj-mNmBK2uMaAXHqBPDN_AGgNm_oyU3McQs" \
-H "Content-Type: application/octet-stream" \
-d @"aLocalFile"
Similar to the 'create directory' call the base URL must be the actual storage URL, returned by allocate or get info methods in the 'connectionURL' field and if the file entry name denotes a hierarchical structure then all its parents must exist and be directory entries
The method returns the a JSON block for the new entry as well as the access URL in the header's 'location' attribute
The method documentation is available here
The shell equivalent commands would be:
cd /users/home/myusername/aWorkingSubdirForProject1
ls -l
The curl command must use the actual storage URL returned in 'connectionURL' field.
curl "http://localhost:8881/jacsstorage/agent_api/v1/agent_storage/2501203311319875608/list?entry=myDir1" \
-H "accept: application/json" \
-H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE1MTgyMDE0MjUsInVzZXJfbmFtZSI6ImphY3MifQ.El8GcDhswj-mNmBK2uMaAXHqBPDN_AGgNm_oyU3McQs" \
The method documentation is available here
The curl command must use the actual storage URL returned in 'connectionURL' field.
curl "http://localhost:8881/jacsstorage/agent_api/v1/agent_storage/2501203311319875608/data_content/myDir1/myDir1.1" \
-H "accept: application/json" \
-H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE1MTgyMDE0MjUsInVzZXJfbmFtZSI6ImphY3MifQ.El8GcDhswj-mNmBK2uMaAXHqBPDN_AGgNm_oyU3McQs" \
If the entry name (the path after 'data_content') denotes a folder the method returns all the sub-entries from from specified entry packaged in a tar archive.
If no entry is specified then the method returns the content of the entire bundle as a tar archive
The storage service allows users to download content that resides on shared storage such: dm11, nrs or nearline. The retrieval can be done by going directly to an agent node and using "storage_content/storage_path" endpoint or using a more reliable mechanism that requires two steps - first get an agent that can serve the content from the master and then use the agent's "storage_content/storage_path" endpoint to actually retrieve the content. For example to retrieve the data file '/nrs/jacs/jacsData/flylight/pipelineResult/data1.png' one can use the following sequence:
curl -i -X PROPFIND http://localhost:8880/jacsstorage/master_api/v1/webdav/data_storage_path/nrs/jacs/jacsData/flylight/pipelineResult/data1.png \
-H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE1MTgyMDE0MjUsInVzZXJfbmFtZSI6ImphY3MifQ.El8GcDhswj-mNmBK2uMaAXHqBPDN_AGgNm_oyU3McQs" \
-H "accept: application/xml"
then get the agent url from the response' HREF field and (assuming HREF field is http://localhost:8881/jacsstorage/agent_api/v1/agent_storage) use:
curl http://localhost:8881/jacsstorage/agent_api/v1/agent_storage/storage_content/storage_path/nrs/jacs/jacsData/flylight/pipelineResult/data1.png \
-H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJleHAiOjE1MTgyMDE0MjUsInVzZXJfbmFtZSI6ImphY3MifQ.El8GcDhswj-mNmBK2uMaAXHqBPDN_AGgNm_oyU3McQs" \
-H "accept: application/octet-stream"
See the jacsstorage-clients module for more information about the CLI.
See the jacsstorage-clients module for more information about the Java API.
For Java examples please take a look at the 'examples/java' directory.
To access the storage service from python please take a look at the examples/python
directory.