diff --git a/README.md b/README.md index 7a4abbfa..e42bddca 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,25 @@ # TRAINS Server + ## Magic Version Control & Experiment Manager for AI ## Introduction -The **trains-server** is the infrastructure behind [trains](https://github.com/allegroai/trains). +The **trains-server** is the infrastructure for [trains](https://github.com/allegroai/trains). +It allows multiple users to collaborate and manage their experiments. + +The **trains-server** contains the following components: + +* the Web-App which is a single-page UI for experiment management and browsing +* a REST interface for: + * documenting and logging experiment information, statistics and results + * querying experiments history, logs and results +* a locally-hosted file server for storing images and models making them easily accessible using the Web-App -The server provides: +You can quickly setup your **trains-server** using a pre-built Docker image (see [Installation](#installation)). - * UI (single-page webapp) for experiment management and browsing - * REST interface for documenting and logging experiment information, statistics and results - * REST interface for querying experiments history, logs and results - * Locally-hosted fileserver, for storing images and models to be easily accessible from the UI +When new releases are available, you can upgrade your pre-built Docker image (see [Upgrade](#upgrade)). -The server is designed to allow multiple users to collaborate and manage their experiments. -The server’s code is freely available [here](https://github.com/allegroai/trains-server). -We've also pre-built a docker image to allow **trains** users to quickly set up their own server. +The **trains-server's** code is freely available [here](https://github.com/allegroai/trains-server). ## System diagram @@ -57,57 +62,94 @@ We've also pre-built a docker image to allow **trains** users to quickly set up ## Installation -In order to install and run the pre-built **trains-server**, you must be logged in as a user with sudo privileges. +This section contains the instructions to setup and launch a pre-built Docker image for the **trains-server**. + +**Note**: This Docker image was tested with Linux, only. For Windows users, we recommend running the server +on a Linux virtual machine. +### Prerequisites + +You must be logged in as a user with sudo privileges. + ### Setup -In order to run the pre-packaged **trains-server**, you'll need to install **docker**. +#### Step 1. Install Docker CE + +You must install Docker to run the pre-packaged **trains-server**. -#### Install docker +* For [Ubuntu](https://docs.docker.com/install/linux/docker-ce/ubuntu/) / Mint (x86_64/amd64): ```bash -sudo apt-get install docker +sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common +curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - +. /etc/os-release +sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $UBUNTU_CODENAME stable" +sudo apt-get update +sudo apt-get install -y docker-ce ``` -#### Setup docker daemon -In order to run the ElasticSearch docker container, you'll need to change some of the default values in the Docker configuration file. +* For other operating systems, see [Supported platforms](https://docs.docker.com/install//#support) in the Docker documentation for instructions. + +#### Step 2. Setup the Docker daemon + +To run the ElasticSearch Docker container, you must setup the Docker daemon by modifing the default +values required by Elastic in your Docker configuration file +that are used by the **trains-server**. We provide instructions for the most common Docker configuration files. + +You must edit or create a Docker configuration file: + +* If your Docker configuration file is `/etc/sysconfig/docker`, edit it. -For systems with an `/etc/sysconfig/docker` file, add the options in quotes to the available arguments in `OPTIONS`: + Add the options in quotes to the available arguments in the `OPTIONS` section: ```bash OPTIONS="--default-ulimit nofile=1024:65536 --default-ulimit memlock=-1:-1" ``` -For systems with an `/etc/docker/daemon.json` file, add the section in curly brackets to `default-ulimits`: +* Otherwise, edit `/etc/docker/daemon.json` (if it exists) or create it (if it does not exist). + + Add or modify the `defaults-ulimits` section as shown below. Be sure your configuration file contains the `nofile` and `memlock` sub-sections and values shown. + + **Note**: Your configuration file may contain other sections. If so, confirm that the sections are separated by commas. For more information about Docker configuration files, see an [Daemon configuration file](https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-configuration-file) in the Docker documentation. + + The **trains-server** required defaults values are: ```json -"default-ulimits": { - "nofile": { +{ + "default-ulimits": { + "nofile": { "name": "nofile", "hard": 65536, "soft": 1024 - }, - "memlock": - { + }, + "memlock": + { "name": "memlock", "soft": -1, "hard": -1 + } } } ``` -Following this configuration change, you will have to restart the docker daemon: +#### Step 3. Restart the Docker daemon + +You must restart the Docker daemon after modifying the configuration file: ```bash sudo service docker stop sudo service docker start ``` -#### vm.max_map_count +#### Step 4. Set the Maximum Number of Memory Map Areas -The `vm.max_map_count` kernel setting must be at least 262144. +The maximum number of memory map areas a process can use is defined +using the `vm.max_map_count` kernel setting. -The following example was tested with CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19: +Elastic requires that `vm.max_map_count` to be at least 262144. + +* For CentOS 7, Ubuntu 16.04, Mint 18.3, Ubuntu 18.04 and Mint 19 users, we tested the following commands to set +`vm.max_map_count`: ```bash sudo echo "vm.max_map_count=262144" > /tmp/99-trains.conf @@ -115,25 +157,23 @@ sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf sudo sysctl -w vm.max_map_count=262144 ``` -For additional information about setting this parameter on other systems, see the [elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode) documentation. - -#### Choose a data folder +* For information about setting this parameter on other systems, see the [elastic](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode) documentation. -You will need to choose a directory on your system in which all data maintained by **trains-server** will be stored (among others, this includes database, uploaded files and logs). +#### Step 5. Choose a Data Directory -The following instructions assume the directory is `/opt/trains`. +You must choose a directory on your system in which all data maintained by the **trains-server** is stored, +create that directory, and set its permissions. The data stored in that directory includes the database, uploaded files and logs. -Issue the following commands: +For example, if your data directory is `/opt/trains`, then use the following command: ```bash sudo mkdir -p /opt/trains/data/elastic && sudo chown -R 1000:1000 /opt/trains ``` -### Launching docker images - - -To launch the docker images, issue the following commands: +### Launching Docker Containers +Launch the Docker containers. For example, if your data directory is `\opt\trains`, +then use the following commands: ```bash sudo docker run -d --restart="always" --name="trains-elastic" -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -e "bootstrap.memory_lock=true" -e "cluster.name=trains" -e "discovery.zen.minimum_master_nodes=1" -e "node.name=trains" -e "script.inline=true" -e "script.update=true" -e "thread_pool.bulk.queue_size=2000" -e "thread_pool.search.queue_size=10000" -e "xpack.security.enabled=false" -e "xpack.monitoring.enabled=false" -e "cluster.routing.allocation.node_initial_primaries_recoveries=500" -e "node.ingest=true" -e "http.compression_level=7" -e "reindex.remote.whitelist=*.*" -e "script.painless.regex.enabled=true" --network="host" -v /opt/trains/data/elastic:/usr/share/elasticsearch/data docker.elastic.co/elasticsearch/elasticsearch:5.6.16 @@ -155,7 +195,7 @@ sudo docker run -d --restart="always" --name="trains-apiserver" --network="host" sudo docker run -d --restart="always" --name="trains-webserver" --network="host" -v /opt/trains/logs:/var/log/trains allegroai/trains:latest webserver ``` -Once the **trains-server** dockers are up, the following are available: +After the **trains-server** Dockers are up, the following are available: * API server on port `8008` * Web server on port `8080` @@ -163,32 +203,37 @@ Once the **trains-server** dockers are up, the following are available: ## Upgrade -We are constantly updating and adding stuff. -When we release a new version, we’ll include a new pre-built docker image. -Once a new release is out, you can simply: - -1. Shut down and remove your docker instances. Each instance can be shut down and removed using the following commands: - ```bash - sudo docker stop - sudo docker rm -v - ``` - The docker names are (see [Launching docker images](#Launching-docker-images)): - * `trains-elastic` - * `trains-mongo` - * `trains-fileserver` - * `trains-apiserver` - * `trains-webserver` - -2. Back up your data folder (recommended!). A simple way to do that is using this command: - ```bash - sudo tar czvf ~/trains_backup.tgz /opt/trains/data - ``` - Which will back up all data to an archive in your home folder. Restoring such a backup can be done using these commands: - ```bash - sudo rm -R /opt/trains/data - sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data - ``` -3. Launch the newly released docker image (see [Launching docker images](#Launching-docker-images)) +We are constantly updating, improving and adding to the **trains-server**. +New releases will include new pre-built Docker images. +When we release a new version and include a new pre-built Docker image for it, upgrade as follows: + +1. Shut down and remove each of your Docker instances using the following commands: + + sudo docker stop + sudo docker rm -v + + The Docker names are (see [Launching Docker images](##launching-docker-images)): + + * `trains-elastic` + * `trains-mongo` + * `trains-fileserver` + * `trains-apiserver` + * `trains-webserver` + +2. We highly recommend backing up your data directory!. A simple way to do that is using `tar`: + + For example, if your data directory is `/opt/trains`, use the following command: + + sudo tar czvf ~/trains_backup.tgz /opt/trains/data + + This back ups all data to an archive in your home directory. + + To restore this example backup, use the following command: + + sudo rm -R /opt/trains/data + sudo tar -xzf ~/trains_backup.tgz -C /opt/trains/data + +3. Launch the newly released Docker image (see [Launching Docker images](#Launching-docker-images)). ## License @@ -196,6 +241,6 @@ Once a new release is out, you can simply: **trains-server** relies *heavily* on both [MongoDB](https://github.com/mongodb/mongo) and [ElasticSearch](https://github.com/elastic/elasticsearch). With the recent changes in both MongoDB's and ElasticSearch's OSS license, we feel it is our job as a community to support the projects we love and cherish. -We feel the cause for the license change in both cases is more than just, and chose [SSPL](https://www.mongodb.com/licensing/server-side-public-license) because it is the more restrictive of the two. +We feel the cause for the license change in both cases is more than just, and chose [SSPL](https://www.mongodb.com/licensing/server-side-public-license) because it is the more general and flexible of the two. This is our way to say - we support you guys! diff --git a/server/config/basic.py b/server/config/basic.py index c6cb2e55..b9fab322 100644 --- a/server/config/basic.py +++ b/server/config/basic.py @@ -57,7 +57,7 @@ def _read_recursive(self, conf_root, verbose=True): return conf if verbose: - print("Loading config from {conf_root}") + print(f"Loading config from {conf_root}") for file in conf_root.rglob("*.conf"): key = ".".join(file.relative_to(conf_root).with_suffix("").parts) diff --git a/webserver/config/basic.py b/webserver/config/basic.py index c6cb2e55..b9fab322 100644 --- a/webserver/config/basic.py +++ b/webserver/config/basic.py @@ -57,7 +57,7 @@ def _read_recursive(self, conf_root, verbose=True): return conf if verbose: - print("Loading config from {conf_root}") + print(f"Loading config from {conf_root}") for file in conf_root.rglob("*.conf"): key = ".".join(file.relative_to(conf_root).with_suffix("").parts) diff --git a/webserver/config/default/webserver.conf b/webserver/config/default/webserver.conf index 86a7b963..1c67c9fe 100644 --- a/webserver/config/default/webserver.conf +++ b/webserver/config/default/webserver.conf @@ -1,49 +1,47 @@ -{ - # requested token expiration in seconds (one month) - apiserver_token_expiration: 2592000 +# requested token expiration in seconds (one month) +apiserver_token_expiration: 2592000 - debug: false +debug: false - flask { - # Uncomment next line to disable login requirement while testing (or unit-testing) - TESTING: False +flask { + # Uncomment next line to disable login requirement while testing (or unit-testing) + TESTING: False - # Uncomment to allow reloading of templates if the caches version differs from the latest version - TEMPLATES_AUTO_RELOAD: True + # Uncomment to allow reloading of templates if the caches version differs from the latest version + TEMPLATES_AUTO_RELOAD: True - # Flask-Login session protection ('basic', 'strong' or null) - SESSION_PROTECTION: basic + # Flask-Login session protection ('basic', 'strong' or null) + SESSION_PROTECTION: basic - SESSION_COOKIE_HTTPONLY: True - REMEMBER_COOKIE_HTTPONLY: True - SESSION_COOKIE_SECURE: False - REMEMBER_COOKIE_SECURE: False - } + SESSION_COOKIE_HTTPONLY: True + REMEMBER_COOKIE_HTTPONLY: True + SESSION_COOKIE_SECURE: False + REMEMBER_COOKIE_SECURE: False +} + +listen { + ip : "0.0.0.0" + port: 8080 +} - listen { - ip : "0.0.0.0" - port: 8080 +auth { + cookies { + httponly: true # allow only http to access the cookies (no JS etc) + secure: false # not using HTTPS + domain: null # Limit to localhost is not supported } - auth { - cookies { - httponly: true # allow only http to access the cookies (no JS etc) - secure: false # not using HTTPS - domain: null # Limit to localhost is not supported - } + session_auth_cookie_name: "trains_token_basic" - session_auth_cookie_name: "trains_token_basic" + user_token_expiration_sec: 3600 +} - user_token_expiration_sec: 3600 - } - - docs { - # Default filename used when file not found error is reported when serving docs. - # This usually happans when the path is to a folder and not a file. - default_filename: "index.html" - } +docs { + # Default filename used when file not found error is reported when serving docs. + # This usually happans when the path is to a folder and not a file. + default_filename: "index.html" +} - default_company: "d1bd92a3b039400cbafc60a7a5b1e52b" +default_company: "d1bd92a3b039400cbafc60a7a5b1e52b" - redirect_to_https: false -} \ No newline at end of file +redirect_to_https: false diff --git a/webserver/webserver.py b/webserver/webserver.py index a92d3b51..22198a8a 100644 --- a/webserver/webserver.py +++ b/webserver/webserver.py @@ -210,6 +210,11 @@ def _serve_webapp(path=None): return response +@app.route("/favicon.ico") +def favicon(): + return send_from_directory("static", "favicon.ico") + + @app.route("/") def index(): if not current_user.is_authenticated: