Splatoon 3 is the latest entry in the Splatoon series by Nintendo. It has garnered a massive following, even having a reach into competitive gaming. The purpose of this project is to provide a Machine Learning solution that can predict victors from past game data. Such a tool would be useful in tournaments to predict winners of games being played in real time.
- Google Cloud to upload data to a google cloud bucket and use BigQuery as our data warehouse. We will also set up a VM environment to host our prefect deployment.
- Terraform for version control of our infrastructure.
- Prefect will be used to orchestrate and monitor our pipeline.
- Pandas to import and transform our dataset.
- splat.ink to access splatoon 3 battle data. You can learn more about the columns of data set here.
- Weights and Biases to track model performance and keep track of models and datasets used.
- Evidently to monitor dataset drift.
- Postgres to save dataset drift metrics
- Grafana to monitor dataset drift
- Docker to containerize deployed model and monitoring architecture
- Docker Compose for managing multiple docker containers used in this project
- Pre-Commit Hooks to identify simple issues in code before review
- Github Actions to prevent issues and making sure tests work before merging with main branch. Once merged with main, the deployment process is initiated.
- Terraform is used to setup the environment to run our pipeline. When run, the script creates our BigQuery dataset, bucket, deploys a Docker containerized production model to Google Cloud Run, and our VM to run our Prefect deployment.
- A Prefect agent is run on our VM compute environment and runs any pending deployments. The pipeline is meant to be run every N months. Initially, the pipeline extracts Splatoon 3 battle data from stat.ink, adds the raw data to a GCS bucket, cleans the data and performs feature engineering and extraction, and then moves the resulting the data to a BigQuery dataset. The data from bigquery is then used to train different models until an optimal one is selected, which will be registered in Weights and Biases and a production folder in the bucket. A reference dataset will be queried from BigQuery to be used as a comparison to the training dataset to see if data drift exists. This is calculated by Evidently, which triggers a notification if this occurs.
- Evidently will record any drift metrics to a postgres database. This database will be queried by a grafana dashboard to monitor drift. This infrastructure is orchestrated by Docker Compose.
Access the deployed model here. You can use it by uploading a CSV file containing raw Splatoon 3 battle data from stat.ink. After uploading, a link to a file with your results should pop up. Click the link to download the resuling file. Results should be under the prediction
column.
- Create a google cloud account
- Setup a new google cloud project.
- Create a new service account. Give the service account the
Compute Admin
,Service Account User
,Storage Admin
,Storage Object Admin
,Cloud Run Admin
, andBigQuery Admin
Roles. - After the service account has been created, click on
Manage Keys
under theActions
Menu. Click on theAdd Key
dropdown and click onCreate new key
. A prompt should pop up asking to download it as a json or P12 file. Choose the json format and clickCreate
. Save your key file. - Install the the Google Cloud CLI. Assuming you have an Ubuntu linux distro or similar as your environment, follow the directions for
Debian/Ubuntu
. Make sure you log in by runninggcloud init
. Choose the cloud project you created to use. - Set the environment variable to point to your downloaded service account keys json file:
export GOOGLE_APPLICATION_CREDENTIALS=<path/to/your/service-account-authkeys>.json
-
Refresh token/session, and verify authentication
gcloud auth application-default login
-
Make sure these APIs are enabled for your project:
- https://console.cloud.google.com/apis/library/iam.googleapis.com
- https://console.cloud.google.com/apis/library/iamcredentials.googleapis.com
- https://console.cloud.google.com/apis/library/compute.googleapis.com
- https://console.cloud.google.com/apis/library/run.googleapis.com
- Clone the repo and
cd
into theSplatoon_Battle_Prediction
folder - Make any necessary changes and push to new repo, using
git add .
,git commit -m "my commit message"
, andgit push
. Bfore you do this make sure you have amain
anddev
branch. Push changes to dev branch, using the commandgit checkout -b dev
. - Pre-commit hooks should be running to make changes to files and format code approrpiately. You may need to disable either black or isort hooks as they tend to conflict with one another and prevent successful pushing.
- After the code has been pushed to the
dev
branch, create a pull request and wait until theCI Test
Github action step completes successfully. You can then merge with with themain
branch, which should trigger a deplyment using terraform. You may need to adjust the following variables in theinfrastructure/vars/vars.tfvars
file appropriately:
Variable | Description |
---|---|
GOOGLE_CLOUD_PROJECT_ID | ID of the google cloud project |
SERVICE_ACCOUNT_EMAIL | Email of the service account you used to generate the key file |
CLOUD_RUN_SERVICE_NAME | Name of your Google Cloud Run Service |
DOCKER_IMAGE_URL | URL of your deployed containerized model |
COMPUTE_VM_NAME | Name of your VM Environment |
- You can use the default docker container URL for the deployment of the model or you can construct your own and push it to Docker hub. You can use the Dockerfile in the
deployment
directory for this. Make sure you change the URL ininfrastructure/vars/vars.tfvars
if this is the case. - Make sure you add the following repository secret variables as well in order to succesfully pass the
CD Deploy
Github action:
Variable | Description |
---|---|
GOOGLE_APPLICATION_CREDENTIALS | JSON file containing Google cloud credentials |
SSH_PUBLIC_KEY | SSH Public Key that will be used to access VM environment |
- Log in your newly created VM environment using the following command
ssh -i /path/to/private/ssh/key username@vm_external_ip_address
. As an alternative, follow this video to help setup SSH in a VS code environment, which allows for port forwarding from your cloud VM to your local machine. Type the commandcd /Solana-Pipeline
tocd
into the/Solana-Pipeline
directory. Login as super user with the commandsudo su
in order to edit files. - Install
make
using the commandsudo apt install make
- Create and activate the python pipenv environment using the command:
make setup_pipenv
. You can then run the code quality checks, unit tests, and integration tests usingmake integration_testing
. - You should now install Docker. Use the following commands, in order:
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update -y
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
sudo apt install docker-compose -y
- Move into the
monitoring
directory using thecd monitoring
command. Then run the docker containers using the commandsudo docker-compose -f docker_compose.yaml up -d
. - Make sure you have created a Weights and Biases account (https://wandb.ai/login) and login into your account on the command line using the command
wandb login
. The command line will ask you to input the API Key that is associated with your account. - You should now have prefect installed. First run the command
prefect block register -m prefect_email
to add the appropriate blocks needed for the emailing functionality. Run the prefect server locally using theprefect server start
command to monitor flows. This is needed to start the Prefect Server. In another terminal,cd
into theflows
directory and run the commandprefect deployment build main_flow.py:run_pipeline -n "splatoon-pipeline-deployment" --cron "0 0 1 * *" -a
to build the prefect deployment that runs every month on the first. Make sure you setup the following prefect blocks before running:
Block Name | Description | Block Type |
---|---|---|
gcp-creds | Block pertaining to your Google cloud credentials. You need the JSON keyfile you downloaded earlier to set it up | GCP Credentials |
splatoon-battle-data | Block pertaining to the bucket you wish to load the data into | GCS Bucket |
db-username | Block pertaining to the postgres database username you will use to record drift metrics | Secret |
db-password | Block pertaining to the postgres database password you will use to record drift metrics | Secret |
email-server-credentials | Email credentials needed to send an alert to a specified email in the event data drift occurs | Email Server Credentials |
- You can then run the deployment using the command
prefect deployment run run-pipeline/splatoon-pipeline-deployment --params '{"data_path":"../data", "wandb_project":<wandb_project>, "wandb_entity":<wandb_entity>, "artifact_path":"./artifacts", "num_months":1, "gcp_project_id":<gcp_project_id>, "bigquery_dataset":<bigquery_dataset>, "bigquery_table":<bigquery_table>}'
as an example. The deployment should be scheduled. - Your newly scheduled deployment can be run when initiating a prefect agent. Run the command
prefect agent start -q "default"
to run your deployment.
- Take advantage of systemd to run the agent when the VM starts up
- Add docker containers in VM to aid with reproducibility of the presented pipeline.