Skip to content

seacevedo/Splatoon_Battle_Prediction

Repository files navigation

Purpose

Splatoon 3 is the latest entry in the Splatoon series by Nintendo. It has garnered a massive following, even having a reach into competitive gaming. The purpose of this project is to provide a Machine Learning solution that can predict victors from past game data. Such a tool would be useful in tournaments to predict winners of games being played in real time.

Technology Stack

  • Google Cloud to upload data to a google cloud bucket and use BigQuery as our data warehouse. We will also set up a VM environment to host our prefect deployment.
  • Terraform for version control of our infrastructure.
  • Prefect will be used to orchestrate and monitor our pipeline.
  • Pandas to import and transform our dataset.
  • splat.ink to access splatoon 3 battle data. You can learn more about the columns of data set here.
  • Weights and Biases to track model performance and keep track of models and datasets used.
  • Evidently to monitor dataset drift.
  • Postgres to save dataset drift metrics
  • Grafana to monitor dataset drift
  • Docker to containerize deployed model and monitoring architecture
  • Docker Compose for managing multiple docker containers used in this project
  • Pre-Commit Hooks to identify simple issues in code before review
  • Github Actions to prevent issues and making sure tests work before merging with main branch. Once merged with main, the deployment process is initiated.

Pipeline Architecture

alt_text

  • Terraform is used to setup the environment to run our pipeline. When run, the script creates our BigQuery dataset, bucket, deploys a Docker containerized production model to Google Cloud Run, and our VM to run our Prefect deployment.
  • A Prefect agent is run on our VM compute environment and runs any pending deployments. The pipeline is meant to be run every N months. Initially, the pipeline extracts Splatoon 3 battle data from stat.ink, adds the raw data to a GCS bucket, cleans the data and performs feature engineering and extraction, and then moves the resulting the data to a BigQuery dataset. The data from bigquery is then used to train different models until an optimal one is selected, which will be registered in Weights and Biases and a production folder in the bucket. A reference dataset will be queried from BigQuery to be used as a comparison to the training dataset to see if data drift exists. This is calculated by Evidently, which triggers a notification if this occurs.
  • Evidently will record any drift metrics to a postgres database. This database will be queried by a grafana dashboard to monitor drift. This infrastructure is orchestrated by Docker Compose.

Deployment Preview

Access the deployed model here. You can use it by uploading a CSV file containing raw Splatoon 3 battle data from stat.ink. After uploading, a link to a file with your results should pop up. Click the link to download the resuling file. Results should be under the prediction column.

alt_text

Replication Steps

Setup Google Cloud

  1. Create a google cloud account
  2. Setup a new google cloud project.
  3. Create a new service account. Give the service account the Compute Admin, Service Account User, Storage Admin, Storage Object Admin, Cloud Run Admin, and BigQuery Admin Roles.
  4. After the service account has been created, click on Manage Keys under the Actions Menu. Click on the Add Key dropdown and click on Create new key. A prompt should pop up asking to download it as a json or P12 file. Choose the json format and click Create. Save your key file.
  5. Install the the Google Cloud CLI. Assuming you have an Ubuntu linux distro or similar as your environment, follow the directions for Debian/Ubuntu. Make sure you log in by running gcloud init. Choose the cloud project you created to use.
  6. Set the environment variable to point to your downloaded service account keys json file:

export GOOGLE_APPLICATION_CREDENTIALS=<path/to/your/service-account-authkeys>.json

  1. Refresh token/session, and verify authentication gcloud auth application-default login

  2. Make sure these APIs are enabled for your project:

Setup VM Environment

  1. Clone the repo and cd into the Splatoon_Battle_Prediction folder
  2. Make any necessary changes and push to new repo, using git add ., git commit -m "my commit message", and git push. Bfore you do this make sure you have a main and dev branch. Push changes to dev branch, using the command git checkout -b dev.
  3. Pre-commit hooks should be running to make changes to files and format code approrpiately. You may need to disable either black or isort hooks as they tend to conflict with one another and prevent successful pushing.
  4. After the code has been pushed to the dev branch, create a pull request and wait until the CI Test Github action step completes successfully. You can then merge with with the main branch, which should trigger a deplyment using terraform. You may need to adjust the following variables in the infrastructure/vars/vars.tfvars file appropriately:
Variable Description
GOOGLE_CLOUD_PROJECT_ID ID of the google cloud project
SERVICE_ACCOUNT_EMAIL Email of the service account you used to generate the key file
CLOUD_RUN_SERVICE_NAME Name of your Google Cloud Run Service
DOCKER_IMAGE_URL URL of your deployed containerized model
COMPUTE_VM_NAME Name of your VM Environment
  1. You can use the default docker container URL for the deployment of the model or you can construct your own and push it to Docker hub. You can use the Dockerfile in the deployment directory for this. Make sure you change the URL in infrastructure/vars/vars.tfvars if this is the case.
  2. Make sure you add the following repository secret variables as well in order to succesfully pass the CD Deploy Github action:
Variable Description
GOOGLE_APPLICATION_CREDENTIALS JSON file containing Google cloud credentials
SSH_PUBLIC_KEY SSH Public Key that will be used to access VM environment
  1. Log in your newly created VM environment using the following command ssh -i /path/to/private/ssh/key username@vm_external_ip_address. As an alternative, follow this video to help setup SSH in a VS code environment, which allows for port forwarding from your cloud VM to your local machine. Type the command cd /Solana-Pipeline to cd into the /Solana-Pipeline directory. Login as super user with the command sudo su in order to edit files.
  2. Install make using the command sudo apt install make
  3. Create and activate the python pipenv environment using the command: make setup_pipenv. You can then run the code quality checks, unit tests, and integration tests using make integration_testing.
  4. You should now install Docker. Use the following commands, in order:
    • sudo apt-get install ca-certificates curl gnupg
    • sudo install -m 0755 -d /etc/apt/keyrings
    • curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
    • sudo chmod a+r /etc/apt/keyrings/docker.gpg
    • echo "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    • sudo apt-get update -y
    • sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
    • sudo apt install docker-compose -y
  5. Move into the monitoring directory using the cd monitoring command. Then run the docker containers using the command sudo docker-compose -f docker_compose.yaml up -d.
  6. Make sure you have created a Weights and Biases account (https://wandb.ai/login) and login into your account on the command line using the command wandb login. The command line will ask you to input the API Key that is associated with your account.
  7. You should now have prefect installed. First run the command prefect block register -m prefect_email to add the appropriate blocks needed for the emailing functionality. Run the prefect server locally using the prefect server start command to monitor flows. This is needed to start the Prefect Server. In another terminal, cd into the flows directory and run the command prefect deployment build main_flow.py:run_pipeline -n "splatoon-pipeline-deployment" --cron "0 0 1 * *" -a to build the prefect deployment that runs every month on the first. Make sure you setup the following prefect blocks before running:
Block Name Description Block Type
gcp-creds Block pertaining to your Google cloud credentials. You need the JSON keyfile you downloaded earlier to set it up GCP Credentials
splatoon-battle-data Block pertaining to the bucket you wish to load the data into GCS Bucket
db-username Block pertaining to the postgres database username you will use to record drift metrics Secret
db-password Block pertaining to the postgres database password you will use to record drift metrics Secret
email-server-credentials Email credentials needed to send an alert to a specified email in the event data drift occurs Email Server Credentials
  1. You can then run the deployment using the command prefect deployment run run-pipeline/splatoon-pipeline-deployment --params '{"data_path":"../data", "wandb_project":<wandb_project>, "wandb_entity":<wandb_entity>, "artifact_path":"./artifacts", "num_months":1, "gcp_project_id":<gcp_project_id>, "bigquery_dataset":<bigquery_dataset>, "bigquery_table":<bigquery_table>}' as an example. The deployment should be scheduled.
  2. Your newly scheduled deployment can be run when initiating a prefect agent. Run the command prefect agent start -q "default" to run your deployment.

Next Steps

  • Take advantage of systemd to run the agent when the VM starts up
  • Add docker containers in VM to aid with reproducibility of the presented pipeline.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published