Merge pull request #10 from UCLAIS/update-challenge

Update challenge 2 with doxa
UCLAIS · Nov 15, 2023 · f7ff7c6 · f7ff7c6
2 parents 5abe466 + 1343ab3
commit f7ff7c6
Showing 1 changed file with 125 additions and 28 deletions.
diff --git a/doxa-challenges/challenge-2/starter.ipynb b/doxa-challenges/challenge-2/starter.ipynb
@@ -5,17 +5,39 @@
    "metadata": {},
    "source": [
     "# UCLAIS Tutorial Series Challenge 2\n",
-    "\n",
-    "<!-- We are proud to present you with the second challenge of the 2022-23 UCLAIS tutorial series: the CIFAR-10 image classification problem. You will be introduced to a variety of core concepts in **computer vision** and specifically the implementation of convolutional neural network (CNN) architectures using the popular machine learning package, [TensorFlow](https://www.tensorflow.org/).\n",
-    "\n",
-    "This Jupyter notebook will guide you through the various general stages involved in end-to-end machine learning projects, including data visualisation, data preprocessing, model selection, model training and model evaluation. Finally, you will have the opportunity to submit the model you build to [DOXA](https://doxaai.com/) for evaluation on an unseen test set.\n",
-    "\n",
-    "This notebook contains blank code blocks for you to experiment with your own ideas in! See the `starter-SOLUTION.ipynb` notebook if you need more guidance.\n",
-    "\n",
-    "If you do not already have a DOXA account, you will want to [sign up](https://doxaai.com/sign-up) first before proceeding. -->\n",
     "\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this challenge you will be explore training and inferring using neural networks. You've already seen how to use classifiers in the previous challenge on premier league prediction. Now we will look at a regression task. Simply put, instead of trying to predict from a set of discrete classes, we are predicting a continuous value. In this case we will predict the alcohol content of wine based on a set of other chemical attribute.\n",
+    "\n",
+    "If you do not already have a DOXA account, you will want to [sign up](https://doxaai.com/sign-up) first before proceeding and then make sure you are enrolled on the [DOXA challenge page](https://doxaai.com/competitions)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Machine Learning Workflow Reminder"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![title](https://miro.medium.com/max/1400/0*V0GyOt3LoDVfY7y5.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The overall machine learning process covers a wide sequence of steps, so as you go through this notebook, try to keep in mind which stage are we dealing with and what we are trying to achieve. There are a lot of helpful resources online you can use, such as the excellent [scikit-learn documentation](https://scikit-learn.org/stable/getting_started.html). You are also more than welcome to ask questions in the [DOXA Community Discord server](https://discord.gg/MUvbQ3UYcf)!"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -25,6 +47,15 @@
     "To get started, we will install a number of common machine learning packages."
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install numpy pandas matplotlib seaborn scikit-learn doxa-cli ipympl"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -110,7 +141,7 @@
    "outputs": [],
    "source": [
     "# Load the data\n",
-    "# !pip install ucimlrepo\n",
+    "# !pip install ucimlrepo # uncomment this line if ucimlrepo is not installed\n",
     "from ucimlrepo import fetch_ucirepo \n"
    ]
   },
@@ -132,7 +163,7 @@
    "metadata": {},
    "source": [
     "## Data Understanding\n",
-    "Before we start to train our Machine Learning model, it is important to have a look and understand first the dataset that we will be using. This will provide some insights onto which model, model hyperparameter, and loss function are suitable for the problem we are dealing with. "
+    "Before we start to train our Machine Learning model, it is important to have a look and understand first the dataset that we will be using. This will provide some insights onto which model, model hyperparameter, and loss function are suitable for the problem we are dealing with. The [first doxa challenge](https://doxaai.com/competition/uclais-2023-1) has good content on data understanding. Check that out if you want to explore further. "
    ]
   },
   {
@@ -179,7 +210,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Here we preprocess the data to make the data suitable for training. We will first split the data into training and validation sets. Feel free to add new cells as you see fit. (hint: it might be worth looking at normalizing the data to make training easier)."
+    "Here we preprocess the data to make the data suitable for training. We will first split the data into training and validation sets. Feel free to add new cells as you see fit. "
    ]
   },
   {
@@ -188,22 +219,19 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "# Important: do not change this cell!\n",
+    "\n",
     "# We split the data into X and y variables. X are the features and y is the target variable. we wand to predict. \n",
     "# We are trying to predict the alcohol content given the other variables. \n",
-    "\n",
     "X = df.drop('alcohol', axis=1)\n",
-    "y = df['alcohol']"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
+    "y = df['alcohol']\n",
+    "\n",
     "# We done covert the Matrix X and vector y in numpy arrays.\n",
     "X = X.to_numpy()\n",
-    "y = y.to_numpy()"
+    "y = y.to_numpy()\n",
+    "\n",
+    "# Finally split into training and test sets\n",
+    "X_train, X_test, y_train, _ = train_test_split(X, y, test_size=0.2, random_state=42)"
    ]
   },
   {
@@ -212,9 +240,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# TODO: split our features and output labels into separate training and test sets\n",
-    "\n",
-    "# HINT: Use train_test_split function from scikit-learn\n"
+    "# TODO: add your own data pre-processing steps here. (hint: it might be worth looking at normalizing the data to make training easier)"
    ]
   },
   {
@@ -388,16 +414,87 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Preparing our DOXA Submission\n",
+    "## Preparing your DOXA Submission"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Pass our data through our neural network\n",
+    "model.eval()\n",
+    "with torch.no_grad():    \n",
+    "    # Print the loss on the test data\n",
+    "    predictions = model(torch.from_numpy(X_test).float().to(device)).numpy().squeeze()\n",
+    "\n",
+    "assert predictions.shape == (1300,) \n",
+    "\n",
+    "# Take a look at the first 20 predictions\n",
+    "predictions[:20]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Prepare our submission package\n",
+    "os.makedirs(\"submission\", exist_ok=True)\n",
+    "\n",
+    "with open(\"submission/y.txt\", \"w\") as f:\n",
+    "    f.writelines([f\"{prediction}\\n\" for prediction in predictions])\n",
+    "\n",
+    "with open(\"submission/doxa.yaml\", \"w\") as f:\n",
+    "    f.write(\"competition: epl\\nenvironment: cpu\\nlanguage: python\\nentrypoint: run.py\")\n",
+    "\n",
+    "with open(\"submission/run.py\", \"w\") as f:\n",
+    "    f.write(\"with open('y.txt', 'r') as f: print(f.read().strip())\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Submitting to DOXA\n",
     "\n",
-    "Once we are content with the performance of our model, we can submit the model to DOXA for evaluation on an unseen test set! "
+    "Before you can submit to DOXA, you must first ensure that you are enrolled for the challenge on the DOXA website. Visit [the challenge page](https://doxaai.com/competition/uclais-1) and click \"Enrol\" in the top-right corner if you have not done so already.\n",
+    "\n",
+    "You can then log in using the DOXA CLI by running the following command:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!doxa login"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, you can submit your results to DOXA by running the following command:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!doxa upload submission"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "TODO"
+    "Wooo! 🥳 You have (probably) just uploaded your predictions to DOXA &ndash; well done! Take a moment to see how you have done on the [scoreboard](https://doxaai.com/competition/epl)."
    ]
   }
  ],