Skip to content

Python Assignments from the Machine Learning course at Hertie School

Notifications You must be signed in to change notification settings

adityanarayan-rai/Machine-Learning-Assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Course Assignments - Machine Learning

This repository includes the Python Assignments from the Machine Learning course I did for the partial completion of my MSc Data Science for Public Policy degree at Hertie School, Berlin

Assignment 1:

  • Calculate summary statistics for the label and five features
  • What is the formula for the closed-form estimate of the coefficient vector in ordinary least squares regression? Estimate the coefficients using numpy in Python by performing the matrix operations from the closed-form solution
  • Estimate the coefficients using the statsmodels package and compare them
  • Estimate the variance of the coefficients using the matrix formula
  • Write a function with three arguments - beta: A 1D numpy array representing a particular value of your coefficients; label: A 1D numpy array of the labels in your dataset; features: A 2D numpy array representing the features in your dataset
  • Using the SciPy library, minimize the objective function for logistic regression.
  • Construct your predictions by taking the dot-product between beta_logistic and your feature matrix and then passing that dot-product through the sigmoid function
  • Construct class estimates for your OLS predictions as well by calculating 1
  • Calculate the full confusion matrix for the logistic regression and the OLS model.
  • Plot the relationship between the predictions from the linear regression in Question 1 (on the x-axis) and the predictions from the logistic regression (on the y-axis). What do you see?
  • Comment on supervised learning, unsupervised learning and reinforcement learning

Assignment 2:

  • Exploratory Data Analysis - What do the variables look like? Is there missingness? What does the distribution of the outcome look like?
  • Create the Training set and Test set
  • Define how you are going to (1) impute missing data, (2)standardize the data and (3) fit the model.
  • Using the pipeline you created, fit a ridge regression where the regularization parameter is alpha = 0.1 on the (entire) training set.
  • Use the KFold class to construct a set of 10 folds to be used for cross-fold
  • Using the training data, use cross-validation to choose a good value of the regularization parameter.
  • Point out which value of the regularization parameter (and therefore which model) you would choose to use based on these results, using the one-standard-error rule
  • Create a new version of your function from Question 3 which replaces ridge regression with KernelRidge
  • Find the cross-validated MSE for a range of values of gamma
  • Calculate the error in the test set for both your best Ridge and KernelRidge model. Which is better?

Assignment 3: This assignment was designed to give the students practical experience with a complete machine learning workflow, applying 10-fold cross-validation, and running classification models on a prepared dataset.

About

Python Assignments from the Machine Learning course at Hertie School

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published