Linear Classification - Red wine and Breast Cancer datasets

Contributors: Shantanil Bagchi, Nikhil Podila, Surya

Mini-Project 1 - COMP 551 Applied Machine Learning - McGill University

Abstract

We investigate the performance of two linear classification techniques–Logistic Regression and Linear Discriminant Analysis–on the red wine quality and breast cancer datasets. We preprocess the data, analyse the features, before implementing the linear models and comparing their performance.
We found that LDA is computationally more intensive than logistic regression, but that logistic regression needs careful selection of the hyper parameters such as the learning rate and stopping criteria. We also inferred that selection of appropriate features and transformations during data preprocessing are crucial for linear classification techniques. Additionally, we tested different learning rates for logistic regression, plotted individual feature histograms, perform correlation analysis on the features and compare the accuracies of the models.

Repository Structure

The repository contains 7 files:

2 Jupyter notebook files - breast-cancer-dataset-analysis.ipynb and wine-dataset-analysis.ipynb
2 Dataset files - breast-cancer-wisconsin.data and winequality-red.csv
1 ReadMe file - ReadMe.md
1 Project writeup - writeup.pdf
1 Libraries file - requirements.txt

Code Usage - (Python 3.6.2, conda 4.3.23)

Install required python libraries from requirements.txt
(refer to https://medium.com/python-pandemonium/better-python-dependency-and-package-management-b5d8ea29dff1 for steps to install libraries using requirements.txt)
Download all Jupyter notebook and Dataset files into one directory.
Open Jupyter notebook into that directory.
Select the required notebook (.ipynb file) and select "Run All" inside the jupyter notebook file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear Classification - Red wine and Breast Cancer datasets

Contributors: Shantanil Bagchi, Nikhil Podila, Surya

Mini-Project 1 - COMP 551 Applied Machine Learning - McGill University

Abstract

Repository Structure

Code Usage - (Python 3.6.2, conda 4.3.23)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ReadMe.md		ReadMe.md
breast-cancer-dataset-analysis.ipynb		breast-cancer-dataset-analysis.ipynb
breast-cancer-wisconsin.data		breast-cancer-wisconsin.data
requirements.txt		requirements.txt
wine-dataset-analysis.ipynb		wine-dataset-analysis.ipynb
winequality-red.csv		winequality-red.csv
writeup.pdf		writeup.pdf

nikhilpodila/Classification-UCI-Datasets

Folders and files

Latest commit

History

Repository files navigation

Linear Classification - Red wine and Breast Cancer datasets

Contributors: Shantanil Bagchi, Nikhil Podila, Surya

Mini-Project 1 - COMP 551 Applied Machine Learning - McGill University

Abstract

Repository Structure

Code Usage - (Python 3.6.2, conda 4.3.23)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages