Skip to content

A machine learning and data processing library for Python

License

Notifications You must be signed in to change notification settings

GiovanniPasserello/StatsML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StatsML

StatsML is a machine learning and data processing library for Python. It was developed by myself in an attempt to further consolidate my machine learning education and gain experience through implementation of state-of-the-art processes. The library contains many features common to most machine learning libraries, but does not implement any specific hardware-level optimizations. These features range from simple statistical methods, to neural networks, decision classifiers, regressors, clustering models, data preprocessors, and much more.

Table of Contents

  1. Getting Started
  2. Project Structure
  3. Features
  4. Technologies
  5. License

Getting Started

The following instructions will help you to get StatsML up and running on your local machine for development and testing purposes. I encourage you to give it a look, experiment with it and give me any feedback you have.

All command line documentation below is specifically for the Linux system.

Prerequisites

You may choose to run the project either in your machine's local environment or to setup a virtual environment and install all packages there.

The packages found in requirements.txt will then need to be installed:

$ pip3 install -r requirements.txt 

Installation

To create a local copy of the codebase, navigate to your directory of choice and clone this repository:

$ git clone https://github.com/GiovanniPasserello/StatsML.git

Project Structure

StatsML is split into several distinct sections found as a set of directories within 'statsml', each implementing a different machine learning algorithm. Within each directory is a suite of classes used to implement the specific algorithm, along with an example demonstrating how to interact with the package on a fake dataset.

Features

  • Neural Network - a backpropagating artificial neural network implementation
    • Multi Layer Network
    • Layers
      • Linear
      • Sigmoid
      • ReLu
      • MSE Loss
      • Cross Entropy Loss
    • Automated Training Suite
  • Decision Classifier - a decision tree classifier implementation
    • Decision Tree Classifier
    • Random Forest Classifier
    • Decision Tree Pruning
    • Cross Validation Suite
  • Regression - a suite of scripts used to regress multi-dimensional data
    • Linear Regression
    • Logistic Regression
  • Clustering - a suite of scripts used to cluster multi-dimensional data
    • K Means
    • Gaussian Mixture Model
  • Metrics Evaluation - a set of extractable metrics from confusion matrices

Technologies

StatsML is built entirely from scratch without the use of external packages, aside from NumPy for performance and data handling purposes.

  • Python 3 - StatsML implementation programming language of choice
  • NumPy - Python library adding efficient support for large, multi-dimensional arrays and matrices

License

This project is licensed under the MIT License - see LICENSE for details.

About

A machine learning and data processing library for Python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages