This is a complete beginner-friendly repo for gssoc beginners and new contributors will be given priority unlike FCFS issue on other repos.
Repeated issue creation for more scores will be considered as flag.
If later found out, the points will be deducted. You can't be earning more than 60 points from this repo. Any technical feature addition is excluded
------------------------------------------------------------------------------
Machine Learning (ML) is a subfield of Artificial Intelligence (AI) that empowers systems to learn from data without explicit programming. ML algorithms analyze vast datasets to identify patterns, extract insights, and make predictions or decisions based on the derived knowledge. Unlike traditional programming, which relies on predefined rules, ML leverages statistical techniques and algorithms to enable systems to adapt and improve their performance over time. This adaptability allows ML to tackle complex problems in various domains, including image recognition, natural language processing, and predictive analytics.
Machine Learning (ML) is a subset of artificial intelligence (AI) that enables computers to learn from data and improve their performance over time without being explicitly programmed. It involves developing algorithms that can identify patterns, make decisions, and predict future outcomes based on historical data.
- Data: The foundation of machine learning. Data can be structured (like databases) or unstructured (like text and images).
- Features: Individual measurable properties or characteristics of the data.
- Model: A mathematical representation that maps inputs (features) to outputs (predictions).
- Training: The process of teaching a model using data.
- Validation: Assessing a model's performance on a separate dataset to ensure it generalizes well.
- Testing: Evaluating a model's performance on a new, unseen dataset.
-
Supervised Learning: The model is trained on labeled data, meaning each training example is paired with an output label. Examples include:
- Classification: Predicting categorical labels (e.g., spam detection).
- Regression: Predicting continuous values (e.g., house prices).
-
Unsupervised Learning: The model is trained on unlabeled data, and it must find hidden patterns or intrinsic structures in the input data. Examples include:
- Clustering: Grouping similar data points together (e.g., customer segmentation).
- Dimensionality Reduction: Reducing the number of random variables under consideration (e.g., PCA).
-
Reinforcement Learning: The model learns by interacting with an environment, receiving rewards or penalties based on its actions. This is often used in robotics and gaming.
- Linear Regression: For predicting a continuous output based on input features.
- Logistic Regression: For binary classification problems.
- Decision Trees: A tree-like model used for classification and regression.
- Support Vector Machines (SVM): Used for classification by finding the hyperplane that best separates the classes.
- K-Nearest Neighbors (KNN): A simple, instance-based learning algorithm for classification and regression.
- Naive Bayes: A probabilistic classifier based on Bayes' theorem.
- Neural Networks: Models inspired by the human brain, used for complex pattern recognition.
- Random Forests: An ensemble method using multiple decision trees for improved accuracy.
- Gradient Boosting Machines: Another ensemble technique that builds models sequentially to correct errors of the previous ones.
- Healthcare: Disease diagnosis, personalized treatment plans, and drug discovery.
- Finance: Fraud detection, algorithmic trading, and risk management.
- Retail: Customer segmentation, recommendation systems, and inventory management.
- Marketing: Predictive analytics, customer churn prediction, and sentiment analysis.
- Transportation: Autonomous vehicles, route optimization, and traffic prediction.
- Natural Language Processing (NLP): Language translation, chatbots, and text summarization.
- Computer Vision: Image and video recognition, facial recognition, and object detection.
- Explainable AI (XAI): Developing models that are interpretable and transparent to build trust and ensure ethical use.
- Federated Learning: Training models across decentralized devices or servers while keeping data localized.
- Automated Machine Learning (AutoML): Tools and techniques that automate the end-to-end process of applying machine learning to real-world problems.
- Integration with IoT: Enhancing Internet of Things (IoT) applications with intelligent decision-making capabilities.
- Quantum Machine Learning: Leveraging quantum computing to solve complex problems faster and more efficiently.
Machine learning continues to evolve, offering innovative solutions across various domains and transforming the way we interact with technology.
- Roadmap
- Tutorials or Courses
- Kaggle Competition Source Code
- Books
- Datasets
- GitHub Repositories
- Youtube Channels
- Machine learning forums
- Courses
- Projects
- Interview
- Others
- Conclusion
This is a roadmap, we can refer to for starting with machine learning.
Resource Name | Description |
---|---|
Machine Learning Roadmap | This roadmap provided by scaler gives you clear cut roadmap for studying/learning Machine learning. |
ML Engineer Roadmap | This roadmap gives you clear cut roadmap for becoming ready for the ML Engineer job profile. |
Discover a collection of tutorials and courses for learning the Mathematics, Fundamentals, Algorithms and more which are required for Machine learning.
Resource Name | Description |
---|---|
Linear Algebra | This link gives comprehensive video tutorials covering the fundamentals of linear algebra, including vectors, matrices, transformations, and more which is provided by Khan academy. |
Calculus 1 (single variable) | This course is provided by MIT gives a comprehensive introduction to the calculus of functions of one variable. It covers the fundamental principles and applications of single-variable calculus, which is essential for advanced studies in mathematics, science, and engineering. |
Calculus 2 (multi variable) | This course provided by MIT focuses on calculus involving multiple variables, an essential area for understanding more complex mathematical models. Topics include vectors and matrices, partial derivatives, multiple integrals, vector calculus. |
Probability and statistics | This course is provided by MIT and covers the fundamentals of probability and statistics, including random variables, probability distributions, expectation, and inference. It includes lecture notes, assignments, exams, and video lectures. |
Resource Name | Description |
---|---|
Python Fundamentals | This course is provied by the Geeks for Geeks and is perfect for both beginners and coding enthusiasts and covers essential Python fundamentals, including Object-Oriented Programming (OOPs), data structures, and Python libraries. |
Python for Data Science | This 12 hrs video provided Freecodecamp give you the fundamental knowledge required for the data science using python including the introduction of pandas, numpy and matplotlib |
Data Visualization using Python | This video by intellipaat will gives you clear understanding for the visualization of data using python,This video is suitable for both beginners and a intermediate level programmer as well. |
SQL Fundamentals | This video by Freecodecamp is a good introduction to SQL (Structured Query Language), covering essential concepts and commands used in database management. It explains the basics of creating, reading, updating, and deleting data within a database. |
SQL for Data Analysis | This course is provied by the Geeks for Geeks and is perfect for both beginners and coding enthusiasts and covers essential Python fundamentals, including Object-Oriented Programming (OOPs), data structures, and Python libraries. |
Jupyter Notebook | The Real Python article on Jupyter Notebooks provides an in-depth introduction to using Jupyter Notebooks for data science, Python programming, and interactive computing. The tutorial covers the basics of setting up and running Jupyter Notebooks, including how to install Jupyter via Anaconda or pip, and how to launch and navigate the notebook interface. |
Google colab | The Google Colab introductory notebook provides a comprehensive guide on how to use Google Colab for interactive Python programming. It covers the basics of creating and running code cells, integrating with Google Drive for storage, and using Colab's powerful computing resources. |
Resource Name | Description |
---|---|
Numpy | This course is provied by the Geeks for Geeks and is perfect for both beginners and coding enthusiasts and covers essential Python fundamentals, including Object-Oriented Programming (OOPs), data structures, and Python libraries. |
Pandas | The W3Schools Pandas tutorial offers a good introduction to the Pandas library, a powerful tool for data analysis and manipulation in Python. The tutorial covers a wide range of topics, including how to install Pandas, basic operations like creating and manipulating DataFrames and Series, and more |
Matplotlib | The Matplotlib documentation site provides a comprehensive guide to using the pyplot module, which is a part of the Matplotlib library used for creating static, animated, and interactive visualizations in Python. |
Tensorflow | The TensorFlow Tutorials page offers a variety of tutorials designed to help users learn and apply machine learning with TensorFlow. It includes beginner-friendly guides using the Keras API, advanced tutorials on custom training, distributed training, and specialized applications such as computer vision, natural language processing, and reinforcement learning. |
Pytorch | The PyTorch tutorials website provides a comprehensive set of resources for learning and using PyTorch, a popular open-source machine learning library. The tutorials are designed for users at various skill levels, from beginners to advanced practitioners, and cover a wide range of topics |
Keras | This documentation is a great resource for anyone looking to get started with Keras, a popular deep learning framework. Keras provides a user-friendly interface for building and training deep learning models. Whether you're a beginner or an experienced practitioner, Keras offers a lot of flexibility and ease of use. |
Scikit-learn | This documentation is the best for learning Scikit-learn. Scikit-learn is another fantastic library, primarily used for machine learning tasks such as classification, regression, clustering, and more. Its simple and efficient tools make it accessible to both beginners and experts in the field. |
Seaborn | Seaborn is an amazing visualization library for statistical graphics plotting in Python. It provides beautiful default styles and color palettes to make statistical plots more attractive. |
Resource Name | Description |
---|---|
Introduction to Machine Learning | This video by Edureka on "Introduction To Machine Learning" will help you understand the basics of Machine Learning like what, when and how it can be used to solve real-world problems. |
Resource Name | Description |
---|---|
Supervised Learning | The GeeksforGeeks article on supervised machine learning is the best resource. Their tutorials often break down complex topics into understandable explanations and provide code examples to illustrate concepts. Supervised learning is a fundamental concept in machine learning, where models are trained on labeled data to make predictions or decisions. |
Unsupervised Learning | In this article on GeeksforGeeks, they delve deeper into different types of machine learning, expanding beyond supervised learning to cover unsupervised learning, semi-supervised learning, reinforcement learning, and more. Understanding the various types of machine learning is essential for choosing the right approach for different tasks and problems. |
Reinforcement learning | This geeksforgeeks article on reinforcement learning is the best to understand the RL. RL has applications in various domains, such as robotics, game playing, recommendation systems, and autonomous vehicle control, among others. |
Resource Name | Description |
---|---|
Data collection - guide | This guide on data collection for machine learning projects, which is a crucial aspect of building effective machine learning models. Data collection involves gathering, cleaning, and preparing data that will be used to train and evaluate machine learning algorithms. |
Introduction to Data collection | This video by codebasics helps you to understand how data collection process is done by collecting the data in real time and gaining some hands-on experience. |
Data collection - video | This video helps get knowledge about where to collect data for Machine Learning; and Where to collect Data for Machine Learning. I Have also explained about Kaggle, UCI Machine Learning Repository and Google Dataset Search. |
Resource Name | Description |
---|---|
Introduction to Data Preparation | This video helps you break down the crucial steps and best practices to ensure your datasets are primed for machine learning success. From handling missing values and outliers to feature scaling and encoding categorical variables etc. |
Data Preparation - article | This article from Machine Learning Mastery provides a comprehensive guide on preparing data for machine learning, Which includes data cleaning, transforming, and organizing data to make it suitable for training machine learning models. |
Data Preparation by Google developers | The Google's Machine Learning Data Preparation guide is a valuable resource for understanding best practices and techniques for preparing data for machine learning projects. Effective data preparation is crucial for building accurate and reliable machine learning models. |
Resource Name | Description |
---|---|
Introduction to Model selection | "A Gentle Introduction to Model Selection for Machine Learning" from Machine Learning Mastery sounds like a great resource for anyone looking to understand how to choose the right model for their machine learning task. |
Model selection process | This Edureka video on Model Selection and Boosting, gives you Step by step guide to select and boost your models in Machine Learning, including need For Model Evaluation,Resampling techniques and more. |
Model selection - video | This video is about how to choose the right machine learning model, and in this video he had also explained about Cross Validation which is used for Model Selection. |
Resource Name | Description |
---|---|
Introduction to Model training | The article "Training a Machine Learning Model" from ProjectPro seems like a useful guide for anyone looking to understand the process of training machine learning models. Training a machine learning model involves feeding it with labeled data to learn patterns and make predictions or decisions. |
Model training - Video | This Edureka video on 'Data Modeling - Feature Engineering' gives a brief introduction to how the model is trained using Machine learning algorithms. |
Model training - Video | This video by Microsoft Azure helps you to understand how to utilize the right compute on Microsoft Azure to scale your training of the model efficiently. |
Resource Name | Description |
---|---|
Introduction to Model Evaluation | This GeeksforGeeks offers a clear guide on machine learning model evaluation, a crucial step in the machine learning workflow to ensure that models perform well on unseen data. |
Model Evaluation - Article | This Medium article is about the resource discussing various model evaluation metrics in machine learning which are crucial for understanding their performance and making informed decisions about model selection and deployment. |
Model Evaluation - Video | This video by AssemblyAI helps you to understand about the most commonly used evaluation metrics for classification and regression tasks and more. |
Resource Name | Description |
---|---|
Introduction to Model Optimization | The link provided leads to an article on Aporia's website discussing the basics of machine learning optimization and seven essential techniques used in this process and understanding these techniques is essential for improving model performance |
Model Optimization - Article | This article from Towards Data Science is a comprehensive guide on understanding optimization algorithms in machine learning. Optimization algorithms play a crucial role in training machine learning models by iteratively adjusting model parameters to minimize a loss function.. |
Model Optimization - Video | This beginners friendly video by Brandon Rohrer gives you a brief understanding about how optimization for machine learning works and more. |
Resource Name | Description |
---|---|
Introduction to Model Deployment - Article | This link will lead to an article on Built In discussing model deployment in the context of machine learning. Model deployment is a crucial step in the machine learning lifecycle, where the trained model is deployed into production to make predictions or decisions on new data |
Model Deployment Strategies | The article from Towards Data Science will focus on machine learning model deployment strategies, which are crucial for ensuring that trained models can be effectively deployed and used in real-world applications. |
Model Deployment | This video by Microsoft Azure helps you to understand the various deployment options and optimizations for large-scale model inferencing. |
These are some machine learning algorithm, you can learn.
Resource Name | Description |
---|---|
Linear Regression-1,Linear Regression-2 | These two videos by Techwithtim channel will give you a clear explaination and understanding of the Linear regressing model,which is also the basic model in the machine learning. |
Logistic Regression | This video by codebasics will give you a brief understanding of logistic regression and also how to use sklearn logistic regression class. At the end we have an interesting exercise for you to solve. |
Gradient Descent | This video, will teach you few important concepts in machine learning such as cost function, gradient descent, learning rate and mean squared error and more. This helps you to python code to implement gradient descent for linear regression in python. |
Support Vector Machines | This video gives you the comprehensive knowledge for the SVC and covers different parameters such as gamma, regularization and how to fine tune svm classifier using these parameters and more. |
Naive Bayes-1,Naive Bayes-2 | These two videos by codebasics gives you the brief understanding of Naive bayes and also teaches you about sklearn library and python for this beginners machine learning model. |
K Nearest Neighbors | This video helps you understand how K nearest neighbors algorithm work and also write python code using sklearn library to build a knn (K nearest neighbors) model to have hands-on experience. |
Decision Trees | This video will help you to solve a employee salary prediction problem using decision tree, and teahes you how to use the sklearn class to apply the decision tree model using python. |
Random Forest | This video teaches you about Random forest a popular regression and classification algorithm, this video also helps you to problem using sklearn RandomForestClassifier in python. |
KMeans Clustering | This video gives you a comprehensive knowledge about K Means clustering algorithm which is a unsupervised machine learning technique used to cluster data points, and this video also helps you to solve a clustering problem using sklearn, kmeans and python. |
Neural Network | This video provides a comprehensive introduction to neural networks, covering fundamental concepts, training processes, and practical applications. It explains forward and backward propagation, deep learning techniques, and the use of convolutional neural networks (CNNs) for image processing. Additionally, it demonstrates implementing neural networks using Python, TensorFlow, and other libraries, including examples such as stock price prediction and image classification. |
Machine learning using Python, that you can learn.
Resource Name | Description |
---|---|
XAD | Fast and easy-to-use backpropagation tool. |
Aim | An easy-to-use & supercharged open-source AI metadata tracker. |
RexMex | A general-purpose recommender metrics library for fair evaluation. |
ChemicalX | A PyTorch based deep learning library for drug pair scoring. |
Microsoft ML for Apache Spark | A distributed machine learning framework for Apache Spark. |
Shapley | A data-driven framework to quantify the value of classifiers in a machine learning ensemble. |
igel | A delightful machine learning tool that allows you to train/fit, test and use models without writing code. |
ML Model building | A repository containing Classification, Clustering, Regression, and Recommender Notebooks with illustrations. |
ML/DL project template | A template for deep learning projects using PyTorch Lightning. |
PyTorch Frame | A Modular Framework for Multi-Modal Tabular Learning. |
PyTorch Geometric | Graph Neural Network Library for PyTorch. |
PyTorch Geometric Temporal | A temporal extension of PyTorch Geometric for dynamic graph representation learning. |
Little Ball of Fur | A graph sampling extension library for NetworkX with a Scikit-Learn like API. |
Karate Club | An unsupervised machine learning extension library for NetworkX with a Scikit-Learn like API. |
Auto_ViML | Automatically Build Variant Interpretable ML models fast! Comprehensive Python AutoML toolkit. |
PyOD | Python Outlier Detection toolkit for detecting outlying objects in multivariate data. |
steppy | Lightweight Python library for fast and reproducible machine learning experimentation. |
steppy-toolkit | Curated collection of neural networks, transformers, and models for efficient machine learning. |
CNTK | Microsoft Cognitive Toolkit (CNTK), an open-source deep-learning toolkit. |
Couler | Unified interface for constructing and managing machine learning workflows on different engines. |
auto_ml | Automated machine learning for production and analytics. |
dtaidistance | High performance library for time series distances (DTW) and clustering. |
einops | Deep learning operations reinvented for pytorch, tensorflow, jax, and others. |
machine learning | Automated build consisting of a web-interface and programmatic-interface API for support vector machines. |
XGBoost | Python bindings for eXtreme Gradient Boosting (Tree) Library. |
ChefBoost | A lightweight decision tree framework for Python with categorical feature support and advanced techniques. |
Apache SINGA | An Apache Incubating project for developing an open source machine learning library. |
Resource Name | Description |
---|---|
DataComPy | A library to compare Pandas, Polars, and Spark data frames with stats and match accuracy adjustment. |
DataVisualization | A GitHub repository to learn data visualization basics to intermediate levels. |
Cartopy | A Python package for geospatial data processing and map production. |
SciPy | A Python-based ecosystem for mathematics, science, and engineering. |
NumPy | A fundamental package for scientific computing with Python. |
AutoViz | Automatic visualization of any dataset with a single line of Python code. |
Numba | Python JIT (just in time) compiler to LLVM aimed at scientific Python. |
Mars | A tensor-based framework for large-scale data computation. |
NetworkX | A high-productivity software for complex networks. |
igraph | Binding to igraph library - General purpose graph library. |
Pandas | High-performance, easy-to-use data structures and data analysis tools for Python. |
ParaMonte | Python library for Bayesian data analysis and visualization via Monte Carlo and MCMC simulations. |
Vaex | High performance Python library for lazy Out-of-Core DataFrames, suitable for big tabular datasets. |
PyTables (tables) | Manage hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. |
PyTorch Geometric | Library for deep learning on irregular input data such as graphs, point clouds, and manifolds. |
bqplot | An API for plotting in Jupyter (IPython). |
bokeh | Interactive Web Plotting for Python. |
plotly | Collaborative web plotting for Python and matplotlib. |
altair | A Python to Vega translator for visualization. |
d3py | A plotting library for Python based on D3.js. |
PyDexter | Simple plotting for Python; wrapper for D3xterjs to render charts in-browser. |
ggplot | Same API as ggplot2 for R (Deprecated). |
ggfortify | Unified interface to ggplot2 popular R packages. |
Kartograph.py | Rendering beautiful SVG maps in Python. |
pygal | A Python SVG Charts Creator. |
PyQtGraph | A pure-python graphics and GUI library built on PyQt4 / PySide and NumPy. |
Resource Name | Description |
---|---|
Scikit-Image | A collection of algorithms for image processing in Python. |
Scikit-Opt | Swarm Intelligence in Python, including Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm, Artificial Fish Swarm Algorithm. |
SimpleCV | An open-source computer vision framework that gives access to several high-powered computer vision libraries, such as OpenCV. Written in Python and runs on Mac, Windows, and Ubuntu Linux. |
Vigranumpy | Python bindings for the VIGRA C++ computer vision library. |
OpenFace | Free and open-source face recognition with deep neural networks. |
PCV | Open-source Python module for computer vision. [Deprecated] |
face_recognition | Face recognition library that recognizes and manipulates faces from Python or from the command line. |
deepface | A lightweight face recognition and facial attribute analysis (age, gender, emotion, and race) framework for Python, covering cutting-edge models such as VGG-Face, FaceNet, OpenFace, DeepFace, DeepID, Dlib, and ArcFace. |
Resource Name | Description |
---|---|
pkuseg-python | A better version of Jieba, developed by Peking University for Chinese word segmentation. |
NLTK | A leading platform for building Python programs to work with human language data. |
Pattern | A web mining module for the Python programming language. It has tools for natural language processing, machine learning, and more. |
Quepy | A Python framework to transform natural language questions into database queries. |
TextBlob | Provides a consistent API for diving into common natural language processing (NLP) tasks. Built on top of NLTK and Pattern. |
YAlign | A sentence aligner tool for extracting parallel sentences from comparable corpora. [Deprecated] |
jieba | Chinese words segmentation utility. |
SnowNLP | A library for processing Chinese text. |
spammy | A library for email spam filtering built on top of NLTK. |
Resource Name | Description |
---|---|
Kinho | Simple API for Neural Network, better for image processing with CPU/GPU + Transfer Learning. |
nn_builder | A Python package that lets you build neural networks in one line. |
NeuralTalk | A Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences. |
NeuralTalk2 | A Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences. [Deprecated] |
Neuron | A simple class for time series predictions utilizing various neural networks learned with Gradient descent or Levenberg–Marquardt algorithm. [Deprecated] |
Data Driven Code | A simple implementation of neural networks for dummies in Python without using any libraries, with detailed comments. |
Machine Learning, Data Science and Deep Learning with Python | LiveVideo course that covers machine learning, TensorFlow, artificial intelligence, and neural networks. |
TResNet | TResNet models designed and optimized to give the best speed-accuracy tradeoff on GPUs. |
Neurolab | A simple and powerful neural network library for Python with a variety of supported types of Artificial Neural Network and learning algorithms. |
Jina AI | An easier way to build neural search in the cloud, compatible with Jupyter Notebooks. |
sequitur | PyTorch library for creating and training sequence autoencoders in just two lines of code. |
Machine learning using R.
Resource Name | Description |
---|---|
Clever Algorithms For Machine Learning | Collection of machine learning algorithms implemented in various languages, including R. |
CORElearn | Package for classification, regression, feature evaluation, and ordinal evaluation. |
Cubist | Rule- and instance-based regression modeling. |
e1071 | Miscellaneous functions of the Department of Statistics (e1071), TU Wien. |
earth | Multivariate adaptive regression spline models. |
elasticnet | Elastic-net for sparse estimation and sparse PCA. |
ElemStatLearn | Data sets, functions, and examples from "The Elements of Statistical Learning". |
evtree | Evolutionary learning of globally optimal trees. |
forecast | Time series forecasting using various models including ARIMA, ETS, TBATS. |
forecastHybrid | Automatic ensemble and cross validation of time series models. |
fpc | Flexible procedures for clustering. |
frbs | Fuzzy rule-based systems for classification and regression tasks. [Deprecated] |
XGBoost.R | R binding for eXtreme Gradient Boosting (Tree) Library. |
Optunity | A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly to R. |
igraph | Binding to igraph library - General purpose graph library. |
MXNet | Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, JavaScript, and more. |
TDSP-Utilities | Two data science utilities in R from Microsoft: 1) Interactive Data Exploration, Analysis, and Reporting (IDEAR); 2) Automated Modelling and Reporting (AMR). |
GAMBoost | Generalized linear and additive models by likelihood-based boosting. [Deprecated] |
gamboostLSS | Boosting methods for generalized additive models for location, scale, and shape. |
gbm | Generalized boosted regression models. |
glmnet | Lasso and elastic-net regularized generalized linear models. |
glmpath | L1 regularization path for generalized linear models and Cox proportional hazards model. |
GMMBoost | Likelihood-based boosting for generalized mixed models. [Deprecated] |
grplasso | Fitting user-specified models with group Lasso penalty. |
grpreg | Regularization paths for regression models with grouped covariates. |
h2o | Framework for fast, parallel, and distributed machine learning algorithms at scale. |
hda | Heteroscedastic discriminant analysis. [Deprecated] |
Introduction to Statistical Learning | Book covering statistical learning methods, useful for practical applications. |
ipred | Improved predictors for classification and regression tasks. |
kernlab | Kernel-based machine learning lab for support vector machines and kernel methods. |
klaR | Classification and visualization techniques. |
L0Learn | Fast algorithms for best subset selection in regression models. |
Resource Name | Description |
---|---|
dplyr | A data manipulation package that helps solve common data manipulation problems. |
ggplot2 | A data visualization package based on the grammar of graphics. |
tmap and leaflet | tmap for visualizing geospatial data with static maps and leaflet for interactive maps. |
tm and quanteda | Main packages for managing, analyzing, and visualizing textual data. |
shiny | Basis for interactive displays and dashboards in R. |
htmlwidgets, including plotly, dygraphs, highcharter, etc. | Brings JavaScript libraries for interactive visualizations to R. |
Kaggle Source code and experiments results.
Repository | Description |
---|---|
open-solution-home-credit | Source code and experiments results for Home Credit Default Risk competition. |
open-solution-googleai-object-detection | Source code and experiments results for Google AI Open Images - Object Detection Track competition. |
open-solution-salt-identification | Source code and experiments results for TGS Salt Identification Challenge. |
open-solution-ship-detection | Source code and experiments results for Airbus Ship Detection Challenge. |
open-solution-data-science-bowl-2018 | Source code and experiments results for 2018 Data Science Bowl. |
open-solution-value-prediction | Source code and experiments results for Santander Value Prediction Challenge. |
open-solution-toxic-comments | Source code for Toxic Comment Classification Challenge. |
wiki challenge | Implementation of Dell Zhang's solution to Wikipedia's Participation Challenge. |
kaggle insults | Kaggle Submission for "Detecting Insults in Social Commentary". |
kaggle_acquire-valued-shoppers-challenge | Code for the Kaggle acquire valued shoppers challenge. |
kaggle-cifar | Code for the CIFAR-10 competition at Kaggle using cuda-convnet. |
kaggle-blackbox | Deep learning made easy for Kaggle competitions. |
kaggle-accelerometer | Code for Accelerometer Biometric Competition at Kaggle. |
kaggle-advertised-salaries | Predicting job salaries from ads - a Kaggle competition. |
kaggle-amazon | Amazon access control challenge at Kaggle. |
kaggle-bestbuy_big | Code for the Best Buy competition at Kaggle. |
kaggle-bestbuy_small | Code for the Best Buy competition at Kaggle (small version). |
Kaggle Dogs vs. Cats | Code for Kaggle Dogs vs. Cats competition. |
Kaggle Galaxy Challenge | Winning solution for the Galaxy Challenge on Kaggle. |
Kaggle Gender | A Kaggle competition: discriminate gender based on handwriting. |
Kaggle Merck | Merck challenge at Kaggle. |
Kaggle Stackoverflow | Predicting closed questions on Stack Overflow. |
Discover a diverse collection of valuable books for Machine Learning.
Resource Name | Description | Cost |
---|---|---|
Hands-On Machine Learning with Scikit-Learn and TensorFlow | The Hands-On Machine Learning with Scikit-Learn and TensorFlow is a popular book by Aurélien Géron that covers various machine learning concepts and practical implementations using Scikit-Learn and TensorFlow. | Free |
The hundred page machine learning book | This book, authored by Andriy Burkov, provides a concise yet comprehensive overview of machine learning concepts and techniques. It's highly regarded for its accessibility and clarity, making it a valuable resource for both beginners and experienced practitioners | free |
Data mining practical machine learning tools and techniques | "Data Mining: Practical Machine Learning Tools and Techniques" provides a comprehensive overview of the field of data mining and machine learning. Authored by Ian H. Witten, Eibe Frank, and Mark A. Hall, this book is widely regarded as an essential resource for students, researchers, and practitioners in the field. | free |
Distributed Machine Learning Patterns | This book teaches you how to take machine learning models from your personal laptop to large distributed clusters. You’ll explore key concepts and patterns behind successful distributed machine learning systems, and learn technologies like TensorFlow, Kubernetes, Kubeflow, and Argo Workflows directly from a key maintainer and contributor, with real-world scenarios and hands-on projects. | Paid |
Grokking Machine Learning | Grokking Machine Learning teaches you how to apply ML to your projects using only standard Python code and high school-level math. | Paid |
Machine Learning Bookcamp | Learn the essentials of machine learning by completing a carefully designed set of real-world projects. | Paid |
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow | Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This bestselling book uses concrete examples, minimal theory, and production-ready Python frameworks (Scikit-Learn, Keras, and TensorFlow) to help you gain an intuitive understanding of the concepts and tools for building intelligent systems. | Paid |
Machine Learning in Action | A comprehensive guide to implementing machine learning algorithms with real-world examples. | Paid |
Machine Learning Engineering in Action | Practical guide to machine learning engineering practices and deployment. | Paid |
Machine Learning in Action: A Primer for the Layman, Step by Step Guide for Newbies | An introductory guide for beginners to understand and apply machine learning concepts. | Paid |
Real-World Machine Learning | Focuses on applying machine learning techniques to real-world problems. | Paid |
Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers | Discusses theories, concepts, and practical applications of machine learning for engineers. | Free |
Bayesian Optimization in Action | Guide to applying Bayesian optimization techniques in real-world scenarios. | Free |
An Introduction to Statistical Learning: With Applications in R | Introductory text on statistical learning with practical applications in R. | Free |
Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies | Comprehensive overview of machine learning algorithms with worked examples and case studies. | Free |
Machine Learning For Dummies | Beginner-friendly introduction to machine learning concepts and applications. | Free |
Quantum Machine Learning: What Quantum Computing Means to Data Mining | Explores the intersection of quantum computing and machine learning. | Paid |
These are some datasets that can help you practice machine learning
Resource Name | Description |
---|---|
Kaggle Datasets | Kaggle Datasets is a platform where users can explore, access, and share datasets for a wide range of topics and purposes. Kaggle is a popular community-driven platform for data science and machine learning competitions, and its Datasets section extends its offerings to provide access to a diverse collection of datasets contributed by users worldwide. |
Microsoft Datasets & Tools | Microsoft Research Tools is a platform offering a diverse range of tools,datasets and resources for researchers and developers. These tools are designed to facilitate various aspects of research, including data analysis, machine learning, natural language processing, computer vision, and more. |
Google Datasets | Google Dataset Search is a tool provided by Google that allows users to search for datasets across a wide range of topics and domains. It helps researchers, data scientists, journalists, and other users discover datasets that are relevant to their interests or research needs. |
Awesome Data Repo | This GitHub repo is a curated list of publicly available datasets covering a wide range of topics and domains. This repository serves as a valuable resource for researchers, data scientists, developers, and anyone else interested in accessing and working with real-world datasets. |
UCI Datasets | The UCI Machine Learning Repository, hosted at the URL you provided, is a collection of datasets for machine learning research and experimentation. It's maintained by the Center for Machine Learning and Intelligent Systems at the University of California, Irvine (UCI). |
Data.gov | Data.gov, a US government website, is invaluable for machine learning enthusiasts with its vast collection of nearly 300,000 datasets. It provides high-quality, reliable training data from various sectors, enabling innovative applications in public health, economics, and environmental science. The open data is freely available, eliminating licensing costs and allowing unrestricted use. Its authoritative sources ensure improved accuracy and reliability in machine learning models. |
Knoema – Home | Comprehensive data platform offering a variety of datasets for different industries and research purposes. |
Public Data Sets : Amazon Web Services | Collection of public datasets hosted on AWS, covering various domains such as genomics, climate, and more. |
Socrata | Platform providing access to open data from government and public sector organizations. |
Data Publica | Les données pour votre business | Resource offering datasets for business analytics and insights. |
Archive-It – Web Archiving Services for Libraries and Archives | Web archiving service that provides access to a wide range of archived data from various sources. |
Freebase | Community-curated database of structured data across various topics. |
Google Public Data Explorer | Tool by Google that allows users to explore, visualize, and share datasets from a variety of sources. |
Zanran Numerical Data Search | Search engine focused on finding numerical data and statistics from the web. |
Quandl – Intelligent Search for Numerical Data | Platform offering financial, economic, and alternative datasets for analysis and research. |
IMF Data and Statistics | International Monetary Fund's data and statistics on global economic indicators. |
Data | The World Bank | World Bank's open data platform providing access to global development data. |
OECD.Stat | OECD's database with a wide range of economic, social, and environmental statistics. |
mldata :: Welcome | Repository of machine learning datasets for research and development. |
UCI Machine Learning Repository: Data Sets | Popular repository of machine learning datasets used for empirical research. |
Google Dataset Search | Google's tool to help researchers locate online datasets for various topics. |
Registry of Open Data on AWS | Amazon-hosted datasets covering areas such as genomics, satellite imagery, and population statistics. |
A list of over 1,000 datasets available in R packages | Curated list of datasets available in R packages for statistical analysis and research. |
curran/data | Collection of public datasets, primarily in text format, hosted on GitHub. |
Tidy Tuesday | Weekly social data project in R with curated datasets for analysis and visualization. |
dsbox | Datasets for data science practice, homework, and projects, provided by Data Science in the Box. |
dslabs | Datasets and functions for data analysis practice in data science courses and workshops. |
These are some GitHub repositories you can refer to
Resource Name | Description |
---|---|
ML-for-Beginners by Microsoft | The GitHub repository "ML-For-Beginners" is an educational resource provided by Microsoft, aimed at beginners who are interested in learning about machine learning (ML) concepts and techniques. |
Machine Learning Tutorial | The GitHub repository "Machine-Learning-Tutorials" by ujjwalkarn is a comprehensive collection of tutorials, resources, and educational materials for individuals interested in learning about machine learning (ML). |
ML by Zoomcamp | This GitHub repository by DataTalksClub is a collection of materials and resources associated with the Machine Learning Zoomcamp, an educational initiative aimed at teaching machine learning concepts and techniques through live Zoom sessions. |
ML YouTube Courses | This GitHub repository is a collection of resources related to machine learning (ML) courses available on YouTube, and provides links to the YouTube videos or playlists for each course, making it easy for learners to access the course content directly from YouTube. |
Explore amazing YouTubers specializing in web development.
Resource Name | Description |
---|---|
Deep Learning AI | Web Dev Simplified is all about teaching web development skills and techniques in an efficient and practical manner. If you are just getting started in web development Web Dev Simplified has all the tools you need to learn the newest and most popular technologies to convert you from a no stack to full stack developer. Web Dev Simplified also deep dives into advanced topics using the latest best practices for you seasoned web developers. |
Machine Learning with Phil | The YouTube channel "Deeplearning.ai" hosts a variety of educational content related to artificial intelligence (AI) and machine learning (ML) created by Andrew Ng and his team at Deeplearning.ai. |
Sent Dex | The YouTube channel "sentdex," hosted by Harrison Kinsley, offers a diverse range of educational content primarily focused on Python programming, machine learning, game development, hardware projects,robotics and more. |
Abhishek Thakur | The YouTube channel "Abhishek Thakur (Abhi)" is hosted by Abhishek Thakur, a well-known figure in the machine learning and data science community.This channel is primarly related to Machine leanring. |
Dataschool | The YouTube channel "Data School," hosted by Kevin Markham, offers a wide range of tutorials and resources related to data science, machine learning, and Python programming, covering topics such as data manipulation with pandas, data visualization with Matplotlib and Seaborn, |
codebasics | The YouTube channel "codebasics," hosted by codebasics, offers a variety of tutorials and resources focused on programming, data science, machine learning, and artificial intelligence. |
Here are valuable resources to help you excel in your web development interview. You'll find videos, articles, and more to aid your preparation.
Resource Name | Description |
---|---|
Machine learning - reddit | The subreddit r/MachineLearning is a popular online community on Reddit dedicated to discussions, news, research, and resources related to machine learning and artificial intelligence. |
Machine learning discussions - kaggle | The Kaggle Discussions forum is a community-driven platform where data scientists, machine learning practitioners, and enthusiasts engage in discussions, seek help, share insights, and collaborate on projects related to data science and machine learning. |
Machine learning Q/A - stack overflow | The "machine-learning" tag on Stack Overflow is a popular destination for developers, data scientists, and machine learning practitioners seeking assistance, sharing insights, and discussing topics related to machine learning. |
Machine learning organisations - DEV community | DEV Community platform for articles related to "machine learning" from organizations. DEV Community is a community-driven platform for developers where they can share their knowledge, experiences, and insights through articles, discussions, and tutorials. |
Machine learning communities - IBM | The IBM Community for AI and Data Science provides a valuable platform for professionals and enthusiasts to learn, collaborate, and stay informed about the latest developments in artificial intelligence, data science, and related fields. |
These are Some valuable resources for learning Machine learning.
Resource Name | Description |
---|---|
Machine learning by Edureka | This youtube playlist by Edureka on machine learning is the best resource to learn machine learning from beginners level to advanced level that too for free. |
Machine learning with python by Freecodecamp | The "Machine Learning with Python" course on FreeCodeCamp provides a valuable learning resource for individuals interested in diving into the field of machine learning using Python, this course offers a structured path to learn machine learning concepts and develop practical skills through hands-on projects and exercises. |
Machine learning by university of washington | This course on Coursera provides a high-quality learning experience for individuals who want to dive deep into the field of machine learning and acquire practical skills that are in high demand in today's job market. |
Post Graduate Programme in Machine Learning & AI by upgrad | This ML program offered by upGrad in collaboration with IIIT Bangalore is designed to provide students with a comprehensive education in machine learning and artificial intelligence, preparing them for careers in this rapidly growing and exciting field. |
Machine learning with python by MIT | This course provided directly to the edX platform's "Machine Learning with Python: from Linear Models to Deep Learning" course offered by the Massachusetts Institute of Technology (MIT). |
These Projects help you gain real time exprience for building machine learning models.
Resource Name | Description |
---|---|
Disease Prediction Using Machine Learning | Project on predicting diseases using machine learning techniques. |
ML | Heart Disease Prediction Using Logistic Regression | Implementation of heart disease prediction using logistic regression. |
Prediction of Wine Type using Deep Learning | Project on predicting the type of wine using deep learning techniques. |
Parkinson’s Disease Prediction using Machine Learning in Python | Project on predicting Parkinson's disease using machine learning in Python. |
ML | Kaggle Breast Cancer Wisconsin Diagnosis using Logistic Regression | Breast cancer diagnosis project using logistic regression on Kaggle dataset. |
ML | Cancer cell classification using Scikit-learn | Cancer cell classification project using Scikit-learn. |
ML | Kaggle Breast Cancer Wisconsin Diagnosis using KNN and Cross-Validation | Breast cancer diagnosis project using KNN and cross-validation on Kaggle dataset. |
Autism Prediction using Machine Learning | Project on predicting autism using machine learning techniques. |
Credit Card Fraud Detection | Project on detecting credit card fraud using machine learning. |
Dogecoin Price Prediction with Machine Learning | Project on predicting Dogecoin price using machine learning techniques. |
Zillow Home Value (Zestimate) Prediction in ML | Project on predicting Zillow home values using machine learning. |
Bitcoin Price Prediction using Machine Learning in Python | Project on predicting Bitcoin price using machine learning in Python. |
Sales Forecast Prediction – Python | Project on predicting sales forecasts using Python. |
Customer Segmentation using Unsupervised Machine Learning in Python | Project on customer segmentation using unsupervised machine learning in Python. |
Analyzing Selling Price of Used Cars using Python | Project on analyzing the selling price of used cars using Python. |
Resource Name | Description |
---|---|
Movie Recommender System | Project on building a movie recommender system using various methods and algorithms in Python. |
House Pricing Prediction | Project on predicting house prices using different machine learning models. |
Sentiment Analysis | Project on analyzing sentiment in e-commerce product reviews and ranking them accordingly. |
Interest Rate Prediction | Project on predicting interest rates for rental listings using machine learning techniques. |
Resource Name | Description |
---|---|
Multiclass Image Classification using Transfer Learning | Advanced project on multiclass image classification using transfer learning techniques. |
Image Caption Generator using Deep Learning on Flickr8K Dataset | Project on generating image captions using deep learning models on the Flickr8K dataset. |
FaceMask Detection using TensorFlow in Python | Project on detecting face masks using TensorFlow in Python. |
Coupon Purchase Prediction | Project on predicting coupon purchases using machine learning techniques. |
Loan Eligibility Prediction | Project on predicting loan eligibility using advanced analytics and machine learning. |
Inventory Demand Forecasting | Project on forecasting inventory demand using machine learning models. |
Passenger Survival Prediction | Project on predicting passenger survival using machine learning techniques. |
These are some interview preparation resources.
Resource Name | Description |
---|---|
Machine Learning Interview questions by geeksforgeeks | This link which navigates to geekforgeeks article focuses on machine learning Interview questions for both freshers and experienced individuals, ensuring thorough preparation for ML interview. This ML questions is also beneficial for individuals who are looking for a quick revision of their machine-learning concepts. |
How to crack Machine Learning Interviews at FAANG! - Medium | This article by Bharathi Priya shared her Machine Learning experiences provided the questions which were asked in her interview and provided tips and tricks to crack any machine leaning interview. |
These are some other resources you can refer to.
Resource Name | Description |
---|---|
Oreilly data show podcast | The O'Reilly Data Show Podcast, hosted on the O'Reilly Radar platform, is a podcast series dedicated to exploring various topics of data science, machine learning, artificial intelligence, and related fields. |
TWIML AI podcast | The TWIML AI Podcast, hosted on the TWIML AI platform, is a podcast series focused on exploring the latest developments, trends, and innovations in the fields of machine learning and artificial intelligence. |
Talk Python | "Talk Python to Me" provides a valuable platform for Python enthusiasts, developers, and learners to stay informed, inspired, and connected within the vibrant and growing Python community. |
Practical AI | The Practical AI podcast offers a valuable platform for individuals interested in practical applications of AI and ML technologies. this podcast provides informative and engaging content to help you stay informed and inspired in the rapidly evolving field |
The Talking machines | The "Talking Machines" offers a valuable platform for individuals interested in staying informed, inspired, and engaged in the dynamic field of machine learning, this podcast provides informative and engaging content on ML. |
Machine Hack | MachineHack is an online platform that offers data science and machine learning competitions. It provides a collaborative environment for data scientists, machine learning practitioners, and enthusiasts to solve real-world business problems through predictive modeling and data analysis. |
Machine learning is an exciting and rapidly evolving field that offers endless opportunities for innovation and discovery. Its ability to analyze vast amounts of data and uncover patterns makes it indispensable for various applications, from predictive analytics and natural language processing to computer vision and autonomous systems. The wealth of libraries and frameworks available, such as TensorFlow, PyTorch, and scikit-learn, empowers developers and data scientists to build sophisticated models with relative ease. A strong community provides extensive resources, including tutorials, forums, and documentation, to support learners and professionals alike. To truly excel in machine learning, consistent practice is essential—engage in coding challenges, contribute to open-source projects, and apply your knowledge to real-world problems. This hands-on experience not only hones your skills but also opens doors to numerous career opportunities in tech, research, and beyond.
Never stop learning !