First check this article on medium it will provide you will a lot of information: here
- Variables
- Numbers
- Strings
- Lists
- Dictionaries
- Sets
- Tuples
- Control Structures
- If conditions
- For loops
Resource: Exploring The Power of python:part 1
- Functions
- Lambda Functions
- Module Management (pip install)
- File Operations (read/write)
- Object-Oriented Programming
- Classes
- Objects
Resources: Exploring the power of python:part 2
Note: first, you need to accomplish the project and then compare it with the given solution
Project: [Personal Finance Tracker and Budget Planner](Project: Personal Finance Tracker and Budget Planner)
Solution: Solution of the Project
Additional resource: all the builtins functions
- Numpy Array Basics
- Array Inspection
- Array Operations
- Working with Numpy Arrays
- NumPy for Data Cleaning
- NumPy for Statistical Analysis
- Advanced NumPy Techniques
Resources: Numpy Medium
Project MovieLens Project
Solution Solution of the Project
Additional resource summary about Numpy
- What is Pandas
- Installation and Setup
- How to install Pandas
- Setting up the environment
- Pandas Data Structures
- Series: Basics and creation
- DataFrame: Basics, creation, and operations
- Data Importing and Exporting
- Reading data from different sources (CSV, Excel, etc.)
- Writing data to files
- Basic Data Operations
- Data selection and filtering
- Data sorting
- Handling missing values
- Data Aggregation and Grouping
- Group by operations
- Aggregate functions (sum, mean, etc.)
Resource Pandas part1
- Advanced Data Selection
- Data Transformation
- Time Series Analysis in Pandas
- Performance Enhancement Techniques
resources Pandas part 2:
Project Description of the project
Solution Kaggle
Additional Source time and categorical data
- Basic Plotting
- Plot Types
- Multiple Subplots
- 1.1 Creating Multiple Plots in a Single Figure
- 1.2 Combining Different Types of Plots
- Advanced Features
Aditional resource Articel
Aditional resource jupyter Notebook: This file contains a wide range of techniques for better visualization.
- scatter plots
- line plots
- bar plots
- histograms
- density plots
- box plots
- violin plots
- heatmaps
resource datacamp article
- pair plots
- joint plots
- facet grids
- Customizing Seaborn plots
- Changing Color Palettes
- Adjusting Figure Size
- Adding Annotations
resource datacamp article
Today, we will explore various probability distributions and their visualizations using the Seaborn library.
- Normal Distribution
- Binomial Distribution
- Poisson Distribution
- Uniform Distribution
- Logistic Distribution
- Multinomial Distribution
- Exponential Distribution
- Chi Square Distribution
- Rayleigh Distribution
- Pareto Distribution
- Zipf Distribution
resource w3scole article
Description here
Project jupyter NoteBook
- Data Inspection.
- Handling missing values.
- Data Imputation
We will discover all this throut this project:
-
Unit 1: Analyzing Categorical Data
- Topics include analyzing one categorical variable, two-way tables, distributions in two-way tables.
-
Unit 2: Displaying and Comparing Quantitative Data
- Covers displaying quantitative data with graphs, describing and comparing distributions, and more on data displays.
-
Unit 3: Summarizing Quantitative Data
- Focuses on measuring center in quantitative data, interquartile range, variance, and standard deviation.
-
Unit 4: Modeling Data Distributions
- Includes topics like percentiles, z-scores, density curves, and normal distributions.
-
Unit 5: Exploring Bivariate Numerical Data
- Discusses scatterplots, correlation coefficients, trend lines, and regression.
-
Unit 6: Study Design
- Covers statistical questions, sampling methods, types of studies, and experiments.
-
Unit 7: Probability
- Topics include theoretical probability, set operations, experimental probability, and rules of probability.
-
Unit 8: Counting, Permutations, and Combinations
- Focuses on counting principle, permutations, combinations, and combinatorics.
-
Unit 9: Random Variables
- Discusses discrete and continuous random variables, transforming and combining random variables, binomial and geometric distributions, and more.
-
Unit 10: Sampling Distributions
- Covers the concept of sampling distributions, including distributions of sample proportions and means.
-
Unit 11: Confidence Intervals
- Introduces confidence intervals and covers how to estimate population proportions and means.
-
Unit 12: Significance Tests (Hypothesis Testing)
- Explores the idea of significance tests, error probabilities, tests about population proportions and means, and more.
-
Unit 13: Two-Sample Inference for the Difference Between Groups
- Focuses on comparing two proportions and two means, among other related topics.
-
Unit 14: Inference for Categorical Data (Chi-Square Tests)
- Discusses chi-square goodness-of-fit tests and chi-square tests for relationships.
-
Unit 15: Advanced Regression (Inference and Transforming)
- Covers inference about slope, nonlinear regression, and other advanced regression topics.
-
Unit 16: Analysis of Variance (ANOVA)
- Focuses on the analysis of variance (ANOVA) methodology.
The main purpose of EDA is to help look at data before making any assumptions. It can help identify obvious errors, as well as better understand patterns within the data, detect outliers or anomalous events, find interesting relations among the variables.
- Examine the data distribution
- Handling missing values of the dataset(a most common issue with every dataset)
- Handling the outliers
- Removing duplicate data
- Encoding the categorical variables
- Normalizing and Scaling
- Creating Database: Learn how to create your own database.
- Creating Tables and Adding Data: Understand how to create tables and insert data into them.
- SELECT Clause: Learn to retrieve or fetch data from a database.
- FROM Clause: Understand from which table in the database you need to select data.
- WHERE Clause: Learn to form conditions based on which data have to be queried.
- DELETE Statement: Understand how to perform deletion tasks.
- INSERT INTO: Learn about insertion tasks.
- AND and OR Operator: Know how to select data based on AND or OR conditions.
- Drop and Truncate: Learn to drop or truncate collections as per the condition.
- NOT Operator: Understand how to select data not based on a given condition.
- WITH Clause: Understanding the concept of the WITH clause and using it to name a sub-query block.
- FETCH Clause: Learn to fetch the filtered data based on conditions, like fetching only the top 3 rows.
- Arithmetic Operators: Use arithmetic operators for precise data filtering.
- Wildcard Operators: Select exact data intelligently, like names starting or ending with 'T'.
- UPDATE Statement: Learn updating data entries based on conditions.
- ALTER Table: Know how to add, drop, or modify tables.
- LIKE Clause: Understand pattern-based search.
- BETWEEN and IN Operator: Learn to select data within a specified range.
- CASE Statement: Understand conditional queries.
- EXISTS: Learn to form nested queries for filtering data that exists in another query.
- DISTINCT Clause: Select only distinct, non-repetitive data.
- Count Function: Learn to return the total count of filtered data.
- Sum Function: Understand how to calculate the sum of queried data.
- Average Function: Calculate the average of queried data.
- Minimum Function: Learn to find the minimum value in queried data.
- Maximum Function: Learn to find the maximum value in queried data.
- ORDER BY: Order queried data in ascending or descending order.
- GROUP BY: Group queried data by a specified column.
- ALL and ANY Clause: Understand these logical operators and their boolean results.
- TOP Clause: Learn to fetch a limited number of rows from a database.
- Union Clause: Understand the union of tables.
- Intersection Clause: Learn to join tables at their intersection.
- Aliases: Assign aliases to tables for later reference.
- Cartesian Join and Self Join: Learn to join a table to itself.
- Inner, Left, Right, and Full Joins: Understand these four types of joins.
- Division Clause: Find entities interacting with all entities of a set of different types.
- Using Clause: Modify NATURAL JOIN with the USING clause for columns with the same names but different datatypes.
- Combining Values: Combine aggregate and non-aggregate values using Joins and Over clause.
- MINUS Operator: Understand how to use the MINUS operator for exclusion.
- Joining 3 or More Tables: Learn to join and query from three or more tables.
Project to practice first download the dataset here than try to practice all the SQL Queries in the readme file, after this you can see the correction here
Don't focus on the Tableau section; we will delve into it in the previous lessons.
- Machine Learning Definition
- Examples and Use Cases
- Recommendation engines (Amazon, Spotify, Netflix).
- Speech recognition software.
- Fraud detection services in banks.
- Self-driving cars and driver assistance features.
- How Does Machine Learning Work?
- Machine Learning vs. Deep Learning
- Types of Machine Learning
- Supervised Machine Learning: Trained on labeled data sets.
- Unsupervised Machine Learning: Uses unlabeled data sets to uncover patterns.
- Semi-supervised Machine Learning: Combines labeled and unlabeled data sets.
- Reinforcement Learning: Uses trial and error in specific environments.
- Machine Learning Benefits and Risks
- Define the Problem: Identify the problem you want to solve and determine if machine learning can be used to solve it.
- Collect Data: Gather and clean the data that you will use to train your model. The quality of your model will depend on the quality of your data.
- Explore the Data: Use data visualization and statistical methods to understand the structure and relationships within your data.
- Pre-process the Data: Prepare the data for modeling by normalizing, transforming, and cleaning it as necessary.
- Split the Data: Divide the data into training and test datasets to validate your model.
- Choose a Model: Select a machine learning model that is appropriate for your problem and the data you have collected.
- Train the Model: Use the training data to train the model, adjusting its parameters to fit the data as accurately as possible.
- Evaluate the Model: Use the test data to evaluate the performance of the model and determine its accuracy.
- Fine-tune the Model: Based on the results of the evaluation, fine-tune the model by adjusting its parameters and repeating the training process until the desired level of accuracy is achieved.
practical Project : This project is quite popular. You can discover all these steps. Take your time because these steps exist in every project.
first if you don´t know which algorithm are you going to use. You shouldn't be worried about it. Scikit-learn tells you what to do: We will discover:
- Basic Example: A simple example using Scikit-Learn for machine learning.
- Data Loading: Guidelines on data requirements and loading techniques.
- Model Fitting: Instructions for fitting both supervised and unsupervised learning models.
- Prediction: Methods to make predictions using different estimators.
- Data Preprocessing: Techniques for standardization, normalization, binarization, and encoding categorical features.
- Model Creation: Steps to create supervised and unsupervised learning estimators.
- Model Evaluation: Various metrics for assessing the performance of models.
- Model Tuning: Strategies for tuning models using grid search and randomized parameter optimization.
Resource This is a PDF file that can provide you with all of the above.
- Introduction to Scikit-Learn: Basic concepts and workflow.
- Data Preprocessing: Techniques for preparing data for modeling.
- Supervised Learning Models: Instructions for creating and using models like regression and classification.
- Unsupervised Learning Models: Guides on clustering and dimensionality reduction.
- Model Tuning and Evaluation: Tips on improving model performance and measuring accuracy.
- Pipeline and Model Complexity: Insights into streamlining workflows and handling complex data scenarios.
Resources This is a PDF file that can provide you with all of the above.
- Handling Missing values
- 1.1 Problems of Having Missing values
- 1.2 Understanding Types of Missing Values
- 1.3 Dealing MV Using SimpleImputer Method
- 1.4 Dealing MV Using KNN Imputer Method
- Handling Categorical Values
- 2.1 One Hot Encoding
- 2.2 Label Encoding
- 2.3 Ordinal Encoding
- 2.4 Multi Label Binarizer
- 2.5 Count/Frequency Encoding
- 2.6 Target Guided Ordinal Encoding
- Feature Scaling
- 1.1 Standardization/Standard Scaler
- 1.2 Normalization/MinMax Scaler
- 1.3 Max Abs Scaler
- 1.4 Robust Scaler
- Why Feature Selection Matters
- Types of Feature Selection
- Filter Methods
- Variance Threshold
- SelectKBest
- SelectPercentile
- GenericUnivariateSelect
- Wrapper Methods
- RFE
- RFECV
- SelectFromModel
- SequentialFeatureSelector
- Feature Transformation
- Understanding QQPlot and PP-Plot
- Logarithmic transformation
- Reciprocal transformation
- Square root transformation
- Exponential transformation
- Boxcox transformation
- Using Pipelines to automate the FE
- What are Pipelines
- Accessing individual steps in pipeline
- Accessing Parameters in Pipeline
- Performing Grid Search with Pipeline
- Combining Transformers and Pipeline
- Visualizing the Pipeline
- Fundamentals of Linear Regression
- Exploring the Assumptions of Linear Regression
- Gradient Descent and Loss Function
- Evaluation Metrics for Linear Regression
- Applications of Linear Regression
- Multiple Linear Regression
- Multicollinearity
- Regularization Techniques
- Ridge, Lasso and Elastic Net
- Polynomial Regression
- How does Logistic Regression work
- What is a sigmoid curve
- Assumptions of Logistic Regression
- Cost Function of Logistic Regression
- Why do we need Decision Trees
- How does Decision Trees work
- How do we select a root node
- Understanding Entropy, Information Gain
- Solving an Example on Entropy
- Understanding Gini Impurity
- Solving an Example on Gini Impurity
- Decision Trees for Regression
- Why decision trees are Greedy Approach
- Understanding Pruning
- What are Ensemble Techniques
- Understanding Bagging
- Understanding Boosting
- Understanding Stacking
- Decision Trees Aggregation
- Bagging and Variance Reduction
- Feature Subspace sampling
- Handling Overfitting
- Out of bag error
- Concept of Boosting
- Understanding Ada Boost
- Solving an Example on AdaBoost
- Understanding Gradient Boosting
- Solving an Example on Gradient Boosting
- AdaBoost vs Gradient Boosting
- Concept of XGBoost Algorithm
- Boosting Mechanism
- Feature Importance Interpretation
- Regularization Techniques
- Flexibility and Scalability
- How does K-Nearest Neighbours work
- How is Distance Calculated
- Euclidean Distance
- Hamming Distance
- Manhattan Distance
- Why is KNN a Lazy Learner
- Effects of Choosing the value of K
- Different ways to perform KNN
- Understanding KD-Tree
- Solving an Example of KD Tree
- Understanding Ball Tree
- Understanding Concept of SVC
- What are Support Vectors
- What is Margin
- Hard Margin and Soft Margin
- Kernelized SVC
- Types of Kernels
- Understanding SVR
- Why do we need Naive Bayes
- Concept of how it works
- Mathematical Intuition of Naive Bayes
- Solving an Example on Naive Bayes
- Other Bayes Classifiers
- Gaussian Naive Bayes Classifier
- Multinomial Naive Bayes Classifier
- Bernoulli Naive Bayes Classifier
- How clustering is different from classification
- Applications of Clustering
- What are density based methods
- What are Hierarchial based methods
- What are partitioning methods
- What are Grid Based methods
- Main Requirements for Clustering Algorithms
- Concept of K-Means Clustering
- Math Intuition Behind K-Means
- Cluster Building Process
- Edge Case Scenarios of K-Means
- Challenges and Improvements in K-Means
- Concept of Hierarchical Clustering
- Understanding Algorithm
- Understanding Linkage Methods
- Concept of DB SCAN
- Key words in understanding DB SCAN
- Algorithm of DB SCAN
- Understanding External Measures
- Rand Index
- Jaccard Co-efficient
- Understanding Internal Measures
- Cohesion
- Separation
- Computational Complexity
- Data Visualization Challenges
- Idea Behind PCA
- What are Principal Components
- Eigen Decomposition Approach
- Singular Value Decomposition Approach
- Why do we maximize Variance
- What is Explained Variance Ratio
- How to select optimal number of Principal Components
- Understanding Scree plot
- Issues with PCA
- Understanding Kernel PCA
- Regression Algorithms
- Linear Regression
- Polynomial Regression
- Classification Algorithms
- K-Nearest Neighbours
- Logistic Regression
- Both Classification and Regression
- Decision Trees
- Random Forest
- Gradient Boosting
- Ada Boost
- Ridge Regression
- Lasso Regression
- Clustering Algorithms
- K-Means
- DBSCAN
- HDBSCAN
- Hierarchical
- Dimensionality Reduction Techniques
- PCA
- t-SNE
- ICA
- Association Rules
- Apriori
- FP-growth
- FP-Max
- Understanding the Data
- Dealing with Null Values
- Data Visualization of the Numeric Columns
- Feature Engineering of the Numeric Columns
- Data Visualization of the Categorical Columns
- Feature Engineering of the Categorical Columns
- Model Selection: Choosing the right model for the problem (classification, regression, etc.).
- Training and Testing: Splitting data into training and testing sets to evaluate model performance.
- Evaluation Metrics: Using metrics like accuracy, precision, recall, and MSE for performance assessment.
- Cross-Validation: Implementing cross-validation techniques for more reliable model evaluation.
- Model Interpretability: Understanding and explaining model decisions.
- Hyperparameter Basics: Understanding what hyperparameters are in machine learning models.
- Tuning Techniques: Introducing Grid Search, Random Search, and Bayesian Optimization.
- Practical Implementation: Applying hyperparameter tuning on a sample model.
- Performance Impact: Assessing how hyperparameters influence model outcomes.
- Best Practices: Discussing balance in model complexity and overfitting.
- Early Developments: Tracing the origins and initial concepts of neural networks.
- Key Milestones: Highlighting major breakthroughs and influential models in deep learning.
- Deep Learning Resurgence: Understanding the factors contributing to the modern rise of deep learning.
- Influential Models: Overview of landmark models in deep learning history.
- Future Trends: Discussing current trends and potential future developments in deep learning.
- Introduction to TensorFlow
- Introduction to PyTorch
- Comparison of Frameworks
- Setting Up a Simple Neural Network in Both Frameworks
- Neural Network Structure: Exploring the basic architecture including neurons and layers.
- Forward Propagation: Understanding how data is processed in a neural network.
- Backpropagation and Training: Learning the mechanism of training neural networks.
- Activation Functions: Introduction to different activation functions and their purposes.
- Simple Implementation: Hands-on example of creating a basic neural network.
- Understanding the Architecture of CNNs
- Applications in Image Recognition
- Implementing a Basic CNN
- Architecture of RNNs
- Long Short-Term Memory (LSTM) Networks
- Applications in Time Series and Text
- Concepts of Transfer Learning
- Pre-trained Models
- Fine-Tuning Techniques
- Introduction to GANs
- Understanding Generator and Discriminator
- Simple GAN Implementation
- Understanding the Reinforcement Learning Framework
- Markov Decision Processes
- Basic Algorithms in Reinforcement Learning
- Deep Q-Networks (DQN)
- Policy Gradient Methods
- Real-world Applications
- Introduction to Model Deployment
- Flask for Python
- Docker Basics
- Understanding Bias in Data
- Ethical Considerations in AI
- Privacy and Data Security
- Text Preprocessing Techniques
- Introduction to NLTK and Spacy
- Basic Text Analysis
- Sentiment Analysis
- Topic Modeling
- Named Entity Recognition
- Basic Image Processing Techniques
- OpenCV Basics
- Simple Applications in Image Analysis
- Object Detection
- Face Recognition
- Optical Character Recognition (OCR)
- Introduction to AWS Sagemaker
- Introduction to Google Cloud ML Engine
- Deploying Models on Cloud Platforms
- Introduction to Big Data Ecosystem
- Basics of Apache Hadoop and Spark
- Big Data Processing and Analysis
- Getting Started with Tableau
- Basic Data Visualization Concepts
- Creating Dashboards and Stories in Tableau
- Advanced Data Visualization Techniques
- Working with Different Data Sources in Tableau
- Interactive Dashboards and Data Exploration
- Advanced Calculations in Tableau (Table Calculations, LOD Expressions)
- Working with Geospatial Data in Tableau
- Performance Optimization in Tableau
- Applying Tableau Skills in a Real-World Scenario
- Creating an End-to-End Data Visualization Project
- Storytelling with Data using Tableau
- Getting Started with Power BI
- Basic Data Modeling and Visualization
- Creating Reports and Dashboards in Power BI
- DAX Basics: Data Analysis Expressions
- Advanced Data Modeling in Power BI
- Interactive Reports and Dashboard Design
- Advanced DAX for Complex Calculations
- Working with Power BI Service (Cloud)
- Data Refresh, Security, and Administration in Power BI
- Applying Power BI Skills in a Business Context
- End-to-End Business Intelligence Project
- Sharing Insights and Dashboards with Stakeholders
- Leveraging Machine Learning Models in Tableau and Power BI
- Visualizing Machine Learning Outputs
- Advanced Analytics with Tableau and Power BI
- Finalizing the Capstone Project
- Presenting the Project to Peers or Stakeholders
- Receiving Feedback and Reflecting on the Learning Journey