After this course you will be able to process, summarize and visualize tabular data efficiently using the pandas
library.
Analysts, researchers and engineers who would like to handle larger data sets more efficiently.
Basic knowledge of Python
The pandas
Python library is a practical everyday tool for the analysis of tabular data.
This course improves your skillset for working with datasets ranging from a few dozen to a several million entries in Python. The course uses hands-on examples to cover exploratory data analysis, extracting relevant summaries and creating attractive diagrams.
The integration of pandas
with interactive environments like IPython und Jupyter will allow you to support answers to many questions with data quickly.
14 hours
Day 1 | Day 2 |
---|---|
Introduction to pandas | Aggregation |
Data Wrangling | Analyzing Time Series |
Summarizing Data | Geographical Data |
Data Visualization | pandas Best Practices |
- Your environment for interactive data analysis
- overview of the
pandas
library - Series
- DataFrames
- Improvements in Python 3
- Jupyter Notebooks
- reading CSV- and Excel files to
pandas
- sorting data
- transposing tables
- selecting rows and columns
- saving
pandas
-tables
- extracting statistical metrics
- merging tables
- hierarchical indexing
- crosstables
- pivot tables
- creating diagrams with
matplotlib
- using
matplotlib
from withinpandas
- visualizing data in Jupyter notebooks
- heatmaps
- multi-panel diagrams
- creating high-quality figures
- other libraries for visualizing data
- iterating rows and columns
- grouping
- aggregation functions
- transformation functions
- applying your own functions
- series of timestamps
- rescaling time series
- changing timezones
- handling data with gaps
- rolling means
- simple predictions
- storing coordinates in
pandas
- drawing maps with
Basemap
- myths and facts
- Numpy
- machine learning models in scikit-learn
- alternative libraries and modeling strategies
- handling huge datasets
- do's and don'ts