After this course you will be able to process, summarize and visualize
tabular data efficiently using the pandas
library.
Analysts, researchers and engineers who would like to handle larger data sets more efficiently.
Basic knowledge of Python
The pandas
Python library is a practical everyday tool for the
analysis of tabular data. This course improves your skillset for working
with datasets ranging from a few dozen to a several million entries in
Python. The course uses hands-on examples to cover exploratory data
analysis, extracting relevant summaries and creating attractive
diagrams. The integration of pandas
with interactive environments
like IPython und Jupyter will allow you to support answers to many
questions with data quickly.
14 hours
Day 1 | Day 2 |
---|---|
Introduction to pandas | Aggregation |
Data Wrangling | Analyzing Time Series |
Summarizing Data | Geographical Data |
Data Visualization | pandas Best Practices |
- Your environment for interactive data analysis
- overview of the
pandas
library - Series
- DataFrames
- Improvements in Python 3
- Jupyter Notebooks
- reading CSV- and Excel files to
pandas
- sorting data
- transposing tables
- selecting rows and columns
- saving
pandas
-tables
- extracting statistical metrics
- merging tables
- hierarchical indexing
- crosstables
- pivot tables
- creating diagrams with
matplotlib
- using
matplotlib
from withinpandas
- visualizing data in Jupyter notebooks
- heatmaps
- multi-panel diagrams
- creating high-quality figures
- other libraries for visualizing data
- iterating rows and columns
- grouping
- aggregation functions
- transformation functions
- applying your own functions
- series of timestamps
- rescaling time series
- changing timezones
- handling data with gaps
- rolling means
- simple predictions
- storing coordinates in
pandas
- drawing maps with
Basemap
- myths and facts
- Numpy
- machine learning models in scikit-learn
- alternative libraries and modeling strategies
- handling huge datasets
- do's and don'ts