Skip to content

Latest commit

 

History

History
104 lines (72 loc) · 2.36 KB

README_EN.md

File metadata and controls

104 lines (72 loc) · 2.36 KB

Data Analysis in Python

Outcome

After this course you will be able to process, summarize and visualize tabular data efficiently using the pandas library.

Target Audience

Analysts, researchers and engineers who would like to handle larger data sets more efficiently.

Prerequisites

Basic knowledge of Python

Course Description

The pandas Python library is a practical everyday tool for the analysis of tabular data. This course improves your skillset for working with datasets ranging from a few dozen to a several million entries in Python. The course uses hands-on examples to cover exploratory data analysis, extracting relevant summaries and creating attractive diagrams. The integration of pandas with interactive environments like IPython und Jupyter will allow you to support answers to many questions with data quickly.

Course Duration

14 hours

Course Outline

Day 1 Day 2
Introduction to pandas Aggregation
Data Wrangling Analyzing Time Series
Summarizing Data Geographical Data
Data Visualization pandas Best Practices

Day 1

Introduction to pandas

  • Your environment for interactive data analysis
  • overview of the pandas library
  • Series
  • DataFrames
  • Improvements in Python 3
  • Jupyter Notebooks

Data Wrangling

  • reading CSV- and Excel files to pandas
  • sorting data
  • transposing tables
  • selecting rows and columns
  • saving pandas-tables

Summarizing data

  • extracting statistical metrics
  • merging tables
  • hierarchical indexing
  • crosstables
  • pivot tables

Data Visualization

  • creating diagrams with matplotlib
  • using matplotlib from within pandas
  • visualizing data in Jupyter notebooks
  • heatmaps
  • multi-panel diagrams
  • creating high-quality figures
  • other libraries for visualizing data

Day 2

Aggregation

  • iterating rows and columns
  • grouping
  • aggregation functions
  • transformation functions
  • applying your own functions

Analyzing Time Series

  • series of timestamps
  • rescaling time series
  • changing timezones
  • handling data with gaps
  • rolling means
  • simple predictions

Geographical Data

  • storing coordinates in pandas
  • drawing maps with Basemap

Best Practices

  • myths and facts
  • Numpy
  • machine learning models in scikit-learn
  • alternative libraries and modeling strategies
  • handling huge datasets
  • do's and don'ts