diff --git a/01-run-quit.md b/01-run-quit.md new file mode 100644 index 000000000..2c0b1ae08 --- /dev/null +++ b/01-run-quit.md @@ -0,0 +1,667 @@ +--- +title: Running and Quitting +teaching: 15 +exercises: 0 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Launch the JupyterLab server. +- Create a new Python script. +- Create a Jupyter notebook. +- Shutdown the JupyterLab server. +- Understand the difference between a Python script and a Jupyter notebook. +- Create Markdown cells in a notebook. +- Create and run Python cells in a notebook. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I run Python programs? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +To run Python, we are going to use [Jupyter Notebooks][jupyter] via [JupyterLab][jupyterlab] for the remainder of this workshop. Jupyter notebooks are common in data science and visualization and serve as a convenient common-denominator experience for running Python code interactively where we can easily view and share the results of our Python code. + +There are other ways of editing, managing, and running code. Software developers often use an integrated development environment (IDE) like [PyCharm](https://www.jetbrains.com/pycharm/) or [Visual Studio Code](https://code.visualstudio.com/), or text editors like Vim or Emacs, to create and edit their Python programs. After editing and saving your Python programs you can execute those programs within the IDE itself or directly on the command line. In contrast, Jupyter notebooks let us execute and view the results of our Python code immediately within the notebook. + +JupyterLab has several other handy features: + +- You can easily type, edit, and copy and paste blocks of code. +- Tab complete allows you to easily access the names of things you are using + and learn more about them. +- It allows you to annotate your code with links, different sized text, bullets, etc. + to make it more accessible to you and your collaborators. +- It allows you to display figures next to the code that produces them + to tell a complete story of the analysis. + +Each notebook contains one or more cells that contain code, text, or images. + +## Getting Started with JupyterLab + +JupyterLab is an application server with a web user interface from [Project Jupyter][jupyter] that +enables one to work with documents and activities such as Jupyter notebooks, text editors, terminals, +and even custom components in a flexible, integrated, and extensible manner. JupyterLab requires a +reasonably up-to-date browser (ideally a current version of Chrome, Safari, or Firefox); Internet +Explorer versions 9 and below are *not* supported. + +JupyterLab is included as part of the Anaconda Python distribution. If you have not already +installed the Anaconda Python distribution, see [the setup instructions](../learners/setup.md) +for installation instructions. + +In this lesson we will run JupyterLab locally on our own machines so it will not require an internet connection besides +the initial connection to download and install Anaconda and JupyterLab + +- Start the JupyterLab server on your machine +- Use a web browser to open a special localhost URL that connects to your JupyterLab server +- The JupyterLab server does the work and the web browser renders the result +- Type code into the browser and see the results after your JupyterLab server has finished executing your code + +::::::::::::::::::::::::::::::::::::::::: callout + +## JupyterLab? What about Jupyter notebooks? + +JupyterLab is the [next stage in the evolution of the Jupyter Notebook](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html#overview). +If you have prior experience working with Jupyter notebooks, then you will have a good idea of what to expect from JupyterLab. + +Experienced users of Jupyter notebooks interested in a more detailed discussion of the similarities and differences +between the JupyterLab and Jupyter notebook user interfaces can find more information in the +[JupyterLab user interface documentation][jupyterlab-ui]. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Starting JupyterLab + +You can start the JupyterLab server through the command line or through an application called +`Anaconda Navigator`. Anaconda Navigator is included as part of the Anaconda Python distribution. + +### macOS - Command Line + +To start the JupyterLab server you will need to access the command line through the Terminal. +There are two ways to open Terminal on Mac. + +1. In your Applications folder, open Utilities and double-click on Terminal +2. Press Command + spacebar to launch Spotlight. Type `Terminal` and then + double-click the search result or hit Enter + +After you have launched Terminal, type the command to launch the JupyterLab server. + +```bash +$ jupyter lab +``` + +### Windows Users - Command Line + +To start the JupyterLab server you will need to access the Anaconda Prompt. + +Press Windows Logo Key and search for `Anaconda Prompt`, click the result or press enter. + +After you have launched the Anaconda Prompt, type the command: + +```bash +$ jupyter lab +``` + +### Anaconda Navigator + +To start a JupyterLab server from Anaconda Navigator you must first [start Anaconda Navigator (click for detailed instructions on macOS, Windows, and Linux)](https://docs.anaconda.com/free/navigator/getting-started/#navigator-starting-navigator). You can search for Anaconda Navigator via Spotlight on macOS (Command + spacebar), the Windows search function (Windows Logo Key) or opening a terminal shell and executing the `anaconda-navigator` executable from the command line. + +After you have launched Anaconda Navigator, click the `Launch` button under JupyterLab. You may need +to scroll down to find it. + +Here is a screenshot of an Anaconda Navigator page similar to the one that should open on either macOS +or Windows. + +

+ Anaconda Navigator landing page +

+ +And here is a screenshot of a JupyterLab landing page that should be similar to the one that opens in your +default web browser after starting the JupyterLab server on either macOS or Windows. + +

+ JupyterLab landing page +

+ +## The JupyterLab Interface + +JupyterLab has many features found in traditional integrated development environments (IDEs) but +is focused on providing flexible building blocks for interactive, exploratory computing. + +The [JupyterLab Interface][jupyterlab-ui] +consists of the Menu Bar, a collapsable Left Side Bar, and the Main Work Area which contains tabs +of documents and activities. + +### Menu Bar + +The Menu Bar at the top of JupyterLab has the top-level menus that expose various actions +available in JupyterLab along with their keyboard shortcuts (where applicable). The following +menus are included by default. + +- **File:** Actions related to files and directories such as *New*, *Open*, *Close*, *Save*, etc. The *File* menu also includes the *Shut Down* action used to shutdown the JupyterLab server. +- **Edit:** Actions related to editing documents and other activities such as *Undo*, *Cut*, *Copy*, *Paste*, etc. +- **View:** Actions that alter the appearance of JupyterLab. +- **Run:** Actions for running code in different activities such as notebooks and code consoles (discussed below). +- **Kernel:** Actions for managing kernels. Kernels in Jupyter will be explained in more detail below. +- **Tabs:** A list of the open documents and activities in the main work area. +- **Settings:** Common JupyterLab settings can be configured using this menu. There is also an *Advanced Settings Editor* option in the dropdown menu that provides more fine-grained control of JupyterLab settings and configuration options. +- **Help:** A list of JupyterLab and kernel help links. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Kernels + +The JupyterLab [docs](https://jupyterlab.readthedocs.io/en/stable/user/documents_kernels.html) +define kernels as "separate processes started by the server that runs your code in different programming languages and environments." +When we open a Jupyter Notebook, that starts a kernel - a process - that is going to run the code. +In this lesson, we'll be using the Jupyter ipython kernel which lets us run Python 3 code interactively. + +Using other Jupyter [kernels for other programming languages](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels) would let us +write and execute code in other programming languages in the same JupyterLab interface, like R, Java, Julia, Ruby, JavaScript, Fortran, +etc. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +A screenshot of the default Menu Bar is provided below. + +

JupyterLab Menu Bar +

+ +### Left Sidebar + +The left sidebar contains a number of commonly used tabs, such as a file browser (showing the +contents of the directory where the JupyterLab server was launched), a list of running kernels +and terminals, the command palette, and a list of open tabs in the main work area. A screenshot of +the default Left Side Bar is provided below. + +

JupyterLab Left Side Bar +

+ +The left sidebar can be collapsed or expanded by selecting "Show Left Sidebar" in the View menu or +by clicking on the active sidebar tab. + +### Main Work Area + +The main work area in JupyterLab enables you to arrange documents (notebooks, text files, etc.) +and other activities (terminals, code consoles, etc.) into panels of tabs that can be resized or +subdivided. A screenshot of the default Main Work Area is provided below. + +If you do not see the Launcher tab, click the blue plus sign under the "File" and "Edit" menus and it will appear. + +

JupyterLab Main Work Area +

+ +Drag a tab to the center of a tab panel to move the tab to the panel. Subdivide a tab panel by +dragging a tab to the left, right, top, or bottom of the panel. The work area has a single current +activity. The tab for the current activity is marked with a colored top border (blue by default). + +## Creating a Python script + +- To start writing a new Python program click the Text File icon under the *Other* header in the Launcher tab of the Main Work Area. + - You can also create a new plain text file by selecting the *New -> Text File* from the *File* menu in the Menu Bar. +- To convert this plain text file to a Python program, select the *Save File As* action from the *File* menu in the Menu Bar and give your new text file a name that ends with the `.py` extension. + - The `.py` extension lets everyone (including the operating system) know that this text file is a Python program. + - This is convention, not a requirement. + +## Creating a Jupyter Notebook + +To open a new notebook click the Python 3 icon under the *Notebook* header in the Launcher tab in +the main work area. You can also create a new notebook by selecting *New -> Notebook* from the *File* menu in the Menu Bar. + +Additional notes on Jupyter notebooks. + +- Notebook files have the extension `.ipynb` to distinguish them from plain-text Python programs. +- Notebooks can be exported as Python scripts that can be run from the command line. + +Below is a screenshot of a Jupyter notebook running inside JupyterLab. If you are interested in +more details, then see the [official notebook documentation][jupyterlab-notebook-docs]. + +

Example Jupyter Notebook +

+ +::::::::::::::::::::::::::::::::::::::::: callout + +## How It's Stored + +- The notebook file is stored in a format called JSON. +- Just like a webpage, what's saved looks different from what you see in your browser. +- But this format allows Jupyter to mix source code, text, and images, all in one file. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Arranging Documents into Panels of Tabs + +In the JupyterLab Main Work Area you can arrange documents into panels of tabs. Here is an +example from the [official documentation][jupyterlab]. + +

Multi-panel JupyterLab +

+ +First, create a text file, Python console, and terminal window and arrange them into three +panels in the main work area. Next, create a notebook, terminal window, and text file and +arrange them into three panels in the main work area. Finally, create your own combination of +panels and tabs. What combination of panels and tabs do you think will be most useful for your +workflow? + +::::::::::::::: solution + +## Solution + +After creating the necessary tabs, you can drag one of the tabs to the center of a panel to +move the tab to the panel; next you can subdivide a tab panel by dragging a tab to the left, +right, top, or bottom of the panel. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::: callout + +## Code vs. Text + +Jupyter mixes code and text in different types of blocks, called cells. We often use the term +"code" to mean "the source code of software written in a language such as Python". +A "code cell" in a Notebook is a cell that contains software; +a "text cell" is one that contains ordinary prose written for human beings. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## The Notebook has Command and Edit modes. + +- If you press Esc and Return alternately, the outer border of your code cell will change from gray to blue. +- These are the **Command** (gray) and **Edit** (blue) modes of your notebook. +- Command mode allows you to edit notebook-level features, and Edit mode changes the content of cells. +- When in Command mode (esc/gray), + - The b key will make a new cell below the currently selected cell. + - The a key will make one above. + - The x key will delete the current cell. + - The z key will undo your last cell operation (which could be a deletion, creation, etc). +- All actions can be done using the menus, but there are lots of keyboard shortcuts to speed things up. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Command Vs. Edit + +In the Jupyter notebook page are you currently in Command or Edit mode? +Switch between the modes. +Use the shortcuts to generate a new cell. +Use the shortcuts to delete a cell. +Use the shortcuts to undo the last cell operation you performed. + +::::::::::::::: solution + +## Solution + +Command mode has a grey border and Edit mode has a blue border. +Use Esc and Return to switch between modes. +You need to be in Command mode (Press Esc if your cell is blue). Type b or a. +You need to be in Command mode (Press Esc if your cell is blue). Type x. +You need to be in Command mode (Press Esc if your cell is blue). Type z. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +### Use the keyboard and mouse to select and edit cells. + +- Pressing the Return key turns the border blue and engages Edit mode, which allows + you to type within the cell. +- Because we want to be able to write many lines of code in a single cell, + pressing the Return key when in Edit mode (blue) moves the cursor to the next line + in the cell just like in a text editor. +- We need some other way to tell the Notebook we want to run what's in the cell. +- Pressing Shift\+Return together will execute the contents of the cell. +- Notice that the Return and Shift keys on the right of the keyboard are + right next to each other. + +### The Notebook will turn Markdown into pretty-printed documentation. + +- Notebooks can also render [Markdown][markdown]. + - A simple plain-text format for writing lists, links, + and other things that might go into a web page. + - Equivalently, a subset of HTML that looks like what you'd send in an old-fashioned email. +- Turn the current cell into a Markdown cell by entering the Command mode (Esc/gray) + and press the M key. +- `In [ ]:` will disappear to show it is no longer a code cell and you will be able to write in + Markdown. +- Turn the current cell into a Code cell by entering the Command mode (Esc/gray) and + press the y key. + +### Markdown does most of what HTML does. + +
+ +
+ +``` +* Use asterisks +* to create +* bullet lists. +``` + +
+ +
+ +- Use asterisks +- to create +- bullet lists. + +
+ +
+ +
+ +
+ +``` +1. Use numbers +1. to create +1. numbered lists. +``` + +
+ +
+ +1. Use numbers +2. to create +3. numbered lists. + +
+ +
+ +
+ +
+ +``` +* You can use indents + * To create sublists + * of the same type +* Or sublists + 1. Of different + 1. types +``` + +
+ +
+ +- You can use indents + - To create sublists + - of the same type +- Or sublists + 1. Of different + 2. types + +
+ +
+ +
+ +
+ +``` +# A Level-1 Heading +``` + +
+ +
+ +## A Level-1 Heading + +
+ +
+ +
+ +
+ +``` +## A Level-2 Heading (etc.) +``` + +
+ +
+ +## A Level-2 Heading (etc.) + +
+ +
+ +
+ +
+ +``` +Line breaks +don't matter. + +But blank lines +create new paragraphs. +``` + +
+ +
+ +Line breaks +don't matter. + +But blank lines +create new paragraphs. + +
+ +
+ +
+ +
+ +``` +[Create links](http://software-carpentry.org) with `[...](...)`. +Or use [named links][data_carpentry]. + +[data_carpentry]: http://datacarpentry.org +``` + +
+ +
+ +[Create links](https://software-carpentry.org) with `[...](...)`. +Or use [named links][data_carpentry]. + +
+ +
+ +::::::::::::::::::::::::::::::::::::::: challenge + +## Creating Lists in Markdown + +Create a nested list in a Markdown cell in a notebook that looks like this: + +1. Get funding. +2. Do work. + - Design experiment. + - Collect data. + - Analyze. +3. Write up. +4. Publish. + +::::::::::::::: solution + +## Solution + +This challenge integrates both the numbered list and bullet list. +Note that the bullet list is indented 2 spaces so that it is inline with the items of the numbered list. + +``` +1. Get funding. +2. Do work. + * Design experiment. + * Collect data. + * Analyze. +3. Write up. +4. Publish. +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## More Math + +What is displayed when a Python cell in a notebook +that contains several calculations is executed? +For example, what happens when this cell is executed? + +```python +7 * 3 +2 + 1 +``` + +::::::::::::::: solution + +## Solution + +Python returns the output of the last calculation. + +```python +3 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Change an Existing Cell from Code to Markdown + +What happens if you write some Python in a code cell +and then you switch it to a Markdown cell? +For example, +put the following in a code cell: + +```python +x = 6 * 7 + 12 +print(x) +``` + +And then run it with Shift\+Return to be sure that it works as a code cell. +Now go back to the cell and use Esc then m to switch the cell to Markdown +and "run" it with Shift\+Return. +What happened and how might this be useful? + +::::::::::::::: solution + +## Solution + +The Python code gets treated like Markdown text. +The lines appear as if they are part of one contiguous paragraph. +This could be useful to temporarily turn on and off cells in notebooks that get used for multiple purposes. + +```python +x = 6 * 7 + 12 print(x) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Equations + +Standard Markdown (such as we're using for these notes) won't render equations, +but the Notebook will. +Create a new Markdown cell +and enter the following: + +``` +$\sum_{i=1}^{N} 2^{-i} \approx 1$ +``` + +(It's probably easier to copy and paste.) +What does it display? +What do you think the underscore, `_`, circumflex, `^`, and dollar sign, `$`, do? + +::::::::::::::: solution + +## Solution + +The notebook shows the equation as it would be rendered from LaTeX equation syntax. +The dollar sign, `$`, is used to tell Markdown that the text in between is a LaTeX equation. +If you're not familiar with LaTeX, underscore, `_`, is used for subscripts and circumflex, `^`, is used for superscripts. +A pair of curly braces, `{` and `}`, is used to group text together so that the statement `i=1` becomes the subscript and `N` becomes the superscript. +Similarly, `-i` is in curly braces to make the whole statement the superscript for `2`. +`\sum` and `\approx` are LaTeX commands for "sum over" and "approximate" symbols. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Closing JupyterLab + +- From the Menu Bar select the "File" menu and then choose "Shut Down" at the bottom of the dropdown menu. You will be prompted to confirm that you wish to shutdown the JupyterLab server (don't forget to save your work!). Click "Shut Down" to shutdown the JupyterLab server. +- To restart the JupyterLab server you will need to re-run the following command from a shell. + +``` +$ jupyter lab +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Closing JupyterLab + +Practice closing and restarting the JupyterLab server. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + + +[jupyterlab]: https://jupyterlab.readthedocs.io/en/stable/ +[jupyterlab-ui]: https://jupyterlab.readthedocs.io/en/stable/user/interface.html +[jupyterlab-notebook-docs]: https://jupyterlab.readthedocs.io/en/stable/user/notebook.html +[markdown]: https://en.wikipedia.org/wiki/Markdown +[data_carpentry]: https://datacarpentry.org + + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Python scripts are plain text files. +- Use the Jupyter Notebook for editing and running Python. +- The Notebook has Command and Edit modes. +- Use the keyboard and mouse to select and edit cells. +- The Notebook will turn Markdown into pretty-printed documentation. +- Markdown does most of what HTML does. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/02-variables.md b/02-variables.md new file mode 100644 index 000000000..561406f8a --- /dev/null +++ b/02-variables.md @@ -0,0 +1,420 @@ +--- +title: Variables and Assignment +teaching: 10 +exercises: 10 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Write programs that assign scalar values to variables and perform calculations with those values. +- Correctly trace value changes in programs that use scalar assignment. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I store data in programs? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Use variables to store values. + +- **Variables** are names for values. + +- Variable names + + - can **only** contain letters, digits, and underscore `_` (typically used to separate words in long variable names) + - cannot start with a digit + - are **case sensitive** (age, Age and AGE are three different variables) + +- The name should also be meaningful so you or another programmer know what it is + +- Variable names that start with underscores like `__alistairs_real_age` have a special meaning + so we won't do that until we understand the convention. + +- In Python the `=` symbol assigns the value on the right to the name on the left. + +- The variable is created when a value is assigned to it. + +- Here, Python assigns an age to a variable `age` + and a name in quotes to a variable `first_name`. + + ```python + age = 42 + first_name = 'Ahmed' + ``` + +## Use `print` to display values. + +- Python has a built-in function called `print` that prints things as text. +- Call the function (i.e., tell Python to run it) by using its name. +- Provide values to the function (i.e., the things to print) in parentheses. +- To add a string to the printout, wrap the string in single or double quotes. +- The values passed to the function are called **arguments** + +```python +print(first_name, 'is', age, 'years old') +``` + +```output +Ahmed is 42 years old +``` + +- `print` automatically puts a single space between items to separate them. +- And wraps around to a new line at the end. + +## Variables must be created before they are used. + +- If a variable doesn't exist yet, or if the name has been mis-spelled, + Python reports an error. (Unlike some languages, which "guess" a default value.) + +```python +print(last_name) +``` + +```error +--------------------------------------------------------------------------- +NameError Traceback (most recent call last) + in () +----> 1 print(last_name) + +NameError: name 'last_name' is not defined +``` + +- The last line of an error message is usually the most informative. +- We will look at error messages in detail [later](17-scope.md#reading-error-messages). + +::::::::::::::::::::::::::::::::::::::::: callout + +## Variables Persist Between Cells + +Be aware that it is the *order* of execution of cells that is important in a Jupyter notebook, not the order +in which they appear. Python will remember *all* the code that was run previously, including any variables you have +defined, irrespective of the order in the notebook. Therefore if you define variables lower down the notebook and then +(re)run cells further up, those defined further down will still be present. As an example, create two cells with the +following content, in this order: + +```python +print(myval) +``` + +```python +myval = 1 +``` + +If you execute this in order, the first cell will give an error. However, if you run the first cell *after* the second +cell it will print out `1`. To prevent confusion, it can be helpful to use the `Kernel` -> `Restart & Run All` option which +clears the interpreter and runs everything from a clean slate going top to bottom. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Variables can be used in calculations. + +- We can use variables in calculations just as if they were values. + - Remember, we assigned the value `42` to `age` a few lines ago. + +```python +age = age + 3 +print('Age in three years:', age) +``` + +```output +Age in three years: 45 +``` + +## Use an index to get a single character from a string. + +- The characters (individual letters, numbers, and so on) in a string are + ordered. For example, the string `'AB'` is not the same as `'BA'`. Because of + this ordering, we can treat the string as a list of characters. +- Each position in the string (first, second, etc.) is given a number. This + number is called an **index** or sometimes a subscript. +- Indices are numbered from 0. +- Use the position's index in square brackets to get the character at that + position. + +![A line of Python code, print(atom\_name[0]), demonstrates that using the zero index will output just the initial letter, in this case 'h' for helium.](fig/2_indexing.svg) + +```python +atom_name = 'helium' +print(atom_name[0]) +``` + +```output +h +``` + +## Use a slice to get a substring. + +- A part of a string is called a **substring**. A substring can be as short as a + single character. +- An item in a list is called an element. Whenever we treat a string as if it + were a list, the string's elements are its individual characters. +- A slice is a part of a string (or, more generally, a part of any list-like thing). +- We take a slice with the notation `[start:stop]`, where `start` is the integer + index of the first element we want and `stop` is the integer index of + the element *just after* the last element we want. +- The difference between `stop` and `start` is the slice's length. +- Taking a slice does not change the contents of the original string. Instead, + taking a slice returns a copy of part of the original string. + +```python +atom_name = 'sodium' +print(atom_name[0:3]) +``` + +```output +sod +``` + +## Use the built-in function `len` to find the length of a string. + +```python +print(len('helium')) +``` + +```output +6 +``` + +- Nested functions are evaluated from the inside out, + like in mathematics. + +## Python is case-sensitive. + +- Python thinks that upper- and lower-case letters are different, + so `Name` and `name` are different variables. +- There are conventions for using upper-case letters at the start of variable names so we will use lower-case letters for now. + +## Use meaningful variable names. + +- Python doesn't care what you call variables as long as they obey the rules + (alphanumeric characters and the underscore). + +```python +flabadab = 42 +ewr_422_yY = 'Ahmed' +print(ewr_422_yY, 'is', flabadab, 'years old') +``` + +- Use meaningful variable names to help other people understand what the program does. +- The most important "other person" is your future self. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Swapping Values + +Fill the table showing the values of the variables in this program +*after* each statement is executed. + +```python +# Command # Value of x # Value of y # Value of swap # +x = 1.0 # # # # +y = 3.0 # # # # +swap = x # # # # +x = y # # # # +y = swap # # # # +``` + +::::::::::::::: solution + +## Solution + +```output +# Command # Value of x # Value of y # Value of swap # +x = 1.0 # 1.0 # not defined # not defined # +y = 3.0 # 1.0 # 3.0 # not defined # +swap = x # 1.0 # 3.0 # 1.0 # +x = y # 3.0 # 3.0 # 1.0 # +y = swap # 3.0 # 1.0 # 1.0 # +``` + +These three lines exchange the values in `x` and `y` using the `swap` +variable for temporary storage. This is a fairly common programming idiom. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Predicting Values + +What is the final value of `position` in the program below? +(Try to predict the value without running the program, +then check your prediction.) + +```python +initial = 'left' +position = initial +initial = 'right' +``` + +::::::::::::::: solution + +## Solution + +```python +print(position) +``` + +```output +left +``` + +The `initial` variable is assigned the value `'left'`. +In the second line, the `position` variable also receives +the string value `'left'`. In third line, the `initial` variable is given the +value `'right'`, but the `position` variable retains its string value +of `'left'`. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Challenge + +If you assign `a = 123`, +what happens if you try to get the second digit of `a` via `a[1]`? + +::::::::::::::: solution + +## Solution + +Numbers are not strings or sequences and Python will raise an error if you try to perform an index operation on a +number. In the [next lesson on types and type conversion](03-types-conversion.md) +we will learn more about types and how to convert between different types. If you want the Nth digit of a number you +can convert it into a string using the `str` built-in function and then perform an index operation on that string. + +```python +a = 123 +print(a[1]) +``` + +```error +TypeError: 'int' object is not subscriptable +``` + +```python +a = str(123) +print(a[1]) +``` + +```output +2 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Choosing a Name + +Which is a better variable name, `m`, `min`, or `minutes`? +Why? +Hint: think about which code you would rather inherit +from someone who is leaving the lab: + +1. `ts = m * 60 + s` +2. `tot_sec = min * 60 + sec` +3. `total_seconds = minutes * 60 + seconds` + +::::::::::::::: solution + +## Solution + +`minutes` is better because `min` might mean something like "minimum" +(and actually is an existing built-in function in Python that we will cover later). + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Slicing practice + +What does the following program print? + +```python +atom_name = 'carbon' +print('atom_name[1:3] is:', atom_name[1:3]) +``` + +::::::::::::::: solution + +## Solution + +```output +atom_name[1:3] is: ar +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Slicing concepts + +Given the following string: + +```python +species_name = "Acacia buxifolia" +``` + +What would these expressions return? + +1. `species_name[2:8]` +2. `species_name[11:]` (without a value after the colon) +3. `species_name[:4]` (without a value before the colon) +4. `species_name[:]` (just a colon) +5. `species_name[11:-3]` +6. `species_name[-5:-3]` +7. What happens when you choose a `stop` value which is out of range? (i.e., try `species_name[0:20]` or `species_name[:103]`) + +::::::::::::::: solution + +## Solutions + +1. `species_name[2:8]` returns the substring `'acia b'` +2. `species_name[11:]` returns the substring `'folia'`, from position 11 until the end +3. `species_name[:4]` returns the substring `'Acac'`, from the start up to but not including position 4 +4. `species_name[:]` returns the entire string `'Acacia buxifolia'` +5. `species_name[11:-3]` returns the substring `'fo'`, from the 11th position to the third last position +6. `species_name[-5:-3]` also returns the substring `'fo'`, from the fifth last position to the third last +7. If a part of the slice is out of range, the operation does not fail. `species_name[0:20]` gives the same result as `species_name[0:]`, and `species_name[:103]` gives the same result as `species_name[:]` + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Use variables to store values. +- Use `print` to display values. +- Variables persist between cells. +- Variables must be created before they are used. +- Variables can be used in calculations. +- Use an index to get a single character from a string. +- Use a slice to get a substring. +- Use the built-in function `len` to find the length of a string. +- Python is case-sensitive. +- Use meaningful variable names. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/03-types-conversion.md b/03-types-conversion.md new file mode 100644 index 000000000..f20125577 --- /dev/null +++ b/03-types-conversion.md @@ -0,0 +1,499 @@ +--- +title: Data Types and Type Conversion +teaching: 10 +exercises: 10 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Explain key differences between integers and floating point numbers. +- Explain key differences between numbers and character strings. +- Use built-in functions to convert between integers, floating point numbers, and strings. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- What kinds of data do programs store? +- How can I convert one type to another? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Every value has a type. + +- Every value in a program has a specific type. +- Integer (`int`): represents positive or negative whole numbers like 3 or -512. +- Floating point number (`float`): represents real numbers like 3.14159 or -2.5. +- Character string (usually called "string", `str`): text. + - Written in either single quotes or double quotes (as long as they match). + - The quote marks aren't printed when the string is displayed. + +## Use the built-in function `type` to find the type of a value. + +- Use the built-in function `type` to find out what type a value has. +- Works on variables as well. + - But remember: the *value* has the type --- the *variable* is just a label. + +```python +print(type(52)) +``` + +```output + +``` + +```python +fitness = 'average' +print(type(fitness)) +``` + +```output + +``` + +## Types control what operations (or methods) can be performed on a given value. + +- A value's type determines what the program can do to it. + +```python +print(5 - 3) +``` + +```output +2 +``` + +```python +print('hello' - 'h') +``` + +```error +--------------------------------------------------------------------------- +TypeError Traceback (most recent call last) + in () +----> 1 print('hello' - 'h') + +TypeError: unsupported operand type(s) for -: 'str' and 'str' +``` + +## You can use the "+" and "\*" operators on strings. + +- "Adding" character strings concatenates them. + +```python +full_name = 'Ahmed' + ' ' + 'Walsh' +print(full_name) +``` + +```output +Ahmed Walsh +``` + +- Multiplying a character string by an integer *N* creates a new string that consists of that character string repeated *N* times. + - Since multiplication is repeated addition. + +```python +separator = '=' * 10 +print(separator) +``` + +```output +========== +``` + +## Strings have a length (but numbers don't). + +- The built-in function `len` counts the number of characters in a string. + +```python +print(len(full_name)) +``` + +```output +11 +``` + +- But numbers don't have a length (not even zero). + +```python +print(len(52)) +``` + +```error +--------------------------------------------------------------------------- +TypeError Traceback (most recent call last) + in () +----> 1 print(len(52)) + +TypeError: object of type 'int' has no len() +``` + +## Must convert numbers to strings or vice versa when operating on them. {#convert-numbers-and-strings} + +- Cannot add numbers and strings. + +```python +print(1 + '2') +``` + +```error +--------------------------------------------------------------------------- +TypeError Traceback (most recent call last) + in () +----> 1 print(1 + '2') + +TypeError: unsupported operand type(s) for +: 'int' and 'str' +``` + +- Not allowed because it's ambiguous: should `1 + '2'` be `3` or `'12'`? +- Some types can be converted to other types by using the type name as a function. + +```python +print(1 + int('2')) +print(str(1) + '2') +``` + +```output +3 +12 +``` + +## Can mix integers and floats freely in operations. + +- Integers and floating-point numbers can be mixed in arithmetic. + - Python 3 automatically converts integers to floats as needed. + +```python +print('half is', 1 / 2.0) +print('three squared is', 3.0 ** 2) +``` + +```output +half is 0.5 +three squared is 9.0 +``` + +## Variables only change value when something is assigned to them. + +- If we make one cell in a spreadsheet depend on another, + and update the latter, + the former updates automatically. +- This does **not** happen in programming languages. + +```python +variable_one = 1 +variable_two = 5 * variable_one +variable_one = 2 +print('first is', variable_one, 'and second is', variable_two) +``` + +```output +first is 2 and second is 5 +``` + +- The computer reads the value of `variable_one` when doing the multiplication, + creates a new value, and assigns it to `variable_two`. +- Afterwards, the value of `variable_two` is set to the new value and *not dependent on `variable_one`* so its value + does not automatically change when `variable_one` changes. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Fractions + +What type of value is 3.4? +How can you find out? + +::::::::::::::: solution + +## Solution + +It is a floating-point number (often abbreviated "float"). +It is possible to find out by using the built-in function `type()`. + +```python +print(type(3.4)) +``` + +```output + +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Automatic Type Conversion + +What type of value is 3.25 + 4? + +::::::::::::::: solution + +## Solution + +It is a float: +integers are automatically converted to floats as necessary. + +```python +result = 3.25 + 4 +print(result, 'is', type(result)) +``` + +```output +7.25 is +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Choose a Type + +What type of value (integer, floating point number, or character string) +would you use to represent each of the following? Try to come up with more than one good answer for each problem. For example, in # 1, when would counting days with a floating point variable make more sense than using an integer? + +1. Number of days since the start of the year. +2. Time elapsed from the start of the year until now in days. +3. Serial number of a piece of lab equipment. +4. A lab specimen's age +5. Current population of a city. +6. Average population of a city over time. + +::::::::::::::: solution + +## Solution + +The answers to the questions are: + +1. Integer, since the number of days would lie between 1 and 365. +2. Floating point, since fractional days are required +3. Character string if serial number contains letters and numbers, otherwise integer if the serial number consists only of numerals +4. This will vary! How do you define a specimen's age? whole days since collection (integer)? date and time (string)? +5. Choose floating point to represent population as large aggregates (eg millions), or integer to represent population in units of individuals. +6. Floating point number, since an average is likely to have a fractional part. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Division Types + +In Python 3, the `//` operator performs integer (whole-number) floor division, the `/` operator performs floating-point +division, and the `%` (or *modulo*) operator calculates and returns the remainder from integer division: + +```python +print('5 // 3:', 5 // 3) +print('5 / 3:', 5 / 3) +print('5 % 3:', 5 % 3) +``` + +```output +5 // 3: 1 +5 / 3: 1.6666666666666667 +5 % 3: 2 +``` + +If `num_subjects` is the number of subjects taking part in a study, +and `num_per_survey` is the number that can take part in a single survey, +write an expression that calculates the number of surveys needed +to reach everyone once. + +::::::::::::::: solution + +## Solution + +We want the minimum number of surveys that reaches everyone once, which is +the rounded up value of `num_subjects/ num_per_survey`. This is +equivalent to performing a floor division with `//` and adding 1. Before +the division we need to subtract 1 from the number of subjects to deal with +the case where `num_subjects` is evenly divisible by `num_per_survey`. + +```python +num_subjects = 600 +num_per_survey = 42 +num_surveys = (num_subjects - 1) // num_per_survey + 1 + +print(num_subjects, 'subjects,', num_per_survey, 'per survey:', num_surveys) +``` + +```output +600 subjects, 42 per survey: 15 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Strings to Numbers + +Where reasonable, `float()` will convert a string to a floating point number, +and `int()` will convert a floating point number to an integer: + +```python +print("string to float:", float("3.4")) +print("float to int:", int(3.4)) +``` + +```output +string to float: 3.4 +float to int: 3 +``` + +If the conversion doesn't make sense, however, an error message will occur. + +```python +print("string to float:", float("Hello world!")) +``` + +```error +--------------------------------------------------------------------------- +ValueError Traceback (most recent call last) + in +----> 1 print("string to float:", float("Hello world!")) + +ValueError: could not convert string to float: 'Hello world!' +``` + +Given this information, what do you expect the following program to do? + +What does it actually do? + +Why do you think it does that? + +```python +print("fractional string to int:", int("3.4")) +``` + +::::::::::::::: solution + +## Solution + +What do you expect this program to do? It would not be so unreasonable to expect the Python 3 `int` command to +convert the string "3.4" to 3.4 and an additional type conversion to 3. After all, Python 3 performs a lot of other +magic - isn't that part of its charm? + +```python +int("3.4") +``` + +```output +--------------------------------------------------------------------------- +ValueError Traceback (most recent call last) + in +----> 1 int("3.4") +ValueError: invalid literal for int() with base 10: '3.4' +``` + +However, Python 3 throws an error. Why? To be consistent, possibly. If you ask Python to perform two consecutive +typecasts, you must convert it explicitly in code. + +```python +int(float("3.4")) +``` + +```output +3 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Arithmetic with Different Types + +Which of the following will return the floating point number `2.0`? +Note: there may be more than one right answer. + +```python +first = 1.0 +second = "1" +third = "1.1" +``` + +1. `first + float(second)` +2. `float(second) + float(third)` +3. `first + int(third)` +4. `first + int(float(third))` +5. `int(first) + int(float(third))` +6. `2.0 * second` + +::::::::::::::: solution + +## Solution + +Answer: 1 and 4 + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Complex Numbers + +Python provides complex numbers, +which are written as `1.0+2.0j`. +If `val` is a complex number, +its real and imaginary parts can be accessed using *dot notation* +as `val.real` and `val.imag`. + +```python +a_complex_number = 6 + 2j +print(a_complex_number.real) +print(a_complex_number.imag) +``` + +```output +6.0 +2.0 +``` + +1. Why do you think Python uses `j` instead of `i` for the imaginary part? +2. What do you expect `1 + 2j + 3` to produce? +3. What do you expect `4j` to be? What about `4 j` or `4 + j`? + +::::::::::::::: solution + +## Solution + +1. Standard mathematics treatments typically use `i` to denote an imaginary number. However, from media reports it + was an early convention established from electrical engineering that now presents a technically expensive area to + change. [Stack Overflow provides additional explanation and + discussion.](https://stackoverflow.com/questions/24812444/why-are-complex-numbers-in-python-denoted-with-j-instead-of-i) +2. `(4+2j)` +3. `4j` and `Syntax Error: invalid syntax`. In the latter cases, `j` is considered a variable and the statement + depends on if `j` is defined and if so, its assigned value. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Every value has a type. +- Use the built-in function `type` to find the type of a value. +- Types control what operations can be done on values. +- Strings can be added and multiplied. +- Strings have a length (but numbers don't). +- Must convert numbers to strings or vice versa when operating on them. +- Can mix integers and floats freely in operations. +- Variables only change value when something is assigned to them. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/04-built-in.md b/04-built-in.md new file mode 100644 index 000000000..e11685be7 --- /dev/null +++ b/04-built-in.md @@ -0,0 +1,424 @@ +--- +title: Built-in Functions and Help +teaching: 15 +exercises: 10 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Explain the purpose of functions. +- Correctly call built-in Python functions. +- Correctly nest calls to built-in functions. +- Use help to display documentation for built-in functions. +- Correctly describe situations in which SyntaxError and NameError occur. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I use built-in functions? +- How can I find out what they do? +- What kind of errors can occur in programs? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Use comments to add documentation to programs. + +```python +# This sentence isn't executed by Python. +adjustment = 0.5 # Neither is this - anything after '#' is ignored. +``` + +## A function may take zero or more arguments. + +- We have seen some functions already --- now let's take a closer look. +- An *argument* is a value passed into a function. +- `len` takes exactly one. +- `int`, `str`, and `float` create a new value from an existing one. +- `print` takes zero or more. +- `print` with no arguments prints a blank line. + - Must always use parentheses, even if they're empty, + so that Python knows a function is being called. + +```python +print('before') +print() +print('after') +``` + +```output +before + +after +``` + +## Every function returns something. + +- Every function call produces some result. +- If the function doesn't have a useful result to return, + it usually returns the special value `None`. `None` is a Python + object that stands in anytime there is no value. + +```python +result = print('example') +print('result of print is', result) +``` + +```output +example +result of print is None +``` + +## Commonly-used built-in functions include `max`, `min`, and `round`. + +- Use `max` to find the largest value of one or more values. +- Use `min` to find the smallest. +- Both work on character strings as well as numbers. + - "Larger" and "smaller" use (0-9, A-Z, a-z) to compare letters. + +```python +print(max(1, 2, 3)) +print(min('a', 'A', '0')) +``` + +```output +3 +0 +``` + +## Functions may only work for certain (combinations of) arguments. + +- `max` and `min` must be given at least one argument. + - "Largest of the empty set" is a meaningless question. +- And they must be given things that can meaningfully be compared. + +```python +print(max(1, 'a')) +``` + +```error +TypeError Traceback (most recent call last) + in +----> 1 print(max(1, 'a')) + +TypeError: '>' not supported between instances of 'str' and 'int' +``` + +## Functions may have default values for some arguments. + +- `round` will round off a floating-point number. +- By default, rounds to zero decimal places. + +```python +round(3.712) +``` + +```output +4 +``` + +- We can specify the number of decimal places we want. + +```python +round(3.712, 1) +``` + +```output +3.7 +``` + +## Functions attached to objects are called methods + +- Functions take another form that will be common in the pandas episodes. +- Methods have parentheses like functions, but come after the variable. +- Some methods are used for internal Python operations, and are marked with double underlines. + +```python +my_string = 'Hello world!' # creation of a string object + +print(len(my_string)) # the len function takes a string as an argument and returns the length of the string + +print(my_string.swapcase()) # calling the swapcase method on the my_string object + +print(my_string.__len__()) # calling the internal __len__ method on the my_string object, used by len(my_string) + +``` + +```output +12 +hELLO WORLD! +12 +``` + +- You might even see them chained together. They operate left to right. + +```python +print(my_string.isupper()) # Not all the letters are uppercase +print(my_string.upper()) # This capitalizes all the letters + +print(my_string.upper().isupper()) # Now all the letters are uppercase +``` + +```output +False +HELLO WORLD +True +``` + +## Use the built-in function `help` to get help for a function. + +- Every built-in function has online documentation. + +```python +help(round) +``` + +```output +Help on built-in function round in module builtins: + +round(number, ndigits=None) + Round a number to a given precision in decimal digits. + + The return value is an integer if ndigits is omitted or None. Otherwise + the return value has the same type as the number. ndigits may be negative. +``` + +## The Jupyter Notebook has two ways to get help. + +- Option 1: Place the cursor near where the function is invoked in a cell + (i.e., the function name or its parameters), + - Hold down Shift, and press Tab. + - Do this several times to expand the information returned. +- Option 2: Type the function name in a cell with a question mark after it. Then run the cell. + +## Python reports a syntax error when it can't understand the source of a program. + +- Won't even try to run the program if it can't be parsed. + +```python +# Forgot to close the quote marks around the string. +name = 'Feng +``` + +```error + File "", line 2 + name = 'Feng + ^ +SyntaxError: EOL while scanning string literal +``` + +```python +# An extra '=' in the assignment. +age = = 52 +``` + +```error + File "", line 2 + age = = 52 + ^ +SyntaxError: invalid syntax +``` + +- Look more closely at the error message: + +```python +print("hello world" +``` + +```error + File "", line 1 + print ("hello world" + ^ +SyntaxError: unexpected EOF while parsing +``` + +- The message indicates a problem on first line of the input ("line 1"). + - In this case the "ipython-input" section of the file name tells us that + we are working with input into IPython, + the Python interpreter used by the Jupyter Notebook. +- The `-6-` part of the filename indicates that + the error occurred in cell 6 of our Notebook. +- Next is the problematic line of code, + indicating the problem with a `^` pointer. + +## Python reports a runtime error when something goes wrong while a program is executing. {#runtime-error} + +```python +age = 53 +remaining = 100 - aege # mis-spelled 'age' +``` + +```error +NameError Traceback (most recent call last) + in + 1 age = 53 +----> 2 remaining = 100 - aege # mis-spelled 'age' + +NameError: name 'aege' is not defined +``` + +- Fix syntax errors by reading the source and runtime errors by tracing execution. + +::::::::::::::::::::::::::::::::::::::: challenge + +## What Happens When + +1. Explain in simple terms the order of operations in the following program: + when does the addition happen, when does the subtraction happen, + when is each function called, etc. +2. What is the final value of `radiance`? + +```python +radiance = 1.0 +radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5)) +``` + +::::::::::::::: solution + +## Solution + +1. Order of operations: + 1. `1.1 * radiance = 1.1` + 2. `1.1 - 0.5 = 0.6` + 3. `min(radiance, 0.6) = 0.6` + 4. `2.0 + 0.6 = 2.6` + 5. `max(2.1, 2.6) = 2.6` +2. At the end, `radiance = 2.6` + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Spot the Difference + +1. Predict what each of the `print` statements in the program below will print. +2. Does `max(len(rich), poor)` run or produce an error message? + If it runs, does its result make any sense? + +```python +easy_string = "abc" +print(max(easy_string)) +rich = "gold" +poor = "tin" +print(max(rich, poor)) +print(max(len(rich), len(poor))) +``` + +::::::::::::::: solution + +## Solution + +```python +print(max(easy_string)) +``` + +```output +c +``` + +```python +print(max(rich, poor)) +``` + +```output +tin +``` + +```python +print(max(len(rich), len(poor))) +``` + +```output +4 +``` + +`max(len(rich), poor)` throws a TypeError. This turns into `max(4, 'tin')` and +as we discussed earlier a string and integer cannot meaningfully be compared. + +```error +TypeError Traceback (most recent call last) + in +----> 1 max(len(rich), poor) + +TypeError: '>' not supported between instances of 'str' and 'int' +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Why Not? + +Why is it that `max` and `min` do not return `None` when they are called with no arguments? + +::::::::::::::: solution + +## Solution + +`max` and `min` return TypeErrors in this case because the correct number of parameters +was not supplied. If it just returned `None`, the error would be much harder to trace as it +would likely be stored into a variable and used later in the program, only to likely throw +a runtime error. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Last Character of a String + +If Python starts counting from zero, +and `len` returns the number of characters in a string, +what index expression will get the last character in the string `name`? +(Note: we will see a simpler way to do this in a later episode.) + +::::::::::::::: solution + +## Solution + +`name[len(name) - 1]` + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::: callout + +## Explore the Python docs! + +The [official Python documentation](https://docs.python.org/3/) is arguably the most complete +source of information about the language. It is available in different languages and contains a lot of useful +resources. The [Built-in Functions page](https://docs.python.org/3/library/functions.html) contains a catalogue of +all of these functions, including the ones that we've covered in this lesson. Some of these are more advanced and +unnecessary at the moment, but others are very simple and useful. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Use comments to add documentation to programs. +- A function may take zero or more arguments. +- Commonly-used built-in functions include `max`, `min`, and `round`. +- Functions may only work for certain (combinations of) arguments. +- Functions may have default values for some arguments. +- Use the built-in function `help` to get help for a function. +- The Jupyter Notebook has two ways to get help. +- Every function returns something. +- Python reports a syntax error when it can't understand the source of a program. +- Python reports a runtime error when something goes wrong while a program is executing. +- Fix syntax errors by reading the source code, and runtime errors by tracing the program's execution. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/05-coffee.md b/05-coffee.md new file mode 100644 index 000000000..366017b40 --- /dev/null +++ b/05-coffee.md @@ -0,0 +1,16 @@ +--- +title: Morning Coffee +teaching: 0 +exercises: 0 +break: 15 +--- + +## Reflection exercise + +Over coffee, reflect on and discuss the following: + +- What are the different kinds of errors Python will report? +- Did the code always produce the results you expected? If not, why? +- Is there something we can do to prevent errors when we write code? + + diff --git a/06-libraries.md b/06-libraries.md new file mode 100644 index 000000000..b6d23d4b4 --- /dev/null +++ b/06-libraries.md @@ -0,0 +1,470 @@ +--- +title: Libraries +teaching: 10 +exercises: 10 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Explain what software libraries are and why programmers create and use them. +- Write programs that import and use modules from Python's standard library. +- Find and read documentation for the standard library interactively (in the interpreter) and online. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I use software that other people have written? +- How can I find out what that software does? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Most of the power of a programming language is in its libraries. + +- A *library* is a collection of files (called *modules*) that contains + functions for use by other programs. + - May also contain data values (e.g., numerical constants) and other things. + - Library's contents are supposed to be related, but there's no way to enforce that. +- The Python [standard library][stdlib] is an extensive suite of modules that comes + with Python itself. +- Many additional libraries are available from [PyPI][pypi] (the Python Package Index). +- We will see later how to write new libraries. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Libraries and modules + +A library is a collection of modules, but the terms are often used +interchangeably, especially since many libraries only consist of a single +module, so don't worry if you mix them. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## A program must import a library module before using it. + +- Use `import` to load a library module into a program's memory. +- Then refer to things from the module as `module_name.thing_name`. + - Python uses `.` to mean "part of". +- Using `math`, one of the modules in the standard library: + +```python +import math + +print('pi is', math.pi) +print('cos(pi) is', math.cos(math.pi)) +``` + +```output +pi is 3.141592653589793 +cos(pi) is -1.0 +``` + +- Have to refer to each item with the module's name. + - `math.cos(pi)` won't work: the reference to `pi` + doesn't somehow "inherit" the function's reference to `math`. + +## Use `help` to learn about the contents of a library module. + +- Works just like help for a function. + +```python +help(math) +``` + +```output +Help on module math: + +NAME + math + +MODULE REFERENCE + http://docs.python.org/3/library/math + + The following documentation is automatically generated from the Python + source files. It may be incomplete, incorrect or include features that + are considered implementation detail and may vary between Python + implementations. When in doubt, consult the module reference at the + location listed above. + +DESCRIPTION + This module is always available. It provides access to the + mathematical functions defined by the C standard. + +FUNCTIONS + acos(x, /) + Return the arc cosine (measured in radians) of x. +⋮ ⋮ ⋮ +``` + +## Import specific items from a library module to shorten programs. + +- Use `from ... import ...` to load only specific items from a library module. +- Then refer to them directly without library name as prefix. + +```python +from math import cos, pi + +print('cos(pi) is', cos(pi)) +``` + +```output +cos(pi) is -1.0 +``` + +## Create an alias for a library module when importing it to shorten programs. + +- Use `import ... as ...` to give a library a short *alias* while importing it. +- Then refer to items in the library using that shortened name. + +```python +import math as m + +print('cos(pi) is', m.cos(m.pi)) +``` + +```output +cos(pi) is -1.0 +``` + +- Commonly used for libraries that are frequently used or have long names. + - E.g., the `matplotlib` plotting library is often aliased as `mpl`. +- But can make programs harder to understand, + since readers must learn your program's aliases. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exploring the Math Module + +1. What function from the `math` module can you use to calculate a square root + *without* using `sqrt`? +2. Since the library contains this function, why does `sqrt` exist? + +::::::::::::::: solution + +## Solution + +1. Using `help(math)` we see that we've got `pow(x,y)` in addition to `sqrt(x)`, + so we could use `pow(x, 0.5)` to find a square root. + +2. The `sqrt(x)` function is arguably more readable than `pow(x, 0.5)` when + implementing equations. Readability is a cornerstone of good programming, so it + makes sense to provide a special function for this specific common case. + + Also, the design of Python's `math` library has its origin in the C standard, + which includes both `sqrt(x)` and `pow(x,y)`, so a little bit of the history + of programming is showing in Python's function names. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Locating the Right Module + +You want to select a random character from a string: + +```python +bases = 'ACTTGCTTGAC' +``` + +1. Which [standard library][stdlib] module could help you? +2. Which function would you select from that module? Are there alternatives? +3. Try to write a program that uses the function. + +::::::::::::::: solution + +## Solution + +The [random module][randommod] seems like it could help. + +The string has 11 characters, each having a positional index from 0 to 10. +You could use the [`random.randrange`](https://docs.python.org/3/library/random.html#random.randrange) +or [`random.randint`](https://docs.python.org/3/library/random.html#random.randint) functions +to get a random integer between 0 and 10, and then select the `bases` character at that index: + +```python +from random import randrange + +random_index = randrange(len(bases)) +print(bases[random_index]) +``` + +or more compactly: + +```python +from random import randrange + +print(bases[randrange(len(bases))]) +``` + +Perhaps you found the [`random.sample`](https://docs.python.org/3/library/random.html#random.sample) function? +It allows for slightly less typing but might be a bit harder to understand just by reading: + +```python +from random import sample + +print(sample(bases, 1)[0]) +``` + +Note that this function returns a list of values. We will learn about +lists in [episode 11](11-lists.md). + +The simplest and shortest solution is the [`random.choice`](https://docs.python.org/3/library/random.html#random.choice) +function that does exactly what we want: + +```python +from random import choice + +print(choice(bases)) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Jigsaw Puzzle (Parson's Problem) Programming Example + +Rearrange the following statements so that a random +DNA base is printed and its index in the string. +Not all statements may be needed. Feel free to use/add +intermediate variables. + +```python +bases="ACTTGCTTGAC" +import math +import random +___ = random.randrange(n_bases) +___ = len(bases) +print("random base ", bases[___], "base index", ___) +``` + +::::::::::::::: solution + +## Solution + +```python +import math +import random +bases = "ACTTGCTTGAC" +n_bases = len(bases) +idx = random.randrange(n_bases) +print("random base", bases[idx], "base index", idx) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## When Is Help Available? + +When a colleague of yours types `help(math)`, +Python reports an error: + +```error +NameError: name 'math' is not defined +``` + +What has your colleague forgotten to do? + +::::::::::::::: solution + +## Solution + +Importing the math module (`import math`) + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Importing With Aliases + +1. Fill in the blanks so that the program below prints `90.0`. +2. Rewrite the program so that it uses `import` *without* `as`. +3. Which form do you find easier to read? + +```python +import math as m +angle = ____.degrees(____.pi / 2) +print(____) +``` + +::::::::::::::: solution + +## Solution + +```python +import math as m +angle = m.degrees(m.pi / 2) +print(angle) +``` + +can be written as + +```python +import math +angle = math.degrees(math.pi / 2) +print(angle) +``` + +Since you just wrote the code and are familiar with it, you might actually +find the first version easier to read. But when trying to read a huge piece +of code written by someone else, or when getting back to your own huge piece +of code after several months, non-abbreviated names are often easier, except +where there are clear abbreviation conventions. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## There Are Many Ways To Import Libraries! + +Match the following print statements with the appropriate library calls. + +Print commands: + +1. `print("sin(pi/2) =", sin(pi/2))` +2. `print("sin(pi/2) =", m.sin(m.pi/2))` +3. `print("sin(pi/2) =", math.sin(math.pi/2))` + +Library calls: + +1. `from math import sin, pi` +2. `import math` +3. `import math as m` +4. `from math import *` + +::::::::::::::: solution + +## Solution + +1. Library calls 1 and 4. In order to directly refer to `sin` and `pi` without + the library name as prefix, you need to use the `from ... import ...` + statement. Whereas library call 1 specifically imports the two functions + `sin` and `pi`, library call 4 imports all functions in the `math` module. +2. Library call 3. Here `sin` and `pi` are referred to with a shortened library + name `m` instead of `math`. Library call 3 does exactly that using the + `import ... as ...` syntax - it creates an alias for `math` in the form of + the shortened name `m`. +3. Library call 2. Here `sin` and `pi` are referred to with the regular library + name `math`, so the regular `import ...` call suffices. + +**Note:** although library call 4 works, importing all names from a module using a wildcard +import is [not recommended][pep8-imports] as it makes it unclear which names from the module +are used in the code. In general it is best to make your imports as specific as possible and to +only import what your code uses. In library call 1, the `import` statement explicitly tells us +that the `sin` function is imported from the `math` module, but library call 4 does not +convey this information. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Importing Specific Items + +1. Fill in the blanks so that the program below prints `90.0`. +2. Do you find this version easier to read than preceding ones? +3. Why *wouldn't* programmers always use this form of `import`? + +```python +____ math import ____, ____ +angle = degrees(pi / 2) +print(angle) +``` + +::::::::::::::: solution + +## Solution + +```python +from math import degrees, pi +angle = degrees(pi / 2) +print(angle) +``` + +Most likely you find this version easier to read since it's less dense. +The main reason not to use this form of import is to avoid name clashes. +For instance, you wouldn't import `degrees` this way if you also wanted to +use the name `degrees` for a variable or function of your own. Or if you +were to also import a function named `degrees` from another library. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Reading Error Messages + +1. Read the code below and try to identify what the errors are without running it. +2. Run the code, and read the error message. What type of error is it? + +```python +from math import log +log(0) +``` + +::::::::::::::: solution + +## Solution + +```output +--------------------------------------------------------------------------- +ValueError Traceback (most recent call last) + in + 1 from math import log +----> 2 log(0) + +ValueError: math domain error +``` + +1. The logarithm of `x` is only defined for `x > 0`, so 0 is outside the + domain of the function. +2. You get an error of type `ValueError`, indicating that the function + received an inappropriate argument value. The additional message + "math domain error" makes it clearer what the problem is. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +[stdlib]: https://docs.python.org/3/library/ +[pypi]: https://pypi.python.org/pypi/ +[randommod]: https://docs.python.org/3/library/random.html +[pep8-imports]: https://pep8.org/#imports + + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Most of the power of a programming language is in its libraries. +- A program must import a library module in order to use it. +- Use `help` to learn about the contents of a library module. +- Import specific items from a library to shorten programs. +- Create an alias for a library when importing it to shorten programs. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/07-reading-tabular.md b/07-reading-tabular.md new file mode 100644 index 000000000..d4cc4c37e --- /dev/null +++ b/07-reading-tabular.md @@ -0,0 +1,442 @@ +--- +title: Reading Tabular Data into DataFrames +teaching: 10 +exercises: 10 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Import the Pandas library. +- Use Pandas to load a simple CSV data set. +- Get some basic information about a Pandas DataFrame. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I read tabular data? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Use the Pandas library to do statistics on tabular data. + +- [Pandas](https://pandas.pydata.org/) is a widely-used Python library for statistics, particularly on tabular data. +- Borrows many features from R's dataframes. + - A 2-dimensional table whose columns have names + and potentially have different data types. +- Load Pandas with `import pandas as pd`. The alias `pd` is commonly used to refer to the Pandas library in code. +- Read a Comma Separated Values (CSV) data file with `pd.read_csv`. + - Argument is the name of the file to be read. + - Returns a dataframe that you can assign to a variable + +```python +import pandas as pd + +data_oceania = pd.read_csv('data/gapminder_gdp_oceania.csv') +print(data_oceania) +``` + +```output + country gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 \ +0 Australia 10039.59564 10949.64959 12217.22686 +1 New Zealand 10556.57566 12247.39532 13175.67800 + + gdpPercap_1967 gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 \ +0 14526.12465 16788.62948 18334.19751 19477.00928 +1 14463.91893 16046.03728 16233.71770 17632.41040 + + gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 \ +0 21888.88903 23424.76683 26997.93657 30687.75473 +1 19007.19129 18363.32494 21050.41377 23189.80135 + + gdpPercap_2007 +0 34435.36744 +1 25185.00911 +``` + +- The columns in a dataframe are the observed variables, and the rows are the observations. +- Pandas uses backslash `\` to show wrapped lines when output is too wide to fit the screen. +- Using descriptive dataframe names helps us distinguish between multiple dataframes so we won't accidentally overwrite a dataframe or read from the wrong one. + +::::::::::::::::::::::::::::::::::::::::: callout + +## File Not Found + +Our lessons store their data files in a `data` sub-directory, +which is why the path to the file is `data/gapminder_gdp_oceania.csv`. +If you forget to include `data/`, +or if you include it but your copy of the file is somewhere else, +you will get a [runtime error](04-built-in.md) +that ends with a line like this: + +```error +FileNotFoundError: [Errno 2] No such file or directory: 'data/gapminder_gdp_oceania.csv' +``` + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Use `index_col` to specify that a column's values should be used as row headings. + +- Row headings are numbers (0 and 1 in this case). +- Really want to index by country. +- Pass the name of the column to `read_csv` as its `index_col` parameter to do this. +- Naming the dataframe `data_oceania_country` tells us which region the data includes (`oceania`) and how it is indexed (`country`). + +```python +data_oceania_country = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country') +print(data_oceania_country) +``` + +```output + gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 \ +country +Australia 10039.59564 10949.64959 12217.22686 14526.12465 +New Zealand 10556.57566 12247.39532 13175.67800 14463.91893 + + gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 \ +country +Australia 16788.62948 18334.19751 19477.00928 21888.88903 +New Zealand 16046.03728 16233.71770 17632.41040 19007.19129 + + gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007 +country +Australia 23424.76683 26997.93657 30687.75473 34435.36744 +New Zealand 18363.32494 21050.41377 23189.80135 25185.00911 +``` + +## Use the `DataFrame.info()` method to find out more about a dataframe. + +```python +data_oceania_country.info() +``` + +```output + +Index: 2 entries, Australia to New Zealand +Data columns (total 12 columns): +gdpPercap_1952 2 non-null float64 +gdpPercap_1957 2 non-null float64 +gdpPercap_1962 2 non-null float64 +gdpPercap_1967 2 non-null float64 +gdpPercap_1972 2 non-null float64 +gdpPercap_1977 2 non-null float64 +gdpPercap_1982 2 non-null float64 +gdpPercap_1987 2 non-null float64 +gdpPercap_1992 2 non-null float64 +gdpPercap_1997 2 non-null float64 +gdpPercap_2002 2 non-null float64 +gdpPercap_2007 2 non-null float64 +dtypes: float64(12) +memory usage: 208.0+ bytes +``` + +- This is a `DataFrame` +- Two rows named `'Australia'` and `'New Zealand'` +- Twelve columns, each of which has two actual 64-bit floating point values. + - We will talk later about null values, which are used to represent missing observations. +- Uses 208 bytes of memory. + +## The `DataFrame.columns` variable stores information about the dataframe's columns. + +- Note that this is data, *not* a method. (It doesn't have parentheses.) + - Like `math.pi`. + - So do not use `()` to try to call it. +- Called a *member variable*, or just *member*. + +```python +print(data_oceania_country.columns) +``` + +```output +Index(['gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967', + 'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987', + 'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007'], + dtype='object') +``` + +## Use `DataFrame.T` to transpose a dataframe. + +- Sometimes want to treat columns as rows and vice versa. +- Transpose (written `.T`) doesn't copy the data, just changes the program's view of it. +- Like `columns`, it is a member variable. + +```python +print(data_oceania_country.T) +``` + +```output +country Australia New Zealand +gdpPercap_1952 10039.59564 10556.57566 +gdpPercap_1957 10949.64959 12247.39532 +gdpPercap_1962 12217.22686 13175.67800 +gdpPercap_1967 14526.12465 14463.91893 +gdpPercap_1972 16788.62948 16046.03728 +gdpPercap_1977 18334.19751 16233.71770 +gdpPercap_1982 19477.00928 17632.41040 +gdpPercap_1987 21888.88903 19007.19129 +gdpPercap_1992 23424.76683 18363.32494 +gdpPercap_1997 26997.93657 21050.41377 +gdpPercap_2002 30687.75473 23189.80135 +gdpPercap_2007 34435.36744 25185.00911 +``` + +## Use `DataFrame.describe()` to get summary statistics about data. + +`DataFrame.describe()` gets the summary statistics of only the columns that have numerical data. +All other columns are ignored, unless you use the argument `include='all'`. + +```python +print(data_oceania_country.describe()) +``` + +```output + gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 \ +count 2.000000 2.000000 2.000000 2.000000 +mean 10298.085650 11598.522455 12696.452430 14495.021790 +std 365.560078 917.644806 677.727301 43.986086 +min 10039.595640 10949.649590 12217.226860 14463.918930 +25% 10168.840645 11274.086022 12456.839645 14479.470360 +50% 10298.085650 11598.522455 12696.452430 14495.021790 +75% 10427.330655 11922.958888 12936.065215 14510.573220 +max 10556.575660 12247.395320 13175.678000 14526.124650 + + gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 \ +count 2.00000 2.000000 2.000000 2.000000 +mean 16417.33338 17283.957605 18554.709840 20448.040160 +std 525.09198 1485.263517 1304.328377 2037.668013 +min 16046.03728 16233.717700 17632.410400 19007.191290 +25% 16231.68533 16758.837652 18093.560120 19727.615725 +50% 16417.33338 17283.957605 18554.709840 20448.040160 +75% 16602.98143 17809.077557 19015.859560 21168.464595 +max 16788.62948 18334.197510 19477.009280 21888.889030 + + gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007 +count 2.000000 2.000000 2.000000 2.000000 +mean 20894.045885 24024.175170 26938.778040 29810.188275 +std 3578.979883 4205.533703 5301.853680 6540.991104 +min 18363.324940 21050.413770 23189.801350 25185.009110 +25% 19628.685413 22537.294470 25064.289695 27497.598692 +50% 20894.045885 24024.175170 26938.778040 29810.188275 +75% 22159.406358 25511.055870 28813.266385 32122.777857 +max 23424.766830 26997.936570 30687.754730 34435.367440 +``` + +- Not particularly useful with just two records, + but very helpful when there are thousands. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Reading Other Data + +Read the data in `gapminder_gdp_americas.csv` +(which should be in the same directory as `gapminder_gdp_oceania.csv`) +into a variable called `data_americas` +and display its summary statistics. + +::::::::::::::: solution + +## Solution + +To read in a CSV, we use `pd.read_csv` and pass the filename `'data/gapminder_gdp_americas.csv'` to it. +We also once again pass the column name `'country'` to the parameter `index_col` in order to index by country. +The summary statistics can be displayed with the `DataFrame.describe()` method. + +```python +data_americas = pd.read_csv('data/gapminder_gdp_americas.csv', index_col='country') +data_americas.describe() +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Inspecting Data + +After reading the data for the Americas, +use `help(data_americas.head)` and `help(data_americas.tail)` +to find out what `DataFrame.head` and `DataFrame.tail` do. + +1. What method call will display the first three rows of this data? +2. What method call will display the last three columns of this data? + (Hint: you may need to change your view of the data.) + +::::::::::::::: solution + +## Solution + +1. We can check out the first five rows of `data_americas` by executing `data_americas.head()` + which lets us view the beginning of the DataFrame. We can specify the number of rows we wish + to see by specifying the parameter `n` in our call to `data_americas.head()`. + To view the first three rows, execute: + + ```python + data_americas.head(n=3) + ``` + + ```output + continent gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 \ + country + Argentina Americas 5911.315053 6856.856212 7133.166023 + Bolivia Americas 2677.326347 2127.686326 2180.972546 + Brazil Americas 2108.944355 2487.365989 3336.585802 + + gdpPercap_1967 gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 \ + country + Argentina 8052.953021 9443.038526 10079.026740 8997.897412 + Bolivia 2586.886053 2980.331339 3548.097832 3156.510452 + Brazil 3429.864357 4985.711467 6660.118654 7030.835878 + + gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 \ + country + Argentina 9139.671389 9308.418710 10967.281950 8797.640716 + Bolivia 2753.691490 2961.699694 3326.143191 3413.262690 + Brazil 7807.095818 6950.283021 7957.980824 8131.212843 + + gdpPercap_2007 + country + Argentina 12779.379640 + Bolivia 3822.137084 + Brazil 9065.800825 + ``` + +2. To check out the last three rows of `data_americas`, we would use the command, + `americas.tail(n=3)`, analogous to `head()` used above. However, here we want to look at + the last three columns so we need to change our view and then use `tail()`. To do so, we + create a new DataFrame in which rows and columns are switched: + + ```python + americas_flipped = data_americas.T + ``` + + We can then view the last three columns of `americas` by viewing the last three rows + of `americas_flipped`: + + ```python + americas_flipped.tail(n=3) + ``` + + ```output + country Argentina Bolivia Brazil Canada Chile Colombia \ + gdpPercap_1997 10967.3 3326.14 7957.98 28954.9 10118.1 6117.36 + gdpPercap_2002 8797.64 3413.26 8131.21 33329 10778.8 5755.26 + gdpPercap_2007 12779.4 3822.14 9065.8 36319.2 13171.6 7006.58 + + country Costa Rica Cuba Dominican Republic Ecuador ... \ + gdpPercap_1997 6677.05 5431.99 3614.1 7429.46 ... + gdpPercap_2002 7723.45 6340.65 4563.81 5773.04 ... + gdpPercap_2007 9645.06 8948.1 6025.37 6873.26 ... + + country Mexico Nicaragua Panama Paraguay Peru Puerto Rico \ + gdpPercap_1997 9767.3 2253.02 7113.69 4247.4 5838.35 16999.4 + gdpPercap_2002 10742.4 2474.55 7356.03 3783.67 5909.02 18855.6 + gdpPercap_2007 11977.6 2749.32 9809.19 4172.84 7408.91 19328.7 + + country Trinidad and Tobago United States Uruguay Venezuela + gdpPercap_1997 8792.57 35767.4 9230.24 10165.5 + gdpPercap_2002 11460.6 39097.1 7727 8605.05 + gdpPercap_2007 18008.5 42951.7 10611.5 11415.8 + ``` + + This shows the data that we want, but we may prefer to display three columns instead of three rows, + so we can flip it back: + + ```python + americas_flipped.tail(n=3).T + ``` + + **Note:** we could have done the above in a single line of code by 'chaining' the commands: + + ```python + data_americas.T.tail(n=3).T + ``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Reading Files in Other Directories + +The data for your current project is stored in a file called `microbes.csv`, +which is located in a folder called `field_data`. +You are doing analysis in a notebook called `analysis.ipynb` +in a sibling folder called `thesis`: + +```output +your_home_directory ++-- field_data/ +| +-- microbes.csv ++-- thesis/ + +-- analysis.ipynb +``` + +What value(s) should you pass to `read_csv` to read `microbes.csv` in `analysis.ipynb`? + +::::::::::::::: solution + +## Solution + +We need to specify the path to the file of interest in the call to `pd.read_csv`. We first need to 'jump' out of +the folder `thesis` using '../' and then into the folder `field_data` using 'field\_data/'. Then we can specify the filename \`microbes.csv. +The result is as follows: + +```python +data_microbes = pd.read_csv('../field_data/microbes.csv') +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Writing Data + +As well as the `read_csv` function for reading data from a file, +Pandas provides a `to_csv` function to write dataframes to files. +Applying what you've learned about reading from files, +write one of your dataframes to a file called `processed.csv`. +You can use `help` to get information on how to use `to_csv`. + +::::::::::::::: solution + +## Solution + +In order to write the DataFrame `data_americas` to a file called `processed.csv`, execute the following command: + +```python +data_americas.to_csv('processed.csv') +``` + +For help on `read_csv` or `to_csv`, you could execute, for example: + +```python +help(data_americas.to_csv) +help(pd.read_csv) +``` + +Note that `help(to_csv)` or `help(pd.to_csv)` throws an error! This is due to the fact that `to_csv` is not a global Pandas function, but +a member function of DataFrames. This means you can only call it on an instance of a DataFrame +e.g., `data_americas.to_csv` or `data_oceania.to_csv` + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Use the Pandas library to get basic statistics out of tabular data. +- Use `index_col` to specify that a column's values should be used as row headings. +- Use `DataFrame.info` to find out more about a dataframe. +- The `DataFrame.columns` variable stores information about the dataframe's columns. +- Use `DataFrame.T` to transpose a dataframe. +- Use `DataFrame.describe` to get summary statistics about data. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/08-data-frames.md b/08-data-frames.md new file mode 100644 index 000000000..b57e4ff6d --- /dev/null +++ b/08-data-frames.md @@ -0,0 +1,783 @@ +--- +title: Pandas DataFrames +teaching: 15 +exercises: 15 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Select individual values from a Pandas dataframe. +- Select entire rows or entire columns from a dataframe. +- Select a subset of both rows and columns from a dataframe in a single operation. +- Select a subset of a dataframe by a single Boolean criterion. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I do statistical analysis of tabular data? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Note about Pandas DataFrames/Series + +A [DataFrame][pandas-dataframe] is a collection of [Series][pandas-series]; +The DataFrame is the way Pandas represents a table, and Series is the data-structure +Pandas use to represent a column. + +Pandas is built on top of the [Numpy][numpy] library, which in practice means that +most of the methods defined for Numpy Arrays apply to Pandas Series/DataFrames. + +What makes Pandas so attractive is the powerful interface to access individual records +of the table, proper handling of missing values, and relational-databases operations +between DataFrames. + +## Selecting values + +To access a value at the position `[i,j]` of a DataFrame, we have two options, depending on +what is the meaning of `i` in use. +Remember that a DataFrame provides an *index* as a way to identify the rows of the table; +a row, then, has a *position* inside the table as well as a *label*, which +uniquely identifies its *entry* in the DataFrame. + +## Use `DataFrame.iloc[..., ...]` to select values by their (entry) position + +- Can specify location by numerical index analogously to 2D version of character selection in strings. + +```python +import pandas as pd +data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country') +print(data.iloc[0, 0]) +``` + +```output +1601.056136 +``` + +## Use `DataFrame.loc[..., ...]` to select values by their (entry) label. + +- Can specify location by row and/or column name. + +```python +print(data.loc["Albania", "gdpPercap_1952"]) +``` + +```output +1601.056136 +``` + +## Use `:` on its own to mean all columns or all rows. + +- Just like Python's usual slicing notation. + +```python +print(data.loc["Albania", :]) +``` + +```output +gdpPercap_1952 1601.056136 +gdpPercap_1957 1942.284244 +gdpPercap_1962 2312.888958 +gdpPercap_1967 2760.196931 +gdpPercap_1972 3313.422188 +gdpPercap_1977 3533.003910 +gdpPercap_1982 3630.880722 +gdpPercap_1987 3738.932735 +gdpPercap_1992 2497.437901 +gdpPercap_1997 3193.054604 +gdpPercap_2002 4604.211737 +gdpPercap_2007 5937.029526 +Name: Albania, dtype: float64 +``` + +- Would get the same result printing `data.loc["Albania"]` (without a second index). + +```python +print(data.loc[:, "gdpPercap_1952"]) +``` + +```output +country +Albania 1601.056136 +Austria 6137.076492 +Belgium 8343.105127 +⋮ ⋮ ⋮ +Switzerland 14734.232750 +Turkey 1969.100980 +United Kingdom 9979.508487 +Name: gdpPercap_1952, dtype: float64 +``` + +- Would get the same result printing `data["gdpPercap_1952"]` +- Also get the same result printing `data.gdpPercap_1952` (not recommended, because easily confused with `.` notation for methods) + +## Select multiple columns or rows using `DataFrame.loc` and a named slice. + +```python +print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']) +``` + +```output + gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 +country +Italy 8243.582340 10022.401310 12269.273780 +Montenegro 4649.593785 5907.850937 7778.414017 +Netherlands 12790.849560 15363.251360 18794.745670 +Norway 13450.401510 16361.876470 18965.055510 +Poland 5338.752143 6557.152776 8006.506993 +``` + +In the above code, we discover that **slicing using `loc` is inclusive at both +ends**, which differs from **slicing using `iloc`**, where slicing indicates +everything up to but not including the final index. + +## Result of slicing can be used in further operations. + +- Usually don't just print a slice. +- All the statistical operators that work on entire dataframes + work the same way on slices. +- E.g., calculate max of a slice. + +```python +print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].max()) +``` + +```output +gdpPercap_1962 13450.40151 +gdpPercap_1967 16361.87647 +gdpPercap_1972 18965.05551 +dtype: float64 +``` + +```python +print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].min()) +``` + +```output +gdpPercap_1962 4649.593785 +gdpPercap_1967 5907.850937 +gdpPercap_1972 7778.414017 +dtype: float64 +``` + +## Use comparisons to select data based on value. + +- Comparison is applied element by element. +- Returns a similarly-shaped dataframe of `True` and `False`. + +```python +# Use a subset of data to keep output readable. +subset = data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'] +print('Subset of data:\n', subset) + +# Which values were greater than 10000 ? +print('\nWhere are values large?\n', subset > 10000) +``` + +```output +Subset of data: + gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 +country +Italy 8243.582340 10022.401310 12269.273780 +Montenegro 4649.593785 5907.850937 7778.414017 +Netherlands 12790.849560 15363.251360 18794.745670 +Norway 13450.401510 16361.876470 18965.055510 +Poland 5338.752143 6557.152776 8006.506993 + +Where are values large? + gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 +country +Italy False True True +Montenegro False False False +Netherlands True True True +Norway True True True +Poland False False False +``` + +## Select values or NaN using a Boolean mask. + +- A frame full of Booleans is sometimes called a *mask* because of how it can be used. + +```python +mask = subset > 10000 +print(subset[mask]) +``` + +```output + gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 +country +Italy NaN 10022.40131 12269.27378 +Montenegro NaN NaN NaN +Netherlands 12790.84956 15363.25136 18794.74567 +Norway 13450.40151 16361.87647 18965.05551 +Poland NaN NaN NaN +``` + +- Get the value where the mask is true, and NaN (Not a Number) where it is false. +- Useful because NaNs are ignored by operations like max, min, average, etc. + +```python +print(subset[subset > 10000].describe()) +``` + +```output + gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 +count 2.000000 3.000000 3.000000 +mean 13120.625535 13915.843047 16676.358320 +std 466.373656 3408.589070 3817.597015 +min 12790.849560 10022.401310 12269.273780 +25% 12955.737547 12692.826335 15532.009725 +50% 13120.625535 15363.251360 18794.745670 +75% 13285.513523 15862.563915 18879.900590 +max 13450.401510 16361.876470 18965.055510 +``` + +## Group By: split-apply-combine + +::::::::::::::::::::::::::::::::::::: instructor +Learners often struggle here, many may not work with financial data and concepts so they +find the example concepts difficult to get their head around. The biggest problem +though is the line generating the wealth_score, this step needs to be talked through +throughly: +* It uses implicit conversion between boolean and float values which +has not been covered in the course so far. +* The axis=1 argument needs to be explained clearly. +::::::::::::::::::::::::::::::::::::::::::::::::: + +Pandas vectorizing methods and grouping operations are features that provide users +much flexibility to analyse their data. + +For instance, let's say we want to have a clearer view on how the European countries +split themselves according to their GDP. + +1. We may have a glance by splitting the countries in two groups during the years surveyed, + those who presented a GDP *higher* than the European average and those with a *lower* GDP. +2. We then estimate a *wealthy score* based on the historical (from 1962 to 2007) values, + where we account how many times a country has participated in the groups of *lower* or *higher* GDP + +```python +mask_higher = data > data.mean() +wealth_score = mask_higher.aggregate('sum', axis=1) / len(data.columns) +print(wealth_score) +``` + +```output +country +Albania 0.000000 +Austria 1.000000 +Belgium 1.000000 +Bosnia and Herzegovina 0.000000 +Bulgaria 0.000000 +Croatia 0.000000 +Czech Republic 0.500000 +Denmark 1.000000 +Finland 1.000000 +France 1.000000 +Germany 1.000000 +Greece 0.333333 +Hungary 0.000000 +Iceland 1.000000 +Ireland 0.333333 +Italy 0.500000 +Montenegro 0.000000 +Netherlands 1.000000 +Norway 1.000000 +Poland 0.000000 +Portugal 0.000000 +Romania 0.000000 +Serbia 0.000000 +Slovak Republic 0.000000 +Slovenia 0.333333 +Spain 0.333333 +Sweden 1.000000 +Switzerland 1.000000 +Turkey 0.000000 +United Kingdom 1.000000 +dtype: float64 +``` + +Finally, for each group in the `wealth_score` table, we sum their (financial) contribution +across the years surveyed using chained methods: + +```python +print(data.groupby(wealth_score).sum()) +``` + +```output + gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 \ +0.000000 36916.854200 46110.918793 56850.065437 71324.848786 +0.333333 16790.046878 20942.456800 25744.935321 33567.667670 +0.500000 11807.544405 14505.000150 18380.449470 21421.846200 +1.000000 104317.277560 127332.008735 149989.154201 178000.350040 + + gdpPercap_1972 gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 \ +0.000000 88569.346898 104459.358438 113553.768507 119649.599409 +0.333333 45277.839976 53860.456750 59679.634020 64436.912960 +0.500000 25377.727380 29056.145370 31914.712050 35517.678220 +1.000000 215162.343140 241143.412730 263388.781960 296825.131210 + + gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007 +0.000000 92380.047256 103772.937598 118590.929863 149577.357928 +0.333333 67918.093220 80876.051580 102086.795210 122803.729520 +0.500000 36310.666080 40723.538700 45564.308390 51403.028210 +1.000000   315238.235970   346930.926170   385109.939210   427850.333420 +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Selection of Individual Values + +Assume Pandas has been imported into your notebook +and the Gapminder GDP data for Europe has been loaded: + +```python +import pandas as pd + +data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country') +``` + +Write an expression to find the Per Capita GDP of Serbia in 2007. + +::::::::::::::: solution + +## Solution + +The selection can be done by using the labels for both the row ("Serbia") and the column ("gdpPercap\_2007"): + +```python +print(data_europe.loc['Serbia', 'gdpPercap_2007']) +``` + +The output is + +```output +9786.534714 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Extent of Slicing + +1. Do the two statements below produce the same output? +2. Based on this, + what rule governs what is included (or not) in numerical slices and named slices in Pandas? + +```python +print(data_europe.iloc[0:2, 0:2]) +print(data_europe.loc['Albania':'Belgium', 'gdpPercap_1952':'gdpPercap_1962']) +``` + +::::::::::::::: solution + +## Solution + +No, they do not produce the same output! The output of the first statement is: + +```output + gdpPercap_1952 gdpPercap_1957 +country +Albania 1601.056136 1942.284244 +Austria 6137.076492 8842.598030 +``` + +The second statement gives: + +```output + gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 +country +Albania 1601.056136 1942.284244 2312.888958 +Austria 6137.076492 8842.598030 10750.721110 +Belgium 8343.105127 9714.960623 10991.206760 +``` + +Clearly, the second statement produces an additional column and an additional row compared to the first statement. +What conclusion can we draw? We see that a numerical slice, 0:2, *omits* the final index (i.e. index 2) +in the range provided, +while a named slice, 'gdpPercap\_1952':'gdpPercap\_1962', *includes* the final element. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Reconstructing Data + +Explain what each line in the following short program does: +what is in `first`, `second`, etc.? + +```python +first = pd.read_csv('data/gapminder_all.csv', index_col='country') +second = first[first['continent'] == 'Americas'] +third = second.drop('Puerto Rico') +fourth = third.drop('continent', axis = 1) +fourth.to_csv('result.csv') +``` + +::::::::::::::: solution + +## Solution + +Let's go through this piece of code line by line. + +```python +first = pd.read_csv('data/gapminder_all.csv', index_col='country') +``` + +This line loads the dataset containing the GDP data from all countries into a dataframe called +`first`. The `index_col='country'` parameter selects which column to use as the +row labels in the dataframe. + +```python +second = first[first['continent'] == 'Americas'] +``` + +This line makes a selection: only those rows of `first` for which the 'continent' column matches +'Americas' are extracted. Notice how the Boolean expression inside the brackets, +`first['continent'] == 'Americas'`, is used to select only those rows where the expression is true. +Try printing this expression! Can you print also its individual True/False elements? +(hint: first assign the expression to a variable) + +```python +third = second.drop('Puerto Rico') +``` + +As the syntax suggests, this line drops the row from `second` where the label is 'Puerto Rico'. The +resulting dataframe `third` has one row less than the original dataframe `second`. + +```python +fourth = third.drop('continent', axis = 1) +``` + +Again we apply the drop function, but in this case we are dropping not a row but a whole column. +To accomplish this, we need to specify also the `axis` parameter (we want to drop the second column +which has index 1). + +```python +fourth.to_csv('result.csv') +``` + +The final step is to write the data that we have been working on to a csv file. Pandas makes this easy +with the `to_csv()` function. The only required argument to the function is the filename. Note that the +file will be written in the directory from which you started the Jupyter or Python session. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Selecting Indices + +Explain in simple terms what `idxmin` and `idxmax` do in the short program below. +When would you use these methods? + +```python +data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country') +print(data.idxmin()) +print(data.idxmax()) +``` + +::::::::::::::: solution + +## Solution + +For each column in `data`, `idxmin` will return the index value corresponding to each column's minimum; +`idxmax` will do accordingly the same for each column's maximum value. + +You can use these functions whenever you want to get the row index of the minimum/maximum value and not the actual minimum/maximum value. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Practice with Selection + +Assume Pandas has been imported and the Gapminder GDP data for Europe has been loaded. +Write an expression to select each of the following: + +1. GDP per capita for all countries in 1982. +2. GDP per capita for Denmark for all years. +3. GDP per capita for all countries for years *after* 1985. +4. GDP per capita for each country in 2007 as a multiple of + GDP per capita for that country in 1952. + +::::::::::::::: solution + +## Solution + +1: + +```python +data['gdpPercap_1982'] +``` + +2: + +```python +data.loc['Denmark',:] +``` + +3: + +```python +data.loc[:,'gdpPercap_1985':] +``` + +Pandas is smart enough to recognize the number at the end of the column label and does not give you an error, although no column named `gdpPercap_1985` actually exists. This is useful if new columns are added to the CSV file later. + +4: + +```python +data['gdpPercap_2007']/data['gdpPercap_1952'] +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Many Ways of Access + +There are at least two ways of accessing a value or slice of a DataFrame: by name or index. +However, there are many others. For example, a single column or row can be accessed either as a `DataFrame` +or a `Series` object. + +Suggest different ways of doing the following operations on a DataFrame: + +1. Access a single column +2. Access a single row +3. Access an individual DataFrame element +4. Access several columns +5. Access several rows +6. Access a subset of specific rows and columns +7. Access a subset of row and column ranges + +::::::::::::::: solution + +## Solution + +1\. Access a single column: + +```python +# by name +data["col_name"] # as a Series +data[["col_name"]] # as a DataFrame + +# by name using .loc +data.T.loc["col_name"] # as a Series +data.T.loc[["col_name"]].T # as a DataFrame + +# Dot notation (Series) +data.col_name + +# by index (iloc) +data.iloc[:, col_index] # as a Series +data.iloc[:, [col_index]] # as a DataFrame + +# using a mask +data.T[data.T.index == "col_name"].T +``` + +2\. Access a single row: + +```python +# by name using .loc +data.loc["row_name"] # as a Series +data.loc[["row_name"]] # as a DataFrame + +# by name +data.T["row_name"] # as a Series +data.T[["row_name"]].T # as a DataFrame + +# by index +data.iloc[row_index] # as a Series +data.iloc[[row_index]] # as a DataFrame + +# using mask +data[data.index == "row_name"] +``` + +3\. Access an individual DataFrame element: + +```python +# by column/row names +data["column_name"]["row_name"] # as a Series + +data[["col_name"]].loc["row_name"] # as a Series +data[["col_name"]].loc[["row_name"]] # as a DataFrame + +data.loc["row_name"]["col_name"] # as a value +data.loc[["row_name"]]["col_name"] # as a Series +data.loc[["row_name"]][["col_name"]] # as a DataFrame + +data.loc["row_name", "col_name"] # as a value +data.loc[["row_name"], "col_name"] # as a Series. Preserves index. Column name is moved to `.name`. +data.loc["row_name", ["col_name"]] # as a Series. Index is moved to `.name.` Sets index to column name. +data.loc[["row_name"], ["col_name"]] # as a DataFrame (preserves original index and column name) + +# by column/row names: Dot notation +data.col_name.row_name + +# by column/row indices +data.iloc[row_index, col_index] # as a value +data.iloc[[row_index], col_index] # as a Series. Preserves index. Column name is moved to `.name` +data.iloc[row_index, [col_index]] # as a Series. Index is moved to `.name.` Sets index to column name. +data.iloc[[row_index], [col_index]] # as a DataFrame (preserves original index and column name) + +# column name + row index +data["col_name"][row_index] +data.col_name[row_index] +data["col_name"].iloc[row_index] + +# column index + row name +data.iloc[:, [col_index]].loc["row_name"] # as a Series +data.iloc[:, [col_index]].loc[["row_name"]] # as a DataFrame + +# using masks +data[data.index == "row_name"].T[data.T.index == "col_name"].T +``` + +4\. Access several columns: + +```python +# by name +data[["col1", "col2", "col3"]] +data.loc[:, ["col1", "col2", "col3"]] + +# by index +data.iloc[:, [col1_index, col2_index, col3_index]] +``` + +5\. Access several rows + +```python +# by name +data.loc[["row1", "row2", "row3"]] + +# by index +data.iloc[[row1_index, row2_index, row3_index]] +``` + +6\. Access a subset of specific rows and columns + +```python +# by names +data.loc[["row1", "row2", "row3"], ["col1", "col2", "col3"]] + +# by indices +data.iloc[[row1_index, row2_index, row3_index], [col1_index, col2_index, col3_index]] + +# column names + row indices +data[["col1", "col2", "col3"]].iloc[[row1_index, row2_index, row3_index]] + +# column indices + row names +data.iloc[:, [col1_index, col2_index, col3_index]].loc[["row1", "row2", "row3"]] +``` + +7\. Access a subset of row and column ranges + +```python +# by name +data.loc["row1":"row2", "col1":"col2"] + +# by index +data.iloc[row1_index:row2_index, col1_index:col2_index] + +# column names + row indices +data.loc[:, "col1_name":"col2_name"].iloc[row1_index:row2_index] + +# column indices + row names +data.iloc[:, col1_index:col2_index].loc["row1":"row2"] +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Exploring available methods using the `dir()` function + +Python includes a `dir()` function that can be used to display all of the available methods (functions) that are built into a data object. In Episode 4, we used some methods with a string. But we can see many more are available by using `dir()`: + +```python +my_string = 'Hello world!' # creation of a string object +dir(my_string) +``` + +This command returns: + +```python +['__add__', +... +'__subclasshook__', +'capitalize', +'casefold', +'center', +... +'upper', +'zfill'] +``` + +You can use `help()` or Shift\+Tab to get more information about what these methods do. + +Assume Pandas has been imported and the Gapminder GDP data for Europe has been loaded as `data`. Then, use `dir()` +to find the function that prints out the median per-capita GDP across all European countries for each year that information is available. + +::::::::::::::: solution + +## Solution + +Among many choices, `dir()` lists the `median()` function as a possibility. Thus, + +```python +data.median() +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Interpretation + +Poland's borders have been stable since 1945, +but changed several times in the years before then. +How would you handle this if you were creating a table of GDP per capita for Poland +for the entire twentieth century? + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +[pandas-dataframe]: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html +[pandas-series]: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html +[numpy]: https://www.numpy.org/ + + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Use `DataFrame.iloc[..., ...]` to select values by integer location. +- Use `:` on its own to mean all columns or all rows. +- Select multiple columns or rows using `DataFrame.loc` and a named slice. +- Result of slicing can be used in further operations. +- Use comparisons to select data based on value. +- Select values or NaN using a Boolean mask. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/09-plotting.md b/09-plotting.md new file mode 100644 index 000000000..e9b9b11ed --- /dev/null +++ b/09-plotting.md @@ -0,0 +1,386 @@ +--- +title: Plotting +teaching: 15 +exercises: 15 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Create a time series plot showing a single data set. +- Create a scatter plot showing relationship between two data sets. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I plot my data? +- How can I save my plot for publishing? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## [`matplotlib`](https://matplotlib.org/) is the most widely used scientific plotting library in Python. + +- Commonly use a sub-library called [`matplotlib.pyplot`](https://matplotlib.org/stable/tutorials/introductory/pyplot.html). +- The Jupyter Notebook will render plots inline by default. + +```python +import matplotlib.pyplot as plt +``` + +- Simple plots are then (fairly) simple to create. + +```python +time = [0, 1, 2, 3] +position = [0, 100, 200, 300] + +plt.plot(time, position) +plt.xlabel('Time (hr)') +plt.ylabel('Position (km)') +``` + +![](fig/9_simple_position_time_plot.svg){alt='A line chart showing time (hr) relative to position (km), using the values provided in the code block above. By default, the plotted line is blue against a white background, and the axes have been scaled automatically to fit the range of the input data.'} + +::::::::::::::::::::::::::::::::::::::::: callout + +## Display All Open Figures + +In our Jupyter Notebook example, running the cell should generate the figure directly below the code. +The figure is also included in the Notebook document for future viewing. +However, other Python environments like an interactive Python session started from a terminal +or a Python script executed via the command line require an additional command to display the figure. + +Instruct `matplotlib` to show a figure: + +```python +plt.show() +``` + +This command can also be used within a Notebook - for instance, to display multiple figures +if several are created by a single cell. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Plot data directly from a [`Pandas dataframe`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html). + +- We can also plot [Pandas dataframes](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html). +- Before plotting, we convert the column headings from a `string` to `integer` data type, since they represent numerical values, + using [str.replace()](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html) to remove the `gpdPercap_` + prefix and then [astype(int)](https://pandas.pydata.org/docs/reference/api/pandas.Series.astype.html) + to convert the series of string values (`['1952', '1957', ..., '2007']`) to a series of integers: `[1925, 1957, ..., 2007]`. + +```python +import pandas as pd + +data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country') + +# Extract year from last 4 characters of each column name +# The current column names are structured as 'gdpPercap_(year)', +# so we want to keep the (year) part only for clarity when plotting GDP vs. years +# To do this we use replace(), which removes from the string the characters stated in the argument +# This method works on strings, so we use replace() from Pandas Series.str vectorized string functions + +years = data.columns.str.replace('gdpPercap_', '') + +# Convert year values to integers, saving results back to dataframe + +data.columns = years.astype(int) + +data.loc['Australia'].plot() +``` + +![](fig/9_gdp_australia.svg){alt='GDP plot for Australia'} + +## Select and transform data, then plot it. + +- By default, [`DataFrame.plot`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html#pandas.DataFrame.plot) plots with the rows as the X axis. +- We can transpose the data in order to plot multiple series. + +```python +data.T.plot() +plt.ylabel('GDP per capita') +``` + +![](fig/9_gdp_australia_nz.svg){alt='GDP plot for Australia and New Zealand'} + +## Many styles of plot are available. + +- For example, do a bar plot using a fancier style. + +```python +plt.style.use('ggplot') +data.T.plot(kind='bar') +plt.ylabel('GDP per capita') +``` + +![](fig/9_gdp_bar.svg){alt='GDP barplot for Australia'} + +## Data can also be plotted by calling the `matplotlib` `plot` function directly. + +- The command is `plt.plot(x, y)` +- The color and format of markers can also be specified as an additional optional argument e.g., `b-` is a blue line, `g--` is a green dashed line. + +## Get Australia data from dataframe + +```python +years = data.columns +gdp_australia = data.loc['Australia'] + +plt.plot(years, gdp_australia, 'g--') +``` + +![](fig/9_gdp_australia_formatted.svg){alt='GDP formatted plot for Australia'} + +## Can plot many sets of data together. + +```python +# Select two countries' worth of data. +gdp_australia = data.loc['Australia'] +gdp_nz = data.loc['New Zealand'] + +# Plot with differently-colored markers. +plt.plot(years, gdp_australia, 'b-', label='Australia') +plt.plot(years, gdp_nz, 'g-', label='New Zealand') + +# Create legend. +plt.legend(loc='upper left') +plt.xlabel('Year') +plt.ylabel('GDP per capita ($)') +``` + +::::::::::::::::::::::::::::::::::::::::: callout + +## Adding a Legend + +Often when plotting multiple datasets on the same figure it is desirable to have +a legend describing the data. + +This can be done in `matplotlib` in two stages: + +- Provide a label for each dataset in the figure: + +```python +plt.plot(years, gdp_australia, label='Australia') +plt.plot(years, gdp_nz, label='New Zealand') +``` + +- Instruct `matplotlib` to create the legend. + +```python +plt.legend() +``` + +By default matplotlib will attempt to place the legend in a suitable position. If you +would rather specify a position this can be done with the `loc=` argument, e.g to place +the legend in the upper left corner of the plot, specify `loc='upper left'` + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +![](fig/9_gdp_australia_nz_formatted.svg){alt='GDP formatted plot for Australia and New Zealand'} + +- Plot a scatter plot correlating the GDP of Australia and New Zealand +- Use either `plt.scatter` or `DataFrame.plot.scatter` + +```python +plt.scatter(gdp_australia, gdp_nz) +``` + +![](fig/9_gdp_correlation_plt.svg){alt='GDP correlation using plt.scatter'} + +```python +data.T.plot.scatter(x = 'Australia', y = 'New Zealand') +``` + +![](fig/9_gdp_correlation_data.svg){alt='GDP correlation using data.T.plot.scatter'} + +::::::::::::::::::::::::::::::::::::::: challenge + +## Minima and Maxima + +Fill in the blanks below to plot the minimum GDP per capita over time +for all the countries in Europe. +Modify it again to plot the maximum GDP per capita over time for Europe. + +```python +data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country') +data_europe.____.plot(label='min') +data_europe.____ +plt.legend(loc='best') +plt.xticks(rotation=90) +``` + +::::::::::::::: solution + +## Solution + +```python +data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country') +data_europe.min().plot(label='min') +data_europe.max().plot(label='max') +plt.legend(loc='best') +plt.xticks(rotation=90) +``` + +![](fig/9_minima_maxima_solution.png){alt='Minima Maxima Solution'} + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Correlations + +Modify the example in the notes to create a scatter plot showing +the relationship between the minimum and maximum GDP per capita +among the countries in Asia for each year in the data set. +What relationship do you see (if any)? + +::::::::::::::: solution + +## Solution + +```python +data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col='country') +data_asia.describe().T.plot(kind='scatter', x='min', y='max') +``` + +![](fig/9_correlations_solution1.svg){alt='Correlations Solution 1'} + +No particular correlations can be seen between the minimum and maximum gdp values +year on year. It seems the fortunes of asian countries do not rise and fall together. + + +::::::::::::::::::::::::: + +You might note that the variability in the maximum is much higher than +that of the minimum. Take a look at the maximum and the max indexes: + +```python +data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col='country') +data_asia.max().plot() +print(data_asia.idxmax()) +print(data_asia.idxmin()) +``` + +::::::::::::::: solution + +## Solution + +![](fig/9_correlations_solution2.png){alt='Correlations Solution 2'} + +Seems the variability in this value is due to a sharp drop after 1972. +Some geopolitics at play perhaps? Given the dominance of oil producing countries, +maybe the Brent crude index would make an interesting comparison? +Whilst Myanmar consistently has the lowest gdp, the highest gdb nation has varied +more notably. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## More Correlations + +This short program creates a plot showing +the correlation between GDP and life expectancy for 2007, +normalizing marker size by population: + +```python +data_all = pd.read_csv('data/gapminder_all.csv', index_col='country') +data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007', + s=data_all['pop_2007']/1e6) +``` + +Using online help and other resources, +explain what each argument to `plot` does. + +::::::::::::::: solution + +## Solution + +![](fig/9_more_correlations_solution.svg){alt='More Correlations Solution'} + +A good place to look is the documentation for the plot function - +help(data\_all.plot). + +kind - As seen already this determines the kind of plot to be drawn. + +x and y - A column name or index that determines what data will be +placed on the x and y axes of the plot + +s - Details for this can be found in the documentation of plt.scatter. +A single number or one value for each data point. Determines the size +of the plotted points. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::: callout + +## Saving your plot to a file + +If you are satisfied with the plot you see you may want to save it to a file, +perhaps to include it in a publication. There is a function in the +matplotlib.pyplot module that accomplishes this: +[savefig](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html). +Calling this function, e.g. with + +```python +plt.savefig('my_figure.png') +``` + +will save the current figure to the file `my_figure.png`. The file format +will automatically be deduced from the file name extension (other formats +are pdf, ps, eps and svg). + +Note that functions in `plt` refer to a global figure variable +and after a figure has been displayed to the screen (e.g. with `plt.show`) +matplotlib will make this variable refer to a new empty figure. +Therefore, make sure you call `plt.savefig` before the plot is displayed to +the screen, otherwise you may find a file with an empty plot. + +When using dataframes, data is often generated and plotted to screen in one line. +In addition to using `plt.savefig`, we can save a reference to the current figure +in a local variable (with `plt.gcf`) and call the `savefig` class method from +that variable to save the figure to file. + +```python +data.plot(kind='bar') +fig = plt.gcf() # get current figure +fig.savefig('my_figure.png') +``` + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::: callout + +## Making your plots accessible + +Whenever you are generating plots to go into a paper or a presentation, there are a few things you can do to make sure that everyone can understand your plots. + +- Always make sure your text is large enough to read. Use the `fontsize` parameter in `xlabel`, `ylabel`, `title`, and `legend`, and [`tick_params` with `labelsize`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.tick_params.html) to increase the text size of the numbers on your axes. +- Similarly, you should make your graph elements easy to see. Use `s` to increase the size of your scatterplot markers and `linewidth` to increase the sizes of your plot lines. +- Using color (and nothing else) to distinguish between different plot elements will make your plots unreadable to anyone who is colorblind, or who happens to have a black-and-white office printer. For lines, the `linestyle` parameter lets you use different types of lines. For scatterplots, `marker` lets you change the shape of your points. If you're unsure about your colors, you can use [Coblis](https://www.color-blindness.com/coblis-color-blindness-simulator/) or [Color Oracle](https://colororacle.org/) to simulate what your plots would look like to those with colorblindness. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- [`matplotlib`](https://matplotlib.org/) is the most widely used scientific plotting library in Python. +- Plot data directly from a Pandas dataframe. +- Select and transform data, then plot it. +- Many styles of plot are available: see the [Python Graph Gallery](https://python-graph-gallery.com/matplotlib/) for more options. +- Can plot many sets of data together. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/10-lunch.md b/10-lunch.md new file mode 100644 index 000000000..f786db36a --- /dev/null +++ b/10-lunch.md @@ -0,0 +1,14 @@ +--- +title: Lunch +teaching: 0 +exercises: 0 +break: 45 +--- + +Over lunch, reflect on and discuss the following: + +- What sort of packages might you use in Python and why would you use them? +- How would data need to be formatted to be used in Pandas data frames? Would the data you have meet these requirements? +- What limitations or problems might you run into when thinking about how to apply what we've learned to your own projects or data? + + diff --git a/11-lists.md b/11-lists.md new file mode 100644 index 000000000..a7b07e5a0 --- /dev/null +++ b/11-lists.md @@ -0,0 +1,506 @@ +--- +title: Lists +teaching: 10 +exercises: 10 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Explain why programs need collections of values. +- Write programs that create flat lists, index them, slice them, and modify them through assignment and method calls. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I store multiple values? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## A list stores many values in a single structure. + +- Doing calculations with a hundred variables called `pressure_001`, `pressure_002`, etc., + would be at least as slow as doing them by hand. +- Use a *list* to store many values together. + - Contained within square brackets `[...]`. + - Values separated by commas `,`. +- Use `len` to find out how many values are in a list. + +```python +pressures = [0.273, 0.275, 0.277, 0.275, 0.276] +print('pressures:', pressures) +print('length:', len(pressures)) +``` + +```output +pressures: [0.273, 0.275, 0.277, 0.275, 0.276] +length: 5 +``` + +## Use an item's index to fetch it from a list. + +- Just like strings. + +```python +print('zeroth item of pressures:', pressures[0]) +print('fourth item of pressures:', pressures[4]) +``` + +```output +zeroth item of pressures: 0.273 +fourth item of pressures: 0.276 +``` + +## Lists' values can be replaced by assigning to them. + +- Use an index expression on the left of assignment to replace a value. + +```python +pressures[0] = 0.265 +print('pressures is now:', pressures) +``` + +```output +pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276] +``` + +## Appending items to a list lengthens it. + +- Use `list_name.append` to add items to the end of a list. + +```python +primes = [2, 3, 5] +print('primes is initially:', primes) +primes.append(7) +print('primes has become:', primes) +``` + +```output +primes is initially: [2, 3, 5] +primes has become: [2, 3, 5, 7] +``` + +- `append` is a *method* of lists. + - Like a function, but tied to a particular object. +- Use `object_name.method_name` to call methods. + - Deliberately resembles the way we refer to things in a library. +- We will meet other methods of lists as we go along. + - Use `help(list)` for a preview. +- `extend` is similar to `append`, but it allows you to combine two lists. For example: + +```python +teen_primes = [11, 13, 17, 19] +middle_aged_primes = [37, 41, 43, 47] +print('primes is currently:', primes) +primes.extend(teen_primes) +print('primes has now become:', primes) +primes.append(middle_aged_primes) +print('primes has finally become:', primes) +``` + +```output +primes is currently: [2, 3, 5, 7] +primes has now become: [2, 3, 5, 7, 11, 13, 17, 19] +primes has finally become: [2, 3, 5, 7, 11, 13, 17, 19, [37, 41, 43, 47]] +``` + +Note that while `extend` maintains the "flat" structure of the list, appending a list to a list means +the last element in `primes` will itself be a list, not an integer. Lists can contain values of any +type; therefore, lists of lists are possible. + +## Use `del` to remove items from a list entirely. + +- We use `del list_name[index]` to remove an element from a list (in the example, 9 is not a prime number) and thus shorten it. +- `del` is not a function or a method, but a statement in the language. + +```python +primes = [2, 3, 5, 7, 9] +print('primes before removing last item:', primes) +del primes[4] +print('primes after removing last item:', primes) +``` + +```output +primes before removing last item: [2, 3, 5, 7, 9] +primes after removing last item: [2, 3, 5, 7] +``` + +## The empty list contains no values. + +- Use `[]` on its own to represent a list that doesn't contain any values. + - "The zero of lists." +- Helpful as a starting point for collecting values + (which we will see in the [next episode](12-for-loops.md)). + +## Lists may contain values of different types. + +- A single list may contain numbers, strings, and anything else. + +```python +goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.'] +``` + +## Character strings can be indexed like lists. + +- Get single characters from a character string using indexes in square brackets. + +```python +element = 'carbon' +print('zeroth character:', element[0]) +print('third character:', element[3]) +``` + +```output +zeroth character: c +third character: b +``` + +## Character strings are immutable. + +- Cannot change the characters in a string after it has been created. + - *Immutable*: can't be changed after creation. + - In contrast, lists are *mutable*: they can be modified in place. +- Python considers the string to be a single value with parts, + not a collection of values. + +```python +element[0] = 'C' +``` + +```error +TypeError: 'str' object does not support item assignment +``` + +- Lists and character strings are both *collections*. + +## Indexing beyond the end of the collection is an error. + +- Python reports an `IndexError` if we attempt to access a value that doesn't exist. + - This is a kind of [runtime error](04-built-in.md). + - Cannot be detected as the code is parsed + because the index might be calculated based on data. + +```python +print('99th element of element is:', element[99]) +``` + +```output +IndexError: string index out of range +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Fill in the Blanks + +Fill in the blanks so that the program below produces the output shown. + +```python +values = ____ +values.____(1) +values.____(3) +values.____(5) +print('first time:', values) +values = values[____] +print('second time:', values) +``` + +```output +first time: [1, 3, 5] +second time: [3, 5] +``` + +::::::::::::::: solution + +## Solution + +```python +values = [] +values.append(1) +values.append(3) +values.append(5) +print('first time:', values) +values = values[1:] +print('second time:', values) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## How Large is a Slice? + +If `start` and `stop` are both non-negative integers, +how long is the list `values[start:stop]`? + +::::::::::::::: solution + +## Solution + +The list `values[start:stop]` has up to `stop - start` elements. For example, +`values[1:4]` has the 3 elements `values[1]`, `values[2]`, and `values[3]`. +Why 'up to'? As we saw in [episode 2](02-variables.md), +if `stop` is greater than the total length of the list `values`, +we will still get a list back but it will be shorter than expected. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## From Strings to Lists and Back + +Given this: + +```python +print('string to list:', list('tin')) +print('list to string:', ''.join(['g', 'o', 'l', 'd'])) +``` + +```output +string to list: ['t', 'i', 'n'] +list to string: gold +``` + +1. What does `list('some string')` do? +2. What does `'-'.join(['x', 'y', 'z'])` generate? + +::::::::::::::: solution + +## Solution + +1. [`list('some string')`](https://docs.python.org/3/library/stdtypes.html#list) converts a string into a list containing all of its characters. +2. [`join`](https://docs.python.org/3/library/stdtypes.html#str.join) returns a string that is the *concatenation* + of each string element in the list and adds the separator between each element in the list. This results in + `x-y-z`. The separator between the elements is the string that provides this method. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Working With the End + +What does the following program print? + +```python +element = 'helium' +print(element[-1]) +``` + +1. How does Python interpret a negative index? +2. If a list or string has N elements, + what is the most negative index that can safely be used with it, + and what location does that index represent? +3. If `values` is a list, what does `del values[-1]` do? +4. How can you display all elements but the last one without changing `values`? + (Hint: you will need to combine slicing and negative indexing.) + +::::::::::::::: solution + +## Solution + +The program prints `m`. + +1. Python interprets a negative index as starting from the end (as opposed to + starting from the beginning). The last element is `-1`. +2. The last index that can safely be used with a list of N elements is element + `-N`, which represents the first element. +3. `del values[-1]` removes the last element from the list. +4. `values[:-1]` + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Stepping Through a List + +What does the following program print? + +```python +element = 'fluorine' +print(element[::2]) +print(element[::-1]) +``` + +1. If we write a slice as `low:high:stride`, what does `stride` do? +2. What expression would select all of the even-numbered items from a collection? + +::::::::::::::: solution + +## Solution + +The program prints + +```python +furn +eniroulf +``` + +1. `stride` is the step size of the slice. +2. The slice `1::2` selects all even-numbered items from a collection: it starts + with element `1` (which is the second element, since indexing starts at `0`), + goes on until the end (since no `end` is given), and uses a step size of `2` + (i.e., selects every second element). + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Slice Bounds + +What does the following program print? + +```python +element = 'lithium' +print(element[0:20]) +print(element[-1:3]) +``` + +::::::::::::::: solution + +## Solution + +```output +lithium + +``` + +The first statement prints the whole string, since the slice goes beyond the total length of the string. +The second statement returns an empty string, because the slice goes "out of bounds" of the string. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Sort and Sorted + +What do these two programs print? +In simple terms, explain the difference between `sorted(letters)` and `letters.sort()`. + +```python +# Program A +letters = list('gold') +result = sorted(letters) +print('letters is', letters, 'and result is', result) +``` + +```python +# Program B +letters = list('gold') +result = letters.sort() +print('letters is', letters, 'and result is', result) +``` + +::::::::::::::: solution + +## Solution + +Program A prints + +```output +letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o'] +``` + +Program B prints + +```output +letters is ['d', 'g', 'l', 'o'] and result is None +``` + +`sorted(letters)` returns a sorted copy of the list `letters` (the original +list `letters` remains unchanged), while `letters.sort()` sorts the list +`letters` in-place and does not return anything. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Copying (or Not) + +What do these two programs print? +In simple terms, explain the difference between `new = old` and `new = old[:]`. + +```python +# Program A +old = list('gold') +new = old # simple assignment +new[0] = 'D' +print('new is', new, 'and old is', old) +``` + +```python +# Program B +old = list('gold') +new = old[:] # assigning a slice +new[0] = 'D' +print('new is', new, 'and old is', old) +``` + +::::::::::::::: solution + +## Solution + +Program A prints + +```output +new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd'] +``` + +Program B prints + +```output +new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd'] +``` + +`new = old` makes `new` a reference to the list `old`; `new` and `old` point +towards the same object. + +`new = old[:]` however creates a new list object `new` containing all elements +from the list `old`; `new` and `old` are different objects. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- A list stores many values in a single structure. +- Use an item's index to fetch it from a list. +- Lists' values can be replaced by assigning to them. +- Appending items to a list lengthens it. +- Use `del` to remove items from a list entirely. +- The empty list contains no values. +- Lists may contain values of different types. +- Character strings can be indexed like lists. +- Character strings are immutable. +- Indexing beyond the end of the collection is an error. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/12-for-loops.md b/12-for-loops.md new file mode 100644 index 000000000..c73f9ced2 --- /dev/null +++ b/12-for-loops.md @@ -0,0 +1,467 @@ +--- +title: For Loops +teaching: 10 +exercises: 15 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Explain what for loops are normally used for. +- Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration. +- Write for loops that use the Accumulator pattern to aggregate values. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I make a program do many things? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## A *for loop* executes commands once for each value in a collection. + +- Doing calculations on the values in a list one by one + is as painful as working with `pressure_001`, `pressure_002`, etc. +- A *for loop* tells Python to execute some statements once for each value in a list, + a character string, + or some other collection. +- "for each thing in this group, do these operations" + +```python +for number in [2, 3, 5]: + print(number) +``` + +- This `for` loop is equivalent to: + +```python +print(2) +print(3) +print(5) +``` + +- And the `for` loop's output is: + +```output +2 +3 +5 +``` + +## A `for` loop is made up of a collection, a loop variable, and a body. + +```python +for number in [2, 3, 5]: + print(number) +``` + +- The collection, `[2, 3, 5]`, is what the loop is being run on. +- The body, `print(number)`, specifies what to do for each value in the collection. +- The loop variable, `number`, is what changes for each *iteration* of the loop. + - The "current thing". + +## The first line of the `for` loop must end with a colon, and the body must be indented. + +- The colon at the end of the first line signals the start of a *block* of statements. +- Python uses indentation rather than `{}` or `begin`/`end` to show *nesting*. + - Any consistent indentation is legal, but almost everyone uses four spaces. + +```python +for number in [2, 3, 5]: +print(number) +``` + +```error +IndentationError: expected an indented block +``` + +- Indentation is always meaningful in Python. + +```python +firstName = "Jon" + lastName = "Smith" +``` + +```error + File "", line 2 + lastName = "Smith" + ^ +IndentationError: unexpected indent +``` + +- This error can be fixed by removing the extra spaces + at the beginning of the second line. + +## Loop variables can be called anything. + +- As with all variables, loop variables are: + - Created on demand. + - Meaningless: their names can be anything at all. + +```python +for kitten in [2, 3, 5]: + print(kitten) +``` + +## The body of a loop can contain many statements. + +- But no loop should be more than a few lines long. +- Hard for human beings to keep larger chunks of code in mind. + +```python +primes = [2, 3, 5] +for p in primes: + squared = p ** 2 + cubed = p ** 3 + print(p, squared, cubed) +``` + +```output +2 4 8 +3 9 27 +5 25 125 +``` + +## Use `range` to iterate over a sequence of numbers. + +- The built-in function [`range`](https://docs.python.org/3/library/stdtypes.html#range) produces a sequence of numbers. + - *Not* a list: the numbers are produced on demand + to make looping over large ranges more efficient. +- `range(N)` is the numbers 0..N-1 + - Exactly the legal indices of a list or character string of length N + +```python +print('a range is not a list: range(0, 3)') +for number in range(0, 3): + print(number) +``` + +```output +a range is not a list: range(0, 3) +0 +1 +2 +``` + +## The Accumulator pattern turns many values into one. + +- A common pattern in programs is to: + 1. Initialize an *accumulator* variable to zero, the empty string, or the empty list. + 2. Update the variable with values from a collection. + +```python +# Sum the first 10 integers. +total = 0 +for number in range(10): + total = total + (number + 1) +print(total) +``` + +```output +55 +``` + +- Read `total = total + (number + 1)` as: + - Add 1 to the current value of the loop variable `number`. + - Add that to the current value of the accumulator variable `total`. + - Assign that to `total`, replacing the current value. +- We have to add `number + 1` because `range` produces 0..9, not 1..10. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Classifying Errors + +Is an indentation error a syntax error or a runtime error? + +::::::::::::::: solution + +## Solution + +An IndentationError is a syntax error. Programs with syntax errors cannot be started. +A program with a runtime error will start but an error will be thrown under certain conditions. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Tracing Execution + +Create a table showing the numbers of the lines that are executed when this program runs, +and the values of the variables after each line is executed. + +```python +total = 0 +for char in "tin": + total = total + 1 +``` + +::::::::::::::: solution + +## Solution + +| Line no | Variables | +| ------- | -------------------- | +| 1 | total = 0 | +| 2 | total = 0 char = 't' | +| 3 | total = 1 char = 't' | +| 2 | total = 1 char = 'i' | +| 3 | total = 2 char = 'i' | +| 2 | total = 2 char = 'n' | +| 3 | total = 3 char = 'n' | + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Reversing a String + +Fill in the blanks in the program below so that it prints "nit" +(the reverse of the original character string "tin"). + +```python +original = "tin" +result = ____ +for char in original: + result = ____ +print(result) +``` + +::::::::::::::: solution + +## Solution + +```python +original = "tin" +result = "" +for char in original: + result = char + result +print(result) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Practice Accumulating + +Fill in the blanks in each of the programs below +to produce the indicated result. + +```python +# Total length of the strings in the list: ["red", "green", "blue"] => 12 +total = 0 +for word in ["red", "green", "blue"]: + ____ = ____ + len(word) +print(total) +``` + +::::::::::::::: solution + +## Solution + +```python +total = 0 +for word in ["red", "green", "blue"]: + total = total + len(word) +print(total) +``` + +::::::::::::::::::::::::: + +```python +# List of word lengths: ["red", "green", "blue"] => [3, 5, 4] +lengths = ____ +for word in ["red", "green", "blue"]: + lengths.____(____) +print(lengths) +``` + +::::::::::::::: solution + +## Solution + +```python +lengths = [] +for word in ["red", "green", "blue"]: + lengths.append(len(word)) +print(lengths) +``` + +::::::::::::::::::::::::: + +```python +# Concatenate all words: ["red", "green", "blue"] => "redgreenblue" +words = ["red", "green", "blue"] +result = ____ +for ____ in ____: + ____ +print(result) +``` + +::::::::::::::: solution + +## Solution + +```python +words = ["red", "green", "blue"] +result = "" +for word in words: + result = result + word +print(result) +``` + +::::::::::::::::::::::::: + +**Create an acronym:** Starting from the list `["red", "green", "blue"]`, create the acronym `"RGB"` using +a for loop. + +**Hint:** You may need to use a string method to properly format the acronym. + +::::::::::::::: solution + +## Solution + +```python +acronym = "" +for word in ["red", "green", "blue"]: + acronym = acronym + word[0].upper() +print(acronym) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Cumulative Sum + +Reorder and properly indent the lines of code below +so that they print a list with the cumulative sum of data. +The result should be `[1, 3, 5, 10]`. + +```python +cumulative.append(total) +for number in data: +cumulative = [] +total = total + number +total = 0 +print(cumulative) +data = [1,2,2,5] +``` + +::::::::::::::: solution + +## Solution + +```python +total = 0 +data = [1,2,2,5] +cumulative = [] +for number in data: + total = total + number + cumulative.append(total) +print(cumulative) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Identifying Variable Name Errors + +1. Read the code below and try to identify what the errors are + *without* running it. +2. Run the code and read the error message. + What type of `NameError` do you think this is? + Is it a string with no quotes, a misspelled variable, or a + variable that should have been defined but was not? +3. Fix the error. +4. Repeat steps 2 and 3, until you have fixed all the errors. + +```python +for number in range(10): + # use a if the number is a multiple of 3, otherwise use b + if (Number % 3) == 0: + message = message + a + else: + message = message + "b" +print(message) +``` + +::::::::::::::: solution + +## Solution + +- Python variable names are case sensitive: `number` and `Number` refer to different variables. +- The variable `message` needs to be initialized as an empty string. +- We want to add the string `"a"` to `message`, not the undefined variable `a`. + +```python +message = "" +for number in range(10): + # use a if the number is a multiple of 3, otherwise use b + if (number % 3) == 0: + message = message + "a" + else: + message = message + "b" +print(message) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Identifying Item Errors + +1. Read the code below and try to identify what the errors are + *without* running it. +2. Run the code, and read the error message. What type of error is it? +3. Fix the error. + +```python +seasons = ['Spring', 'Summer', 'Fall', 'Winter'] +print('My favorite season is ', seasons[4]) +``` + +::::::::::::::: solution + +## Solution + +This list has 4 elements and the index to access the last element in the list is `3`. + +```python +seasons = ['Spring', 'Summer', 'Fall', 'Winter'] +print('My favorite season is ', seasons[3]) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- A *for loop* executes commands once for each value in a collection. +- A `for` loop is made up of a collection, a loop variable, and a body. +- The first line of the `for` loop must end with a colon, and the body must be indented. +- Indentation is always meaningful in Python. +- Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable). +- The body of a loop can contain many statements. +- Use `range` to iterate over a sequence of numbers. +- The Accumulator pattern turns many values into one. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/13-conditionals.md b/13-conditionals.md new file mode 100644 index 000000000..217eca25b --- /dev/null +++ b/13-conditionals.md @@ -0,0 +1,411 @@ +--- +title: Conditionals +teaching: 10 +exercises: 15 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Correctly write programs that use if and else statements and simple Boolean expressions (without logical operators). +- Trace the execution of unnested conditionals and conditionals inside loops. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can programs do different things for different data? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Use `if` statements to control whether or not a block of code is executed. + +- An `if` statement (more properly called a *conditional* statement) + controls whether some block of code is executed or not. +- Structure is similar to a `for` statement: + - First line opens with `if` and ends with a colon + - Body containing one or more statements is indented (usually by 4 spaces) + +```python +mass = 3.54 +if mass > 3.0: + print(mass, 'is large') + +mass = 2.07 +if mass > 3.0: + print (mass, 'is large') +``` + +```output +3.54 is large +``` + +## Conditionals are often used inside loops. + +- Not much point using a conditional when we know the value (as above). +- But useful when we have a collection to process. + +```python +masses = [3.54, 2.07, 9.22, 1.86, 1.71] +for m in masses: + if m > 3.0: + print(m, 'is large') +``` + +```output +3.54 is large +9.22 is large +``` + +## Use `else` to execute a block of code when an `if` condition is *not* true. + +- `else` can be used following an `if`. +- Allows us to specify an alternative to execute when the `if` *branch* isn't taken. + +```python +masses = [3.54, 2.07, 9.22, 1.86, 1.71] +for m in masses: + if m > 3.0: + print(m, 'is large') + else: + print(m, 'is small') +``` + +```output +3.54 is large +2.07 is small +9.22 is large +1.86 is small +1.71 is small +``` + +## Use `elif` to specify additional tests. + +- May want to provide several alternative choices, each with its own test. +- Use `elif` (short for "else if") and a condition to specify these. +- Always associated with an `if`. +- Must come before the `else` (which is the "catch all"). + +```python +masses = [3.54, 2.07, 9.22, 1.86, 1.71] +for m in masses: + if m > 9.0: + print(m, 'is HUGE') + elif m > 3.0: + print(m, 'is large') + else: + print(m, 'is small') +``` + +```output +3.54 is large +2.07 is small +9.22 is HUGE +1.86 is small +1.71 is small +``` + +## Conditions are tested once, in order. + +- Python steps through the branches of the conditional in order, testing each in turn. +- So ordering matters. + +```python +grade = 85 +if grade >= 90: + print('grade is A') +elif grade >= 80: + print('grade is B') +elif grade >= 70: + print('grade is C') +``` + +```output +grade is B +``` + +- Does *not* automatically go back and re-evaluate if values change. + +```python +velocity = 10.0 +if velocity > 20.0: + print('moving too fast') +else: + print('adjusting velocity') + velocity = 50.0 +``` + +```output +adjusting velocity +``` + +- Often use conditionals in a loop to "evolve" the values of variables. + +```python +velocity = 10.0 +for i in range(5): # execute the loop 5 times + print(i, ':', velocity) + if velocity > 20.0: + print('moving too fast') + velocity = velocity - 5.0 + else: + print('moving too slow') + velocity = velocity + 10.0 +print('final velocity:', velocity) +``` + +```output +0 : 10.0 +moving too slow +1 : 20.0 +moving too slow +2 : 30.0 +moving too fast +3 : 25.0 +moving too fast +4 : 20.0 +moving too slow +final velocity: 30.0 +``` + +## Create a table showing variables' values to trace a program's execution. + + + + + + +
i 0 . 1 . 2 . 3 . 4 .
velocity 10.0 20.0 . 30.0 . 25.0 . 20.0 . 30.0
+ +- The program must have a `print` statement *outside* the body of the loop + to show the final value of `velocity`, + since its value is updated by the last iteration of the loop. + +::::::::::::::::::::::::::::::::::::::::: callout + +## Compound Relations Using `and`, `or`, and Parentheses + +Often, you want some combination of things to be true. You can combine +relations within a conditional using `and` and `or`. Continuing the example +above, suppose you have + +```python +mass = [ 3.54, 2.07, 9.22, 1.86, 1.71] +velocity = [10.00, 20.00, 30.00, 25.00, 20.00] + +i = 0 +for i in range(5): + if mass[i] > 5 and velocity[i] > 20: + print("Fast heavy object. Duck!") + elif mass[i] > 2 and mass[i] <= 5 and velocity[i] <= 20: + print("Normal traffic") + elif mass[i] <= 2 and velocity[i] <= 20: + print("Slow light object. Ignore it") + else: + print("Whoa! Something is up with the data. Check it") +``` + +Just like with arithmetic, you can and should use parentheses whenever there +is possible ambiguity. A good general rule is to *always* use parentheses +when mixing `and` and `or` in the same condition. That is, instead of: + +```python +if mass[i] <= 2 or mass[i] >= 5 and velocity[i] > 20: +``` + +write one of these: + +```python +if (mass[i] <= 2 or mass[i] >= 5) and velocity[i] > 20: +if mass[i] <= 2 or (mass[i] >= 5 and velocity[i] > 20): +``` + +so it is perfectly clear to a reader (and to Python) what you really mean. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Tracing Execution + +What does this program print? + +```python +pressure = 71.9 +if pressure > 50.0: + pressure = 25.0 +elif pressure <= 50.0: + pressure = 0.0 +print(pressure) +``` + +::::::::::::::: solution + +## Solution + +```output +25.0 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Trimming Values + +Fill in the blanks so that this program creates a new list +containing zeroes where the original list's values were negative +and ones where the original list's values were positive. + +```python +original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4] +result = ____ +for value in original: + if ____: + result.append(0) + else: + ____ +print(result) +``` + +```output +[0, 1, 1, 1, 0, 1] +``` + +::::::::::::::: solution + +## Solution + +```python +original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4] +result = [] +for value in original: + if value < 0.0: + result.append(0) + else: + result.append(1) +print(result) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Processing Small Files + +Modify this program so that it only processes files with fewer than 50 records. + +```python +import glob +import pandas as pd +for filename in glob.glob('data/*.csv'): + contents = pd.read_csv(filename) + ____: + print(filename, len(contents)) +``` + +::::::::::::::: solution + +## Solution + +```python +import glob +import pandas as pd +for filename in glob.glob('data/*.csv'): + contents = pd.read_csv(filename) + if len(contents) < 50: + print(filename, len(contents)) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Initializing + +Modify this program so that it finds the largest and smallest values in the list +no matter what the range of values originally is. + +```python +values = [...some test data...] +smallest, largest = None, None +for v in values: + if ____: + smallest, largest = v, v + ____: + smallest = min(____, v) + largest = max(____, v) +print(smallest, largest) +``` + +What are the advantages and disadvantages of using this method +to find the range of the data? + +::::::::::::::: solution + +## Solution + +```python +values = [-2,1,65,78,-54,-24,100] +smallest, largest = None, None +for v in values: + if smallest is None and largest is None: + smallest, largest = v, v + else: + smallest = min(smallest, v) + largest = max(largest, v) +print(smallest, largest) +``` + +If you wrote `== None` instead of `is None`, that works too, but Python programmers always +write `is None` because of the special way `None` works in the language. + +It can be argued that an advantage of using this method would be to make the code more readable. +However, a disadvantage is that this code is not efficient because within each iteration of the +`for` loop statement, there are two more loops that run over two numbers each (the `min` and +`max` functions). It would be more efficient to iterate over each number just once: + +```python +values = [-2,1,65,78,-54,-24,100] +smallest, largest = None, None +for v in values: + if smallest is None or v < smallest: + smallest = v + if largest is None or v > largest: + largest = v +print(smallest, largest) +``` + +Now we have one loop, but four comparison tests. There are two ways we could improve it further: +either use fewer comparisons in each iteration, or use two loops that each contain only one +comparison test. The simplest solution is often the best: + +```python +values = [-2,1,65,78,-54,-24,100] +smallest = min(values) +largest = max(values) +print(smallest, largest) +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Use `if` statements to control whether or not a block of code is executed. +- Conditionals are often used inside loops. +- Use `else` to execute a block of code when an `if` condition is *not* true. +- Use `elif` to specify additional tests. +- Conditions are tested once, in order. +- Create a table showing variables' values to trace a program's execution. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/14-looping-data-sets.md b/14-looping-data-sets.md new file mode 100644 index 000000000..94d7ddbe4 --- /dev/null +++ b/14-looping-data-sets.md @@ -0,0 +1,268 @@ +--- +title: Looping Over Data Sets +teaching: 5 +exercises: 10 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Be able to read and write globbing expressions that match sets of files. +- Use glob to create lists of files. +- Write for loops to perform operations on files given their names in a list. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I process many data sets with a single command? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Use a `for` loop to process files given a list of their names. + +- A filename is a character string. +- And lists can contain character strings. + +```python +import pandas as pd +for filename in ['data/gapminder_gdp_africa.csv', 'data/gapminder_gdp_asia.csv']: + data = pd.read_csv(filename, index_col='country') + print(filename, data.min()) +``` + +```output +data/gapminder_gdp_africa.csv gdpPercap_1952 298.846212 +gdpPercap_1957 335.997115 +gdpPercap_1962 355.203227 +gdpPercap_1967 412.977514 +⋮ ⋮ ⋮ +gdpPercap_1997 312.188423 +gdpPercap_2002 241.165877 +gdpPercap_2007 277.551859 +dtype: float64 +data/gapminder_gdp_asia.csv gdpPercap_1952 331 +gdpPercap_1957 350 +gdpPercap_1962 388 +gdpPercap_1967 349 +⋮ ⋮ ⋮ +gdpPercap_1997 415 +gdpPercap_2002 611 +gdpPercap_2007 944 +dtype: float64 +``` + +## Use [`glob.glob`](https://docs.python.org/3/library/glob.html#glob.glob) to find sets of files whose names match a pattern. + +- In Unix, the term "globbing" means "matching a set of files with a pattern". +- The most common patterns are: + - `*` meaning "match zero or more characters" + - `?` meaning "match exactly one character" +- Python's standard library contains the [`glob`](https://docs.python.org/3/library/glob.html) module to provide pattern matching functionality +- The [`glob`](https://docs.python.org/3/library/glob.html) module contains a function also called `glob` to match file patterns +- E.g., `glob.glob('*.txt')` matches all files in the current directory + whose names end with `.txt`. +- Result is a (possibly empty) list of character strings. + +```python +import glob +print('all csv files in data directory:', glob.glob('data/*.csv')) +``` + +```output +all csv files in data directory: ['data/gapminder_all.csv', 'data/gapminder_gdp_africa.csv', \ +'data/gapminder_gdp_americas.csv', 'data/gapminder_gdp_asia.csv', 'data/gapminder_gdp_europe.csv', \ +'data/gapminder_gdp_oceania.csv'] +``` + +```python +print('all PDB files:', glob.glob('*.pdb')) +``` + +```output +all PDB files: [] +``` + +## Use `glob` and `for` to process batches of files. + +- Helps a lot if the files are named and stored systematically and consistently + so that simple patterns will find the right data. + +```python +for filename in glob.glob('data/gapminder_*.csv'): + data = pd.read_csv(filename) + print(filename, data['gdpPercap_1952'].min()) +``` + +```output +data/gapminder_all.csv 298.8462121 +data/gapminder_gdp_africa.csv 298.8462121 +data/gapminder_gdp_americas.csv 1397.717137 +data/gapminder_gdp_asia.csv 331.0 +data/gapminder_gdp_europe.csv 973.5331948 +data/gapminder_gdp_oceania.csv 10039.59564 +``` + +- This includes all data, as well as per-region data. +- Use a more specific pattern in the exercises to exclude the whole data set. +- But note that the minimum of the entire data set is also the minimum of one of the data sets, + which is a nice check on correctness. + +::::::::::::::::::::::::::::::::::::::: challenge + +## Determining Matches + +Which of these files is *not* matched by the expression `glob.glob('data/*as*.csv')`? + +1. `data/gapminder_gdp_africa.csv` +2. `data/gapminder_gdp_americas.csv` +3. `data/gapminder_gdp_asia.csv` + +::::::::::::::: solution + +## Solution + +1 is not matched by the glob. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Minimum File Size + +Modify this program so that it prints the number of records in +the file that has the fewest records. + +```python +import glob +import pandas as pd +fewest = ____ +for filename in glob.glob('data/*.csv'): + dataframe = pd.____(filename) + fewest = min(____, dataframe.shape[0]) +print('smallest file has', fewest, 'records') +``` + +Note that the [`DataFrame.shape()` method][shape-method] +returns a tuple with the number of rows and columns of the data frame. + +::::::::::::::: solution + +## Solution + +```python +import glob +import pandas as pd +fewest = float('Inf') +for filename in glob.glob('data/*.csv'): + dataframe = pd.read_csv(filename) + fewest = min(fewest, dataframe.shape[0]) +print('smallest file has', fewest, 'records') +``` + +You might have chosen to initialize the `fewest` variable with a number greater than the numbers +you're dealing with, but that could lead to trouble if you reuse the code with bigger numbers. +Python lets you use positive infinity, which will work no matter how big your numbers are. +What other special strings does the [`float` function][float-function] recognize? + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Comparing Data + +Write a program that reads in the regional data sets +and plots the average GDP per capita for each region over time +in a single chart. Pandas will raise an error if it encounters +non-numeric columns in a dataframe computation so you may need +to either filter out those columns or tell pandas to ignore them. + + +::::::::::::::: solution + +## Solution + +This solution builds a useful legend by using the [string `split` method][split-method] to +extract the `region` from the path 'data/gapminder\_gdp\_a\_specific\_region.csv'. + +```python +import glob +import pandas as pd +import matplotlib.pyplot as plt +fig, ax = plt.subplots(1,1) +for filename in glob.glob('data/gapminder_gdp*.csv'): + dataframe = pd.read_csv(filename) + # extract from the filename, expected to be in the format 'data/gapminder_gdp_.csv'. + # we will split the string using the split method and `_` as our separator, + # retrieve the last string in the list that split returns (`.csv`), + # and then remove the `.csv` extension from that string. + # NOTE: the pathlib module covered in the next callout also offers + # convenient abstractions for working with filesystem paths and could solve this as well: + # from pathlib import Path + # region = Path(filename).stem.split('_')[-1] + region = filename.split('_')[-1][:-4] + # pandas raises errors when it encounters non-numeric columns in a dataframe computation + # but we can tell pandas to ignore them with the `numeric_only` parameter + dataframe.mean(numeric_only=True).plot(ax=ax, label=region) + # NOTE: another way of doing this selects just the columns with gdp in their name using the filter method + # dataframe.filter(like="gdp").mean().plot(ax=ax, label=region) + +plt.legend() +plt.show() +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::: callout + +## Dealing with File Paths + +The [`pathlib` module][pathlib-module] provides useful abstractions for file and path manipulation like +returning the name of a file without the file extension. This is very useful when looping over files and +directories. In the example below, we create a `Path` object and inspect its attributes. + +```python +from pathlib import Path + +p = Path("data/gapminder_gdp_africa.csv") +print(p.parent) +print(p.stem) +print(p.suffix) +``` + +```output +data +gapminder_gdp_africa +.csv +``` + +**Hint:** Check all available attributes and methods on the `Path` object with the `dir()` +function. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +[shape-method]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shape.html +[float-function]: https://docs.python.org/3/library/functions.html#float +[split-method]: https://docs.python.org/3/library/stdtypes.html#str.split +[pathlib-module]: https://docs.python.org/3/library/pathlib.html + + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Use a `for` loop to process files given a list of their names. +- Use `glob.glob` to find sets of files whose names match a pattern. +- Use `glob` and `for` to process batches of files. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/15-coffee.md b/15-coffee.md new file mode 100644 index 000000000..2cb1ddc11 --- /dev/null +++ b/15-coffee.md @@ -0,0 +1,18 @@ +--- +title: Afternoon Coffee +teaching: 0 +exercises: 0 +break: 15 +--- + +## Reflection exercise + +Over break, reflect on and discuss the following: + +- A common refrain in software engineering is "Don't Repeat Yourself". How do the techniques we've learned in the last + lessons help us avoid repeating ourselves? *Note that in practice there is some nuance to this and should be balanced + with doing the simplest thing that could possibly work.* +- What are the pros / cons of making a variable global or local to a function? +- When would you consider turning a block of code into a function definition? + + diff --git a/16-writing-functions.md b/16-writing-functions.md new file mode 100644 index 000000000..613850f9b --- /dev/null +++ b/16-writing-functions.md @@ -0,0 +1,678 @@ +--- +title: Writing Functions +teaching: 10 +exercises: 15 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Explain and identify the difference between function definition and function call. +- Write a function that takes a small, fixed number of arguments and produces a single result. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I create my own functions? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Break programs down into functions to make them easier to understand. + +- Human beings can only keep a few items in working memory at a time. +- Understand larger/more complicated ideas by understanding and combining pieces. + - Components in a machine. + - Lemmas when proving theorems. +- Functions serve the same purpose in programs. + - *Encapsulate* complexity so that we can treat it as a single "thing". +- Also enables *re-use*. + - Write one time, use many times. + +## Define a function using `def` with a name, parameters, and a block of code. + +- Begin the definition of a new function with `def`. +- Followed by the name of the function. + - Must obey the same rules as variable names. +- Then *parameters* in parentheses. + - Empty parentheses if the function doesn't take any inputs. + - We will discuss this in detail in a moment. +- Then a colon. +- Then an indented block of code. + +```python +def print_greeting(): + print('Hello!') + print('The weather is nice today.') + print('Right?') +``` + +## Defining a function does not run it. + +- Defining a function does not run it. + - Like assigning a value to a variable. +- Must call the function to execute the code it contains. + +```python +print_greeting() +``` + +```output +Hello! +``` + +## Arguments in a function call are matched to its defined parameters. + +- Functions are most useful when they can operate on different data. +- Specify *parameters* when defining a function. + - These become variables when the function is executed. + - Are assigned the arguments in the call (i.e., the values passed to the function). + - If you don't name the arguments when using them in the call, the arguments will be matched to + parameters in the order the parameters are defined in the function. + +```python +def print_date(year, month, day): + joined = str(year) + '/' + str(month) + '/' + str(day) + print(joined) + +print_date(1871, 3, 19) +``` + +```output +1871/3/19 +``` + +Or, we can name the arguments when we call the function, which allows us to +specify them in any order and adds clarity to the call site; otherwise as +one is reading the code they might forget if the second argument is the month +or the day for example. + +```python +print_date(month=3, day=19, year=1871) +``` + +```output +1871/3/19 +``` + +- Via [Twitter](https://twitter.com/minisciencegirl/status/693486088963272705): + `()` contains the ingredients for the function + while the body contains the recipe. + +## Functions may return a result to their caller using `return`. + +- Use `return ...` to give a value back to the caller. +- May occur anywhere in the function. +- But functions are easier to understand if `return` occurs: + - At the start to handle special cases. + - At the very end, with a final result. + +```python +def average(values): + if len(values) == 0: + return None + return sum(values) / len(values) +``` + +```python +a = average([1, 3, 4]) +print('average of actual values:', a) +``` + +```output +average of actual values: 2.6666666666666665 +``` + +```python +print('average of empty list:', average([])) +``` + +```output +average of empty list: None +``` + +- Remember: [every function returns something](04-built-in.md). +- A function that doesn't explicitly `return` a value automatically returns `None`. + +```python +result = print_date(1871, 3, 19) +print('result of call is:', result) +``` + +```output +1871/3/19 +result of call is: None +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Identifying Syntax Errors + +1. Read the code below and try to identify what the errors are + *without* running it. +2. Run the code and read the error message. + Is it a `SyntaxError` or an `IndentationError`? +3. Fix the error. +4. Repeat steps 2 and 3 until you have fixed all the errors. + +```python +def another_function + print("Syntax errors are annoying.") + print("But at least python tells us about them!") + print("So they are usually not too hard to fix.") +``` + +::::::::::::::: solution + +## Solution + +```python +def another_function(): + print("Syntax errors are annoying.") + print("But at least Python tells us about them!") + print("So they are usually not too hard to fix.") +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Definition and Use + +What does the following program print? + +```python +def report(pressure): + print('pressure is', pressure) + +print('calling', report, 22.5) +``` + +::::::::::::::: solution + +## Solution + +```output +calling 22.5 +``` + +A function call always needs parenthesis, otherwise you get memory address of the function object. So, if we wanted to call the function named report, and give it the value 22.5 to report on, we could have our function call as follows + +```python +print("calling") +report(22.5) +``` + +```output +calling +pressure is 22.5 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Order of Operations + +1. What's wrong in this example? + + ```python + result = print_time(11, 37, 59) + + def print_time(hour, minute, second): + time_string = str(hour) + ':' + str(minute) + ':' + str(second) + print(time_string) + ``` + +2. After fixing the problem above, explain why running this example code: + + ```python + result = print_time(11, 37, 59) + print('result of call is:', result) + ``` + + gives this output: + + ```output + 11:37:59 + result of call is: None + ``` + +3. Why is the result of the call `None`? + +::::::::::::::: solution + +## Solution + +1. The problem with the example is that the function `print_time()` is defined *after* the call to the function is made. Python + doesn't know how to resolve the name `print_time` since it hasn't been defined yet and will raise a `NameError` e.g., + `NameError: name 'print_time' is not defined` + +2. The first line of output `11:37:59` is printed by the first line of code, `result = print_time(11, 37, 59)` that binds the value + returned by invoking `print_time` to the variable `result`. The second line is from the second print call to print the contents + of the `result` variable. + +3. `print_time()` does not explicitly `return` a value, so it automatically returns `None`. + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Encapsulation + +Fill in the blanks to create a function that takes a single filename as an argument, +loads the data in the file named by the argument, +and returns the minimum value in that data. + +```python +import pandas as pd + +def min_in_data(____): + data = ____ + return ____ +``` + +::::::::::::::: solution + +## Solution + +```python +import pandas as pd + +def min_in_data(filename): + data = pd.read_csv(filename) + return data.min() +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Find the First + +Fill in the blanks to create a function that takes a list of numbers as an argument +and returns the first negative value in the list. +What does your function do if the list is empty? What if the list has no negative numbers? + +```python +def first_negative(values): + for v in ____: + if ____: + return ____ +``` + +::::::::::::::: solution + +## Solution + +```python +def first_negative(values): + for v in values: + if v < 0: + return v +``` + +If an empty list or a list with all positive values is passed to this function, it returns `None`: + +```python +my_list = [] +print(first_negative(my_list)) +``` + +```output +None +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Calling by Name + +Earlier we saw this function: + +```python +def print_date(year, month, day): + joined = str(year) + '/' + str(month) + '/' + str(day) + print(joined) +``` + +We saw that we can call the function using *named arguments*, like this: + +```python +print_date(day=1, month=2, year=2003) +``` + +1. What does `print_date(day=1, month=2, year=2003)` print? +2. When have you seen a function call like this before? +3. When and why is it useful to call functions this way? + +::::::::::::::: solution + +## Solution + +1. `2003/2/1` +2. We saw examples of using *named arguments* when working with the pandas library. For example, when reading in a dataset + using `data = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')`, the last argument `index_col` is a + named argument. +3. Using named arguments can make code more readable since one can see from the function call what name the different arguments + have inside the function. It can also reduce the chances of passing arguments in the wrong order, since by using named arguments + the order doesn't matter. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Encapsulation of an If/Print Block + +The code below will run on a label-printer for chicken eggs. A digital scale will report a chicken egg mass (in grams) +to the computer and then the computer will print a label. + +```python +import random +for i in range(10): + + # simulating the mass of a chicken egg + # the (random) mass will be 70 +/- 20 grams + mass = 70 + 20.0 * (2.0 * random.random() - 1.0) + + print(mass) + + # egg sizing machinery prints a label + if mass >= 85: + print("jumbo") + elif mass >= 70: + print("large") + elif mass < 70 and mass >= 55: + print("medium") + else: + print("small") +``` + +The if-block that classifies the eggs might be useful in other situations, +so to avoid repeating it, we could fold it into a function, `get_egg_label()`. +Revising the program to use the function would give us this: + +```python +# revised version +import random +for i in range(10): + + # simulating the mass of a chicken egg + # the (random) mass will be 70 +/- 20 grams + mass = 70 + 20.0 * (2.0 * random.random() - 1.0) + + print(mass, get_egg_label(mass)) + +``` + +1. Create a function definition for `get_egg_label()` that will work with the revised program above. Note that the `get_egg_label()` function's return value will be important. Sample output from the above program would be `71.23 large`. +2. A dirty egg might have a mass of more than 90 grams, and a spoiled or broken egg will probably have a mass that's less than 50 grams. Modify your `get_egg_label()` function to account for these error conditions. Sample output could be `25 too light, probably spoiled`. + +::::::::::::::: solution + +## Solution + +```python +def get_egg_label(mass): + # egg sizing machinery prints a label + egg_label = "Unlabelled" + if mass >= 90: + egg_label = "warning: egg might be dirty" + elif mass >= 85: + egg_label = "jumbo" + elif mass >= 70: + egg_label = "large" + elif mass < 70 and mass >= 55: + egg_label = "medium" + elif mass < 50: + egg_label = "too light, probably spoiled" + else: + egg_label = "small" + return egg_label +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Encapsulating Data Analysis + +Assume that the following code has been executed: + +```python +import pandas as pd + +data_asia = pd.read_csv('data/gapminder_gdp_asia.csv', index_col=0) +japan = data_asia.loc['Japan'] +``` + +1. Complete the statements below to obtain the average GDP for Japan + across the years reported for the 1980s. + + ```python + year = 1983 + gdp_decade = 'gdpPercap_' + str(year // ____) + avg = (japan.loc[gdp_decade + ___] + japan.loc[gdp_decade + ___]) / 2 + ``` + +2. Abstract the code above into a single function. + + ```python + def avg_gdp_in_decade(country, continent, year): + data_countries = pd.read_csv('data/gapminder_gdp_'+___+'.csv',delimiter=',',index_col=0) + ____ + ____ + ____ + return avg + ``` + +3. How would you generalize this function + if you did not know beforehand which specific years occurred as columns in the data? + For instance, what if we also had data from years ending in 1 and 9 for each decade? + (Hint: use the columns to filter out the ones that correspond to the decade, + instead of enumerating them in the code.) + +::::::::::::::: solution + +## Solution + +1. The average GDP for Japan across the years reported for the 1980s is computed with: + + ```python + year = 1983 + gdp_decade = 'gdpPercap_' + str(year // 10) + avg = (japan.loc[gdp_decade + '2'] + japan.loc[gdp_decade + '7']) / 2 + ``` + +2. That code as a function is: + + ```python + def avg_gdp_in_decade(country, continent, year): + data_countries = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0) + c = data_countries.loc[country] + gdp_decade = 'gdpPercap_' + str(year // 10) + avg = (c.loc[gdp_decade + '2'] + c.loc[gdp_decade + '7'])/2 + return avg + ``` + +3. To obtain the average for the relevant years, we need to loop over them: + + ```python + def avg_gdp_in_decade(country, continent, year): + data_countries = pd.read_csv('data/gapminder_gdp_' + continent + '.csv', index_col=0) + c = data_countries.loc[country] + gdp_decade = 'gdpPercap_' + str(year // 10) + total = 0.0 + num_years = 0 + for yr_header in c.index: # c's index contains reported years + if yr_header.startswith(gdp_decade): + total = total + c.loc[yr_header] + num_years = num_years + 1 + return total/num_years + ``` + +The function can now be called by: + +```python +avg_gdp_in_decade('Japan','asia',1983) +``` + +```output +20880.023800000003 +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Simulating a dynamical system + +In mathematics, a [dynamical system](https://en.wikipedia.org/wiki/Dynamical_system) is a system +in which a function describes the time dependence of a point in a geometrical space. A canonical +example of a dynamical system is the [logistic map](https://en.wikipedia.org/wiki/Logistic_map), +a growth model that computes a new population density (between 0 and 1) based on the current +density. In the model, time takes discrete values 0, 1, 2, ... + +1. Define a function called `logistic_map` that takes two inputs: `x`, representing the current + population (at time `t`), and a parameter `r = 1`. This function should return a value + representing the state of the system (population) at time `t + 1`, using the mapping function: + + `f(t+1) = r * f(t) * [1 - f(t)]` + +2. Using a `for` or `while` loop, iterate the `logistic_map` function defined in part 1, starting + from an initial population of 0.5, for a period of time `t_final = 10`. Store the intermediate + results in a list so that after the loop terminates you have accumulated a sequence of values + representing the state of the logistic map at times `t = [0,1,...,t_final]` (11 values in total). + Print this list to see the evolution of the population. + +3. Encapsulate the logic of your loop into a function called `iterate` that takes the initial + population as its first input, the parameter `t_final` as its second input and the parameter + `r` as its third input. The function should return the list of values representing the state of + the logistic map at times `t = [0,1,...,t_final]`. Run this function for periods `t_final = 100` + and `1000` and print some of the values. Is the population trending toward a steady state? + +::::::::::::::: solution + +## Solution + +1. ```python + def logistic_map(x, r): + return r * x * (1 - x) + ``` + +2. ```python + initial_population = 0.5 + t_final = 10 + r = 1.0 + population = [initial_population] + for t in range(t_final): + population.append( logistic_map(population[t], r) ) + ``` + +3. ```python + def iterate(initial_population, t_final, r): + population = [initial_population] + for t in range(t_final): + population.append( logistic_map(population[t], r) ) + return population + + for period in (10, 100, 1000): + population = iterate(0.5, period, 1) + print(population[-1]) + ``` + + ```output + 0.06945089389714401 + 0.009395779870614648 + 0.0009913908614406382 + ``` + + The population seems to be approaching zero. + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::::: callout + +## Using Functions With Conditionals in Pandas + +Functions will often contain conditionals. Here is a short example that +will indicate which quartile the argument is in based on hand-coded values +for the quartile cut points. + +```python +def calculate_life_quartile(exp): + if exp < 58.41: + # This observation is in the first quartile + return 1 + elif exp >= 58.41 and exp < 67.05: + # This observation is in the second quartile + return 2 + elif exp >= 67.05 and exp < 71.70: + # This observation is in the third quartile + return 3 + elif exp >= 71.70: + # This observation is in the fourth quartile + return 4 + else: + # This observation has bad data + return None + +calculate_life_quartile(62.5) +``` + +```output +2 +``` + +That function would typically be used within a `for` loop, but Pandas has +a different, more efficient way of doing the same thing, and that is by +*applying* a function to a dataframe or a portion of a dataframe. Here +is an example, using the definition above. + +```python +data = pd.read_csv('data/gapminder_all.csv') +data['life_qrtl'] = data['lifeExp_1952'].apply(calculate_life_quartile) +``` + +There is a lot in that second line, so let's take it piece by piece. +On the right side of the `=` we start with `data['lifeExp']`, which is the +column in the dataframe called `data` labeled `lifExp`. We use the +`apply()` to do what it says, apply the `calculate_life_quartile` to the +value of this column for every row in the dataframe. + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Break programs down into functions to make them easier to understand. +- Define a function using `def` with a name, parameters, and a block of code. +- Defining a function does not run it. +- Arguments in a function call are matched to its defined parameters. +- Functions may return a result to their caller using `return`. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/17-scope.md b/17-scope.md new file mode 100644 index 000000000..afa533b98 --- /dev/null +++ b/17-scope.md @@ -0,0 +1,143 @@ +--- +title: Variable Scope +teaching: 10 +exercises: 10 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Identify local and global variables. +- Identify parameters as local variables. +- Read a traceback and determine the file, function, and line number on which the error occurred, the type of error, and the error message. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How do function calls actually work? +- How can I determine where errors occurred? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## The scope of a variable is the part of a program that can 'see' that variable. + +- There are only so many sensible names for variables. +- People using functions shouldn't have to worry about + what variable names the author of the function used. +- People writing functions shouldn't have to worry about + what variable names the function's caller uses. +- The part of a program in which a variable is visible is called its *scope*. + +```python +pressure = 103.9 + +def adjust(t): + temperature = t * 1.43 / pressure + return temperature +``` + +- `pressure` is a *global variable*. + - Defined outside any particular function. + - Visible everywhere. +- `t` and `temperature` are *local variables* in `adjust`. + - Defined in the function. + - Not visible in the main program. + - Remember: a function parameter is a variable + that is automatically assigned a value when the function is called. + +```python +print('adjusted:', adjust(0.9)) +print('temperature after call:', temperature) +``` + +```output +adjusted: 0.01238691049085659 +``` + +```error +Traceback (most recent call last): + File "/Users/swcarpentry/foo.py", line 8, in + print('temperature after call:', temperature) +NameError: name 'temperature' is not defined +``` + +::::::::::::::::::::::::::::::::::::::: challenge + +## Local and Global Variable Use + +Trace the values of all variables in this program as it is executed. +(Use '---' as the value of variables before and after they exist.) + +```python +limit = 100 + +def clip(value): + return min(max(0.0, value), limit) + +value = -22.5 +print(clip(value)) +``` + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Reading Error Messages + +Read the traceback below, and identify the following: + +1. How many levels does the traceback have? +2. What is the file name where the error occurred? +3. What is the function name where the error occurred? +4. On which line number in this function did the error occur? +5. What is the type of error? +6. What is the error message? + +```error +--------------------------------------------------------------------------- +KeyError Traceback (most recent call last) + in () + 1 import errors_02 +----> 2 errors_02.print_friday_message() + +/Users/ghopper/thesis/code/errors_02.py in print_friday_message() + 13 + 14 def print_friday_message(): +---> 15 print_message("Friday") + +/Users/ghopper/thesis/code/errors_02.py in print_message(day) + 9 "sunday": "Aw, the weekend is almost over." + 10 } +---> 11 print(messages[day]) + 12 + 13 + +KeyError: 'Friday' +``` + +::::::::::::::: solution + +## Solution + +1. Three levels. +2. `errors_02.py` +3. `print_message` +4. Line 11 +5. `KeyError`. These errors occur when we are trying to look up a key that does not exist (usually in a data + structure such as a dictionary). We can find more information about the `KeyError` and other built-in exceptions + in the [Python docs](https://docs.python.org/3/library/exceptions.html#KeyError). +6. `KeyError: 'Friday'` + + + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- The scope of a variable is the part of a program that can 'see' that variable. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/18-style.md b/18-style.md new file mode 100644 index 000000000..886093885 --- /dev/null +++ b/18-style.md @@ -0,0 +1,257 @@ +--- +title: Programming Style +teaching: 15 +exercises: 15 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Provide sound justifications for basic rules of coding style. +- Refactor one-page programs to make them more readable and justify the changes. +- Use Python community coding standards (PEP-8). + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How can I make my programs more readable? +- How do most programmers format their code? +- How can programs check their own operation? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Coding style + +A consistent coding style helps others (including our future selves) read and understand code more easily. Code is read much more often than it is written, and as the [Zen of Python](https://www.python.org/dev/peps/pep-0020) states, "Readability counts". +Python proposed a standard style through one of its first Python Enhancement Proposals (PEP), [PEP8](https://www.python.org/dev/peps/pep-0008). + +Some points worth highlighting: + +- document your code and ensure that assumptions, internal algorithms, expected inputs, expected outputs, etc., are clear +- use clear, semantically meaningful variable names +- use white-space, *not* tabs, to indent lines (tabs can cause problems across different text editors, operating systems, and version control systems) + +## Follow standard Python style in your code. + +- [PEP8](https://www.python.org/dev/peps/pep-0008): + a style guide for Python that discusses topics such as how to name variables, + how to indent your code, + how to structure your `import` statements, + etc. + Adhering to PEP8 makes it easier for other Python developers to read and understand your code, and to understand what their contributions should look like. +- To check your code for compliance with PEP8, you can use the [pycodestyle application](https://pypi.org/project/pycodestyle/) and tools like the [black code formatter](https://github.com/psf/black) can automatically format your code to conform to PEP8 and pycodestyle (a Jupyter notebook formatter also exists [nb\_black](https://github.com/dnanhkhoa/nb_black)). +- Some groups and organizations follow different style guidelines besides PEP8. For example, the [Google style guide on Python](https://google.github.io/styleguide/pyguide.html) makes slightly different recommendations. Google wrote an application that can help you format your code in either their style or PEP8 called [yapf](https://github.com/google/yapf/). +- With respect to coding style, the key is *consistency*. Choose a style for your project be it PEP8, the Google style, or something else and do your best to ensure that you and anyone else you are collaborating with sticks to it. Consistency within a project is often more impactful than the particular style used. A consistent style will make your software easier to read and understand for others and for your future self. + +## Use assertions to check for internal errors. + +Assertions are a simple but powerful method for making sure that the context in which your code is executing is as you expect. + +```python +def calc_bulk_density(mass, volume): + '''Return dry bulk density = powder mass / powder volume.''' + assert volume > 0 + return mass / volume +``` + +If the assertion is `False`, the Python interpreter raises an `AssertionError` runtime exception. The source code for the expression that failed will be displayed as part of the error message. To ignore assertions in your code run the interpreter with the '-O' (optimize) switch. Assertions should contain only simple checks and never change the state of the program. For example, an assertion should never contain an assignment. + +## Use docstrings to provide builtin help. + +If the first thing in a function is a character string that is not assigned directly to a variable, Python attaches it to the function, accessible via the builtin help function. This string that provides documentation is also known as a *docstring*. + +```python +def average(values): + "Return average of values, or None if no values are supplied." + + if len(values) == 0: + return None + return sum(values) / len(values) + +help(average) +``` + +```output +Help on function average in module __main__: + +average(values) + Return average of values, or None if no values are supplied. +``` + +::::::::::::::::::::::::::::::::::::::::: callout + +## Multiline Strings + +Often use *multiline strings* for documentation. +These start and end with three quote characters (either single or double) +and end with three matching characters. + +```python +"""This string spans +multiple lines. + +Blank lines are allowed.""" +``` + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## What Will Be Shown? + +Highlight the lines in the code below that will be available as online help. +Are there lines that should be made available, but won't be? +Will any lines produce a syntax error or a runtime error? + +```python +"Find maximum edit distance between multiple sequences." +# This finds the maximum distance between all sequences. + +def overall_max(sequences): + '''Determine overall maximum edit distance.''' + + highest = 0 + for left in sequences: + for right in sequences: + '''Avoid checking sequence against itself.''' + if left != right: + this = edit_distance(left, right) + highest = max(highest, this) + + # Report. + return highest +``` + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Document This + +Use comments to describe and help others understand potentially unintuitive +sections or individual lines of code. They are especially useful to whoever +may need to understand and edit your code in the future, including yourself. + +Use docstrings to document the acceptable inputs and expected outputs of a method +or class, its purpose, assumptions and intended behavior. Docstrings are displayed +when a user invokes the builtin `help` method on your method or class. + +Turn the comment in the following function into a docstring +and check that `help` displays it properly. + +```python +def middle(a, b, c): + # Return the middle value of three. + # Assumes the values can actually be compared. + values = [a, b, c] + values.sort() + return values[1] +``` + +::::::::::::::: solution + +## Solution + +```python +def middle(a, b, c): + '''Return the middle value of three. + Assumes the values can actually be compared.''' + values = [a, b, c] + values.sort() + return values[1] +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::::: challenge + +## Clean Up This Code + +1. Read this short program and try to predict what it does. +2. Run it: how accurate was your prediction? +3. Refactor the program to make it more readable. + Remember to run it after each change to ensure its behavior hasn't changed. +4. Compare your rewrite with your neighbor's. + What did you do the same? + What did you do differently, and why? + +```python +n = 10 +s = 'et cetera' +print(s) +i = 0 +while i < n: + # print('at', j) + new = '' + for j in range(len(s)): + left = j-1 + right = (j+1)%len(s) + if s[left]==s[right]: new = new + '-' + else: new = new + '*' + s=''.join(new) + print(s) + i += 1 +``` + +::::::::::::::: solution + +## Solution + +Here's one solution. + +```python +def string_machine(input_string, iterations): + """ + Takes input_string and generates a new string with -'s and *'s + corresponding to characters that have identical adjacent characters + or not, respectively. Iterates through this procedure with the resultant + strings for the supplied number of iterations. + """ + print(input_string) + input_string_length = len(input_string) + old = input_string + for i in range(iterations): + new = '' + # iterate through characters in previous string + for j in range(input_string_length): + left = j-1 + right = (j+1) % input_string_length # ensure right index wraps around + if old[left] == old[right]: + new = new + '-' + else: + new = new + '*' + print(new) + # store new string as old + old = new + +string_machine('et cetera', 10) +``` + +```output +et cetera +*****-*** +----*-*-- +---*---*- +--*-*-*-* +**------- +***-----* +--**---** +*****-*** +----*-*-- +---*---*- +``` + +::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Follow standard Python style in your code. +- Use docstrings to provide builtin help. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/19-wrap.md b/19-wrap.md new file mode 100644 index 000000000..e6cfdf074 --- /dev/null +++ b/19-wrap.md @@ -0,0 +1,51 @@ +--- +title: Wrap-Up +teaching: 20 +exercises: 0 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Name and locate scientific Python community sites for software, workshops, and help. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- What have we learned? +- What else is out there and where do I find it? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +Leslie Lamport once said, "Writing is nature's way of showing you how sloppy your thinking is." +The same is true of programming: +many things that seem obvious when we're thinking about them +turn out to be anything but when we have to explain them precisely. + +## Python supports a large and diverse community across academia and industry. + +- The [Python 3 documentation](https://docs.python.org/3/) covers the core language + and the standard library. + +- [PyCon](https://pycon.org/) is the largest annual conference for the Python community. + +- [SciPy](https://scipy.org) is a rich collection of scientific utilities. + It is also the name of [a series of annual conferences](https://conference.scipy.org/). + +- [Jupyter](https://jupyter.org) is the home of Project Jupyter. + +- [Pandas](https://pandas.pydata.org) is the home of the Pandas data library. + +- Stack Overflow's [general Python section](https://stackoverflow.com/questions/tagged/python?tab=Votes) + can be helpful, + as well as the sections on [NumPy](https://stackoverflow.com/questions/tagged/numpy?tab=Votes), + [SciPy](https://stackoverflow.com/questions/tagged/scipy?tab=Votes), and + [Pandas](https://stackoverflow.com/questions/tagged/pandas?tab=Votes). + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- Python supports a large and diverse community across academia and industry. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/20-feedback.md b/20-feedback.md new file mode 100644 index 000000000..37e3060dd --- /dev/null +++ b/20-feedback.md @@ -0,0 +1,27 @@ +--- +title: Feedback +teaching: 0 +exercises: 15 +--- + +::::::::::::::::::::::::::::::::::::::: objectives + +- Gather feedback on the class + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::: questions + +- How did the class go? + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +Gather feedback from participants. + +:::::::::::::::::::::::::::::::::::::::: keypoints + +- We are constantly seeking to improve this course. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 000000000..f19b80495 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,13 @@ +--- +title: "Contributor Code of Conduct" +--- + +As contributors and maintainers of this project, +we pledge to follow the [The Carpentries Code of Conduct][coc]. + +Instances of abusive, harassing, or otherwise unacceptable behavior +may be reported by following our [reporting guidelines][coc-reporting]. + + +[coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html +[coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 000000000..7632871ff --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,79 @@ +--- +title: "Licenses" +--- + +## Instructional Material + +All Carpentries (Software Carpentry, Data Carpentry, and Library Carpentry) +instructional material is made available under the [Creative Commons +Attribution license][cc-by-human]. The following is a human-readable summary of +(and not a substitute for) the [full legal text of the CC BY 4.0 +license][cc-by-legal]. + +You are free: + +- to **Share**---copy and redistribute the material in any medium or format +- to **Adapt**---remix, transform, and build upon the material + +for any purpose, even commercially. + +The licensor cannot revoke these freedoms as long as you follow the license +terms. + +Under the following terms: + +- **Attribution**---You must give appropriate credit (mentioning that your work + is derived from work that is Copyright (c) The Carpentries and, where + practical, linking to ), provide a [link to the + license][cc-by-human], and indicate if changes were made. You may do so in + any reasonable manner, but not in any way that suggests the licensor endorses + you or your use. + +- **No additional restrictions**---You may not apply legal terms or + technological measures that legally restrict others from doing anything the + license permits. With the understanding that: + +Notices: + +* You do not have to comply with the license for elements of the material in + the public domain or where your use is permitted by an applicable exception + or limitation. +* No warranties are given. The license may not give you all of the permissions + necessary for your intended use. For example, other rights such as publicity, + privacy, or moral rights may limit how you use the material. + +## Software + +Except where otherwise noted, the example programs and other software provided +by The Carpentries are made available under the [OSI][osi]-approved [MIT +license][mit-license]. + +Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the "Software"), to deal in +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do +so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + +## Trademark + +"The Carpentries", "Software Carpentry", "Data Carpentry", and "Library +Carpentry" and their respective logos are registered trademarks of [Community +Initiatives][ci]. + +[cc-by-human]: https://creativecommons.org/licenses/by/4.0/ +[cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode +[mit-license]: https://opensource.org/licenses/mit-license.html +[ci]: https://communityin.org/ +[osi]: https://opensource.org diff --git a/config.yaml b/config.yaml new file mode 100644 index 000000000..212cb4b7c --- /dev/null +++ b/config.yaml @@ -0,0 +1,100 @@ +#------------------------------------------------------------ +# Values for this lesson. +#------------------------------------------------------------ + +# Which carpentry is this (swc, dc, lc, or cp)? +# swc: Software Carpentry +# dc: Data Carpentry +# lc: Library Carpentry +# cp: Carpentries (to use for instructor training for instance) +# incubator: The Carpentries Incubator +carpentry: 'swc' + +# Overall title for pages. +title: 'Plotting and Programming in Python' + +# Date the lesson was created (YYYY-MM-DD, this is empty by default) +created: '2016-01-07' + +# Comma-separated list of keywords for the lesson +keywords: 'software, data, lesson, The Carpentries' + +# Life cycle stage of the lesson +# possible values: pre-alpha, alpha, beta, stable +life_cycle: 'stable' + +# License of the lesson materials (recommended CC-BY 4.0) +license: 'CC-BY 4.0' + +# Link to the source repository for this lesson +source: 'https://github.com/swcarpentry/python-novice-gapminder' + +# Default branch of your lesson +branch: 'main' + +# Who to contact if there are any issues +contact: 'team@carpentries.org' + +# Navigation ------------------------------------------------ +# +# Use the following menu items to specify the order of +# individual pages in each dropdown section. Leave blank to +# include all pages in the folder. +# +# Example ------------- +# +# episodes: +# - introduction.md +# - first-steps.md +# +# learners: +# - setup.md +# +# instructors: +# - instructor-notes.md +# +# profiles: +# - one-learner.md +# - another-learner.md + +# Order of episodes in your lesson +episodes: +- 01-run-quit.md +- 02-variables.md +- 03-types-conversion.md +- 04-built-in.md +- 05-coffee.md +- 06-libraries.md +- 07-reading-tabular.md +- 08-data-frames.md +- 09-plotting.md +- 10-lunch.md +- 11-lists.md +- 12-for-loops.md +- 13-conditionals.md +- 14-looping-data-sets.md +- 15-coffee.md +- 16-writing-functions.md +- 17-scope.md +- 18-style.md +- 19-wrap.md +- 20-feedback.md + +# Information for Learners +learners: + +# Information for Instructors +instructors: + +# Learner Profiles +profiles: + +# Customisation --------------------------------------------- +# +# This space below is where custom yaml items (e.g. pinning +# sandpaper and varnish versions) should live + + +url: 'https://swcarpentry.github.io/python-novice-gapminder' +analytics: carpentries +lang: en diff --git a/data/asia_gdp_per_capita.csv b/data/asia_gdp_per_capita.csv new file mode 100644 index 000000000..6fba7d500 --- /dev/null +++ b/data/asia_gdp_per_capita.csv @@ -0,0 +1,13 @@ +'year','Afghanistan','Bahrain','Bangladesh','Cambodia','China','Hong Kong China','India','Indonesia','Iran','Iraq','Israel','Japan','Jordan','Korea Dem. Rep.','Korea Rep.','Kuwait','Lebanon','Malaysia','Mongolia','Myanmar','Nepal','Oman','Pakistan','Philippines','Saudi Arabia','Singapore','Sri Lanka','Syria','Taiwan','Thailand','Vietnam','West Bank and Gaza','Yemen Rep.' +1952,779.4453145,9867.084765,684.2441716,368.4692856,400.4486107,3054.421209,546.5657493,749.6816546,3035.326002,4129.766056,4086.522128,3216.956347,1546.907807,1088.277758,1030.592226,108382.3529,4834.804067,1831.132894,786.5668575,331,545.8657229,1828.230307,684.5971438,1272.880995,6459.554823,2315.138227,1083.53203,1643.485354,1206.947913,757.7974177,605.0664917,1515.592329,781.7175761 +1957,820.8530296,11635.79945,661.6374577,434.0383364,575.9870009,3629.076457,590.061996,858.9002707,3290.257643,6229.333562,5385.278451,4317.694365,1886.080591,1571.134655,1487.593537,113523.1329,6089.786934,1810.066992,912.6626085,350,597.9363558,2242.746551,747.0835292,1547.944844,8157.591248,2843.104409,1072.546602,2117.234893,1507.86129,793.5774148,676.2854478,1827.067742,804.8304547 +1962,853.10071,12753.27514,686.3415538,496.9136476,487.6740183,4692.648272,658.3471509,849.2897701,4187.329802,8341.737815,7105.630706,6576.649461,2348.009158,1621.693598,1536.344387,95458.11176,5714.560611,2036.884944,1056.353958,388,652.3968593,2924.638113,803.3427418,1649.552153,11626.41975,3674.735572,1074.47196,2193.037133,1822.879028,1002.199172,772.0491602,2198.956312,825.6232006 +1967,836.1971382,14804.6727,721.1860862,523.4323142,612.7056934,6197.962814,700.7706107,762.4317721,5906.731805,8931.459811,8393.741404,9847.788607,2741.796252,2143.540609,2029.228142,80894.88326,6006.983042,2277.742396,1226.04113,349,676.4422254,4720.942687,942.4082588,1814.12743,16903.04886,4977.41854,1135.514326,1881.923632,2643.858681,1295.46066,637.1232887,2649.715007,862.4421463 +1972,739.9811058,18268.65839,630.2336265,421.6240257,676.9000921,8315.928145,724.032527,1111.107907,9613.818607,9576.037596,12786.93223,14778.78636,2110.856309,3701.621503,3030.87665,109347.867,7486.384341,2849.09478,1421.741975,357,674.7881296,10618.03855,1049.938981,1989.37407,24837.42865,8597.756202,1213.39553,2571.423014,4062.523897,1524.358936,699.5016441,3133.409277,1265.047031 +1977,786.11336,19340.10196,659.8772322,524.9721832,741.2374699,11186.14125,813.337323,1382.702056,11888.59508,14688.23507,13306.61921,16610.37701,2852.351568,4106.301249,4657.22102,59265.47714,8659.696836,3827.921571,1647.511665,371,694.1124398,11848.34392,1175.921193,2373.204287,34167.7626,11210.08948,1348.775651,3195.484582,5596.519826,1961.224635,713.5371196,3682.831494,1829.765177 +1982,978.0114388,19211.14731,676.9818656,624.4754784,962.4213805,14560.53051,855.7235377,1516.872988,7608.334602,14517.90711,15367.0292,19384.10571,4161.415959,4106.525293,5622.942464,31354.03573,7640.519521,4920.355951,2000.603139,424,718.3730947,12954.79101,1443.429832,2603.273765,33693.17525,15169.16112,1648.079789,3761.837715,7426.354774,2393.219781,707.2357863,4336.032082,1977.55701 +1987,852.3959448,18524.02406,751.9794035,683.8955732,1378.904018,20038.47269,976.5126756,1748.356961,6642.881371,11643.57268,17122.47986,22375.94189,4448.679912,4106.492315,8533.088805,28118.42998,5377.091329,5249.802653,2338.008304,385,775.6324501,18115.22313,1704.686583,2189.634995,21198.26136,18861.53081,1876.766827,3116.774285,11054.56175,2982.653773,820.7994449,5107.197384,1971.741538 +1992,649.3413952,19035.57917,837.8101643,682.3031755,1655.784158,24757.60301,1164.406809,2383.140898,7235.653188,3745.640687,18051.52254,26824.89511,3431.593647,3726.063507,12104.27872,34932.91959,6890.806854,7277.912802,1785.402016,347,897.7403604,18616.70691,1971.829464,2279.324017,24841.61777,24769.8912,2153.739222,3340.542768,15215.6579,4616.896545,989.0231487,6017.654756,1879.496673 +1997,635.341351,20292.01679,972.7700352,734.28517,2289.234136,28377.63219,1458.817442,3119.335603,8263.590301,3076.239795,20896.60924,28816.58499,3645.379572,1690.756814,15993.52796,40300.61996,8754.96385,10132.90964,1902.2521,415,1010.892138,19702.05581,2049.350521,2536.534925,20586.69019,33519.4766,2664.477257,4014.238972,20206.82098,5852.625497,1385.896769,7110.667619,2117.484526 +2002,726.7340548,23403.55927,1136.39043,896.2260153,3119.280896,30209.01516,1746.769454,2873.91287,9240.761975,4390.717312,21905.59514,28604.5919,3844.917194,1646.758151,19233.98818,35110.10566,9313.93883,10206.97794,2140.739323,611,1057.206311,19774.83687,2092.712441,2650.921068,19014.54118,36023.1054,3015.378833,4090.925331,23235.42329,5913.187529,1764.456677,4515.487575,2234.820827 +2007,974.5803384,29796.04834,1391.253792,1713.778686,4959.114854,39724.97867,2452.210407,3540.651564,11605.71449,4471.061906,25523.2771,31656.06806,4519.461171,1593.06548,23348.13973,47306.98978,10461.05868,12451.6558,3095.772271,944,1091.359778,22316.19287,2605.94758,3190.481016,21654.83194,47143.17964,3970.095407,4184.548089,28718.27684,7458.396327,2441.576404,3025.349798,2280.769906 diff --git a/data/gapminder_all.csv b/data/gapminder_all.csv new file mode 100644 index 000000000..6824ead2d --- /dev/null +++ b/data/gapminder_all.csv @@ -0,0 +1,143 @@ +"continent","country","gdpPercap_1952","gdpPercap_1957","gdpPercap_1962","gdpPercap_1967","gdpPercap_1972","gdpPercap_1977","gdpPercap_1982","gdpPercap_1987","gdpPercap_1992","gdpPercap_1997","gdpPercap_2002","gdpPercap_2007","lifeExp_1952","lifeExp_1957","lifeExp_1962","lifeExp_1967","lifeExp_1972","lifeExp_1977","lifeExp_1982","lifeExp_1987","lifeExp_1992","lifeExp_1997","lifeExp_2002","lifeExp_2007","pop_1952","pop_1957","pop_1962","pop_1967","pop_1972","pop_1977","pop_1982","pop_1987","pop_1992","pop_1997","pop_2002","pop_2007" +"Africa","Algeria",2449.008185,3013.976023,2550.81688,3246.991771,4182.663766,4910.416756,5745.160213,5681.358539,5023.216647,4797.295051,5288.040382,6223.367465,43.077,45.685,48.303,51.407,54.518,58.014,61.368,65.799,67.744,69.152,70.994,72.301,9279525,10270856,11000948,12760499,14760787,17152804,20033753,23254956,26298373,29072015,31287142,33333216 +"Africa","Angola",3520.610273,3827.940465,4269.276742,5522.776375,5473.288005,3008.647355,2756.953672,2430.208311,2627.845685,2277.140884,2773.287312,4797.231267,30.015,31.999,34,35.985,37.928,39.483,39.942,39.906,40.647,40.963,41.003,42.731,4232095,4561361,4826015,5247469,5894858,6162675,7016384,7874230,8735988,9875024,10866106,12420476 +"Africa","Benin",1062.7522,959.6010805,949.4990641,1035.831411,1085.796879,1029.161251,1277.897616,1225.85601,1191.207681,1232.975292,1372.877931,1441.284873,38.223,40.358,42.618,44.885,47.014,49.19,50.904,52.337,53.919,54.777,54.406,56.728,1738315,1925173,2151895,2427334,2761407,3168267,3641603,4243788,4981671,6066080,7026113,8078314 +"Africa","Botswana",851.2411407,918.2325349,983.6539764,1214.709294,2263.611114,3214.857818,4551.14215,6205.88385,7954.111645,8647.142313,11003.60508,12569.85177,47.622,49.618,51.52,53.298,56.024,59.319,61.484,63.622,62.745,52.556,46.634,50.728,442308,474639,512764,553541,619351,781472,970347,1151184,1342614,1536536,1630347,1639131 +"Africa","Burkina Faso",543.2552413,617.1834648,722.5120206,794.8265597,854.7359763,743.3870368,807.1985855,912.0631417,931.7527731,946.2949618,1037.645221,1217.032994,31.975,34.906,37.814,40.697,43.591,46.137,48.122,49.557,50.26,50.324,50.65,52.295,4469979,4713416,4919632,5127935,5433886,5889574,6634596,7586551,8878303,10352843,12251209,14326203 +"Africa","Burundi",339.2964587,379.5646281,355.2032273,412.9775136,464.0995039,556.1032651,559.603231,621.8188189,631.6998778,463.1151478,446.4035126,430.0706916,39.031,40.533,42.045,43.548,44.057,45.91,47.471,48.211,44.736,45.326,47.36,49.58,2445618,2667518,2961915,3330989,3529983,3834415,4580410,5126023,5809236,6121610,7021078,8390505 +"Africa","Cameroon",1172.667655,1313.048099,1399.607441,1508.453148,1684.146528,1783.432873,2367.983282,2602.664206,1793.163278,1694.337469,1934.011449,2042.09524,38.523,40.428,42.643,44.799,47.049,49.355,52.961,54.985,54.314,52.199,49.856,50.43,5009067,5359923,5793633,6335506,7021028,7959865,9250831,10780667,12467171,14195809,15929988,17696293 +"Africa","Central African Republic",1071.310713,1190.844328,1193.068753,1136.056615,1070.013275,1109.374338,956.7529907,844.8763504,747.9055252,740.5063317,738.6906068,706.016537,35.463,37.464,39.475,41.478,43.457,46.775,48.295,50.485,49.396,46.066,43.308,44.741,1291695,1392284,1523478,1733638,1927260,2167533,2476971,2840009,3265124,3696513,4048013,4369038 +"Africa","Chad",1178.665927,1308.495577,1389.817618,1196.810565,1104.103987,1133.98495,797.9081006,952.386129,1058.0643,1004.961353,1156.18186,1704.063724,38.092,39.881,41.716,43.601,45.569,47.383,49.517,51.051,51.724,51.573,50.525,50.651,2682462,2894855,3150417,3495967,3899068,4388260,4875118,5498955,6429417,7562011,8835739,10238807 +"Africa","Comoros",1102.990936,1211.148548,1406.648278,1876.029643,1937.577675,1172.603047,1267.100083,1315.980812,1246.90737,1173.618235,1075.811558,986.1478792,40.715,42.46,44.467,46.472,48.944,50.939,52.933,54.926,57.939,60.66,62.974,65.152,153936,170928,191689,217378,250027,304739,348643,395114,454429,527982,614382,710960 +"Africa","Congo Dem. Rep.",780.5423257,905.8602303,896.3146335,861.5932424,904.8960685,795.757282,673.7478181,672.774812,457.7191807,312.188423,241.1658765,277.5518587,39.143,40.652,42.122,44.056,45.989,47.804,47.784,47.412,45.548,42.587,44.966,46.462,14100005,15577932,17486434,19941073,23007669,26480870,30646495,35481645,41672143,47798986,55379852,64606759 +"Africa","Congo Rep.",2125.621418,2315.056572,2464.783157,2677.939642,3213.152683,3259.178978,4879.507522,4201.194937,4016.239529,3484.164376,3484.06197,3632.557798,42.111,45.053,48.435,52.04,54.907,55.625,56.695,57.47,56.433,52.962,52.97,55.322,854885,940458,1047924,1179760,1340458,1536769,1774735,2064095,2409073,2800947,3328795,3800610 +"Africa","Cote d'Ivoire",1388.594732,1500.895925,1728.869428,2052.050473,2378.201111,2517.736547,2602.710169,2156.956069,1648.073791,1786.265407,1648.800823,1544.750112,40.477,42.469,44.93,47.35,49.801,52.374,53.983,54.655,52.044,47.991,46.832,48.328,2977019,3300000,3832408,4744870,6071696,7459574,9025951,10761098,12772596,14625967,16252726,18013409 +"Africa","Djibouti",2669.529475,2864.969076,3020.989263,3020.050513,3694.212352,3081.761022,2879.468067,2880.102568,2377.156192,1895.016984,1908.260867,2082.481567,34.812,37.328,39.693,42.074,44.366,46.519,48.812,50.04,51.604,53.157,53.373,54.791,63149,71851,89898,127617,178848,228694,305991,311025,384156,417908,447416,496374 +"Africa","Egypt",1418.822445,1458.915272,1693.335853,1814.880728,2024.008147,2785.493582,3503.729636,3885.46071,3794.755195,4173.181797,4754.604414,5581.180998,41.893,44.444,46.992,49.293,51.137,53.319,56.006,59.797,63.674,67.217,69.806,71.338,22223309,25009741,28173309,31681188,34807417,38783863,45681811,52799062,59402198,66134291,73312559,80264543 +"Africa","Equatorial Guinea",375.6431231,426.0964081,582.8419714,915.5960025,672.4122571,958.5668124,927.8253427,966.8968149,1132.055034,2814.480755,7703.4959,12154.08975,34.482,35.983,37.485,38.987,40.516,42.024,43.662,45.664,47.545,48.245,49.348,51.579,216964,232922,249220,259864,277603,192675,285483,341244,387838,439971,495627,551201 +"Africa","Eritrea",328.9405571,344.1618859,380.9958433,468.7949699,514.3242082,505.7538077,524.8758493,521.1341333,582.8585102,913.47079,765.3500015,641.3695236,35.928,38.047,40.158,42.189,44.142,44.535,43.89,46.453,49.991,53.378,55.24,58.04,1438760,1542611,1666618,1820319,2260187,2512642,2637297,2915959,3668440,4058319,4414865,4906585 +"Africa","Ethiopia",362.1462796,378.9041632,419.4564161,516.1186438,566.2439442,556.8083834,577.8607471,573.7413142,421.3534653,515.8894013,530.0535319,690.8055759,34.078,36.667,40.059,42.115,43.515,44.51,44.916,46.684,48.091,49.402,50.725,52.947,20860941,22815614,25145372,27860297,30770372,34617799,38111756,42999530,52088559,59861301,67946797,76511887 +"Africa","Gabon",4293.476475,4976.198099,6631.459222,8358.761987,11401.94841,21745.57328,15113.36194,11864.40844,13522.15752,14722.84188,12521.71392,13206.48452,37.003,38.999,40.489,44.598,48.69,52.79,56.564,60.19,61.366,60.461,56.761,56.735,420702,434904,455661,489004,537977,706367,753874,880397,985739,1126189,1299304,1454867 +"Africa","Gambia",485.2306591,520.9267111,599.650276,734.7829124,756.0868363,884.7552507,835.8096108,611.6588611,665.6244126,653.7301704,660.5855997,752.7497265,30,32.065,33.896,35.857,38.308,41.842,45.58,49.265,52.644,55.861,58.041,59.448,284320,323150,374020,439593,517101,608274,715523,848406,1025384,1235767,1457766,1688359 +"Africa","Ghana",911.2989371,1043.561537,1190.041118,1125.69716,1178.223708,993.2239571,876.032569,847.0061135,925.060154,1005.245812,1111.984578,1327.60891,43.149,44.779,46.452,48.072,49.875,51.756,53.744,55.729,57.501,58.556,58.453,60.022,5581001,6391288,7355248,8490213,9354120,10538093,11400338,14168101,16278738,18418288,20550751,22873338 +"Africa","Guinea",510.1964923,576.2670245,686.3736739,708.7595409,741.6662307,874.6858643,857.2503577,805.5724718,794.3484384,869.4497668,945.5835837,942.6542111,33.609,34.558,35.753,37.197,38.842,40.762,42.891,45.552,48.576,51.455,53.676,56.007,2664249,2876726,3140003,3451418,3811387,4227026,4710497,5650262,6990574,8048834,8807818,9947814 +"Africa","Guinea-Bissau",299.850319,431.7904566,522.0343725,715.5806402,820.2245876,764.7259628,838.1239671,736.4153921,745.5398706,796.6644681,575.7047176,579.231743,32.5,33.489,34.488,35.492,36.486,37.465,39.327,41.245,43.266,44.873,45.504,46.388,580653,601095,627820,601287,625361,745228,825987,927524,1050938,1193708,1332459,1472041 +"Africa","Kenya",853.540919,944.4383152,896.9663732,1056.736457,1222.359968,1267.613204,1348.225791,1361.936856,1341.921721,1360.485021,1287.514732,1463.249282,42.27,44.686,47.949,50.654,53.559,56.155,58.766,59.339,59.285,54.407,50.992,54.11,6464046,7454779,8678557,10191512,12044785,14500404,17661452,21198082,25020539,28263827,31386842,35610177 +"Africa","Lesotho",298.8462121,335.9971151,411.8006266,498.6390265,496.5815922,745.3695408,797.2631074,773.9932141,977.4862725,1186.147994,1275.184575,1569.331442,42.138,45.047,47.747,48.492,49.767,52.208,55.078,57.18,59.685,55.558,44.593,42.592,748747,813338,893143,996380,1116779,1251524,1411807,1599200,1803195,1982823,2046772,2012649 +"Africa","Liberia",575.5729961,620.9699901,634.1951625,713.6036483,803.0054535,640.3224383,572.1995694,506.1138573,636.6229191,609.1739508,531.4823679,414.5073415,38.48,39.486,40.502,41.536,42.614,43.764,44.852,46.027,40.802,42.221,43.753,45.678,863308,975950,1112796,1279406,1482628,1703617,1956875,2269414,1912974,2200725,2814651,3193942 +"Africa","Libya",2387.54806,3448.284395,6757.030816,18772.75169,21011.49721,21951.21176,17364.27538,11770.5898,9640.138501,9467.446056,9534.677467,12057.49928,42.723,45.289,47.808,50.227,52.773,57.442,62.155,66.234,68.755,71.555,72.737,73.952,1019729,1201578,1441863,1759224,2183877,2721783,3344074,3799845,4364501,4759670,5368585,6036914 +"Africa","Madagascar",1443.011715,1589.20275,1643.38711,1634.047282,1748.562982,1544.228586,1302.878658,1155.441948,1040.67619,986.2958956,894.6370822,1044.770126,36.681,38.865,40.848,42.881,44.851,46.881,48.969,49.35,52.214,54.978,57.286,59.443,4762912,5181679,5703324,6334556,7082430,8007166,9171477,10568642,12210395,14165114,16473477,19167654 +"Africa","Malawi",369.1650802,416.3698064,427.9010856,495.5147806,584.6219709,663.2236766,632.8039209,635.5173634,563.2000145,692.2758103,665.4231186,759.3499101,36.256,37.207,38.41,39.487,41.766,43.767,45.642,47.457,49.42,47.495,45.009,48.303,2917802,3221238,3628608,4147252,4730997,5637246,6502825,7824747,10014249,10419991,11824495,13327079 +"Africa","Mali",452.3369807,490.3821867,496.1743428,545.0098873,581.3688761,686.3952693,618.0140641,684.1715576,739.014375,790.2579846,951.4097518,1042.581557,33.685,35.307,36.936,38.487,39.977,41.714,43.916,46.364,48.388,49.903,51.818,54.467,3838168,4241884,4690372,5212416,5828158,6491649,6998256,7634008,8416215,9384984,10580176,12031795 +"Africa","Mauritania",743.1159097,846.1202613,1055.896036,1421.145193,1586.851781,1497.492223,1481.150189,1421.603576,1361.369784,1483.136136,1579.019543,1803.151496,40.543,42.338,44.248,46.289,48.437,50.852,53.599,56.145,58.333,60.43,62.247,64.164,1022556,1076852,1146757,1230542,1332786,1456688,1622136,1841240,2119465,2444741,2828858,3270065 +"Africa","Mauritius",1967.955707,2034.037981,2529.067487,2475.387562,2575.484158,3710.982963,3688.037739,4783.586903,6058.253846,7425.705295,9021.815894,10956.99112,50.986,58.089,60.246,61.557,62.944,64.93,66.711,68.74,69.745,70.736,71.954,72.801,516556,609816,701016,789309,851334,913025,992040,1042663,1096202,1149818,1200206,1250882 +"Africa","Morocco",1688.20357,1642.002314,1566.353493,1711.04477,1930.194975,2370.619976,2702.620356,2755.046991,2948.047252,2982.101858,3258.495584,3820.17523,42.873,45.423,47.924,50.335,52.862,55.73,59.65,62.677,65.393,67.66,69.615,71.164,9939217,11406350,13056604,14770296,16660670,18396941,20198730,22987397,25798239,28529501,31167783,33757175 +"Africa","Mozambique",468.5260381,495.5868333,556.6863539,566.6691539,724.9178037,502.3197334,462.2114149,389.8761846,410.8968239,472.3460771,633.6179466,823.6856205,31.286,33.779,36.161,38.113,40.328,42.495,42.795,42.861,44.284,46.344,44.026,42.082,6446316,7038035,7788944,8680909,9809596,11127868,12587223,12891952,13160731,16603334,18473780,19951656 +"Africa","Namibia",2423.780443,2621.448058,3173.215595,3793.694753,3746.080948,3876.485958,4191.100511,3693.731337,3804.537999,3899.52426,4072.324751,4811.060429,41.725,45.226,48.386,51.159,53.867,56.437,58.968,60.835,61.999,58.909,51.479,52.906,485831,548080,621392,706640,821782,977026,1099010,1278184,1554253,1774766,1972153,2055080 +"Africa","Niger",761.879376,835.5234025,997.7661127,1054.384891,954.2092363,808.8970728,909.7221354,668.3000228,581.182725,580.3052092,601.0745012,619.6768924,37.444,38.598,39.487,40.118,40.546,41.291,42.598,44.555,47.391,51.313,54.496,56.867,3379468,3692184,4076008,4534062,5060262,5682086,6437188,7332638,8392818,9666252,11140655,12894865 +"Africa","Nigeria",1077.281856,1100.592563,1150.927478,1014.514104,1698.388838,1981.951806,1576.97375,1385.029563,1619.848217,1624.941275,1615.286395,2013.977305,36.324,37.802,39.36,41.04,42.821,44.514,45.826,46.886,47.472,47.464,46.608,46.859,33119096,37173340,41871351,47287752,53740085,62209173,73039376,81551520,93364244,106207839,119901274,135031164 +"Africa","Reunion",2718.885295,2769.451844,3173.72334,4021.175739,5047.658563,4319.804067,5267.219353,5303.377488,6101.255823,6071.941411,6316.1652,7670.122558,52.724,55.09,57.666,60.542,64.274,67.064,69.885,71.913,73.615,74.772,75.744,76.442,257700,308700,358900,414024,461633,492095,517810,562035,622191,684810,743981,798094 +"Africa","Rwanda",493.3238752,540.2893983,597.4730727,510.9637142,590.5806638,670.0806011,881.5706467,847.991217,737.0685949,589.9445051,785.6537648,863.0884639,40,41.5,43,44.1,44.6,45,46.218,44.02,23.599,36.087,43.413,46.242,2534927,2822082,3051242,3451079,3992121,4657072,5507565,6349365,7290203,7212583,7852401,8860588 +"Africa","Sao Tome and Principe",879.5835855,860.7369026,1071.551119,1384.840593,1532.985254,1737.561657,1890.218117,1516.525457,1428.777814,1339.076036,1353.09239,1598.435089,46.471,48.945,51.893,54.425,56.48,58.55,60.351,61.728,62.742,63.306,64.337,65.528,60011,61325,65345,70787,76595,86796,98593,110812,125911,145608,170372,199579 +"Africa","Senegal",1450.356983,1567.653006,1654.988723,1612.404632,1597.712056,1561.769116,1518.479984,1441.72072,1367.899369,1392.368347,1519.635262,1712.472136,37.278,39.329,41.454,43.563,45.815,48.879,52.379,55.769,58.196,60.187,61.6,63.062,2755589,3054547,3430243,3965841,4588696,5260855,6147783,7171347,8307920,9535314,10870037,12267493 +"Africa","Sierra Leone",879.7877358,1004.484437,1116.639877,1206.043465,1353.759762,1348.285159,1465.010784,1294.447788,1068.696278,574.6481576,699.489713,862.5407561,30.331,31.57,32.767,34.113,35.4,36.788,38.445,40.006,38.333,39.897,41.012,42.568,2143249,2295678,2467895,2662190,2879013,3140897,3464522,3868905,4260884,4578212,5359092,6144562 +"Africa","Somalia",1135.749842,1258.147413,1369.488336,1284.73318,1254.576127,1450.992513,1176.807031,1093.244963,926.9602964,930.5964284,882.0818218,926.1410683,32.978,34.977,36.981,38.977,40.973,41.974,42.955,44.501,39.658,43.795,45.936,48.159,2526994,2780415,3080153,3428839,3840161,4353666,5828892,6921858,6099799,6633514,7753310,9118773 +"Africa","South Africa",4725.295531,5487.104219,5768.729717,7114.477971,7765.962636,8028.651439,8568.266228,7825.823398,7225.069258,7479.188244,7710.946444,9269.657808,45.009,47.985,49.951,51.927,53.696,55.527,58.161,60.834,61.888,60.236,53.365,49.339,14264935,16151549,18356657,20997321,23935810,27129932,31140029,35933379,39964159,42835005,44433622,43997828 +"Africa","Sudan",1615.991129,1770.337074,1959.593767,1687.997641,1659.652775,2202.988423,1895.544073,1507.819159,1492.197043,1632.210764,1993.398314,2602.394995,38.635,39.624,40.87,42.858,45.083,47.8,50.338,51.744,53.556,55.373,56.369,58.556,8504667,9753392,11183227,12716129,14597019,17104986,20367053,24725960,28227588,32160729,37090298,42292929 +"Africa","Swaziland",1148.376626,1244.708364,1856.182125,2613.101665,3364.836625,3781.410618,3895.384018,3984.839812,3553.0224,3876.76846,4128.116943,4513.480643,41.407,43.424,44.992,46.633,49.552,52.537,55.561,57.678,58.474,54.289,43.869,39.613,290243,326741,370006,420690,480105,551425,649901,779348,962344,1054486,1130269,1133066 +"Africa","Tanzania",716.6500721,698.5356073,722.0038073,848.2186575,915.9850592,962.4922932,874.2426069,831.8220794,825.682454,789.1862231,899.0742111,1107.482182,41.215,42.974,44.246,45.757,47.62,49.919,50.608,51.535,50.44,48.466,49.651,52.517,8322925,9452826,10863958,12607312,14706593,17129565,19844382,23040630,26605473,30686889,34593779,38139640 +"Africa","Togo",859.8086567,925.9083202,1067.53481,1477.59676,1649.660188,1532.776998,1344.577953,1202.201361,1034.298904,982.2869243,886.2205765,882.9699438,38.596,41.208,43.922,46.769,49.759,52.887,55.471,56.941,58.061,58.39,57.561,58.42,1219113,1357445,1528098,1735550,2056351,2308582,2644765,3154264,3747553,4320890,4977378,5701579 +"Africa","Tunisia",1468.475631,1395.232468,1660.30321,1932.360167,2753.285994,3120.876811,3560.233174,3810.419296,4332.720164,4876.798614,5722.895655,7092.923025,44.6,47.1,49.579,52.053,55.602,59.837,64.048,66.894,70.001,71.973,73.042,73.923,3647735,3950849,4286552,4786986,5303507,6005061,6734098,7724976,8523077,9231669,9770575,10276158 +"Africa","Uganda",734.753484,774.3710692,767.2717398,908.9185217,950.735869,843.7331372,682.2662268,617.7244065,644.1707969,816.559081,927.7210018,1056.380121,39.978,42.571,45.344,48.051,51.016,50.35,49.849,51.509,48.825,44.578,47.813,51.542,5824797,6675501,7688797,8900294,10190285,11457758,12939400,15283050,18252190,21210254,24739869,29170398 +"Africa","Zambia",1147.388831,1311.956766,1452.725766,1777.077318,1773.498265,1588.688299,1408.678565,1213.315116,1210.884633,1071.353818,1071.613938,1271.211593,42.038,44.077,46.023,47.768,50.107,51.386,51.821,50.821,46.1,40.238,39.193,42.384,2672000,3016000,3421000,3900000,4506497,5216550,6100407,7272406,8381163,9417789,10595811,11746035 +"Africa","Zimbabwe",406.8841148,518.7642681,527.2721818,569.7950712,799.3621758,685.5876821,788.8550411,706.1573059,693.4207856,792.4499603,672.0386227,469.7092981,48.451,50.469,52.358,53.995,55.635,57.674,60.363,62.351,60.377,46.809,39.989,43.487,3080907,3646340,4277736,4995432,5861135,6642107,7636524,9216418,10704340,11404948,11926563,12311143 +"Americas","Argentina",5911.315053,6856.856212,7133.166023,8052.953021,9443.038526,10079.02674,8997.897412,9139.671389,9308.41871,10967.28195,8797.640716,12779.37964,62.485,64.399,65.142,65.634,67.065,68.481,69.942,70.774,71.868,73.275,74.34,75.32,17876956,19610538,21283783,22934225,24779799,26983828,29341374,31620918,33958947,36203463,38331121,40301927 +"Americas","Bolivia",2677.326347,2127.686326,2180.972546,2586.886053,2980.331339,3548.097832,3156.510452,2753.69149,2961.699694,3326.143191,3413.26269,3822.137084,40.414,41.89,43.428,45.032,46.714,50.023,53.859,57.251,59.957,62.05,63.883,65.554,2883315,3211738,3593918,4040665,4565872,5079716,5642224,6156369,6893451,7693188,8445134,9119152 +"Americas","Brazil",2108.944355,2487.365989,3336.585802,3429.864357,4985.711467,6660.118654,7030.835878,7807.095818,6950.283021,7957.980824,8131.212843,9065.800825,50.917,53.285,55.665,57.632,59.504,61.489,63.336,65.205,67.057,69.388,71.006,72.39,56602560,65551171,76039390,88049823,100840058,114313951,128962939,142938076,155975974,168546719,179914212,190010647 +"Americas","Canada",11367.16112,12489.95006,13462.48555,16076.58803,18970.57086,22090.88306,22898.79214,26626.51503,26342.88426,28954.92589,33328.96507,36319.23501,68.75,69.96,71.3,72.13,72.88,74.21,75.76,76.86,77.95,78.61,79.77,80.653,14785584,17010154,18985849,20819767,22284500,23796400,25201900,26549700,28523502,30305843,31902268,33390141 +"Americas","Chile",3939.978789,4315.622723,4519.094331,5106.654313,5494.024437,4756.763836,5095.665738,5547.063754,7596.125964,10118.05318,10778.78385,13171.63885,54.745,56.074,57.924,60.523,63.441,67.052,70.565,72.492,74.126,75.816,77.86,78.553,6377619,7048426,7961258,8858908,9717524,10599793,11487112,12463354,13572994,14599929,15497046,16284741 +"Americas","Colombia",2144.115096,2323.805581,2492.351109,2678.729839,3264.660041,3815.80787,4397.575659,4903.2191,5444.648617,6117.361746,5755.259962,7006.580419,50.643,55.118,57.863,59.963,61.623,63.837,66.653,67.768,68.421,70.313,71.682,72.889,12350771,14485993,17009885,19764027,22542890,25094412,27764644,30964245,34202721,37657830,41008227,44227550 +"Americas","Costa Rica",2627.009471,2990.010802,3460.937025,4161.727834,5118.146939,5926.876967,5262.734751,5629.915318,6160.416317,6677.045314,7723.447195,9645.06142,57.206,60.026,62.842,65.424,67.849,70.75,73.45,74.752,75.713,77.26,78.123,78.782,926317,1112300,1345187,1588717,1834796,2108457,2424367,2799811,3173216,3518107,3834934,4133884 +"Americas","Cuba",5586.53878,6092.174359,5180.75591,5690.268015,5305.445256,6380.494966,7316.918107,7532.924763,5592.843963,5431.990415,6340.646683,8948.102923,59.421,62.325,65.246,68.29,70.723,72.649,73.717,74.174,74.414,76.151,77.158,78.273,6007797,6640752,7254373,8139332,8831348,9537988,9789224,10239839,10723260,10983007,11226999,11416987 +"Americas","Dominican Republic",1397.717137,1544.402995,1662.137359,1653.723003,2189.874499,2681.9889,2861.092386,2899.842175,3044.214214,3614.101285,4563.808154,6025.374752,45.928,49.828,53.459,56.751,59.631,61.788,63.727,66.046,68.457,69.957,70.847,72.235,2491346,2923186,3453434,4049146,4671329,5302800,5968349,6655297,7351181,7992357,8650322,9319622 +"Americas","Ecuador",3522.110717,3780.546651,4086.114078,4579.074215,5280.99471,6679.62326,7213.791267,6481.776993,7103.702595,7429.455877,5773.044512,6873.262326,48.357,51.356,54.64,56.678,58.796,61.31,64.342,67.231,69.613,72.312,74.173,74.994,3548753,4058385,4681707,5432424,6298651,7278866,8365850,9545158,10748394,11911819,12921234,13755680 +"Americas","El Salvador",3048.3029,3421.523218,3776.803627,4358.595393,4520.246008,5138.922374,4098.344175,4140.442097,4444.2317,5154.825496,5351.568666,5728.353514,45.262,48.57,52.307,55.855,58.207,56.696,56.604,63.154,66.798,69.535,70.734,71.878,2042865,2355805,2747687,3232927,3790903,4282586,4474873,4842194,5274649,5783439,6353681,6939688 +"Americas","Guatemala",2428.237769,2617.155967,2750.364446,3242.531147,4031.408271,4879.992748,4820.49479,4246.485974,4439.45084,4684.313807,4858.347495,5186.050003,42.023,44.142,46.954,50.016,53.738,56.029,58.137,60.782,63.373,66.322,68.978,70.259,3146381,3640876,4208858,4690773,5149581,5703430,6395630,7326406,8486949,9803875,11178650,12572928 +"Americas","Haiti",1840.366939,1726.887882,1796.589032,1452.057666,1654.456946,1874.298931,2011.159549,1823.015995,1456.309517,1341.726931,1270.364932,1201.637154,37.579,40.696,43.59,46.243,48.042,49.923,51.461,53.636,55.089,56.671,58.137,60.916,3201488,3507701,3880130,4318137,4698301,4908554,5198399,5756203,6326682,6913545,7607651,8502814 +"Americas","Honduras",2194.926204,2220.487682,2291.156835,2538.269358,2529.842345,3203.208066,3121.760794,3023.096699,3081.694603,3160.454906,3099.72866,3548.330846,41.912,44.665,48.041,50.924,53.884,57.402,60.909,64.492,66.399,67.659,68.565,70.198,1517453,1770390,2090162,2500689,2965146,3055235,3669448,4372203,5077347,5867957,6677328,7483763 +"Americas","Jamaica",2898.530881,4756.525781,5246.107524,6124.703451,7433.889293,6650.195573,6068.05135,6351.237495,7404.923685,7121.924704,6994.774861,7320.880262,58.53,62.61,65.61,67.51,69,70.11,71.21,71.77,71.766,72.262,72.047,72.567,1426095,1535090,1665128,1861096,1997616,2156814,2298309,2326606,2378618,2531311,2664659,2780132 +"Americas","Mexico",3478.125529,4131.546641,4581.609385,5754.733883,6809.40669,7674.929108,9611.147541,8688.156003,9472.384295,9767.29753,10742.44053,11977.57496,50.789,55.19,58.299,60.11,62.361,65.032,67.405,69.498,71.455,73.67,74.902,76.195,30144317,35015548,41121485,47995559,55984294,63759976,71640904,80122492,88111030,95895146,102479927,108700891 +"Americas","Nicaragua",3112.363948,3457.415947,3634.364406,4643.393534,4688.593267,5486.371089,3470.338156,2955.984375,2170.151724,2253.023004,2474.548819,2749.320965,42.314,45.432,48.632,51.884,55.151,57.47,59.298,62.008,65.843,68.426,70.836,72.899,1165790,1358828,1590597,1865490,2182908,2554598,2979423,3344353,4017939,4609572,5146848,5675356 +"Americas","Panama",2480.380334,2961.800905,3536.540301,4421.009084,5364.249663,5351.912144,7009.601598,7034.779161,6618.74305,7113.692252,7356.031934,9809.185636,55.191,59.201,61.817,64.071,66.216,68.681,70.472,71.523,72.462,73.738,74.712,75.537,940080,1063506,1215725,1405486,1616384,1839782,2036305,2253639,2484997,2734531,2990875,3242173 +"Americas","Paraguay",1952.308701,2046.154706,2148.027146,2299.376311,2523.337977,3248.373311,4258.503604,3998.875695,4196.411078,4247.400261,3783.674243,4172.838464,62.649,63.196,64.361,64.951,65.815,66.353,66.874,67.378,68.225,69.4,70.755,71.752,1555876,1770902,2009813,2287985,2614104,2984494,3366439,3886512,4483945,5154123,5884491,6667147 +"Americas","Peru",3758.523437,4245.256698,4957.037982,5788.09333,5937.827283,6281.290855,6434.501797,6360.943444,4446.380924,5838.347657,5909.020073,7408.905561,43.902,46.263,49.096,51.445,55.448,58.447,61.406,64.134,66.458,68.386,69.906,71.421,8025700,9146100,10516500,12132200,13954700,15990099,18125129,20195924,22430449,24748122,26769436,28674757 +"Americas","Puerto Rico",3081.959785,3907.156189,5108.34463,6929.277714,9123.041742,9770.524921,10330.98915,12281.34191,14641.58711,16999.4333,18855.60618,19328.70901,64.28,68.54,69.62,71.1,72.16,73.44,73.75,74.63,73.911,74.917,77.778,78.746,2227000,2260000,2448046,2648961,2847132,3080828,3279001,3444468,3585176,3759430,3859606,3942491 +"Americas","Trinidad and Tobago",3023.271928,4100.3934,4997.523971,5621.368472,6619.551419,7899.554209,9119.528607,7388.597823,7370.990932,8792.573126,11460.60023,18008.50924,59.1,61.8,64.9,65.4,65.9,68.3,68.832,69.582,69.862,69.465,68.976,69.819,662850,764900,887498,960155,975199,1039009,1116479,1191336,1183669,1138101,1101832,1056608 +"Americas","United States",13990.48208,14847.12712,16173.14586,19530.36557,21806.03594,24072.63213,25009.55914,29884.35041,32003.93224,35767.43303,39097.09955,42951.65309,68.44,69.49,70.21,70.76,71.34,73.38,74.65,75.02,76.09,76.81,77.31,78.242,157553000,171984000,186538000,198712000,209896000,220239000,232187835,242803533,256894189,272911760,287675526,301139947 +"Americas","Uruguay",5716.766744,6150.772969,5603.357717,5444.61962,5703.408898,6504.339663,6920.223051,7452.398969,8137.004775,9230.240708,7727.002004,10611.46299,66.071,67.044,68.253,68.468,68.673,69.481,70.805,71.918,72.752,74.223,75.307,76.384,2252965,2424959,2598466,2748579,2829526,2873520,2953997,3045153,3149262,3262838,3363085,3447496 +"Americas","Venezuela",7689.799761,9802.466526,8422.974165,9541.474188,10505.25966,13143.95095,11152.41011,9883.584648,10733.92631,10165.49518,8605.047831,11415.80569,55.088,57.907,60.77,63.479,65.712,67.456,68.557,70.19,71.15,72.146,72.766,73.747,5439568,6702668,8143375,9709552,11515649,13503563,15620766,17910182,20265563,22374398,24287670,26084662 +"Asia","Afghanistan",779.4453145,820.8530296,853.10071,836.1971382,739.9811058,786.11336,978.0114388,852.3959448,649.3413952,635.341351,726.7340548,974.5803384,28.801,30.332,31.997,34.02,36.088,38.438,39.854,40.822,41.674,41.763,42.129,43.828,8425333,9240934,10267083,11537966,13079460,14880372,12881816,13867957,16317921,22227415,25268405,31889923 +"Asia","Bahrain",9867.084765,11635.79945,12753.27514,14804.6727,18268.65839,19340.10196,19211.14731,18524.02406,19035.57917,20292.01679,23403.55927,29796.04834,50.939,53.832,56.923,59.923,63.3,65.593,69.052,70.75,72.601,73.925,74.795,75.635,120447,138655,171863,202182,230800,297410,377967,454612,529491,598561,656397,708573 +"Asia","Bangladesh",684.2441716,661.6374577,686.3415538,721.1860862,630.2336265,659.8772322,676.9818656,751.9794035,837.8101643,972.7700352,1136.39043,1391.253792,37.484,39.348,41.216,43.453,45.252,46.923,50.009,52.819,56.018,59.412,62.013,64.062,46886859,51365468,56839289,62821884,70759295,80428306,93074406,103764241,113704579,123315288,135656790,150448339 +"Asia","Cambodia",368.4692856,434.0383364,496.9136476,523.4323142,421.6240257,524.9721832,624.4754784,683.8955732,682.3031755,734.28517,896.2260153,1713.778686,39.417,41.366,43.415,45.415,40.317,31.22,50.957,53.914,55.803,56.534,56.752,59.723,4693836,5322536,6083619,6960067,7450606,6978607,7272485,8371791,10150094,11782962,12926707,14131858 +"Asia","China",400.448610699994,575.9870009,487.6740183,612.7056934,676.9000921,741.2374699,962.4213805,1378.904018,1655.784158,2289.234136,3119.280896,4959.114854,44,50.54896,44.50136,58.38112,63.11888,63.96736,65.525,67.274,68.69,70.426,72.028,72.961,556263527.999989,637408000,665770000,754550000,862030000,943455000,1000281000,1084035000,1164970000,1230075000,1280400000,1318683096 +"Asia","Hong Kong China",3054.421209,3629.076457,4692.648272,6197.962814,8315.928145,11186.14125,14560.53051,20038.47269,24757.60301,28377.63219,30209.01516,39724.97867,60.96,64.75,67.65,70,72,73.6,75.45,76.2,77.601,80,81.495,82.208,2125900,2736300,3305200,3722800,4115700,4583700,5264500,5584510,5829696,6495918,6762476,6980412 +"Asia","India",546.5657493,590.061996,658.3471509,700.7706107,724.032527,813.337323,855.7235377,976.5126756,1164.406809,1458.817442,1746.769454,2452.210407,37.373,40.249,43.605,47.193,50.651,54.208,56.596,58.553,60.223,61.765,62.879,64.698,3.72e+08,4.09e+08,4.54e+08,5.06e+08,5.67e+08,6.34e+08,7.08e+08,7.88e+08,8.72e+08,9.59e+08,1034172547,1110396331 +"Asia","Indonesia",749.6816546,858.9002707,849.2897701,762.4317721,1111.107907,1382.702056,1516.872988,1748.356961,2383.140898,3119.335603,2873.91287,3540.651564,37.468,39.918,42.518,45.964,49.203,52.702,56.159,60.137,62.681,66.041,68.588,70.65,82052000,90124000,99028000,109343000,121282000,136725000,153343000,169276000,184816000,199278000,211060000,223547000 +"Asia","Iran",3035.326002,3290.257643,4187.329802,5906.731805,9613.818607,11888.59508,7608.334602,6642.881371,7235.653188,8263.590301,9240.761975,11605.71449,44.869,47.181,49.325,52.469,55.234,57.702,59.62,63.04,65.742,68.042,69.451,70.964,17272000,19792000,22874000,26538000,30614000,35480679,43072751,51889696,60397973,63327987,66907826,69453570 +"Asia","Iraq",4129.766056,6229.333562,8341.737815,8931.459811,9576.037596,14688.23507,14517.90711,11643.57268,3745.640687,3076.239795,4390.717312,4471.061906,45.32,48.437,51.457,54.459,56.95,60.413,62.038,65.044,59.461,58.811,57.046,59.545,5441766,6248643,7240260,8519282,10061506,11882916,14173318,16543189,17861905,20775703,24001816,27499638 +"Asia","Israel",4086.522128,5385.278451,7105.630706,8393.741404,12786.93223,13306.61921,15367.0292,17122.47986,18051.52254,20896.60924,21905.59514,25523.2771,65.39,67.84,69.39,70.75,71.63,73.06,74.45,75.6,76.93,78.269,79.696,80.745,1620914,1944401,2310904,2693585,3095893,3495918,3858421,4203148,4936550,5531387,6029529,6426679 +"Asia","Japan",3216.956347,4317.694365,6576.649461,9847.788607,14778.78636,16610.37701,19384.10571,22375.94189,26824.89511,28816.58499,28604.5919,31656.06806,63.03,65.5,68.73,71.43,73.42,75.38,77.11,78.67,79.36,80.69,82,82.603,86459025,91563009,95831757,100825279,107188273,113872473,118454974,122091325,124329269,125956499,127065841,127467972 +"Asia","Jordan",1546.907807,1886.080591,2348.009158,2741.796252,2110.856309,2852.351568,4161.415959,4448.679912,3431.593647,3645.379572,3844.917194,4519.461171,43.158,45.669,48.126,51.629,56.528,61.134,63.739,65.869,68.015,69.772,71.263,72.535,607914,746559,933559,1255058,1613551,1937652,2347031,2820042,3867409,4526235,5307470,6053193 +"Asia","Korea Dem. Rep.",1088.277758,1571.134655,1621.693598,2143.540609,3701.621503,4106.301249,4106.525293,4106.492315,3726.063507,1690.756814,1646.758151,1593.06548,50.056,54.081,56.656,59.942,63.983,67.159,69.1,70.647,69.978,67.727,66.662,67.297,8865488,9411381,10917494,12617009,14781241,16325320,17647518,19067554,20711375,21585105,22215365,23301725 +"Asia","Korea Rep.",1030.592226,1487.593537,1536.344387,2029.228142,3030.87665,4657.22102,5622.942464,8533.088805,12104.27872,15993.52796,19233.98818,23348.13973,47.453,52.681,55.292,57.716,62.612,64.766,67.123,69.81,72.244,74.647,77.045,78.623,20947571,22611552,26420307,30131000,33505000,36436000,39326000,41622000,43805450,46173816,47969150,49044790 +"Asia","Kuwait",108382.3529,113523.1329,95458.11176,80894.88326,109347.867,59265.47714,31354.03573,28118.42998,34932.91959,40300.61996,35110.10566,47306.98978,55.565,58.033,60.47,64.624,67.712,69.343,71.309,74.174,75.19,76.156,76.904,77.588,160000,212846,358266,575003,841934,1140357,1497494,1891487,1418095,1765345,2111561,2505559 +"Asia","Lebanon",4834.804067,6089.786934,5714.560611,6006.983042,7486.384341,8659.696836,7640.519521,5377.091329,6890.806854,8754.96385,9313.93883,10461.05868,55.928,59.489,62.094,63.87,65.421,66.099,66.983,67.926,69.292,70.265,71.028,71.993,1439529,1647412,1886848,2186894,2680018,3115787,3086876,3089353,3219994,3430388,3677780,3921278 +"Asia","Malaysia",1831.132894,1810.066992,2036.884944,2277.742396,2849.09478,3827.921571,4920.355951,5249.802653,7277.912802,10132.90964,10206.97794,12451.6558,48.463,52.102,55.737,59.371,63.01,65.256,68,69.5,70.693,71.938,73.044,74.241,6748378,7739235,8906385,10154878,11441462,12845381,14441916,16331785,18319502,20476091,22662365,24821286 +"Asia","Mongolia",786.5668575,912.6626085,1056.353958,1226.04113,1421.741975,1647.511665,2000.603139,2338.008304,1785.402016,1902.2521,2140.739323,3095.772271,42.244,45.248,48.251,51.253,53.754,55.491,57.489,60.222,61.271,63.625,65.033,66.803,800663,882134,1010280,1149500,1320500,1528000,1756032,2015133,2312802,2494803,2674234,2874127 +"Asia","Myanmar",331,350,388,349,357,371,424,385,347,415,611,944,36.319,41.905,45.108,49.379,53.07,56.059,58.056,58.339,59.32,60.328,59.908,62.069,20092996,21731844,23634436,25870271,28466390,31528087,34680442,38028578,40546538,43247867,45598081,47761980 +"Asia","Nepal",545.8657229,597.9363558,652.3968593,676.4422254,674.7881296,694.1124398,718.3730947,775.6324501,897.7403604,1010.892138,1057.206311,1091.359778,36.157,37.686,39.393,41.472,43.971,46.748,49.594,52.537,55.727,59.426,61.34,63.785,9182536,9682338,10332057,11261690,12412593,13933198,15796314,17917180,20326209,23001113,25873917,28901790 +"Asia","Oman",1828.230307,2242.746551,2924.638113,4720.942687,10618.03855,11848.34392,12954.79101,18115.22313,18616.70691,19702.05581,19774.83687,22316.19287,37.578,40.08,43.165,46.988,52.143,57.367,62.728,67.734,71.197,72.499,74.193,75.64,507833,561977,628164,714775,829050,1004533,1301048,1593882,1915208,2283635,2713462,3204897 +"Asia","Pakistan",684.5971438,747.0835292,803.3427418,942.4082588,1049.938981,1175.921193,1443.429832,1704.686583,1971.829464,2049.350521,2092.712441,2605.94758,43.436,45.557,47.67,49.8,51.929,54.043,56.158,58.245,60.838,61.818,63.61,65.483,41346560,46679944,53100671,60641899,69325921,78152686,91462088,105186881,120065004,135564834,153403524,169270617 +"Asia","Philippines",1272.880995,1547.944844,1649.552153,1814.12743,1989.37407,2373.204287,2603.273765,2189.634995,2279.324017,2536.534925,2650.921068,3190.481016,47.752,51.334,54.757,56.393,58.065,60.06,62.082,64.151,66.458,68.564,70.303,71.688,22438691,26072194,30325264,35356600,40850141,46850962,53456774,60017788,67185766,75012988,82995088,91077287 +"Asia","Saudi Arabia",6459.554823,8157.591248,11626.41975,16903.04886,24837.42865,34167.7626,33693.17525,21198.26136,24841.61777,20586.69019,19014.54118,21654.83194,39.875,42.868,45.914,49.901,53.886,58.69,63.012,66.295,68.768,70.533,71.626,72.777,4005677,4419650,4943029,5618198,6472756,8128505,11254672,14619745,16945857,21229759,24501530,27601038 +"Asia","Singapore",2315.138227,2843.104409,3674.735572,4977.41854,8597.756202,11210.08948,15169.16112,18861.53081,24769.8912,33519.4766,36023.1054,47143.17964,60.396,63.179,65.798,67.946,69.521,70.795,71.76,73.56,75.788,77.158,78.77,79.972,1127000,1445929,1750200,1977600,2152400,2325300,2651869,2794552,3235865,3802309,4197776,4553009 +"Asia","Sri Lanka",1083.53203,1072.546602,1074.47196,1135.514326,1213.39553,1348.775651,1648.079789,1876.766827,2153.739222,2664.477257,3015.378833,3970.095407,57.593,61.456,62.192,64.266,65.042,65.949,68.757,69.011,70.379,70.457,70.815,72.396,7982342,9128546,10421936,11737396,13016733,14116836,15410151,16495304,17587060,18698655,19576783,20378239 +"Asia","Syria",1643.485354,2117.234893,2193.037133,1881.923632,2571.423014,3195.484582,3761.837715,3116.774285,3340.542768,4014.238972,4090.925331,4184.548089,45.883,48.284,50.305,53.655,57.296,61.195,64.59,66.974,69.249,71.527,73.053,74.143,3661549,4149908,4834621,5680812,6701172,7932503,9410494,11242847,13219062,15081016,17155814,19314747 +"Asia","Taiwan",1206.947913,1507.86129,1822.879028,2643.858681,4062.523897,5596.519826,7426.354774,11054.56175,15215.6579,20206.82098,23235.42329,28718.27684,58.5,62.4,65.2,67.5,69.39,70.59,72.16,73.4,74.26,75.25,76.99,78.4,8550362,10164215,11918938,13648692,15226039,16785196,18501390,19757799,20686918,21628605,22454239,23174294 +"Asia","Thailand",757.7974177,793.5774148,1002.199172,1295.46066,1524.358936,1961.224635,2393.219781,2982.653773,4616.896545,5852.625497,5913.187529,7458.396327,50.848,53.63,56.061,58.285,60.405,62.494,64.597,66.084,67.298,67.521,68.564,70.616,21289402,25041917,29263397,34024249,39276153,44148285,48827160,52910342,56667095,60216677,62806748,65068149 +"Asia","Vietnam",605.0664917,676.2854478,772.0491602,637.1232887,699.5016441,713.5371196,707.2357863,820.7994449,989.0231487,1385.896769,1764.456677,2441.576404,40.412,42.887,45.363,47.838,50.254,55.764,58.816,62.82,67.662,70.672,73.017,74.249,26246839,28998543,33796140,39463910,44655014,50533506,56142181,62826491,69940728,76048996,80908147,85262356 +"Asia","West Bank and Gaza",1515.592329,1827.067742,2198.956312,2649.715007,3133.409277,3682.831494,4336.032082,5107.197384,6017.654756,7110.667619,4515.487575,3025.349798,43.16,45.671,48.127,51.631,56.532,60.765,64.406,67.046,69.718,71.096,72.37,73.422,1030585,1070439,1133134,1142636,1089572,1261091,1425876,1691210,2104779,2826046,3389578,4018332 +"Asia","Yemen Rep.",781.7175761,804.8304547,825.6232006,862.4421463,1265.047031,1829.765177,1977.55701,1971.741538,1879.496673,2117.484526,2234.820827,2280.769906,32.548,33.97,35.18,36.984,39.848,44.175,49.113,52.922,55.599,58.02,60.308,62.698,4963829,5498090,6120081,6740785,7407075,8403990,9657618,11219340,13367997,15826497,18701257,22211743 +"Europe","Albania",1601.056136,1942.284244,2312.888958,2760.196931,3313.422188,3533.00391,3630.880722,3738.932735,2497.437901,3193.054604,4604.211737,5937.029526,55.23,59.28,64.82,66.22,67.69,68.93,70.42,72,71.581,72.95,75.651,76.423,1282697,1476505,1728137,1984060,2263554,2509048,2780097,3075321,3326498,3428038,3508512,3600523 +"Europe","Austria",6137.076492,8842.59803,10750.72111,12834.6024,16661.6256,19749.4223,21597.08362,23687.82607,27042.01868,29095.92066,32417.60769,36126.4927,66.8,67.48,69.54,70.14,70.63,72.17,73.18,74.94,76.04,77.51,78.98,79.829,6927772,6965860,7129864,7376998,7544201,7568430,7574613,7578903,7914969,8069876,8148312,8199783 +"Europe","Belgium",8343.105127,9714.960623,10991.20676,13149.04119,16672.14356,19117.97448,20979.84589,22525.56308,25575.57069,27561.19663,30485.88375,33692.60508,68,69.24,70.25,70.94,71.44,72.8,73.93,75.35,76.46,77.53,78.32,79.441,8730405,8989111,9218400,9556500,9709100,9821800,9856303,9870200,10045622,10199787,10311970,10392226 +"Europe","Bosnia and Herzegovina",973.5331948,1353.989176,1709.683679,2172.352423,2860.16975,3528.481305,4126.613157,4314.114757,2546.781445,4766.355904,6018.975239,7446.298803,53.82,58.45,61.93,64.79,67.45,69.86,70.69,71.14,72.178,73.244,74.09,74.852,2791000,3076000,3349000,3585000,3819000,4086000,4172693,4338977,4256013,3607000,4165416,4552198 +"Europe","Bulgaria",2444.286648,3008.670727,4254.337839,5577.0028,6597.494398,7612.240438,8224.191647,8239.854824,6302.623438,5970.38876,7696.777725,10680.79282,59.6,66.61,69.51,70.42,70.9,70.81,71.08,71.34,71.19,70.32,72.14,73.005,7274900,7651254,8012946,8310226,8576200,8797022,8892098,8971958,8658506,8066057,7661799,7322858 +"Europe","Croatia",3119.23652,4338.231617,5477.890018,6960.297861,9164.090127,11305.38517,13221.82184,13822.58394,8447.794873,9875.604515,11628.38895,14619.22272,61.21,64.77,67.13,68.5,69.61,70.64,70.46,71.52,72.527,73.68,74.876,75.748,3882229,3991242,4076557,4174366,4225310,4318673,4413368,4484310,4494013,4444595,4481020,4493312 +"Europe","Czech Republic",6876.14025,8256.343918,10136.86713,11399.44489,13108.4536,14800.16062,15377.22855,16310.4434,14297.02122,16048.51424,17596.21022,22833.30851,66.87,69.03,69.9,70.38,70.29,70.71,70.96,71.58,72.4,74.01,75.51,76.486,9125183,9513758,9620282,9835109,9862158,10161915,10303704,10311597,10315702,10300707,10256295,10228744 +"Europe","Denmark",9692.385245,11099.65935,13583.31351,15937.21123,18866.20721,20422.9015,21688.04048,25116.17581,26406.73985,29804.34567,32166.50006,35278.41874,70.78,71.81,72.35,72.96,73.47,74.69,74.63,74.8,75.33,76.11,77.18,78.332,4334000,4487831,4646899,4838800,4991596,5088419,5117810,5127024,5171393,5283663,5374693,5468120 +"Europe","Finland",6424.519071,7545.415386,9371.842561,10921.63626,14358.8759,15605.42283,18533.15761,21141.01223,20647.16499,23723.9502,28204.59057,33207.0844,66.55,67.49,68.75,69.83,70.87,72.52,74.55,74.83,75.7,77.13,78.37,79.313,4090500,4324000,4491443,4605744,4639657,4738902,4826933,4931729,5041039,5134406,5193039,5238460 +"Europe","France",7029.809327,8662.834898,10560.48553,12999.91766,16107.19171,18292.63514,20293.89746,22066.44214,24703.79615,25889.78487,28926.03234,30470.0167,67.41,68.93,70.51,71.55,72.38,73.83,74.89,76.34,77.46,78.64,79.59,80.657,42459667,44310863,47124000,49569000,51732000,53165019,54433565,55630100,57374179,58623428,59925035,61083916 +"Europe","Germany",7144.114393,10187.82665,12902.46291,14745.62561,18016.18027,20512.92123,22031.53274,24639.18566,26505.30317,27788.88416,30035.80198,32170.37442,67.5,69.1,70.3,70.8,71,72.5,73.8,74.847,76.07,77.34,78.67,79.406,69145952,71019069,73739117,76368453,78717088,78160773,78335266,77718298,80597764,82011073,82350671,82400996 +"Europe","Greece",3530.690067,4916.299889,6017.190733,8513.097016,12724.82957,14195.52428,15268.42089,16120.52839,17541.49634,18747.69814,22514.2548,27538.41188,65.86,67.86,69.51,71,72.34,73.68,75.24,76.67,77.03,77.869,78.256,79.483,7733250,8096218,8448233,8716441,8888628,9308479,9786480,9974490,10325429,10502372,10603863,10706290 +"Europe","Hungary",5263.673816,6040.180011,7550.359877,9326.64467,10168.65611,11674.83737,12545.99066,12986.47998,10535.62855,11712.7768,14843.93556,18008.94444,64.03,66.41,67.96,69.5,69.76,69.95,69.39,69.58,69.17,71.04,72.59,73.338,9504000,9839000,10063000,10223422,10394091,10637171,10705535,10612740,10348684,10244684,10083313,9956108 +"Europe","Iceland",7267.688428,9244.001412,10350.15906,13319.89568,15798.06362,19654.96247,23269.6075,26923.20628,25144.39201,28061.09966,31163.20196,36180.78919,72.49,73.47,73.68,73.73,74.46,76.11,76.99,77.23,78.77,78.95,80.5,81.757,147962,165110,182053,198676,209275,221823,233997,244676,259012,271192,288030,301931 +"Europe","Ireland",5210.280328,5599.077872,6631.597314,7655.568963,9530.772896,11150.98113,12618.32141,13872.86652,17558.81555,24521.94713,34077.04939,40675.99635,66.91,68.9,70.29,71.08,71.28,72.03,73.1,74.36,75.467,76.122,77.783,78.885,2952156,2878220,2830000,2900100,3024400,3271900,3480000,3539900,3557761,3667233,3879155,4109086 +"Europe","Italy",4931.404155,6248.656232,8243.58234,10022.40131,12269.27378,14255.98475,16537.4835,19207.23482,22013.64486,24675.02446,27968.09817,28569.7197,65.94,67.81,69.24,71.06,72.19,73.48,74.98,76.42,77.44,78.82,80.24,80.546,47666000,49182000,50843200,52667100,54365564,56059245,56535636,56729703,56840847,57479469,57926999,58147733 +"Europe","Montenegro",2647.585601,3682.259903,4649.593785,5907.850937,7778.414017,9595.929905,11222.58762,11732.51017,7003.339037,6465.613349,6557.194282,9253.896111,59.164,61.448,63.728,67.178,70.636,73.066,74.101,74.865,75.435,75.445,73.981,74.543,413834,442829,474528,501035,527678,560073,562548,569473,621621,692651,720230,684736 +"Europe","Netherlands",8941.571858,11276.19344,12790.84956,15363.25136,18794.74567,21209.0592,21399.46046,23651.32361,26790.94961,30246.13063,33724.75778,36797.93332,72.13,72.99,73.23,73.82,73.75,75.24,76.05,76.83,77.42,78.03,78.53,79.762,10381988,11026383,11805689,12596822,13329874,13852989,14310401,14665278,15174244,15604464,16122830,16570613 +"Europe","Norway",10095.42172,11653.97304,13450.40151,16361.87647,18965.05551,23311.34939,26298.63531,31540.9748,33965.66115,41283.16433,44683.97525,49357.19017,72.67,73.44,73.47,74.08,74.34,75.37,75.97,75.89,77.32,78.32,79.05,80.196,3327728,3491938,3638919,3786019,3933004,4043205,4114787,4186147,4286357,4405672,4535591,4627926 +"Europe","Poland",4029.329699,4734.253019,5338.752143,6557.152776,8006.506993,9508.141454,8451.531004,9082.351172,7738.881247,10159.58368,12002.23908,15389.92468,61.31,65.77,67.64,69.61,70.85,70.67,71.32,70.98,70.99,72.75,74.67,75.563,25730551,28235346,30329617,31785378,33039545,34621254,36227381,37740710,38370697,38654957,38625976,38518241 +"Europe","Portugal",3068.319867,3774.571743,4727.954889,6361.517993,9022.247417,10172.48572,11753.84291,13039.30876,16207.26663,17641.03156,19970.90787,20509.64777,59.82,61.51,64.39,66.6,69.26,70.41,72.77,74.06,74.86,75.97,77.29,78.098,8526050,8817650,9019800,9103000,8970450,9662600,9859650,9915289,9927680,10156415,10433867,10642836 +"Europe","Romania",3144.613186,3943.370225,4734.997586,6470.866545,8011.414402,9356.39724,9605.314053,9696.273295,6598.409903,7346.547557,7885.360081,10808.47561,61.05,64.1,66.8,66.8,69.21,69.46,69.66,69.53,69.36,69.72,71.322,72.476,16630000,17829327,18680721,19284814,20662648,21658597,22356726,22686371,22797027,22562458,22404337,22276056 +"Europe","Serbia",3581.459448,4981.090891,6289.629157,7991.707066,10522.06749,12980.66956,15181.0927,15870.87851,9325.068238,7914.320304,7236.075251,9786.534714,57.996,61.685,64.531,66.914,68.7,70.3,70.162,71.218,71.659,72.232,73.213,74.002,6860147,7271135,7616060,7971222,8313288,8686367,9032824,9230783,9826397,10336594,10111559,10150265 +"Europe","Slovak Republic",5074.659104,6093.26298,7481.107598,8412.902397,9674.167626,10922.66404,11348.54585,12037.26758,9498.467723,12126.23065,13638.77837,18678.31435,64.36,67.45,70.33,70.98,70.35,70.45,70.8,71.08,71.38,72.71,73.8,74.663,3558137,3844277,4237384,4442238,4593433,4827803,5048043,5199318,5302888,5383010,5410052,5447502 +"Europe","Slovenia",4215.041741,5862.276629,7402.303395,9405.489397,12383.4862,15277.03017,17866.72175,18678.53492,14214.71681,17161.10735,20660.01936,25768.25759,65.57,67.85,69.15,69.18,69.82,70.97,71.063,72.25,73.64,75.13,76.66,77.926,1489518,1533070,1582962,1646912,1694510,1746919,1861252,1945870,1999210,2011612,2011497,2009245 +"Europe","Spain",3834.034742,4564.80241,5693.843879,7993.512294,10638.75131,13236.92117,13926.16997,15764.98313,18603.06452,20445.29896,24835.47166,28821.0637,64.94,66.66,69.69,71.44,73.06,74.39,76.3,76.9,77.57,78.77,79.78,80.941,28549870,29841614,31158061,32850275,34513161,36439000,37983310,38880702,39549438,39855442,40152517,40448191 +"Europe","Sweden",8527.844662,9911.878226,12329.44192,15258.29697,17832.02464,18855.72521,20667.38125,23586.92927,23880.01683,25266.59499,29341.63093,33859.74835,71.86,72.49,73.37,74.16,74.72,75.44,76.42,77.19,78.16,79.39,80.04,80.884,7124673,7363802,7561588,7867931,8122293,8251648,8325260,8421403,8718867,8897619,8954175,9031088 +"Europe","Switzerland",14734.23275,17909.48973,20431.0927,22966.14432,27195.11304,26982.29052,28397.71512,30281.70459,31871.5303,32135.32301,34480.95771,37506.41907,69.62,70.56,71.32,72.77,73.78,75.39,76.21,77.41,78.03,79.37,80.62,81.701,4815000,5126000,5666000,6063000,6401400,6316424,6468126,6649942,6995447,7193761,7361757,7554661 +"Europe","Turkey",1969.10098,2218.754257,2322.869908,2826.356387,3450.69638,4269.122326,4241.356344,5089.043686,5678.348271,6601.429915,6508.085718,8458.276384,43.585,48.079,52.098,54.336,57.005,59.507,61.036,63.108,66.146,68.835,70.845,71.777,22235677,25670939,29788695,33411317,37492953,42404033,47328791,52881328,58179144,63047647,67308928,71158647 +"Europe","United Kingdom",9979.508487,11283.17795,12477.17707,14142.85089,15895.11641,17428.74846,18232.42452,21664.78767,22705.09254,26074.53136,29478.99919,33203.26128,69.18,70.42,70.76,71.36,72.01,72.76,74.04,75.007,76.42,77.218,78.471,79.425,50430000,51430000,53292000,54959000,56079000,56179000,56339704,56981620,57866349,58808266,59912431,60776238 +"Oceania","Australia",10039.59564,10949.64959,12217.22686,14526.12465,16788.62948,18334.19751,19477.00928,21888.88903,23424.76683,26997.93657,30687.75473,34435.36744,69.12,70.33,70.93,71.1,71.93,73.49,74.74,76.32,77.56,78.83,80.37,81.235,8691212,9712569,10794968,11872264,13177000,14074100,15184200,16257249,17481977,18565243,19546792,20434176 +"Oceania","New Zealand",10556.57566,12247.39532,13175.678,14463.91893,16046.03728,16233.7177,17632.4104,19007.19129,18363.32494,21050.41377,23189.80135,25185.00911,69.39,70.26,71.24,71.52,71.89,72.22,73.84,74.32,76.33,77.55,79.11,80.204,1994794,2229407,2488550,2728150,2929100,3164900,3210650,3317166,3437674,3676187,3908037,4115771 diff --git a/data/gapminder_gdp_africa.csv b/data/gapminder_gdp_africa.csv new file mode 100644 index 000000000..1609c405d --- /dev/null +++ b/data/gapminder_gdp_africa.csv @@ -0,0 +1,53 @@ +country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007 +Algeria,2449.008185,3013.976023,2550.81688,3246.991771,4182.663766,4910.416756,5745.160213,5681.358539,5023.216647,4797.295051,5288.040382,6223.367465 +Angola,3520.610273,3827.940465,4269.276742,5522.776375,5473.288005,3008.647355,2756.953672,2430.208311,2627.845685,2277.140884,2773.287312,4797.231267 +Benin,1062.7522,959.6010805,949.4990641,1035.831411,1085.796879,1029.161251,1277.897616,1225.85601,1191.207681,1232.975292,1372.877931,1441.284873 +Botswana,851.2411407,918.2325349,983.6539764,1214.709294,2263.611114,3214.857818,4551.14215,6205.88385,7954.111645,8647.142313,11003.60508,12569.85177 +Burkina Faso,543.2552413,617.1834648,722.5120206,794.8265597,854.7359763,743.3870368,807.1985855,912.0631417,931.7527731,946.2949618,1037.645221,1217.032994 +Burundi,339.2964587,379.5646281,355.2032273,412.9775136,464.0995039,556.1032651,559.603231,621.8188189,631.6998778,463.1151478,446.4035126,430.0706916 +Cameroon,1172.667655,1313.048099,1399.607441,1508.453148,1684.146528,1783.432873,2367.983282,2602.664206,1793.163278,1694.337469,1934.011449,2042.09524 +Central African Republic,1071.310713,1190.844328,1193.068753,1136.056615,1070.013275,1109.374338,956.7529907,844.8763504,747.9055252,740.5063317,738.6906068,706.016537 +Chad,1178.665927,1308.495577,1389.817618,1196.810565,1104.103987,1133.98495,797.9081006,952.386129,1058.0643,1004.961353,1156.18186,1704.063724 +Comoros,1102.990936,1211.148548,1406.648278,1876.029643,1937.577675,1172.603047,1267.100083,1315.980812,1246.90737,1173.618235,1075.811558,986.1478792 +Congo Dem. Rep.,780.5423257,905.8602303,896.3146335,861.5932424,904.8960685,795.757282,673.7478181,672.774812,457.7191807,312.188423,241.1658765,277.5518587 +Congo Rep.,2125.621418,2315.056572,2464.783157,2677.939642,3213.152683,3259.178978,4879.507522,4201.194937,4016.239529,3484.164376,3484.06197,3632.557798 +Cote d'Ivoire,1388.594732,1500.895925,1728.869428,2052.050473,2378.201111,2517.736547,2602.710169,2156.956069,1648.073791,1786.265407,1648.800823,1544.750112 +Djibouti,2669.529475,2864.969076,3020.989263,3020.050513,3694.212352,3081.761022,2879.468067,2880.102568,2377.156192,1895.016984,1908.260867,2082.481567 +Egypt,1418.822445,1458.915272,1693.335853,1814.880728,2024.008147,2785.493582,3503.729636,3885.46071,3794.755195,4173.181797,4754.604414,5581.180998 +Equatorial Guinea,375.6431231,426.0964081,582.8419714,915.5960025,672.4122571,958.5668124,927.8253427,966.8968149,1132.055034,2814.480755,7703.4959,12154.08975 +Eritrea,328.9405571,344.1618859,380.9958433,468.7949699,514.3242082,505.7538077,524.8758493,521.1341333,582.8585102,913.47079,765.3500015,641.3695236 +Ethiopia,362.1462796,378.9041632,419.4564161,516.1186438,566.2439442,556.8083834,577.8607471,573.7413142,421.3534653,515.8894013,530.0535319,690.8055759 +Gabon,4293.476475,4976.198099,6631.459222,8358.761987,11401.94841,21745.57328,15113.36194,11864.40844,13522.15752,14722.84188,12521.71392,13206.48452 +Gambia,485.2306591,520.9267111,599.650276,734.7829124,756.0868363,884.7552507,835.8096108,611.6588611,665.6244126,653.7301704,660.5855997,752.7497265 +Ghana,911.2989371,1043.561537,1190.041118,1125.69716,1178.223708,993.2239571,876.032569,847.0061135,925.060154,1005.245812,1111.984578,1327.60891 +Guinea,510.1964923,576.2670245,686.3736739,708.7595409,741.6662307,874.6858643,857.2503577,805.5724718,794.3484384,869.4497668,945.5835837,942.6542111 +Guinea-Bissau,299.850319,431.7904566,522.0343725,715.5806402,820.2245876,764.7259628,838.1239671,736.4153921,745.5398706,796.6644681,575.7047176,579.231743 +Kenya,853.540919,944.4383152,896.9663732,1056.736457,1222.359968,1267.613204,1348.225791,1361.936856,1341.921721,1360.485021,1287.514732,1463.249282 +Lesotho,298.8462121,335.9971151,411.8006266,498.6390265,496.5815922,745.3695408,797.2631074,773.9932141,977.4862725,1186.147994,1275.184575,1569.331442 +Liberia,575.5729961,620.9699901,634.1951625,713.6036483,803.0054535,640.3224383,572.1995694,506.1138573,636.6229191,609.1739508,531.4823679,414.5073415 +Libya,2387.54806,3448.284395,6757.030816,18772.75169,21011.49721,21951.21176,17364.27538,11770.5898,9640.138501,9467.446056,9534.677467,12057.49928 +Madagascar,1443.011715,1589.20275,1643.38711,1634.047282,1748.562982,1544.228586,1302.878658,1155.441948,1040.67619,986.2958956,894.6370822,1044.770126 +Malawi,369.1650802,416.3698064,427.9010856,495.5147806,584.6219709,663.2236766,632.8039209,635.5173634,563.2000145,692.2758103,665.4231186,759.3499101 +Mali,452.3369807,490.3821867,496.1743428,545.0098873,581.3688761,686.3952693,618.0140641,684.1715576,739.014375,790.2579846,951.4097518,1042.581557 +Mauritania,743.1159097,846.1202613,1055.896036,1421.145193,1586.851781,1497.492223,1481.150189,1421.603576,1361.369784,1483.136136,1579.019543,1803.151496 +Mauritius,1967.955707,2034.037981,2529.067487,2475.387562,2575.484158,3710.982963,3688.037739,4783.586903,6058.253846,7425.705295,9021.815894,10956.99112 +Morocco,1688.20357,1642.002314,1566.353493,1711.04477,1930.194975,2370.619976,2702.620356,2755.046991,2948.047252,2982.101858,3258.495584,3820.17523 +Mozambique,468.5260381,495.5868333,556.6863539,566.6691539,724.9178037,502.3197334,462.2114149,389.8761846,410.8968239,472.3460771,633.6179466,823.6856205 +Namibia,2423.780443,2621.448058,3173.215595,3793.694753,3746.080948,3876.485958,4191.100511,3693.731337,3804.537999,3899.52426,4072.324751,4811.060429 +Niger,761.879376,835.5234025,997.7661127,1054.384891,954.2092363,808.8970728,909.7221354,668.3000228,581.182725,580.3052092,601.0745012,619.6768924 +Nigeria,1077.281856,1100.592563,1150.927478,1014.514104,1698.388838,1981.951806,1576.97375,1385.029563,1619.848217,1624.941275,1615.286395,2013.977305 +Reunion,2718.885295,2769.451844,3173.72334,4021.175739,5047.658563,4319.804067,5267.219353,5303.377488,6101.255823,6071.941411,6316.1652,7670.122558 +Rwanda,493.3238752,540.2893983,597.4730727,510.9637142,590.5806638,670.0806011,881.5706467,847.991217,737.0685949,589.9445051,785.6537648,863.0884639 +Sao Tome and Principe,879.5835855,860.7369026,1071.551119,1384.840593,1532.985254,1737.561657,1890.218117,1516.525457,1428.777814,1339.076036,1353.09239,1598.435089 +Senegal,1450.356983,1567.653006,1654.988723,1612.404632,1597.712056,1561.769116,1518.479984,1441.72072,1367.899369,1392.368347,1519.635262,1712.472136 +Sierra Leone,879.7877358,1004.484437,1116.639877,1206.043465,1353.759762,1348.285159,1465.010784,1294.447788,1068.696278,574.6481576,699.489713,862.5407561 +Somalia,1135.749842,1258.147413,1369.488336,1284.73318,1254.576127,1450.992513,1176.807031,1093.244963,926.9602964,930.5964284,882.0818218,926.1410683 +South Africa,4725.295531,5487.104219,5768.729717,7114.477971,7765.962636,8028.651439,8568.266228,7825.823398,7225.069258,7479.188244,7710.946444,9269.657808 +Sudan,1615.991129,1770.337074,1959.593767,1687.997641,1659.652775,2202.988423,1895.544073,1507.819159,1492.197043,1632.210764,1993.398314,2602.394995 +Swaziland,1148.376626,1244.708364,1856.182125,2613.101665,3364.836625,3781.410618,3895.384018,3984.839812,3553.0224,3876.76846,4128.116943,4513.480643 +Tanzania,716.6500721,698.5356073,722.0038073,848.2186575,915.9850592,962.4922932,874.2426069,831.8220794,825.682454,789.1862231,899.0742111,1107.482182 +Togo,859.8086567,925.9083202,1067.53481,1477.59676,1649.660188,1532.776998,1344.577953,1202.201361,1034.298904,982.2869243,886.2205765,882.9699438 +Tunisia,1468.475631,1395.232468,1660.30321,1932.360167,2753.285994,3120.876811,3560.233174,3810.419296,4332.720164,4876.798614,5722.895655,7092.923025 +Uganda,734.753484,774.3710692,767.2717398,908.9185217,950.735869,843.7331372,682.2662268,617.7244065,644.1707969,816.559081,927.7210018,1056.380121 +Zambia,1147.388831,1311.956766,1452.725766,1777.077318,1773.498265,1588.688299,1408.678565,1213.315116,1210.884633,1071.353818,1071.613938,1271.211593 +Zimbabwe,406.8841148,518.7642681,527.2721818,569.7950712,799.3621758,685.5876821,788.8550411,706.1573059,693.4207856,792.4499603,672.0386227,469.7092981 diff --git a/data/gapminder_gdp_americas.csv b/data/gapminder_gdp_americas.csv new file mode 100644 index 000000000..e93fb979a --- /dev/null +++ b/data/gapminder_gdp_americas.csv @@ -0,0 +1,26 @@ +continent,country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007 +Americas,Argentina,5911.315053,6856.856212,7133.166023,8052.953021,9443.038526,10079.02674,8997.897412,9139.671389,9308.41871,10967.28195,8797.640716,12779.37964 +Americas,Bolivia,2677.326347,2127.686326,2180.972546,2586.886053,2980.331339,3548.097832,3156.510452,2753.69149,2961.699694,3326.143191,3413.26269,3822.137084 +Americas,Brazil,2108.944355,2487.365989,3336.585802,3429.864357,4985.711467,6660.118654,7030.835878,7807.095818,6950.283021,7957.980824,8131.212843,9065.800825 +Americas,Canada,11367.16112,12489.95006,13462.48555,16076.58803,18970.57086,22090.88306,22898.79214,26626.51503,26342.88426,28954.92589,33328.96507,36319.23501 +Americas,Chile,3939.978789,4315.622723,4519.094331,5106.654313,5494.024437,4756.763836,5095.665738,5547.063754,7596.125964,10118.05318,10778.78385,13171.63885 +Americas,Colombia,2144.115096,2323.805581,2492.351109,2678.729839,3264.660041,3815.80787,4397.575659,4903.2191,5444.648617,6117.361746,5755.259962,7006.580419 +Americas,Costa Rica,2627.009471,2990.010802,3460.937025,4161.727834,5118.146939,5926.876967,5262.734751,5629.915318,6160.416317,6677.045314,7723.447195,9645.06142 +Americas,Cuba,5586.53878,6092.174359,5180.75591,5690.268015,5305.445256,6380.494966,7316.918107,7532.924763,5592.843963,5431.990415,6340.646683,8948.102923 +Americas,Dominican Republic,1397.717137,1544.402995,1662.137359,1653.723003,2189.874499,2681.9889,2861.092386,2899.842175,3044.214214,3614.101285,4563.808154,6025.374752 +Americas,Ecuador,3522.110717,3780.546651,4086.114078,4579.074215,5280.99471,6679.62326,7213.791267,6481.776993,7103.702595,7429.455877,5773.044512,6873.262326 +Americas,El Salvador,3048.3029,3421.523218,3776.803627,4358.595393,4520.246008,5138.922374,4098.344175,4140.442097,4444.2317,5154.825496,5351.568666,5728.353514 +Americas,Guatemala,2428.237769,2617.155967,2750.364446,3242.531147,4031.408271,4879.992748,4820.49479,4246.485974,4439.45084,4684.313807,4858.347495,5186.050003 +Americas,Haiti,1840.366939,1726.887882,1796.589032,1452.057666,1654.456946,1874.298931,2011.159549,1823.015995,1456.309517,1341.726931,1270.364932,1201.637154 +Americas,Honduras,2194.926204,2220.487682,2291.156835,2538.269358,2529.842345,3203.208066,3121.760794,3023.096699,3081.694603,3160.454906,3099.72866,3548.330846 +Americas,Jamaica,2898.530881,4756.525781,5246.107524,6124.703451,7433.889293,6650.195573,6068.05135,6351.237495,7404.923685,7121.924704,6994.774861,7320.880262 +Americas,Mexico,3478.125529,4131.546641,4581.609385,5754.733883,6809.40669,7674.929108,9611.147541,8688.156003,9472.384295,9767.29753,10742.44053,11977.57496 +Americas,Nicaragua,3112.363948,3457.415947,3634.364406,4643.393534,4688.593267,5486.371089,3470.338156,2955.984375,2170.151724,2253.023004,2474.548819,2749.320965 +Americas,Panama,2480.380334,2961.800905,3536.540301,4421.009084,5364.249663,5351.912144,7009.601598,7034.779161,6618.74305,7113.692252,7356.031934,9809.185636 +Americas,Paraguay,1952.308701,2046.154706,2148.027146,2299.376311,2523.337977,3248.373311,4258.503604,3998.875695,4196.411078,4247.400261,3783.674243,4172.838464 +Americas,Peru,3758.523437,4245.256698,4957.037982,5788.09333,5937.827283,6281.290855,6434.501797,6360.943444,4446.380924,5838.347657,5909.020073,7408.905561 +Americas,Puerto Rico,3081.959785,3907.156189,5108.34463,6929.277714,9123.041742,9770.524921,10330.98915,12281.34191,14641.58711,16999.4333,18855.60618,19328.70901 +Americas,Trinidad and Tobago,3023.271928,4100.3934,4997.523971,5621.368472,6619.551419,7899.554209,9119.528607,7388.597823,7370.990932,8792.573126,11460.60023,18008.50924 +Americas,United States,13990.48208,14847.12712,16173.14586,19530.36557,21806.03594,24072.63213,25009.55914,29884.35041,32003.93224,35767.43303,39097.09955,42951.65309 +Americas,Uruguay,5716.766744,6150.772969,5603.357717,5444.61962,5703.408898,6504.339663,6920.223051,7452.398969,8137.004775,9230.240708,7727.002004,10611.46299 +Americas,Venezuela,7689.799761,9802.466526,8422.974165,9541.474188,10505.25966,13143.95095,11152.41011,9883.584648,10733.92631,10165.49518,8605.047831,11415.80569 diff --git a/data/gapminder_gdp_asia.csv b/data/gapminder_gdp_asia.csv new file mode 100644 index 000000000..7b026f549 --- /dev/null +++ b/data/gapminder_gdp_asia.csv @@ -0,0 +1,34 @@ +country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007 +Afghanistan,779.4453145,820.8530296,853.10071,836.1971382,739.9811058,786.11336,978.0114388,852.3959448,649.3413952,635.341351,726.7340548,974.5803384 +Bahrain,9867.084765,11635.79945,12753.27514,14804.6727,18268.65839,19340.10196,19211.14731,18524.02406,19035.57917,20292.01679,23403.55927,29796.04834 +Bangladesh,684.2441716,661.6374577,686.3415538,721.1860862,630.2336265,659.8772322,676.9818656,751.9794035,837.8101643,972.7700352,1136.39043,1391.253792 +Cambodia,368.4692856,434.0383364,496.9136476,523.4323142,421.6240257,524.9721832,624.4754784,683.8955732,682.3031755,734.28517,896.2260153,1713.778686 +China,400.4486107,575.9870009,487.6740183,612.7056934,676.9000921,741.2374699,962.4213805,1378.904018,1655.784158,2289.234136,3119.280896,4959.114854 +Hong Kong China,3054.421209,3629.076457,4692.648272,6197.962814,8315.928145,11186.14125,14560.53051,20038.47269,24757.60301,28377.63219,30209.01516,39724.97867 +India,546.5657493,590.061996,658.3471509,700.7706107,724.032527,813.337323,855.7235377,976.5126756,1164.406809,1458.817442,1746.769454,2452.210407 +Indonesia,749.6816546,858.9002707,849.2897701,762.4317721,1111.107907,1382.702056,1516.872988,1748.356961,2383.140898,3119.335603,2873.91287,3540.651564 +Iran,3035.326002,3290.257643,4187.329802,5906.731805,9613.818607,11888.59508,7608.334602,6642.881371,7235.653188,8263.590301,9240.761975,11605.71449 +Iraq,4129.766056,6229.333562,8341.737815,8931.459811,9576.037596,14688.23507,14517.90711,11643.57268,3745.640687,3076.239795,4390.717312,4471.061906 +Israel,4086.522128,5385.278451,7105.630706,8393.741404,12786.93223,13306.61921,15367.0292,17122.47986,18051.52254,20896.60924,21905.59514,25523.2771 +Japan,3216.956347,4317.694365,6576.649461,9847.788607,14778.78636,16610.37701,19384.10571,22375.94189,26824.89511,28816.58499,28604.5919,31656.06806 +Jordan,1546.907807,1886.080591,2348.009158,2741.796252,2110.856309,2852.351568,4161.415959,4448.679912,3431.593647,3645.379572,3844.917194,4519.461171 +Korea Dem. Rep.,1088.277758,1571.134655,1621.693598,2143.540609,3701.621503,4106.301249,4106.525293,4106.492315,3726.063507,1690.756814,1646.758151,1593.06548 +Korea Rep.,1030.592226,1487.593537,1536.344387,2029.228142,3030.87665,4657.22102,5622.942464,8533.088805,12104.27872,15993.52796,19233.98818,23348.13973 +Kuwait,108382.3529,113523.1329,95458.11176,80894.88326,109347.867,59265.47714,31354.03573,28118.42998,34932.91959,40300.61996,35110.10566,47306.98978 +Lebanon,4834.804067,6089.786934,5714.560611,6006.983042,7486.384341,8659.696836,7640.519521,5377.091329,6890.806854,8754.96385,9313.93883,10461.05868 +Malaysia,1831.132894,1810.066992,2036.884944,2277.742396,2849.09478,3827.921571,4920.355951,5249.802653,7277.912802,10132.90964,10206.97794,12451.6558 +Mongolia,786.5668575,912.6626085,1056.353958,1226.04113,1421.741975,1647.511665,2000.603139,2338.008304,1785.402016,1902.2521,2140.739323,3095.772271 +Myanmar,331,350,388,349,357,371,424,385,347,415,611,944 +Nepal,545.8657229,597.9363558,652.3968593,676.4422254,674.7881296,694.1124398,718.3730947,775.6324501,897.7403604,1010.892138,1057.206311,1091.359778 +Oman,1828.230307,2242.746551,2924.638113,4720.942687,10618.03855,11848.34392,12954.79101,18115.22313,18616.70691,19702.05581,19774.83687,22316.19287 +Pakistan,684.5971438,747.0835292,803.3427418,942.4082588,1049.938981,1175.921193,1443.429832,1704.686583,1971.829464,2049.350521,2092.712441,2605.94758 +Philippines,1272.880995,1547.944844,1649.552153,1814.12743,1989.37407,2373.204287,2603.273765,2189.634995,2279.324017,2536.534925,2650.921068,3190.481016 +Saudi Arabia,6459.554823,8157.591248,11626.41975,16903.04886,24837.42865,34167.7626,33693.17525,21198.26136,24841.61777,20586.69019,19014.54118,21654.83194 +Singapore,2315.138227,2843.104409,3674.735572,4977.41854,8597.756202,11210.08948,15169.16112,18861.53081,24769.8912,33519.4766,36023.1054,47143.17964 +Sri Lanka,1083.53203,1072.546602,1074.47196,1135.514326,1213.39553,1348.775651,1648.079789,1876.766827,2153.739222,2664.477257,3015.378833,3970.095407 +Syria,1643.485354,2117.234893,2193.037133,1881.923632,2571.423014,3195.484582,3761.837715,3116.774285,3340.542768,4014.238972,4090.925331,4184.548089 +Taiwan,1206.947913,1507.86129,1822.879028,2643.858681,4062.523897,5596.519826,7426.354774,11054.56175,15215.6579,20206.82098,23235.42329,28718.27684 +Thailand,757.7974177,793.5774148,1002.199172,1295.46066,1524.358936,1961.224635,2393.219781,2982.653773,4616.896545,5852.625497,5913.187529,7458.396327 +Vietnam,605.0664917,676.2854478,772.0491602,637.1232887,699.5016441,713.5371196,707.2357863,820.7994449,989.0231487,1385.896769,1764.456677,2441.576404 +West Bank and Gaza,1515.592329,1827.067742,2198.956312,2649.715007,3133.409277,3682.831494,4336.032082,5107.197384,6017.654756,7110.667619,4515.487575,3025.349798 +Yemen Rep.,781.7175761,804.8304547,825.6232006,862.4421463,1265.047031,1829.765177,1977.55701,1971.741538,1879.496673,2117.484526,2234.820827,2280.769906 diff --git a/data/gapminder_gdp_europe.csv b/data/gapminder_gdp_europe.csv new file mode 100644 index 000000000..1792319f6 --- /dev/null +++ b/data/gapminder_gdp_europe.csv @@ -0,0 +1,31 @@ +country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007 +Albania,1601.056136,1942.284244,2312.888958,2760.196931,3313.422188,3533.00391,3630.880722,3738.932735,2497.437901,3193.054604,4604.211737,5937.029526 +Austria,6137.076492,8842.59803,10750.72111,12834.6024,16661.6256,19749.4223,21597.08362,23687.82607,27042.01868,29095.92066,32417.60769,36126.4927 +Belgium,8343.105127,9714.960623,10991.20676,13149.04119,16672.14356,19117.97448,20979.84589,22525.56308,25575.57069,27561.19663,30485.88375,33692.60508 +Bosnia and Herzegovina,973.5331948,1353.989176,1709.683679,2172.352423,2860.16975,3528.481305,4126.613157,4314.114757,2546.781445,4766.355904,6018.975239,7446.298803 +Bulgaria,2444.286648,3008.670727,4254.337839,5577.0028,6597.494398,7612.240438,8224.191647,8239.854824,6302.623438,5970.38876,7696.777725,10680.79282 +Croatia,3119.23652,4338.231617,5477.890018,6960.297861,9164.090127,11305.38517,13221.82184,13822.58394,8447.794873,9875.604515,11628.38895,14619.22272 +Czech Republic,6876.14025,8256.343918,10136.86713,11399.44489,13108.4536,14800.16062,15377.22855,16310.4434,14297.02122,16048.51424,17596.21022,22833.30851 +Denmark,9692.385245,11099.65935,13583.31351,15937.21123,18866.20721,20422.9015,21688.04048,25116.17581,26406.73985,29804.34567,32166.50006,35278.41874 +Finland,6424.519071,7545.415386,9371.842561,10921.63626,14358.8759,15605.42283,18533.15761,21141.01223,20647.16499,23723.9502,28204.59057,33207.0844 +France,7029.809327,8662.834898,10560.48553,12999.91766,16107.19171,18292.63514,20293.89746,22066.44214,24703.79615,25889.78487,28926.03234,30470.0167 +Germany,7144.114393,10187.82665,12902.46291,14745.62561,18016.18027,20512.92123,22031.53274,24639.18566,26505.30317,27788.88416,30035.80198,32170.37442 +Greece,3530.690067,4916.299889,6017.190733,8513.097016,12724.82957,14195.52428,15268.42089,16120.52839,17541.49634,18747.69814,22514.2548,27538.41188 +Hungary,5263.673816,6040.180011,7550.359877,9326.64467,10168.65611,11674.83737,12545.99066,12986.47998,10535.62855,11712.7768,14843.93556,18008.94444 +Iceland,7267.688428,9244.001412,10350.15906,13319.89568,15798.06362,19654.96247,23269.6075,26923.20628,25144.39201,28061.09966,31163.20196,36180.78919 +Ireland,5210.280328,5599.077872,6631.597314,7655.568963,9530.772896,11150.98113,12618.32141,13872.86652,17558.81555,24521.94713,34077.04939,40675.99635 +Italy,4931.404155,6248.656232,8243.58234,10022.40131,12269.27378,14255.98475,16537.4835,19207.23482,22013.64486,24675.02446,27968.09817,28569.7197 +Montenegro,2647.585601,3682.259903,4649.593785,5907.850937,7778.414017,9595.929905,11222.58762,11732.51017,7003.339037,6465.613349,6557.194282,9253.896111 +Netherlands,8941.571858,11276.19344,12790.84956,15363.25136,18794.74567,21209.0592,21399.46046,23651.32361,26790.94961,30246.13063,33724.75778,36797.93332 +Norway,10095.42172,11653.97304,13450.40151,16361.87647,18965.05551,23311.34939,26298.63531,31540.9748,33965.66115,41283.16433,44683.97525,49357.19017 +Poland,4029.329699,4734.253019,5338.752143,6557.152776,8006.506993,9508.141454,8451.531004,9082.351172,7738.881247,10159.58368,12002.23908,15389.92468 +Portugal,3068.319867,3774.571743,4727.954889,6361.517993,9022.247417,10172.48572,11753.84291,13039.30876,16207.26663,17641.03156,19970.90787,20509.64777 +Romania,3144.613186,3943.370225,4734.997586,6470.866545,8011.414402,9356.39724,9605.314053,9696.273295,6598.409903,7346.547557,7885.360081,10808.47561 +Serbia,3581.459448,4981.090891,6289.629157,7991.707066,10522.06749,12980.66956,15181.0927,15870.87851,9325.068238,7914.320304,7236.075251,9786.534714 +Slovak Republic,5074.659104,6093.26298,7481.107598,8412.902397,9674.167626,10922.66404,11348.54585,12037.26758,9498.467723,12126.23065,13638.77837,18678.31435 +Slovenia,4215.041741,5862.276629,7402.303395,9405.489397,12383.4862,15277.03017,17866.72175,18678.53492,14214.71681,17161.10735,20660.01936,25768.25759 +Spain,3834.034742,4564.80241,5693.843879,7993.512294,10638.75131,13236.92117,13926.16997,15764.98313,18603.06452,20445.29896,24835.47166,28821.0637 +Sweden,8527.844662,9911.878226,12329.44192,15258.29697,17832.02464,18855.72521,20667.38125,23586.92927,23880.01683,25266.59499,29341.63093,33859.74835 +Switzerland,14734.23275,17909.48973,20431.0927,22966.14432,27195.11304,26982.29052,28397.71512,30281.70459,31871.5303,32135.32301,34480.95771,37506.41907 +Turkey,1969.10098,2218.754257,2322.869908,2826.356387,3450.69638,4269.122326,4241.356344,5089.043686,5678.348271,6601.429915,6508.085718,8458.276384 +United Kingdom,9979.508487,11283.17795,12477.17707,14142.85089,15895.11641,17428.74846,18232.42452,21664.78767,22705.09254,26074.53136,29478.99919,33203.26128 diff --git a/data/gapminder_gdp_oceania.csv b/data/gapminder_gdp_oceania.csv new file mode 100644 index 000000000..e15b16056 --- /dev/null +++ b/data/gapminder_gdp_oceania.csv @@ -0,0 +1,3 @@ +country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007 +Australia,10039.59564,10949.64959,12217.22686,14526.12465,16788.62948,18334.19751,19477.00928,21888.88903,23424.76683,26997.93657,30687.75473,34435.36744 +New Zealand,10556.57566,12247.39532,13175.678,14463.91893,16046.03728,16233.7177,17632.4104,19007.19129,18363.32494,21050.41377,23189.80135,25185.00911 diff --git a/design.md b/design.md new file mode 100644 index 000000000..059a6f7da --- /dev/null +++ b/design.md @@ -0,0 +1,360 @@ +--- +title: Lesson Design +--- + +::::::::::::::::::::::::::::::::::::::::: callout + +## Help Wanted + +**We are filling in the exercises [below](#stage-3-learning-plan) +in order to make the lesson plan more concrete. +Contributions (both in the form of pull requests with filled-in exercises, +and comments on specific exercises, ordering, and timings) are greatly appreciated.** + + +:::::::::::::::::::::::::::::::::::::::::::::::::: + +## Process Used + +> Michael Pollan's advice if he taught R or Python programming: +> +> 1. Write code. +> 2. Not too much. +> 3. Mostly plots. +> +> — [Michael Koontz](https://twitter.com/_mikoontz/status/758021742078025728) +> {: .quotation} + +This lesson was developed using a slimmed-down variant of the "Understanding by Design" process. +The main sections are: + +1. Assumptions about audience, time, etc. + (The current draft also includes some conclusions and decisions in this + section - that should be refactored.) + +2. Desired results: + overall goals, summative assessments at half-day granularity, what learners + will be able to do, what learners will know. + +3. Learning plan: + each episode has a heading that summarizes what will be covered, + then estimates time that will be spent on teaching and on exercises, + while the exercises are given as bullet points. + +## Stage 1: Assumptions + +- Audience + - Graduate students in numerate disciplines from cosmology to archaeology + - Who have manipulated data in spreadsheets and with interactive tools like SAS + - But have *not* programmed beyond CPD (copy-paste-despair) +- Constraints + - One full day 09:00-16:30 + - 06:15 class time + - 0:45 lunch + - 0:30 total for two coffee breaks + - Learners use native installs on their own machines + - May use VMs or cloud resources at instructor's discretion + - But must keep native local install as an option + - No dependence on other Carpentry modules + - In particular, does not require knowledge of shell or version control + - Use the Jupyter Notebook + - Authentic tool used by many instructors + - There isn't really an alternative + - And means that even people who have seen a bit of Python before + will probably learn something +- Motivating Example + - Creating 2D plots suitable for inclusion in papers + - Appeals to almost everyone + - Makes lesson usable by both Carpentries + - And means that even people who have seen a bit of Python before + will probably learn something +- Data + - Use the gapminder data throughout + - But break into multiple files by continent + - To make display of output from examples tidier + (e.g., use Australia/New Zealand, which is only two lines) + - And allow examples showing use of multiple data sets +- Focus on Pandas instead of NumPy + - Makes lesson usable by both Data Carpentry and Software Carpentry + - Genuine novices are likely to want data analysis + - And people with some prior experience: + - will accept data analysis as an authentic task, + - and are unlikely to have encountered Pandas, + so they'll still get something useful out of the lesson +- Challenges will mostly *not* be "write this code from scratch" + - Want lots of short exercises that can reliably be finished in allotted time + - So use MCQs, fill-in-the-blanks, Parsons Problems, "tweak this code", etc. + +## Stage 2: Desired Results + +### Questions + +How do I... + +- ...read tabular data? +- ...plot a single vector of values? +- ...create a time series plot? +- ...create one plot for each of several data sets? +- ...get extra data from a single data set for plotting? +- ...write programs I can read and re-use in future? + +### Skills + +I can... + +- ...write short scripts using loops and conditionals. +- ...write functions with a fixed number of parameters that return a single result. +- ...import libraries using aliases and refer to those libraries' contents. +- ...do simple data extraction and formatting using Pandas. + +### Concepts + +I know... + +- ...that a program is a piece of lab equipment that implements an analysis + - Needs to be validated/calibrated before/during use + - Makes analysis reproducible, reviewable, shareable +- ...that programs are written for people, not for computers + - Meaningful variable names + - Modularity for readability as well as re-use + - No duplication + - Document purpose and use +- ...that there is no magic: the programs they use are no different + in principle from those they build +- ...how to assign values to variables +- ...what integers, floats, strings, NumPy arrays, and Pandas dataframes are +- ...how to trace the execution of a `for` loop +- ...how to trace the execution of `if`/`else` statements +- ...how to create and index lists +- ...how to create and index NumPy arrays +- ...how to create and index Pandas dataframes +- ...how to create time series plots +- ...the difference between defining and calling a function +- ...where to find documentation on standard libraries +- ...how to find out what else scientific Python offers + +## Stage 3: Learning Plan + +### Summative Assessment + +- Midpoint: create time-series plot for each file in a directory. +- Final: extract data from Pandas dataframe + and create comparative multi-line time series plot. + +### [Running and Quitting Interactively](../episodes/01-run-quit.md) (9:00) + +- Teaching: 15 min (because setup issues) + - Launch the Jupyter Notebook, create new notebooks, and exit the Notebook. + - Create Markdown cells in a notebook. + - Create and run Python cells in a notebook. +- Challenges: 0 min (accounted for in teaching time - no separate exercise) + - Creating lists in Markdown + - What is displayed when several expressions are put in a single cell? + - Change an existing cell from code to Markdown + - Rendering LaTeX-style equations + +### [Variables and Assignment](../episodes/02-variables.md) (9:15) + +- Teaching: 10 min + - Write programs that assign scalar values to variables and perform calculations with those values. + - Correctly trace value changes in programs that use scalar assignment. +- Challenges: 10 min + - Trace execution of code swapping two values using an intermediate variable. + - Predict final values of variables after several assignments. + - What happens if you try to index a number? + - Which is a better variable name, `m`, `min`, or `minutes`? + - What do the following slice expressions produce? + +### [Data Types and Type Conversion](../episodes/03-types-conversion.md) (09:35) + +- Teaching: 10 min + - Explain key differences between integers and floating point numbers. + - Explain key differences between numbers and character strings. + - Use built-in functions to convert between integers, floating point numbers, and strings. +- Challenges: 10 min + - What type of value is 3.4? + - What type of value is 3.25 + 4? + - What type of value would you use to represent: + - Number of days since the start of the year. + - Time elapsed since the start of the year. + - Etc. + - How can you use `//` (integer division) and `%` (modulo)? + - What does `int("3.4")` do? + - Given these float, int, and string values, which expressions will print a particular result? + - What do you expect `1+2j + 3` to produce? + +### [Built-in Functions and Help](../episodes/04-built-in.md) (09:55) + +- Teaching: 15 min + - Explain the purpose of functions. + - Correctly call built-in Python functions. + - Correctly nest calls to built-in functions. + - Use help to display documentation for built-in functions. + - Correctly describe situations in which SyntaxError and NameError occur. +- Challenges: 10 min + - Explain the order of operations in the following complex expression. + - What will each nested combination of `min` and `max` calls produce? + - Why don't `max` and `min` return `None` when given no arguments? + - Given what we have seen so far, + what index expression will get the last character in a string? + +### [Coffee](../episodes/05-coffee.md): 15 min (10:20) + +### [Libraries](../episodes/06-libraries.md) (10:35) + +- Teaching: 10 min + - Explain what software libraries are and why programmers create and use them. + - Write programs that import and use libraries from Python's standard library. + - Find and read documentation for standard libraries interactively (in the interpreter) and online. +- Challenges: 10 min + - Which function from the standard math library could you use to calculate a square root? + - What library would you use to select a random value from data? + - If `help(math)` produces an error, what have you forgotten to do? + - Fill in the blanks in code below so that the import statement and program run. + +### [Reading Tabular Data](../episodes/07-reading-tabular.md) (10:55) + +- Teaching: 10 min + - Import the Pandas library. + - Use Pandas to load a simple CSV data set. + - Get some basic information about a Pandas DataFrame. +- Challenges: 10 min + - Read the data for the Americas and display its summary statistics. + - What do `.head` and `.tail` do? + - What string(s) should you pass to `read_csv` to read files from other directories? + - How can you *write* CSV data? + +### [DataFrames](../episodes/08-data-frames.md) (11:15) + +- Teaching: 15 min + - Select individual values from a Pandas dataframe. + - Select entire rows or entire columns from a dataframe. + - Select a subset of both rows and columns from a dataframe in a single operation. + - Select a subset of a dataframe by a single Boolean criterion. +- Challenges: 15 min + - Write an expression to find the Per Capita GDP of Serbia in 2007. + - What rule governs what is (or isn't) included in numerical and named slices in Pandas? + - What does each line in the following short program do? + - What do `idxmin` and `idxmax` do? + - Write expressions to get the GDP per capita for all countries in 1982, + for all countries *after* 1985, + etc. + - Given the way its borders have changed since 1900, + what would you do if asked to create a table of GDP per capita for Poland + for the Twentieth Century? + +### [Plotting](../episodes/09-plotting.md) (11:45) + +- Teaching: 15 min + - Create a time series plot showing a single data set. + - Create a scatter plot showing relationship between two data sets. +- Exercise: 15 min + - Fill in the blanks to plot the minimum GDP per capita over time for European countries. + - Modify the example to create a scatter plot of GDP per capita in Asian countries. + - Explain what each argument to `plot` does in the following example. + +### [Lunch](../episodes/10-lunch.md) (12:15): 45 min + +### [Lists](../episodes/11-lists.md) (13:00) + +- Teaching: 10 min + - Explain why programs need collections of values. + - Write programs that create flat lists, index them, slice them, and modify them through assignment and method calls. +- Challenges: 10 min + - Fill in the blanks so that the program produces the output shown. + - How large are the following slices? + - What do negative index expressions print? + - What does a "stride" in a slice do? + - How do slices treat out-of-range bounds? + - What are the differences between sorting these two ways? + - What is the difference between `new = old` and `new = old[:]`? + +### [Loops](../episodes/12-for-loops.md) (13:20) + +- Teaching: 10 min + - Explain what for loops are normally used for. + - Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration. + - Write for loops that use the Accumulator pattern to aggregate values. +- Challenges: 15 min + - Is an indentation error a syntax error or a runtime error? + - Trace which lines of this program are executed in what order. + - Fill in the blanks in this program so that it reverses a string. + - Fill in the blanks in this series of examples to get practice accumulating values. + - Reorder and indent these lines to calculate the cumulative sum of the list values. + +### [Looping Over Data Sets](13-looping-data-sets) (13:45) + +- Teaching: 5 min + - Be able to read and write globbing expressions that match sets of files. + - Use glob to create lists of files. + - Write for loops to perform operations on files given their names in a list. +- Challenges: 10 min + - Which filenames are *not* matched by this glob expression? + - Modify this program so that it prints the number of records in the shortest file. + - Write a program that reads and plots all of the regional data sets. + +### [Writing Functions](14-writing-functions) (14:00) + +- Teaching: 10 min + - Explain and identify the difference between function definition and function call. + - Write a function that takes a small, fixed number of arguments and produces a single result. +- Challenges: 15 min + - This code defines and calls a function - what does it print when run? + - Explain why this short program prints things in the order it does. + - Fill in the blanks to create a function that finds the minimum value in a data file. + - Fill in the blanks to create a function that finds the first negative value in a list. + What does your function do if the list is empty? + - Why is it sometimes useful to pass arguments by naming the corresponding parameters? + - Fill in the blanks and turn this short piece of code into a function. + +### [Variable Scope](15-scope) (14:25) + +- Teaching: 10 min + - Identify local and global variables. + - Identify parameters as local variables. + - Read a traceback and determine the file, function, and line number on which the error occurred. +- Challenges: 10 min + - Trace the changes to the values in this program, + being careful to distinguish local from global values. + +### [Coffee](16-coffee) (14:45): 15 min + +### [Conditionals](17-conditionals) (15:00) + +- Teaching: 10 min + - Correctly write programs that use if and else statements and simple Boolean expressions (without logical operators). + - Trace the execution of unnested conditionals and conditionals inside loops. +- Challenges: 15 min + - Trace the execution of this conditional statement. + - Fill in the blanks so that this function replaces negative values with zeroes. + - Modify this program so that it only processes files with fewer than 50 records. + - Modify this program so that it always finds the largest and smallest values in a list + no matter what the list's values are. + +### [Programming Style](../episodes/18-style.md) (15:25) + +- Teaching: 15 min + - How can I make my programs more readable? + - How do most programmers format their code? + - How can programs check their own operation? +- Challenges: 15 min + - Which lines in this code will be available as online help? + - Turn the comments in this program into docstrings. + - Rewrite this short program to be more readable. + +### [Wrap-Up](../episodes/19-wrap.md) (15:55) + +- Teaching: 20 min + - Name and locate scientific Python community sites for software, workshops, and help. +- Challenges: 0 min + - None. + +### [Feedback](../episodes/20-feedback.md) (16:15) + +- Teaching: 0 min +- Challenges: 15 min + - Collect feedback + +### Finish (16:30) + + diff --git a/discuss.md b/discuss.md new file mode 100644 index 000000000..2882ce8ae --- /dev/null +++ b/discuss.md @@ -0,0 +1,7 @@ +--- +title: Discussion +--- + +FIXME: general discussion and further reading for learners. + + diff --git a/exercises.md b/exercises.md new file mode 100644 index 000000000..e717ad504 --- /dev/null +++ b/exercises.md @@ -0,0 +1,7 @@ +--- +title: Further Exercises +--- + +FIXME: exercises that don't fit into the regular schedule. + + diff --git a/fig/0_anaconda_navigator_landing_page.png b/fig/0_anaconda_navigator_landing_page.png new file mode 100644 index 000000000..a5953fa5e Binary files /dev/null and b/fig/0_anaconda_navigator_landing_page.png differ diff --git a/fig/0_jupyterlab_landing_page.png b/fig/0_jupyterlab_landing_page.png new file mode 100644 index 000000000..6eb836b7c Binary files /dev/null and b/fig/0_jupyterlab_landing_page.png differ diff --git a/fig/0_jupyterlab_left_side_bar.png b/fig/0_jupyterlab_left_side_bar.png new file mode 100644 index 000000000..2469d4078 Binary files /dev/null and b/fig/0_jupyterlab_left_side_bar.png differ diff --git a/fig/0_jupyterlab_main_work_area.png b/fig/0_jupyterlab_main_work_area.png new file mode 100644 index 000000000..1dc30618d Binary files /dev/null and b/fig/0_jupyterlab_main_work_area.png differ diff --git a/fig/0_jupyterlab_menu_bar.png b/fig/0_jupyterlab_menu_bar.png new file mode 100644 index 000000000..dbbc585ca Binary files /dev/null and b/fig/0_jupyterlab_menu_bar.png differ diff --git a/fig/0_jupyterlab_notebook_screenshot.png b/fig/0_jupyterlab_notebook_screenshot.png new file mode 100644 index 000000000..928982480 Binary files /dev/null and b/fig/0_jupyterlab_notebook_screenshot.png differ diff --git a/fig/0_multipanel_jupyterlab_screenshot.png b/fig/0_multipanel_jupyterlab_screenshot.png new file mode 100644 index 000000000..adbca585b Binary files /dev/null and b/fig/0_multipanel_jupyterlab_screenshot.png differ diff --git a/fig/2_indexing.svg b/fig/2_indexing.svg new file mode 100644 index 000000000..0493e5941 --- /dev/null +++ b/fig/2_indexing.svg @@ -0,0 +1,24 @@ + + + + print(atom_name[0]) + + + h + e + l + i + u + m + + + + 0 + 1 + 2 + 3 + 4 + 5 + + + diff --git a/fig/9_correlations_solution1.svg b/fig/9_correlations_solution1.svg new file mode 100644 index 000000000..7b5c06a0c --- /dev/null +++ b/fig/9_correlations_solution1.svg @@ -0,0 +1,67 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + min + max + 300 + 400 + 500 + 600 + 700 + 800 + 900 + + + + + 40000 + 60000 + 80000 + 100000 + + + + + + diff --git a/fig/9_correlations_solution2.png b/fig/9_correlations_solution2.png new file mode 100644 index 000000000..d54af1a1f Binary files /dev/null and b/fig/9_correlations_solution2.png differ diff --git a/fig/9_gdp_australia.svg b/fig/9_gdp_australia.svg new file mode 100644 index 000000000..7f0504e5c --- /dev/null +++ b/fig/9_gdp_australia.svg @@ -0,0 +1,46 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + 1950 + 1960 + 1970 + 1980 + 1990 + 2000 + + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + + + + + + diff --git a/fig/9_gdp_australia_formatted.svg b/fig/9_gdp_australia_formatted.svg new file mode 100644 index 000000000..cb1fe6cc7 --- /dev/null +++ b/fig/9_gdp_australia_formatted.svg @@ -0,0 +1,58 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1950 + 1960 + 1970 + 1980 + 1990 + 2000 + + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + + diff --git a/fig/9_gdp_australia_nz.svg b/fig/9_gdp_australia_nz.svg new file mode 100644 index 000000000..d8ae1142d --- /dev/null +++ b/fig/9_gdp_australia_nz.svg @@ -0,0 +1,54 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + country + Australia + New Zealand + + 1950 + 1960 + 1970 + 1980 + 1990 + 2000 + + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + GDP per capita + + diff --git a/fig/9_gdp_australia_nz_formatted.svg b/fig/9_gdp_australia_nz_formatted.svg new file mode 100644 index 000000000..d300207c8 --- /dev/null +++ b/fig/9_gdp_australia_nz_formatted.svg @@ -0,0 +1,68 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1950 + 1960 + 1970 + 1980 + 1990 + 2000 + + Year + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + GDP per capita ($) + + Australia + New Zealand + diff --git a/fig/9_gdp_bar.svg b/fig/9_gdp_bar.svg new file mode 100644 index 000000000..55b907b33 --- /dev/null +++ b/fig/9_gdp_bar.svg @@ -0,0 +1,119 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1952 + 1957 + 1962 + 1967 + 1972 + 1977 + 1982 + 1987 + 1992 + 1997 + 2002 + 2007 + + 0 + 5000 + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + + GDP per capita + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + country + Australia + New Zealand + + diff --git a/fig/9_gdp_correlation_data.svg b/fig/9_gdp_correlation_data.svg new file mode 100644 index 000000000..5d1cb7448 --- /dev/null +++ b/fig/9_gdp_correlation_data.svg @@ -0,0 +1,80 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + Australia + + 10000 + 12000 + 14000 + 16000 + 18000 + 20000 + 22000 + 24000 + + New Zealand + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/fig/9_gdp_correlation_plt.svg b/fig/9_gdp_correlation_plt.svg new file mode 100644 index 000000000..23cd95a24 --- /dev/null +++ b/fig/9_gdp_correlation_plt.svg @@ -0,0 +1,80 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 10000 + 15000 + 20000 + 25000 + 30000 + 35000 + + + 10000 + 12000 + 14000 + 16000 + 18000 + 20000 + 22000 + 24000 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/fig/9_minima_maxima_solution.png b/fig/9_minima_maxima_solution.png new file mode 100644 index 000000000..b7146fb36 Binary files /dev/null and b/fig/9_minima_maxima_solution.png differ diff --git a/fig/9_more_correlations_solution.svg b/fig/9_more_correlations_solution.svg new file mode 100644 index 000000000..1337d2e90 --- /dev/null +++ b/fig/9_more_correlations_solution.svg @@ -0,0 +1,187 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + 10000 + 20000 + 30000 + 40000 + 50000 + gdpPercap_2007 + + + 40 + 50 + 60 + 70 + 80 + + lifeExp_2007 + + + + + diff --git a/fig/9_simple_position_time_plot.svg b/fig/9_simple_position_time_plot.svg new file mode 100644 index 000000000..394d5f3c8 --- /dev/null +++ b/fig/9_simple_position_time_plot.svg @@ -0,0 +1,52 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.0 + 0.5 + 1.0 + 1.5 + 2.0 + 2.5 + 3.0 + Time (hr) + + + 0 + 50 + 100 + 150 + 200 + 250 + 300 + + Position (km) + + + + + + diff --git a/files/python-novice-gapminder-data.zip b/files/python-novice-gapminder-data.zip new file mode 100644 index 000000000..5988d3c5e Binary files /dev/null and b/files/python-novice-gapminder-data.zip differ diff --git a/index.md b/index.md new file mode 100644 index 000000000..24e518279 --- /dev/null +++ b/index.md @@ -0,0 +1,31 @@ +--- +permalink: index.html +site: sandpaper::sandpaper_site +--- + +This lesson is an introduction to programming in Python 3 for people with little or no previous +programming experience. It uses plotting as its motivating example and is designed to be used in +both [Data Carpentry][dc-lessons] and [Software Carpentry][swc-lessons] workshops. +This lesson references [JupyterLab][jupyterlab] but can be taught using alternative Python 3 interpreters +as well (e.g., repl.it, Anaconda). + +:::::::::::::::::::::::::::::::::::::::::: prereq + +## Prerequisites + +1. Learners need to understand what files and directories are, + what a working directory is, + and how to start a Python interpreter. + +2. Learners must install Python 3 before the class starts. + +3. Learners must get the gapminder data before class starts: + please download and unzip the file + [python-novice-gapminder-data.zip](episodes/files/python-novice-gapminder-data.zip). + +Please see [the setup instructions](learners/setup.md) +for more details. + +:::::::::::::::::::::::::::::::::::::::::::::::::: + + diff --git a/instructor-notes.md b/instructor-notes.md new file mode 100644 index 000000000..ac2cc6d1e --- /dev/null +++ b/instructor-notes.md @@ -0,0 +1,29 @@ +--- +title: Instructors' Guide +--- + +## General Notes + +It's all right not to get through the whole lesson. +: This lesson is designed for people who have never programmed before, +but any given class may include people with a wide range of prior experience. +We have therefore included enough material to fill a full day if need be, +but expect that many offerings will only get as far as the introduction to Pandas. + +Don't tell people to Google things. +: One of the goals of this lesson is +to help novices build a workable mental model of how programming works. +Until they have that model, +they will not know what to search for or how to recognize a helpful answer. +Telling them to Google can also give the impression that we think their problem is trivial. +(That said, if learners have done enough programming before to be past these issues, +having them search for solutions online can help them solidify their understanding.) +It's also worth quoting +[Trevor King]([https://github.com/swcarpentry/python-novice-](https://github.com/swcarpentry/python-novice-) +gapminder/issues/22#issuecomment-182573516)'s +comment about online search: +"If you find anything, +other folks were confused enough to bother with a blog or Stack Overflow post, +so it's probably not trivial." + + diff --git a/learner-profiles.md b/learner-profiles.md new file mode 100644 index 000000000..434e335aa --- /dev/null +++ b/learner-profiles.md @@ -0,0 +1,5 @@ +--- +title: FIXME +--- + +This is a placeholder file. Please add content here. diff --git a/links.md b/links.md new file mode 100644 index 000000000..f8e1e4a05 --- /dev/null +++ b/links.md @@ -0,0 +1,41 @@ +[cc-by-human]: https://creativecommons.org/licenses/by/4.0/ +[cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode +[ci]: https://communityin.org/ +[coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html +[coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html +[concept-maps]: https://carpentries.github.io/instructor-training/05-memory/ +[contrib-covenant]: https://contributor-covenant.org/ +[cran-checkpoint]: https://cran.r-project.org/package=checkpoint +[cran-knitr]: https://cran.r-project.org/package=knitr +[cran-stringr]: https://cran.r-project.org/package=stringr +[dc-lessons]: https://www.datacarpentry.org/lessons/ +[email]: mailto:team@carpentries.org +[github-importer]: https://import2.github.com/ +[importer]: https://github.com/new/import +[jekyll-collection]: https://jekyllrb.com/docs/collections/ +[jekyll-install]: https://jekyllrb.com/docs/installation/ +[jekyll-windows]: https://jekyll-windows.juthilo.com/ +[jekyll]: https://jekyllrb.com/ +[jupyter]: https://jupyter.org/ +[jupyterlab]: https://jupyterlab.readthedocs.io/en/stable/ +[kramdown]: https://kramdown.gettalong.org/ +[lc-lessons]: https://librarycarpentry.org/lessons/ +[lesson-example]: https://carpentries.github.io/lesson-example/ +[mit-license]: https://opensource.org/licenses/mit-license.html +[morea]: https://morea-framework.github.io/ +[numfocus]: https://numfocus.org/ +[osi]: https://opensource.org +[pandoc]: https://pandoc.org/ +[paper-now]: https://github.com/PeerJ/paper-now +[python-gapminder]: https://swcarpentry.github.io/python-novice-gapminder/ +[pyyaml]: https://pypi.org/project/PyYAML/ +[r-markdown]: https://rmarkdown.rstudio.com/ +[rstudio]: https://www.rstudio.com/ +[ruby-install-guide]: https://www.ruby-lang.org/en/downloads/ +[ruby-installer]: https://rubyinstaller.org/ +[rubygems]: https://rubygems.org/pages/download/ +[styles]: https://github.com/carpentries/styles/ +[swc-lessons]: https://software-carpentry.org/lessons/ +[swc-releases]: https://github.com/swcarpentry/swc-releases +[training]: https://carpentries.github.io/instructor-training/ +[yaml]: https://yaml.org/ diff --git a/md5sum.txt b/md5sum.txt new file mode 100644 index 000000000..dea8d627c --- /dev/null +++ b/md5sum.txt @@ -0,0 +1,33 @@ +"file" "checksum" "built" "date" +"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2023-05-02" +"LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2023-05-02" +"config.yaml" "4c8c3b66083d754c51eae2c277d24ca0" "site/built/config.yaml" "2023-05-02" +"index.md" "f019634aead94a6e24c7b0a414239caa" "site/built/index.md" "2023-05-03" +"links.md" "fd719a41381bb145880b9220d70edec3" "site/built/links.md" "2023-05-03" +"episodes/01-run-quit.md" "8503041b7590fbc507249282d847cba6" "site/built/01-run-quit.md" "2023-06-07" +"episodes/02-variables.md" "9dacd8cd9968b5d0185f0c244a55eb84" "site/built/02-variables.md" "2023-05-02" +"episodes/03-types-conversion.md" "9e3a08116a2124cd8e23d1d7c5dba432" "site/built/03-types-conversion.md" "2023-05-02" +"episodes/04-built-in.md" "d3ea4aa2a49667b61cb21a62034c88c6" "site/built/04-built-in.md" "2023-05-02" +"episodes/05-coffee.md" "c7616ec40b9e611c47b2bac1e11c47d2" "site/built/05-coffee.md" "2023-05-02" +"episodes/06-libraries.md" "96899c58843e51f10eb84a8ac20ebb90" "site/built/06-libraries.md" "2023-05-02" +"episodes/07-reading-tabular.md" "b5b65e50037a583dfc5a3a879e4404b0" "site/built/07-reading-tabular.md" "2023-05-02" +"episodes/08-data-frames.md" "af0057242e5f63f0c049f58ad66f1cbb" "site/built/08-data-frames.md" "2023-08-29" +"episodes/09-plotting.md" "d701a7c8d39329d1786b48b32063ffc8" "site/built/09-plotting.md" "2024-03-17" +"episodes/10-lunch.md" "0624bfa89e628df443070e8c44271b33" "site/built/10-lunch.md" "2023-05-02" +"episodes/11-lists.md" "1257daeb542377a3b04c6bec0d0ffee1" "site/built/11-lists.md" "2023-07-24" +"episodes/12-for-loops.md" "1da6e4e57a25f8d4fd64802c2eb682c4" "site/built/12-for-loops.md" "2023-05-02" +"episodes/13-conditionals.md" "2739086f688f386c32ce56400c6b27e2" "site/built/13-conditionals.md" "2024-02-16" +"episodes/14-looping-data-sets.md" "fb2992c34b244b375302ffb15bd25b8d" "site/built/14-looping-data-sets.md" "2024-03-05" +"episodes/15-coffee.md" "062bae79eb17ee57f183b21658a8d813" "site/built/15-coffee.md" "2023-05-02" +"episodes/16-writing-functions.md" "0f162f45b0072659b0113baf01ade027" "site/built/16-writing-functions.md" "2023-07-24" +"episodes/17-scope.md" "8109afb18f278a482083d867ad80da6e" "site/built/17-scope.md" "2023-05-02" +"episodes/18-style.md" "67f9594a062909ef15132811d02ee6a0" "site/built/18-style.md" "2023-07-29" +"episodes/19-wrap.md" "8863b58685fecbc89a6f5058bde50307" "site/built/19-wrap.md" "2023-05-02" +"episodes/20-feedback.md" "942925c3013831350ae64f2cb75f2171" "site/built/20-feedback.md" "2023-05-02" +"instructors/design.md" "84d5da2a0671a8a719c26f6636695873" "site/built/design.md" "2023-05-02" +"instructors/instructor-notes.md" "2ea8589d855779b73fe1526c1552b330" "site/built/instructor-notes.md" "2023-05-02" +"learners/discuss.md" "012b885b35283c528857acd0fde06604" "site/built/discuss.md" "2023-05-02" +"learners/exercises.md" "8f305efe9f670305e9d23140d43ca651" "site/built/exercises.md" "2023-05-02" +"learners/reference.md" "f83e0f36168cb869210dd190ef81227b" "site/built/reference.md" "2023-08-19" +"learners/setup.md" "40258d2c8777bac1e9ee081f6c12010c" "site/built/setup.md" "2023-07-26" +"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2023-05-02" diff --git a/reference.md b/reference.md new file mode 100644 index 000000000..0b56a7301 --- /dev/null +++ b/reference.md @@ -0,0 +1,260 @@ +--- +title: 'Reference' +--- + +## Reference + +## [Running and Quitting](episodes/01-run-quit.md) + +- Python files have the `.py` extension. +- Can be written in a text file or a [Jupyter Notebook][jupyter]. + - Jupyter notebooks have the extension `.ipynb` + - Jupyter notebooks can be opened from [Anaconda](https://docs.continuum.io/anaconda/install) or through the command line by entering `$ jupyter notebook` + - Markdown and HTML are allowed in markdown cells for documenting code. + +## [Variables and Assignment](episodes/02-variables.md) + +- Variables are stored using `=`. + - Strings are defined in quotations `'...'`. + - Integers and floating point numbers are defined without quotations. +- Variables can contain letters, digits, and underscores `_`. + - Cannot start with a digit. + - Variables that start with underscores should be avoided. +- Use `print(...)` to display values as text. +- Can use indexing on strings. + - Indexing starts at 0. + - Position is given in square brackets `[position]` following the variable name. + - Take a slice using `[start:stop]`. This makes a copy of part of the original string. + - `start` is the index of the first element. + - `stop` is the index of the element after the last desired element. +- Use `len(...)` to find the length of a variable or string. + +## [Data Types and Type Conversion](episodes/03-types-conversion.md) + +- Each value has a type. This controls what can be done with it. + - `int` represents an integer + - `float` represents a floating point number. + - `str` represents a string. +- To determine a variables type, use the built-in function `type(...)`, including the variable name in the parenthesis. +- Modifying strings: + - Use `+` to concatenate strings. + - Use `*` to repeat a string. + - Numbers and strings cannot be added to on another. + - Convert string to integer: `int(...)`. + - Convert integer to string: `str(...)`. + +## [Built-in Functions and Help](episodes/04-built-in.md) + +- To add a comment, place `#` before the thing you do not with to be executed. +- Commonly used built-in functions: + - `min()` finds the smallest value. + - `max()` finds the largest value. + - `round()` rounds off a floating point number. + - `help()` displays documentation for the function in the parenthesis. + - Other ways to get help include holding down `shift` and pressing `tab` in Jupyter Notebooks. + +## [Libraries](episodes/06-libraries.md) + +- Importing a library: + - Use `import ...` to load a library. + - Refer to this library by using `module_name.thing_name`. + - `.` indicates 'part of'. +- To import a specific item from a library: `from ... import ...` +- To import a library using an alias: `import ... as ...` +- Importing the math library: `import math` + - Example of referring to an item with the module's name: `math.cos(math.pi)`. +- Importing the plotting library as an alias: `import matplotlib as mpl` + +## [Reading Tabular Data into DataFrames](episodes/07-reading-tabular.md) + +- Use the pandas library to do statistics on tabular data. Load with `import pandas as pd`. + - To read in a csv: `pd.read_csv()`, including the path name in the parenthesis. + - To specify a column's values should be used as row headings: `pd.read_csv('path', index_col='column name')`, where path and column name should be replaced with the relevant values. +- To get more information about a DataFrame, use `DataFrame.info`, replacing `DataFrame` with the variable name of your DataFrame. +- Use `DataFrame.columns` to view the column names. +- Use `DataFrame.T` to transpose a DataFrame. +- Use `DataFrame.describe` to get summary statistics about your data. + +## [Pandas DataFrames](episodes/08-data-frames.md) + +- Select data using `[i,j]` + - To select by entry position: `DataFrame.iloc[..., ...]` + - This is inclusive of everything except the final index. + - To select by entry label: `DataFrame.loc[..., ...]` + - Can select multiple rows or columns by listing labels. + - This is inclusive to both ends. + - Use `:` to select all rows or columns. +- Can also select data based on values using `True` and `False`. This is a Boolean mask. + - `mask = subset > 10000` + - We can then use this to select values. +- To use a select-apply-combine operation we use `data.apply(lambda x: x > x.mean())` where `mean()` can be any operation the user would like to be applied to x. + +## [Plotting](episodes/09-plotting.md) + +- The most widely used plotting library is `matplotlib`. + - Usually imported using `import matplotlib.pyplot as plt`. + - To plot we use the command `plt.plot(time, position)`. + - To create a legend use `plt.legend(['label1', 'label2'], loc='upper left')` + - Can also define labels within the plot statements by using `plt.plot(time, position, label='label')`. To make the legend show up, use `plt.legend()` + - To label x and y axis `plt.xlabel('label')` and `plt.ylabel('label')` are used. +- Pandas DataFrames can be used to plot by using `DataFrame.plot()`. Any operations that can be used on a DataFrame can be applied while plotting. + - To plot a bar plot `data.plot(kind='bar')` + +```python +import matplotlib.puplot as plot +plt.plot(time, position, label='label') +plt.xlabel('x axis label') +plt.ylabel('y axis label') +plt.legend() +``` + +## [Lists](episodes/11-lists.md) + +- Defined within `[...]` and separated by `,`. + - An empty list can be created by using `[]`. +- Can use `len(...)` to determine how many values are in a list. +- Can index just as done in previous lessons. + - Indexing can be used to reassign values `list_name[0] = newvalue`. +- To add an item to a list use `list_name.append()`, with the item to append in the parenthesis. +- To combine two lists use `list_name_1.extend(list_name_2)`. +- To remove an item from a list use `del list_name[index]`. + +## [For Loops](episodes/12-for-loops.md) + +- Start a for loop with `for number in [1, 2, 3]:`, with the following lines indented. + - `[1, 2, 3]` is considered the collection. + - `number` is the loop variable. + - The action following the collection is the body. +- To iterate over a sequence of numbers use `range(start, end)` + +```python +for number in range(0,5): + print(number) +``` + +## [Conditionals](episodes/13-conditionals.md) + +- Defined similarly to a loop, using `if variable conditional value:`. + - For example, `if variable > 5:`. +- Use `elif:` for additional tests. +- Use `else:` for when if statement is not true. +- Can combine more than one conditional by using `and` or `or`. +- Often used in combination with for loops. +- Conditions that can be used: + - `==` equal to. + - `>=` greater than or equal to. + - `<=` less than or equal to. + - `>` greater than. + - `<` less than. + +```python +for m in [3, 6, 7, 2, 8]: + if m > 5: + print(m, 'is large') + elif m == 5: + print(m, 'is 5') + else: + print(m, 'is small') +``` + +## [Looping Over Data Sets](episodes/14-looping-data-sets.md) + +- Use a for loop: `for filename in [file1, file2]:` +- To find a set of files using a pattern use `glob.glob` + - Must import first using `import glob`. + - `*` indicates "match zero or more characters" + - `?` indicates "match exactly one character" + - For example: `glob.glob(*.txt)` will find all files that end with `.txt` in the current directory. +- Combine these by writing a loop using: `for filename in glob.glob(*.txt):` + +```python +for filename in glob.glob(*.txt): + data = pd.read_csv(filename) +``` + +## [Writing Functions](episodes/16-writing-functions.md) + +- Define a function using `def function_name(parameters):`. Replace `parameters` with the variables to use when the function is executed. +- Run by using `function_name(parameters)`. +- To return a result to the caller use `return ...` in the function. + +```python +def add_numbers(a, b): + result = a + b + return result + +add_numbers(1, 4) +``` + +## [Variable Scope](episodes/17-scope.md) + +- A local variable is defined in a function and can only be seen and used within that function. +- A global variable is defined outside of a function and can be seen or used anywhere after definition. + +## [Programming Style](episodes/18-style.md) + +- Document your code. +- Use clear and meaningful variable names. +- Follow [the PEP8 style guide](https://www.python.org/dev/peps/pep-0008) when setting up your code. +- Use assertions to check for internal errors. +- Use docstrings to provide help. + +## Glossary + +Arguments +: Values passed to functions. + +Array +: A container holding elements of the same type. + +Boolean +: An object composed of `True` and `False`. + +DataFrame +: The way Pandas represents a table; a collection of series. + +Element +: An item in a list or an array. For a string, these are the individual characters. + +Function +: A block of code that can be called and re-used elsewhere. + +Global variable +: A variable defined outside of a function that can be used anywhere. + +Index +: The position of a given element. + +Jupyter Notebook +: Interactive coding environment allowing a combination of code and markdown. + +Library +: A collection of files containing functions used by other programs. + +Local Variable +: A variable defined inside of a function that can only be used inside of that function. + +Mask +: A boolean object used for selecting data from another object. + +Method +: An action tied to a particular object. Called by using `object.method`. + +Modules +: The files within a library containing functions used by other programs. + +Parameters +: Variables used when executing a function. + +Series +: A Pandas data structure to represent a column. + +Substring +: A part of a string. + +Variables +: Names for values. + + + + diff --git a/setup.md b/setup.md new file mode 100644 index 000000000..3f53a3e92 --- /dev/null +++ b/setup.md @@ -0,0 +1,22 @@ +--- +title: Setup +--- + +## Getting the Data + +The data we will be using is taken from the [gapminder] dataset. +To obtain it, download and unzip the file +[python-novice-gapminder-data.zip](files/python-novice-gapminder-data.zip). +In order to follow the presented material, you should launch the JupyterLab +server in the root directory (see [Starting JupyterLab](episodes/01-run-quit.md#starting-jupyterlab)). + +## Installing Python Using Anaconda + +Please refer to the [Python section of the workshop website for installation instructions.](https://carpentries.github.io/workshop-template/install_instructions/#python) + + + +[gapminder]: https://en.wikipedia.org/wiki/Gapminder_Foundation + + +