From e5362d341a8723686bde2cf791bf4e459eb287cb Mon Sep 17 00:00:00 2001 From: Hilmar Lapp Date: Tue, 22 Oct 2024 13:05:14 -0400 Subject: [PATCH] Adds Python concepts review notebook --- Python-ML/Python-for-ML_01.ipynb | 1459 ++++++++++++++++++++++++++++++ 1 file changed, 1459 insertions(+) create mode 100644 Python-ML/Python-for-ML_01.ipynb diff --git a/Python-ML/Python-for-ML_01.ipynb b/Python-ML/Python-for-ML_01.ipynb new file mode 100644 index 0000000..a9a943a --- /dev/null +++ b/Python-ML/Python-for-ML_01.ipynb @@ -0,0 +1,1459 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Python for AI/ML: Review of concepts\n", + "\n", + "Mainly to point out useful aspects of Python you may have glossed over. Assumes you already know Python fairly well.\n", + "\n", + "## Acknowledgments \\& Credits\n", + "\n", + "This lesson is adapted largely from the excellent curriculum materials by Cliburn Chan (2021) at under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Python as a language" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Why Python? \n", + "\n", + "Modules, executable notebooks, and tutorials exist in several programming languages widely used (or increasingly popular) in the life sciences (e.g., R, Julia, Rust). Why did we choose to focus on Python here? \n", + "- Huge community - especially in data science and ML \n", + "- Easy to learn \n", + "- Batteries included \n", + "- Extensive 3rd party libraries \n", + "- Widely used in both industry and academia \n", + "- Most important “glue” language bridging multiple communities" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Versions \n", + "\n", + "- Only use **Python 3**. ML frameworks (Tensorflow, PyTorch) at recent versions require at least Python v3.8. (Tensorflow 1.x can use up to Python v3.7; Tensorflow v2.2+ Python can use Python 3.8.)\n", + "- Do not use Python 2.\n", + "- Container has Python 3.11:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sys\n", + "sys.version" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Multi-paradigm " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Procedural" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = []\n", + "for i in range(5):\n", + " x.append(i*i)\n", + "x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Functional" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(map(lambda x: x*x, range(5)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Object-oriented " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class Robot:\n", + " def __init__(self, name, function):\n", + " self.name = name\n", + " self.function = function\n", + " \n", + " def greet(self):\n", + " return f\"I am {self.name}, a {self.function} robot!\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fido = Robot('roomba', 'vacuum cleaner')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fido.name" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fido.function" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fido.greet()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Dynamic typing " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Complexity of a + b " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "1 + 2.3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "type(1), type(1.0), type(2.3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "'hello' + ' world'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[1,2,3] + [4,5,6]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "np.arange(3) + 10" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.array([1,2,3]) + np.array([4,5,6])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Coding in Python" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Coding conventions \n", + "\n", + "- PEP 8 \n", + "- Avoid magic numbers \n", + "- Avoid copy and paste \n", + "- extract common functionality into functions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Data types \n", + "\n", + "#### Integers \n", + "- Arbitrary precision \n", + "- Integer division operator \n", + "- Base conversion \n", + "- Check if integer " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import math" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "n = math.factorial(100)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "f'{n:,}'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sys.getsizeof(64), sys.getsizeof(2**31), sys.getsizeof(2**60)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sys.getsizeof(n)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Floats\n", + "- Checking for equality \n", + "- Roundoff error accumulation " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "h = math.sqrt(3**2 + 4**2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "h" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "h.is_integer()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "h == 5" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = np.arange(9).reshape(3,3)\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x.sum(axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = x / x.sum(axis=0)\n", + "z = np.linalg.eigvals(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z[0] == 1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "z[0], z[1], z[2]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "math.isclose(z[0], 1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Sample variance:\n", + "$$var(X)=\\frac{\\sum_i^N(x_i-\\mu)^2}{N-1}; \\mu=\\frac{1}{N}\\sum_i^Nx_i$$\n", + "\n", + "Short-cut formula to avoid having to first calculate sample mean $\\mu$:\n", + "$$var(X)=\\frac{\\sum_i^Nx_i^2 - \\frac{1}{N}(\\sum_i^Nx_i)^2}{N-1}$$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def var(xs):\n", + " \"\"\"Returns variance of sample data.\"\"\"\n", + " \n", + " n = 0\n", + " s = 0\n", + " ss = 0\n", + "\n", + " for x in xs:\n", + " n +=1\n", + " s += x\n", + " ss += x*x\n", + "\n", + " v = (ss - (s*s)/n)/(n-1)\n", + " return v" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "xs = np.random.normal(1e9, 1, int(1e6))\n", + "xs, type(xs[0])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "var(xs)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.var(xs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Boolean \n", + "- What evaluates as False? " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "stuff = [[], [1], {},'', 'hello', 0, 1, 1==1, 1==2]\n", + "for s in stuff:\n", + " if s:\n", + " print(f'{s} evaluates as True')\n", + " else:\n", + " print(f'{s} evaluates as False')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### String \n", + "- Unicode by default \n", + "- b, r, f strings" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "u'\\u732b'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'ACGT'\n", + "type(s), list(s)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = b'ACGT'\n", + "type(s), list(s)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "r\"C:\\Users\\Name\\Documents\\file.txt\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"String with newline\\nThis is after the newline\")\n", + "print(r\"String with newline\\nThis is after the newline\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "String formatting\n", + "\n", + "- Learn to use the f-string." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import string" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "char = 'e'\n", + "pos = string.ascii_lowercase.index(char) + 1\n", + "f\"The letter {char} has position {pos} in the alphabet\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "n = int(1e9)\n", + "f\"{n:,}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = math.pi" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "f\"{x:8.2f}\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import datetime\n", + "now = datetime.datetime.now()\n", + "now" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "f\"{now:%Y-%m-%d %H:%M}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Data structures \n", + "\n", + "- Immutable - string, tulle \n", + "- Mutable - list, set, dictionary \n", + "- Collections module \n", + "- heapq " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "\n", + "[x for x in dir(collections) if not x.startswith('_')]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = {'a': 1, 'b': 'foo', 'c': 1e-6}\n", + "d" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "kw = {'b': 'bar', 'z': 5}\n", + "{**d, **kw}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "kw2 = kw\n", + "kw2['z'] = 10\n", + "kw" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "l = [1,2,3]\n", + "l" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[*l]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Functions \n", + "\n", + "- \\*args, \\*\\*kwargs \n", + "- Care with mutable default values \n", + "- First class objects \n", + "- Anonymous functions \n", + "- Decorators" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def f(*args, c=1, **kwargs):\n", + " print(f\"f() c = {c}\")\n", + " print(f\"f() args = {args}\")\n", + " print(f\"f() kwargs = {kwargs}\")\n", + " return g(*args, **kwargs)\n", + "\n", + "def g(*args, **kwargs):\n", + " print(f\"g() args = {args}\")\n", + " print(f\"g() kwargs = {kwargs}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "f(1,2,3,a=4,b=5,c=6)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "l = [4,5,6]\n", + "k = {'c':10}\n", + "f(l)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "f(*l, **k)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "kwargs = {'a': 1, 'b': 2, 'c': 3}\n", + "newargs = kwargs\n", + "newargs['a'] = 0\n", + "kwargs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def g(a, xs=[]):\n", + " xs.append(a)\n", + " return xs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "g(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "g(2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "h = lambda x, y, z: x**2 + y**2 + z**2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "h(1,2,3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from functools import lru_cache" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def fib(n):\n", + " print(n, end=', ')\n", + " if n <= 1:\n", + " return n\n", + " else:\n", + " return fib(n-2) + fib(n-1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fib(10)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@lru_cache(maxsize=100)\n", + "def fib_cache(n):\n", + " print(n, end=', ')\n", + " if n <= 1:\n", + " return n\n", + " else:\n", + " return fib_cache(n-2) + fib_cache(n-1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fib_cache(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Classes \n", + "\n", + "- Key idea is encapsulation into objects \n", + "- Everything in Python is an object \n", + "- Attributes and methods \n", + "- What is self? \n", + "- Special methods - double underscore methods \n", + "- Avoid complex inheritance schemes - prefer composition \n", + "- Learn “design patterns” if interested in OOP" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "(3.0).is_integer()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "'hello world'.title()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class Student:\n", + " def __init__(self, first, last):\n", + " self.first = first\n", + " self.last = last\n", + " \n", + " @property\n", + " def name(self):\n", + " return f'{self.first} {self.last}' " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = Student('Santa', 'Claus')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s.name" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Enums\n", + "\n", + "Use enums readability when you have a discrete set of CONSTANTS." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from enum import Enum" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class Day(Enum):\n", + " MON = 1\n", + " TUE = 2\n", + " WED = 3\n", + " THU = 4\n", + " FRI = 5\n", + " SAT = 6\n", + " SUN = 7" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for day in Day:\n", + " print(day)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### NamedTuple" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from collections import namedtuple" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Student = namedtuple('Student', ['name', 'email', 'age', 'gpa', 'species'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "abe = Student('Abraham Lincoln', 'abe.lincoln@gmail.com', 23, 3.4, 'Human')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "abe.species" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "abe[1:4]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Data Classes\n", + "\n", + "Simplifies creation and use of classes for data records. \n", + "\n", + "Note: NamedTuple serves a similar function but are immutable." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from dataclasses import dataclass" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@dataclass\n", + "class Student:\n", + " name: str\n", + " email: str\n", + " age: int\n", + " gpa: float\n", + " species: str = 'Human'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "abe = Student('Abraham Lincoln', 'abe.lincoln@gmail.com', age=23, gpa=3.4)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "abe" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "abe.email" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "abe.species" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note**\n", + "\n", + "The type annotations are informative only. Python does *not* enforce them." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Student(*'abcde')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Imports, modules and namespaces \n", + "\n", + "- A namespace is basically just a dictionary \n", + "- LEGB \n", + "- Avoid polluting the global namespace" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[x for x in dir(__builtin__) if x[0].islower()][:8]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x1 = 23\n", + "\n", + "def f1(x2):\n", + " print(locals())\n", + " # x1 is global (G), x2 is enclosing (E), x3 is local\n", + " def g(x3):\n", + " print(locals())\n", + " return x3 + x2 + x1 \n", + " return g" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x = 23\n", + "\n", + "def f2(x):\n", + " print(locals())\n", + " def g(x):\n", + " print(locals())\n", + " return x \n", + " return g" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "g1 = f1(3)\n", + "g1(2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "g2 = f2(3)\n", + "g2(2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Loops \n", + "\n", + "- Prefer vectorization unless using numba \n", + "- Difference between continue and break \n", + "- Avoid infinite loops \n", + "- Comprehensions and generator expressions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import string" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "{char: ord(char) for char in string.ascii_lowercase}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Iterations and generators \n", + "\n", + "- The iterator protocol\n", + " - `__iter__` and `__next__`\n", + " - iter()\n", + " - next()\n", + "- What happens in a for loop\n", + "- Generators with `yield` and `yield from`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class Iterator:\n", + " \"\"\"A silly class that implements the Iterator protocol and Strategy pattern.\n", + " \n", + " start = start of range to apply func to\n", + " stop = end of range to apply func to\n", + " \"\"\"\n", + " def __init__(self, start, stop, func):\n", + " self.start = start\n", + " self.stop = stop\n", + " self.func = func\n", + " \n", + " def __iter__(self):\n", + " self.n = self.start\n", + " return self\n", + " \n", + " def __next__(self):\n", + " if self.n >= self.stop:\n", + " raise StopIteration\n", + " else:\n", + " x = self.func(self.n)\n", + " self.n += 1\n", + " return x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sq = Iterator(0, 5, lambda x: x*x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(sq)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Generators\n", + "\n", + "Like functions, but lazy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def cycle1(xs, n):\n", + " \"\"\"Cycles through values in xs n times.\"\"\"\n", + " \n", + " for i in range(n):\n", + " for x in xs:\n", + " yield x" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(cycle1([1,2,3], 4))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for x in cycle1(['ann', 'bob', 'stop', 'charles'], 1000):\n", + " if x == 'stop':\n", + " break\n", + " else:\n", + " print(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def cycle2(xs, n):\n", + " \"\"\"Cycles through values in xs n times.\"\"\"\n", + " \n", + " for i in range(n):\n", + " yield from xs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(cycle2([1,2,3], 4))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Because they are lazy, generators can be used for infinite streams." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def fib():\n", + " a, b = 1, 1\n", + " while True:\n", + " yield a\n", + " a, b = b, a + b" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for n in fib():\n", + " if n > 100:\n", + " break\n", + " print(n, end=', ')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can even slice infinite generators. More when we cover functional programming." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import itertools as it" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(it.islice(fib(), 5, 10))" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.10" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}