diff --git a/Python-ML/Python-for-ML_01.ipynb b/Python-ML/Python-for-ML_01.ipynb index 2085b3a..d5b8ed6 100644 --- a/Python-ML/Python-for-ML_01.ipynb +++ b/Python-ML/Python-for-ML_01.ipynb @@ -4,9 +4,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Python review of concepts\n", + "# Python for AI/ML: Review of concepts\n", "\n", - "Mainly to point out useful aspects of Python you may have glossed over. Assumes you already know Python fairly well." + "Mainly to point out useful aspects of Python you may have glossed over. Assumes you already know Python fairly well.\n", + "\n", + "## Acknowledgments \\& Credits\n", + "\n", + "This lesson is adapted largely from the excellent curriculum materials by Cliburn Chan (2021) at under the MIT License." ] }, { @@ -22,6 +26,7 @@ "source": [ "### Why Python? \n", "\n", + "Modules, executable notebooks, and tutorials exist in several programming languages widely used (or increasingly popular) in the life sciences (e.g., R, Julia, Rust). Why did we choose to focus on Python here? \n", "- Huge community - especially in data science and ML \n", "- Easy to learn \n", "- Batteries included \n", @@ -30,32 +35,15 @@ "- Most important “glue” language bridging multiple communities" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import __hello__" - ] - }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Versions \n", "\n", - "- Only use Python 3 (current release version is 3.8, container is 3.7) \n", - "- Do not use Python 2" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import sys" + "- Only use **Python 3**. ML frameworks (Tensorflow, PyTorch) at recent versions require at least Python v3.8. (Tensorflow 1.x can use up to Python v3.7; Tensorflow v2.2+ Python can use Python 3.8.)\n", + "- Do not use Python 2.\n", + "- Container has Python 3.11:" ] }, { @@ -64,6 +52,7 @@ "metadata": {}, "outputs": [], "source": [ + "import sys\n", "sys.version" ] }, @@ -196,7 +185,7 @@ "metadata": {}, "outputs": [], "source": [ - "type(1), type(2.3)" + "type(1), type(1.0), type(2.3)" ] }, { @@ -228,53 +217,52 @@ "np.arange(3) + 10" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.array([1,2,3]) + np.array([4,5,6])" + ] + }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Several Python implementations! \n", - "\n", - "- CPtyhon \n", - "- Pypy \n", - "- IronPython \n", - "- Jython" + "## Coding in Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Global interpreter lock (GIL) \n", + "### Coding conventions \n", "\n", - "- Only applies to CPython\n", - "- Threads vs processes \n", - "- Avoid threads in general \n", - "- Performance not predictable" + "- PEP 8 \n", + "- Avoid magic numbers \n", + "- Avoid copy and paste \n", + "- extract common functionality into functions" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "from concurrent.futures import ThreadPoolExecutor" + "[Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "def f(n):\n", - " x = np.random.uniform(0,1,n)\n", - " y = np.random.uniform(0,1,n)\n", - " count = 0\n", - " for i in range(n):\n", - " if x[i]**2 + y[i]**2 < 1:\n", - " count += 1\n", - " return count*4/n" + "### Data types \n", + "\n", + "#### Integers \n", + "- Arbitrary precision \n", + "- Integer division operator \n", + "- Base conversion \n", + "- Check if integer " ] }, { @@ -283,8 +271,7 @@ "metadata": {}, "outputs": [], "source": [ - "n = 100000\n", - "niter = 4" + "import math" ] }, { @@ -293,9 +280,7 @@ "metadata": {}, "outputs": [], "source": [ - "%%time\n", - "\n", - "[f(n) for i in range(niter)]" + "n = math.factorial(100)" ] }, { @@ -304,18 +289,16 @@ "metadata": {}, "outputs": [], "source": [ - "%%time\n", - "\n", - "with ThreadPoolExecutor(4) as pool:\n", - " xs = list(pool.map(f, [n]*niter))\n", - "xs" + "n" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "## Coding in Python" + "f'{n:,}'" ] }, { @@ -324,39 +307,25 @@ "metadata": {}, "outputs": [], "source": [ - "import this" + "sys.getsizeof(64), sys.getsizeof(2**31), sys.getsizeof(2**60)" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "### Coding conventions \n", - "\n", - "- PEP 8 \n", - "- Avoid magic numbers \n", - "- Avoid copy and paste \n", - "- extract common functionality into functions" + "sys.getsizeof(n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "[Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Data types \n", - "\n", - "- Integers \n", - " - Arbitrary precision \n", - " - Integer division operator \n", - " - Base conversion \n", - " - Check if integer " + "#### Floats\n", + "- Checking for equality \n", + "- Roundoff error accumulation " ] }, { @@ -365,7 +334,7 @@ "metadata": {}, "outputs": [], "source": [ - "import math" + "h = math.sqrt(3**2 + 4**2)" ] }, { @@ -374,7 +343,7 @@ "metadata": {}, "outputs": [], "source": [ - "n = math.factorial(100)" + "h" ] }, { @@ -383,7 +352,7 @@ "metadata": {}, "outputs": [], "source": [ - "n" + "h.is_integer()" ] }, { @@ -392,7 +361,7 @@ "metadata": {}, "outputs": [], "source": [ - "f'{n:,}'" + "h == 5" ] }, { @@ -401,7 +370,8 @@ "metadata": {}, "outputs": [], "source": [ - "h = math.sqrt(3**2 + 4**2)" + "x = np.arange(9).reshape(3,3)\n", + "x" ] }, { @@ -410,7 +380,7 @@ "metadata": {}, "outputs": [], "source": [ - "h" + "x.sum(axis=0)" ] }, { @@ -419,17 +389,17 @@ "metadata": {}, "outputs": [], "source": [ - "h.is_integer()" + "x = x / x.sum(axis=0)\n", + "z = np.linalg.eigvals(x)" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "- Floats \n", - " - Checking for equality \n", - " - Catastrophic cancellation \n", - "- Complex" + "z" ] }, { @@ -438,9 +408,7 @@ "metadata": {}, "outputs": [], "source": [ - "x = np.arange(9).reshape(3,3)\n", - "x = x / x.sum(axis=0)\n", - "λ = np.linalg.eigvals(x)" + "z[0] == 1" ] }, { @@ -449,7 +417,7 @@ "metadata": {}, "outputs": [], "source": [ - "λ[0]" + "z[0], z[1], z[2]" ] }, { @@ -458,16 +426,18 @@ "metadata": {}, "outputs": [], "source": [ - "λ[0] == 1" + "math.isclose(z[0], 1)" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "math.isclose(λ[0], 1)" + "Sample variance:\n", + "$$var(X)=\\frac{\\sum_i^N(x_i-\\mu)^2}{N-1}; \\mu=\\frac{1}{N}\\sum_i^Nx_i$$\n", + "\n", + "Short-cut formula to avoid having to first calculate sample mean $\\mu$:\n", + "$$var(X)=\\frac{\\sum_i^Nx_i^2 - \\frac{1}{N}(\\sum_i^Nx_i)^2}{N-1}$$" ] }, { @@ -498,7 +468,8 @@ "metadata": {}, "outputs": [], "source": [ - "xs = np.random.normal(1e9, 1, int(1e6))" + "xs = np.random.normal(1e9, 1, int(1e6))\n", + "xs, type(xs[0])" ] }, { @@ -523,8 +494,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "- Boolean \n", - " - What evaluates as False? " + "#### Boolean \n", + "- What evaluates as False? " ] }, { @@ -545,9 +516,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "- String \n", - " - Unicode by default \n", - " - b, r, f strings" + "#### String \n", + "- Unicode by default \n", + "- b, r, f strings" ] }, { @@ -559,6 +530,45 @@ "u'\\u732b'" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = 'ACGT'\n", + "type(s), list(s)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s = b'ACGT'\n", + "type(s), list(s)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "r\"C:\\Users\\Name\\Documents\\file.txt\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"String with newline\\nThis is after the newline\")\n", + "print(r\"String with newline\\nThis is after the newline\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -659,6 +669,56 @@ "[x for x in dir(collections) if not x.startswith('_')]" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = {'a': 1, 'b': 'foo', 'c': 1e-6}\n", + "d" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "kw = {'b': 'bar', 'z': 5}\n", + "{**d, **kw}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "kw2 = kw\n", + "kw2['z'] = 10\n", + "kw" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "l = [1,2,3]\n", + "l" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[*l]" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -678,9 +738,15 @@ "metadata": {}, "outputs": [], "source": [ - "def f(*args, **kwargs):\n", - " print(f\"args = {args}\") # in Python 3.8, you can just write f'{args = }'\n", - " print(f\"kwargs = {kwargs}\")" + "def f(*args, c=1, **kwargs):\n", + " print(f\"f() c = {c}\")\n", + " print(f\"f() args = {args}\")\n", + " print(f\"f() kwargs = {kwargs}\")\n", + " return g(*args, **kwargs)\n", + "\n", + "def g(*args, **kwargs):\n", + " print(f\"g() args = {args}\")\n", + " print(f\"g() kwargs = {kwargs}\")" ] }, { @@ -698,9 +764,9 @@ "metadata": {}, "outputs": [], "source": [ - "def g(a, xs=[]):\n", - " xs.append(a)\n", - " return xs" + "l = [4,5,6]\n", + "k = {'c':10, 'a': 4, 'b': 5}\n", + "f(l, k)" ] }, { @@ -709,7 +775,7 @@ "metadata": {}, "outputs": [], "source": [ - "g(1)" + "f(*l, **k)" ] }, { @@ -718,16 +784,9 @@ "metadata": {}, "outputs": [], "source": [ - "g(2)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "h = lambda x, y, z: x**2 + y**2 + z**2" + "def g(a, xs=[]):\n", + " xs.append(a)\n", + " return xs" ] }, { @@ -736,7 +795,7 @@ "metadata": {}, "outputs": [], "source": [ - "h(1,2,3)" + "g(1)" ] }, { @@ -745,7 +804,7 @@ "metadata": {}, "outputs": [], "source": [ - "from functools import lru_cache" + "g(2)" ] }, { @@ -754,12 +813,7 @@ "metadata": {}, "outputs": [], "source": [ - "def fib(n):\n", - " print(n, end=', ')\n", - " if n <= 1:\n", - " return n\n", - " else:\n", - " return fib(n-2) + fib(n-1)" + "h = lambda x, y, z: x**2 + y**2 + z**2" ] }, { @@ -768,7 +822,7 @@ "metadata": {}, "outputs": [], "source": [ - "fib(10)" + "h(1,2,3)" ] }, { @@ -777,13 +831,13 @@ "metadata": {}, "outputs": [], "source": [ - "@lru_cache(maxsize=100)\n", - "def fib_cache(n):\n", - " print(n, end=', ')\n", - " if n <= 1:\n", - " return n\n", - " else:\n", - " return fib_cache(n-2) + fib_cache(n-1)" + "class VerboseFunc:\n", + " def __init__(self, func):\n", + " self.func = func\n", + "\n", + " def __call__(self, *args):\n", + " print(f'called with args = {args} for function {str(self.func)}')\n", + " return self.func(*args)\n" ] }, { @@ -792,7 +846,9 @@ "metadata": {}, "outputs": [], "source": [ - "fib_cache(10)" + "import math\n", + "verbose_prod = VerboseFunc(lambda x, y: x * y)\n", + "verbose_prod(4,5)" ] }, { @@ -838,10 +894,21 @@ " def __init__(self, first, last):\n", " self.first = first\n", " self.last = last\n", - " \n", + "\n", " @property\n", " def name(self):\n", - " return f'{self.first} {self.last}' " + " return f'{self.first} {self.last}'\n", + "\n", + " @staticmethod\n", + " def is_student(obj):\n", + " return isinstance(obj, Student)\n", + "\n", + " @classmethod\n", + " def fromlist(cls, l):\n", + " print(cls)\n", + " while len(l) < 2:\n", + " l = l + ['Nameless']\n", + " return cls(*l)" ] }, { @@ -862,6 +929,77 @@ "s.name" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Student.is_student(s)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s2 = Student.fromlist(['Santa','Claus'])\n", + "type(s2), s2.name" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class GraduateStudent(Student):\n", + " def __init__(self, first, last, program='Masters'):\n", + " super().__init__(first,last)\n", + " self.program = program\n", + "\n", + " @property\n", + " def name(self):\n", + " return super().name + f' ({self.program} program)'\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s3 = GraduateStudent.fromlist(['Santa','Claus','PhD'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s3.name" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Student.is_student(s3), GraduateStudent.is_student(s3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "GraduateStudent.fromlist(['John']).name" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -896,6 +1034,16 @@ " SUN = 7" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from collections.abc import Iterable\n", + "type(Day), isinstance(Day, Iterable)" + ] + }, { "cell_type": "code", "execution_count": null, @@ -906,6 +1054,15 @@ " print(day)" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "[str(day) for day in Day]" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -1054,7 +1211,7 @@ "### Imports, modules and namespaces \n", "\n", "- A namespace is basically just a dictionary \n", - "- LEGB \n", + "- **L**(ocal)**E**(nclosing)**G**(lobal)**B**(uiltin) \n", "- Avoid polluting the global namespace" ] }, @@ -1093,9 +1250,9 @@ "x = 23\n", "\n", "def f2(x):\n", - " print(locals())\n", + " print('enclosing locals:', locals())\n", " def g(x):\n", - " print(locals())\n", + " print('enclosed locals:', locals())\n", " return x \n", " return g" ] @@ -1126,7 +1283,7 @@ "source": [ "### Loops \n", "\n", - "- Prefer vectorization unless using numba \n", + "- Prefer vectorization unless using [Numba](https://numba.pydata.org) \n", "- Difference between continue and break \n", "- Avoid infinite loops \n", "- Comprehensions and generator expressions" @@ -1228,7 +1385,7 @@ "outputs": [], "source": [ "def cycle1(xs, n):\n", - " \"\"\"Cuycles through values in xs n times.\"\"\"\n", + " \"\"\"Cycles through values in xs n times.\"\"\"\n", " \n", " for i in range(n):\n", " for x in xs:\n", @@ -1264,7 +1421,7 @@ "outputs": [], "source": [ "def cycle2(xs, n):\n", - " \"\"\"Cuycles through values in xs n times.\"\"\"\n", + " \"\"\"Cycles through values in xs n times.\"\"\"\n", " \n", " for i in range(n):\n", " yield from xs" @@ -1335,18 +1492,11 @@ "source": [ "list(it.islice(fib(), 5, 10))" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -1360,7 +1510,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.11.10" } }, "nbformat": 4,