Skip to content

Commit

Permalink
Add asynchronous processing libraries. (#331)
Browse files Browse the repository at this point in the history
The first couple of async tools. I don't have tonnes of experience with
async stuff so I'm probably not the best person to decide the 🚦 s.

We use `asyncio` for [INFORMus](https://github.com/inform-us/INFORMus/)
(and probably a fair few other web APIs??) and @paddyroddy likes
`futures`.

## Continues

- #323 
- #178

---------

Co-authored-by: Matt Graham <[email protected]>
  • Loading branch information
samcunliffe and matt-graham authored Mar 25, 2024
1 parent 06fd1b2 commit 01dd7b2
Showing 1 changed file with 25 additions and 9 deletions.
34 changes: 25 additions & 9 deletions docs/pages/parallel-async.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ layout: default

# Parallel and asynchronous processing

Python has a good ecosystem of libraries for parallelising the processing of tasks,
as well as asynchronous processing.
Python has a good ecosystem of libraries for parallelising the processing of
tasks, as well as asynchronous processing.

Parallelisation in Python is typically _process-based_ with code parallelised
across multiple Python processes each with their own interpreter or makes use of
Expand All @@ -21,13 +21,14 @@ simply due to pre-existing code using a library like [pandas].

## Process-based (and thread-based) parallelism

| Name | Short description | 🚦 |
| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-: |
| [multiprocess] | A fork of [multiprocessing] which uses `dill` instead of `pickle` to allow serializing wider range of object types including nested / anonymous functions. We've found this easier to use than `multiprocessing`. | 🟢 |
| [dask] | Aims to make scaling existing code in familiar libraries (`numpy`, [pandas], `scikit-learn`, ...) easy. | 🟠 |
| [multiprocessing] | The standard library module for distributing tasks across multiple processes. | 🟠 |
| [mpi4py] | Support for MPI based parallelism. | 🟠 |
| [threading] | The standard library module for multi-threading. Due to the _global interpreter lock_ [currently][PEP703] only one thread can execute Python code at a time. | 🔴 |
| Name | Short description | 🚦 |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-: |
| [multiprocess] | A fork of [multiprocessing] which uses `dill` instead of `pickle` to allow serializing wider range of object types including nested / anonymous functions. We've found this easier to use than `multiprocessing`. | 🟢 |
| [concurrent.futures] | [See the table below](#asynchronous-processing). | 🟠 |
| [dask] | Aims to make scaling existing code in familiar libraries (`numpy`, [pandas], `scikit-learn`, ...) easy. | 🟠 |
| [multiprocessing] | The standard library module for distributing tasks across multiple processes. | 🟠 |
| [mpi4py] | Support for MPI based parallelism. | 🟠 |
| [threading] | The standard library module for multi-threading. Due to the _global interpreter lock_ [currently][PEP703] only one thread can execute Python code at a time. | 🔴 |

## Compiler-based parallelism

Expand All @@ -37,6 +38,19 @@ simply due to pre-existing code using a library like [pandas].
| [numba] | [Support for parallelism via `jit(parallel=True)`](https://numba.pydata.org/numba-doc/latest/user/parallel.html). | 🟠 |
| [jax] | [Support for parallelising NumPy / scientific computing like operations using functional transforms](https://jax.readthedocs.io/en/latest/jax-101/06-parallelism.html). | 🟠 |

## Asynchronous processing

| Name | Short description | 🚦 |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-: |
| [asyncio] | Python standard library for asynchronous programming with tasks run in a single-threaded event loop. Used for [cooperative multitasking](https://en.wikipedia.org/wiki/Cooperative_multitasking). | 🟠 |
| [concurrent.futures] | Another Python standard library for asynchrounous processing. Provides a common interface for thread and process based concurrency as an alternative to using `multiprocess(ing)` or `threading` directly. | 🟠 |

## See also

- This [Stack Overflow post](https://stackoverflow.com/a/61360215) is a nice
summary of what each of [threading], [multiprocessing], [asyncio] and
[concurrent.futures] do.

<!-- URLs for more a readable tables and text above 👆 -->

[multiprocess]: https://multiprocess.readthedocs.io/en/stable/
Expand All @@ -49,3 +63,5 @@ simply due to pre-existing code using a library like [pandas].
[dask]: https://docs.dask.org/
[numba]: https://numba.pydata.org/
[jax]: https://jax.readthedocs.io/
[asyncio]: https://docs.python.org/3/library/asyncio.html
[concurrent.futures]: https://docs.python.org/3/library/concurrent.futures.html

0 comments on commit 01dd7b2

Please sign in to comment.