From 43302fe279ead9ad62510677759598ec6c346a02 Mon Sep 17 00:00:00 2001 From: Kyle Barron Date: Sun, 8 Oct 2023 21:24:04 -0400 Subject: [PATCH] dev and perf docs (#62) --- DEVELOP.md | 63 +++++++++++++++++++++++++++++++++++++++++++++ README.md | 2 ++ docs/index.md | 1 + docs/performance.md | 53 ++++++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 5 files changed, 120 insertions(+) create mode 100644 DEVELOP.md create mode 100644 docs/performance.md diff --git a/DEVELOP.md b/DEVELOP.md new file mode 100644 index 00000000..c6fbb508 --- /dev/null +++ b/DEVELOP.md @@ -0,0 +1,63 @@ +# Developer Documentation + +## Python + +This project uses [Poetry](https://python-poetry.org/) to manage Python dependencies. + +After installing Poetry, run + +``` +poetry install +``` + +to install all dependencies. + +To register the current Poetry-managed Python environment with JupyterLab, run + +``` +poetry run python -m ipykernel install --user --name "lonboard" +``` + +JupyterLab is an included dev dependency, so to start JupyterLab you can run + +``` +poetry run jupyter lab +``` + +Then you should see a tile on the home screen that lets you open a Jupyter Notebook in the `lonboard` environment. You should also be able to open up an example notebook from the `examples/` folder. + +## JavaScript + +The JavaScript dependencies are managed in `package.json` and tracked with Yarn or NPM (I haven't been consistent at using one or the other :sweat_smile:). + +ESBuild is used for bundling into an ES Module that the Jupyter Widget loads at runtime. The ESBuild configuration is in `build.mjs`. You can run the script with + +``` +yarn build +``` + +I often run + +``` +fswatch -o src | xargs -n1 -I{} yarn build +``` + +to watch the `src` directory and run `yarn build` anytime it changes. + +Currently, each Python model (the `ScatterplotLayer`, `PathLayer`, and `SolidPolygonLayer` classes) use _their own individual JS entry points_. You can inspect this with the `_esm` key on each class, which is used by anywidget to load in the widget. The ESBuild script converts `scatterplot-layer.tsx`, `path-layer.tsx`, and `solid-polygon-layer.tsx` into bundles used by each class, respectively. + +Anywidget and its dependency ipywidgets handles the serialization from Python into JS, automatically keeping each side in sync. + +## Documentation website + +The documentation website is generated with `mkdocs` and [`mkdocs-material`](https://squidfunk.github.io/mkdocs-material). After `poetry install`, you can serve the docs website locally with + +``` +poetry run mkdocs serve +``` + +and you can publish the docs to Github Pages with + +``` +poetry run mkdocs gh-deploy +``` diff --git a/README.md b/README.md index da452bf4..5470b3c5 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,7 @@ # lonboard +Python library for extremely fast geospatial data visualization in Jupyter. + ![](docs/img/scatterplot-layer-network-speeds.jpg) ## Install diff --git a/docs/index.md b/docs/index.md index e69de29b..cba8ec97 100644 --- a/docs/index.md +++ b/docs/index.md @@ -0,0 +1 @@ +# lonboard diff --git a/docs/performance.md b/docs/performance.md new file mode 100644 index 00000000..bbc6432d --- /dev/null +++ b/docs/performance.md @@ -0,0 +1,53 @@ +# Performance + +Performance is a critical goal of lonboard. Below are a couple pieces of information you should know to understand lonboard's performance characteristics, as well as some advice for how to get the best performance. + +## Performance Characteristics + +There are two distinct parts to the performance of **lonboard**: one is the performance of transferring data to the browser and the other is the performance of rendering the data once it's there. + +In general, these parts are completely distinct. Even if it takes a while to load the data in your browser, the map might be snappy once it loads, and vice versa. + +### Data Transfer + +Lonboard creates an interactive visualization of your data in your browser. In order to do this, your GeoDataFrame needs to be transferred from your Python environment to your browser. + +In the case where your Python session is running locally (on the same machine as your browser), this data transfer is extremely fast: less than a second in most cases. + +However, in the case where your Python session is running on a remote server (such as [Google Colab](https://colab.research.google.com/), [Binder](https://mybinder.readthedocs.io/en/latest/introduction.html), or a JupyterHub instance), this data transfer means **downloading the data to your local browser**. Therefore, when running lonboard from a remote server, your internet speed and the quantity of data you pass into a layer will have large impacts on the data transfer speed. + +Under the hood, lonboard uses efficient compression (in the form of [GeoParquet](https://geoparquet.org/)) to transfer data to the browser, but compression can only do so much; the data still needs to be downloaded. + +### Rendering Performance + +Once the data has been transfered from your Python session to your browser, it needs to be rendered. + +The biggest thing to note is that — in contrast to projects like [datashader](https://datashader.org/) — lonboard **does not minimize the amount of data being rendered**. If you pass a GeoDataFrame with 10 million coordinates, lonboard will attempt to render all 10 million coordinates at once. + +The primary determinant of the maximum amount of data you can render with lonboard is your computer's hardware. Via the underlying [deck.gl](https://deck.gl/) library, lonboard ultimately renders geometries using your computer's Graphics Processing Unit (GPU). If you have a better GPU card, you'll be able to visualize more data. + +Lonboard is more efficient at rendering than previous libraries, but there will always be _some quantity of data_ beyond which your browser tab is likely to crash while attempting to render. Testing on a recent MacBook Pro M2 computer, lonboard has been able to render a few million points with minimal lag. + +## Performance Advice + +### Use a local Python session + +Moving from a remote Python environment to a local Python environment is often impractical, but this change will make it much faster to visualize data, especially over slow internet connections. + +### Remove columns before rendering + +All columns included in the `GeoDataFrame` will be transferred to the browser for visualization. (In the future, these other columns will be used to display a tooltip when hovering over/clicking on a geometry.) + +Especially in the case of a remote Python session, excluding unnecessary attribute columns will make data transfer to the browser faster. + +### Use Arrow-based data types in Pandas + +As of Pandas 2.0, Pandas supports two backends for data types: either the original numpy-based data types or new data types based on Arrow and the pyarrow library. + +The first thing that lonboard does when visualizing data is converting from Pandas to an Arrow representation. Any non-geometry attribute columns will be converted to Arrow, so if you're using Arrow-based data types in Pandas already, this step will be "free" as no conversion is needed. + +See the pandas [guide on data types](https://pandas.pydata.org/docs/user_guide/pyarrow.html) and the [`pandas.ArrowDtype` class](https://pandas.pydata.org/docs/reference/api/pandas.ArrowDtype.html). + +### Simplify geometries before rendering + +Simplifying geometries before rendering reduces the total number of coordinates and can make a visualization snappier. At this point, lonboard does not offer built-in geometry simplification. This is something you would need to do before passing data to lonboard. diff --git a/mkdocs.yml b/mkdocs.yml index f4c8eb9b..da678321 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -21,6 +21,7 @@ nav: - ScatterplotLayer: layers/scatterplot-layer.md - PathLayer: layers/path-layer.md - SolidPolygonLayer: layers/solid-polygon-layer.md + - Performance: performance.md - "How it works?": how-it-works.md theme: