Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NYC_buildings: Modernize notebook #386

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Conversation

Azaya89
Copy link
Collaborator

@Azaya89 Azaya89 commented May 17, 2024

Modernizing an example checklist

Preliminary checks

  • Look for open PRs and issues that reference the project you are updating. It is possible previous unmerged work in PR could be re-used to modernize the project. Comment on these PRs and issues when appropriate, hopefully we should be able to close some of them after your modernizing work.

Change ‘anaconda-project.yml’ to use the latest workable version of packages

  • Pin python=3.11
  • Remove the upper pin (e.g. hvplot<0.9 to hvplot, panel>=0.12,<1.0 to panel>=0.12) of all other dependencies. Removing the upper pins of dependencies could necessitate code revisions in the notebooks to address any errors encountered in the updated environment. Should complexities or extensive time requirements arise, document issues for team discussion on whether to re-pin specific packages or explore other solutions.
  • Add/update the lower pin of all other dependencies (e.g. hvplot to hvplot>=0.9.2, hvplot>=0.8 to hvplot>=0.9.2). Usually, the new/updated lower pin of a dependency will be the version resolved after anaconda prepare has been run. Execute !conda list in a notebook, or anaconda run conda list in the terminal, to display the version of each dependency installed in the environment. Adjusting the lower pin helps ensure that the locks produced for each platform (linux-64, win-64, osx-64, osx-arm64) rely on the tested dependencies and not on some older versions.
  • If one of the channels include conda-forge or pyviz, ask Maxime if it can be removed

Plot API updates (discussed on a per-example basis)

  • Generally, try to replace HoloViews usage with hvPlot. At a certain point of complexity, such as with the use of ‘.select’, it might be better to stick with HoloViews. Additional examples of ‘complexity boundaries’ should be documented in this document.
  • Almost always, try to replace the use of datashade with rasterize (read this page). Essentially, rasterize allows Bokeh to handle the colormapping instead of Datashader.

Interactivity API updates (discussed on a per-example basis)

  • Remove all pn.interact usage
  • Avoid .param.watch() usage. This is pretty low-level and verbose approach and should not be used in Examples unless required, or an Example is specifically trying to demo its usage in an advanced workflow.
  • Prefer using pn.bind(). Read this page for explanation.
  • For apps built using a class approach, when they create a view() method and call it directly, update the class by inheriting from pn.viewable.Viewer and replace view() by __panel__(). Here is an example.

Panel App updates (discussed on a per-example basis)

  • If the project doesn’t at any point create a Panel app at all, consider creating one. It can be as simple as wrapping a plot in pn.Column, or more complicated to incorporate widgets, etc. Make the final app .servable().
  • If the project creates an app in a notebook but doesn’t deploy it (i.e. there is no command: dashboard declaration in the anaconda-project.yml file), try adding it.
  • If the project already deploys an app but doesn’t wrap it in a nice template, consider wrapping it in a template.
  • If the project deploys an app wrapped in a template, customize the template a little so all the apps don’t look similar (e.g. change the header background color). This doesn’t need to be discussed.
  • Comment start If you are building the application in a single cell, you can construct a template explicitly, like template = pn.template.BootstrampTemplate, but if building up an app across multiple cells, it is probably cleaner to declare the template at the top with pn.extension(template='bootstrap'). See how to guide on setting a template.

General code quality updates

  • If the notebook disables warnings (e.g. with warnings.simplefilter(‘ignore’) somewhere at the start of the notebook, remove this line. Try to update the code to remove the warnings, if any. If updating the code to remove the warnings is taking significant amount of time and effort, bring it up for discussion and we may decide to disable warnings again.

Text content

  • Edit the text content anywhere and everywhere that it can be improved for clarity.
  • Check the links are valid, and update old links (e.g. http -> https, xyz.pyviz.org -> xyz.holoviz.org)
  • Remove instructions to install packages inside an example

Visual appearance - Example

  • Check that the titles/headings make sense and are succinct.
  • Check that the text content blocks are easily readable; revise into additional paragraphs if needed.
  • Check that the code blocks are easily readable; revise as needed. (e.g. add spaces after commas in a list if there are none, wrap long lines, etc.)
  • Check image and plot sizes. If possible, making them responsive is highly recommended.
  • Check the appearance on a smartphone (check Google to see how to adapt the appearance of your browser to display pages as if they were seen from a smartphone, this is usually done via the web developer tools). This is not a top priority for all examples, but if there are a few easy and straightforward changes to make that can improve the experience, let’s do it.
  • Check the updated notebook with the original notebook

Visual appearance - Gallery

  • Check the thumbnail is visually appealing
  • Check the project title is well formatted (e.g. Ml Annotators to ML Annotators), if not, add/update the examples_config.title field in anaconda-project.yml
  • Check the project description is appropriate, if not, update the description field in anaconda-project.yml

Workflow (after you have made the changes above)

  • Run successfully doit validate:<projectname>
  • Run successfully doit test:<projectname>
  • Run successfully doit doc_one –name <projectname>. It’s better if the project notebook(s) is saved with its outputs (but be sure to clear outputs before committing to the examples repo!) when building the docs. Then open this file in your browser ./builtdocs/index.html and check how the site looks.
  • If you’re happy with all the above, open a PR. Reminder, clear notebook outputs before pushing to the PR.

@Azaya89 Azaya89 self-assigned this May 17, 2024
@Azaya89 Azaya89 requested review from maximlt and philippjfr May 17, 2024 15:10
@Azaya89 Azaya89 marked this pull request as draft May 17, 2024 15:12
@Azaya89 Azaya89 added the NF SDG NumFocus Software Development Grant 2024 label May 17, 2024
Copy link
Contributor

Your changes were successfully integrated in the dev site, make sure to review
the pages of the projects you touched before merging this PR: https://holoviz-dev.github.io/examples/.
You can also download an archive of the site from the workflow summary page which comes in handy
when your dev site built was overriden by another PR (we have a single dev site!).

@Azaya89 Azaya89 mentioned this pull request May 23, 2024
2 tasks
@Azaya89 Azaya89 marked this pull request as ready for review June 7, 2024 14:28
@Azaya89 Azaya89 force-pushed the modernize_nyc_building branch from 17121aa to 9b6c32c Compare June 7, 2024 14:32
@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jun 7, 2024

In this PR, pinning notebook<7 prevents geopandas from being imported, so I skipped that step. This is also causing one of the CI failures.
@maximlt

@maximlt
Copy link
Contributor

maximlt commented Jun 7, 2024

In this PR, pinning notebook<7 prevents geopandas from being imported, so I skipped that step.

Can you please add more details on this?

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jun 7, 2024

Can you please add more details on this?

Screenshot 2024-06-07 at 6 24 33 PM

When I run cell2, here's what I get:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[2], line 3
      1 import hvplot.dask # noqa
      2 import hvplot.pandas # noqa
----> 3 import geopandas as gpd
      4 import colorcet as cc
      5 from holoviews import opts

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/geopandas/__init__.py:3
      1 from geopandas._config import options
----> 3 from geopandas.geoseries import GeoSeries
      4 from geopandas.geodataframe import GeoDataFrame
      5 from geopandas.array import points_from_xy

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/geopandas/geoseries.py:13
     10 from pandas import Series, MultiIndex
     11 from pandas.core.internals import SingleBlockManager
---> 13 from pyproj import CRS
     14 import shapely
     15 from shapely.geometry.base import BaseGeometry

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/pyproj/__init__.py:33
      1 """
      2 Python interface to PROJ (https://proj.org),
      3 cartographic projections and coordinate transformations library.
   (...)
     29 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
     30 """
     31 import warnings
---> 33 import pyproj.network
     34 from pyproj._datadir import (  # noqa: F401 pylint: disable=unused-import
     35     _pyproj_global_context_initialize,
     36     set_use_global_context,
     37 )
     38 from pyproj._show_versions import (  # noqa: F401 pylint: disable=unused-import
     39     show_versions,
     40 )

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/pyproj/network.py:10
      6 from typing import Union
      8 import certifi
---> 10 from pyproj._network import (  # noqa: F401 pylint: disable=unused-import
     11     _set_ca_bundle_path,
     12     is_network_enabled,
     13     set_network_enabled,
     14 )
     17 def set_ca_bundle_path(ca_bundle_path: Union[Path, str, bool, None] = None) -> None:
     18     """
     19     .. versionadded:: 3.0.0
     20 
   (...)
     40         variables.
     41     """

ImportError: dlopen(/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/pyproj/_network.cpython-311-darwin.so, 0x0002): Library not loaded: @rpath/libtiff.5.dylib
  Referenced from: <1BF0DA3A-18BF-3035-BAF9-9B25E936A309> /Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/libproj.25.9.3.1.dylib
  Reason: tried: '/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/libtiff.5.dylib' (no such file), '/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/pyproj/../../../libtiff.5.dylib' (no such file), '/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/pyproj/../../../libtiff.5.dylib' (no such file), '/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/bin/../lib/libtiff.5.dylib' (no such file), '/Users/mac/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/bin/../lib/libtiff.5.dylib' (no such file), '/usr/local/lib/libtiff.5.dylib' (no such file), '/usr/lib/libtiff.5.dylib' (no such file, not in dyld cache)

@maximlt
Copy link
Contributor

maximlt commented Jun 8, 2024

Ok it looks like a packaging issue. I don't understand why notebook<7 would influence geopandas though. Can you try to create an environment (conda create -n reproissue ...) with just pyproj, geopandas and python and the versions you have in the current lock, and see if you can reproduce the error? I'm mentioning these packages only as there are the ones that show up in the traceback you shared.

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jun 10, 2024

Ok it looks like a packaging issue. I don't understand why notebook<7 would influence geopandas though. Can you try to create an environment (conda create -n reproissue ...) with just pyproj, geopandas and python and the versions you have in the current lock, and see if you can reproduce the error? I'm mentioning these packages only as there are the ones that show up in the traceback you shared.

OK, so I did create a new environment with pypoj, geopandas, python, and pyarrow and it worked well, although geopandas import took some time to load (about 20 secs):

Screenshot 2024-06-10 at 12 15 58 PM

So, i'm thinking the issue may be another dependency?

@maximlt
Copy link
Contributor

maximlt commented Jun 10, 2024

@Azaya89 can you maybe try to pin again notebook<7 in the project file, re-lock, and push the changes to Github? It'd be interesting to see whether the issue you reported shows up on the CI or not.

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jun 10, 2024

@Azaya89 can you maybe try to pin again notebook<7 in the project file, re-lock, and push the changes to Github? It'd be interesting to see whether the issue you reported shows up on the CI or not.

This is not able to work because doit:test ... fails dues to the same import errors.

@maximlt
Copy link
Contributor

maximlt commented Jun 10, 2024

This is not able to work because doit:test ... fails dues to the same import errors.

I would like to see it failing on the CI to see if it reports the same error that you get.

@@ -15,20 +15,26 @@ user_fields: [examples_config]

channels:
- defaults
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Azaya89 ah I just noticed something. It's good practice not to mix the defaults channel with conda-forge. So when we use conda-forge we should replace defaults with nodefaults, to avoid the defaults channel to be added by default 🙃 Can you try that on your machine and re-lock?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but there was no failure in the CI here. This is diabolical!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Your changes were successfully integrated in the dev site, make sure to review
the pages of the projects you touched before merging this PR: https://holoviz-dev.github.io/examples/.
You can also download an archive of the site from the workflow summary page which comes in handy
when your dev site built was overriden by another PR (we have a single dev site!).

1 similar comment
Copy link
Contributor

Your changes were successfully integrated in the dev site, make sure to review
the pages of the projects you touched before merging this PR: https://holoviz-dev.github.io/examples/.
You can also download an archive of the site from the workflow summary page which comes in handy
when your dev site built was overriden by another PR (we have a single dev site!).

@Azaya89 Azaya89 requested a review from maximlt June 11, 2024 10:15
@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jun 14, 2024

I think #199 is ready to be closed now @maximlt

Copy link
Contributor

Your changes were successfully integrated in the dev site, make sure to review
the pages of the projects you touched before merging this PR: https://holoviz-dev.github.io/examples/.
You can also download an archive of the site from the workflow summary page which comes in handy
when your dev site built was overriden by another PR (we have a single dev site!).

@Azaya89 Azaya89 force-pushed the modernize_nyc_building branch from fe3b8fb to b2030b9 Compare July 4, 2024 14:44
Copy link
Contributor

github-actions bot commented Jul 4, 2024

Your changes were successfully integrated in the dev site, make sure to review
the pages of the projects you touched before merging this PR: https://holoviz-dev.github.io/examples/.
You can also download an archive of the site from the workflow summary page which comes in handy
when your dev site built was overriden by another PR (we have a single dev site!).

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jul 4, 2024

@maximlt I think this PR is ready for another review with the following notes:

  1. There is still a bit of performance issues regarding rendering of the plots and dashboard as you are already aware. It's faster than before but still slower than expected. This also affects the time it takes the tests to run (took 2:38 to run 10 cells via doit test:...)
  2. The new_nyc_buildings.parq file is the dataset used in the notebook now and so needs to be moved to S3 to replace the old one there and then deleted from the repo.
  3. The narrative about inspect_polygons was completely deleted from the notebook as it only works with spatialpandas not geopandas

@maximlt
Copy link
Contributor

maximlt commented Jul 4, 2024

Ok thanks for the report. Depending on the performance issues, it might be that we end up not updating the code in this example.

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Jul 4, 2024

Ok thanks for the report. Depending on the performance issues, it might be that we end up not updating the code in this example.

:(

@droumis
Copy link
Contributor

droumis commented Aug 7, 2024

Isaiah reports that it takes about 30 seconds to run a cell with the full visualization with geopandas and that now (reverting back to spatialpandas on his local machine) it's taking even longer.

@Azaya89 Azaya89 force-pushed the modernize_nyc_building branch from b2030b9 to 3983383 Compare August 21, 2024 14:35
@Azaya89 Azaya89 force-pushed the modernize_nyc_building branch from 3983383 to 11c8dcf Compare November 27, 2024 11:27
@Azaya89
Copy link
Collaborator Author

Azaya89 commented Nov 27, 2024

I'd like to make sure of that :) Let's compare timings before these changes (old environment) and after (this environment), using %time display(hv_obj) like we did on another example.

First full plot

image

Last full plot

image

Last plot Old (On website)

image

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Nov 27, 2024

The test is failing because of this fix that is not merged yet.

@Azaya89 Azaya89 requested a review from maximlt November 27, 2024 11:34
},
"outputs": [],
"source": [
"ddf.hvplot.polygons(rasterize=True, tiles='CartoLight', groupby='type', aggregator='any')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, in Jim's example, the categories in the widget were alphabetically sorted. It's no longer the case:

image

Jim's code used an intermediate HoloMap, this object has a sort keyword that is True by default:
image

In hvPlot, groupby=True leads internally to applying the groubpy operation method of hv.Dataset. It's called with dynamic=True (the default) which means a DynamicMap is returned and not a HoloMap, loosing that default sorting behavior.

We need to open an issue on hvPlot I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the Issue address? A way to alphabetize a hvPlot groupby operation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but more generally a way to declare that the values of the grouped dimension(s) should be sorted. It should be possible to come up with a small example to reproduce this.

In our case, we could ignore this, or preprocess the dataset in the notebook to sort it by category.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I can find a way to create the Issue.

In this particular example however, I think we can ignore it. I noticed that the final plot that didn't use a groupby operation had the categories sorted:

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I've contributed to this example, but it's from @philippjfr , not me! :-)

"hover = inspect_polygons(shaded).opts(fill_color='red', tools=['hover'])\n",
"\n",
"tiles * shaded * legend * hover"
"plot = ddf.hvplot.polygons(tiles='CartoLight', data_aspect=1, datashade=True,\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also reproduce this issue with the plot being distorted on zoom?

Monosnap screencast 2024-11-28 11-40-38

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can. I think it had something to do with setting data_aspect=1

Here's the same plot without setting the data aspect:

nyc_b_gif720.mov

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yes that's better without it. In fact, I'm not sure you can modify the display when tiles are used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I will remove the data_aspect parameter then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Mixing data_aspect=1 and tiles is at best redundant, and at worst sets them up for a death match with each other.

@Azaya89 Azaya89 requested a review from maximlt December 2, 2024 12:14
@jbednar
Copy link
Contributor

jbednar commented Dec 2, 2024

Have the duplicated legend entries visible in #386 (comment) been addressed? Previously that's been due to having both manual and automatic legends in the same plot.

@Azaya89
Copy link
Collaborator Author

Azaya89 commented Dec 2, 2024

Have the duplicated legend entries visible in #386 (comment) been addressed? Previously that's been due to having both manual and automatic legends in the same plot.

Yes, this PR does not have duplicate legend entries.

@maximlt
Copy link
Contributor

maximlt commented Dec 8, 2024

Blocked by holoviz/holoviews#6470, I found a pretty bad performance regression with the modernized code when creating the groupby rasterized plot, while creating the HoloViews object itself, see the PR for more details.

For @Azaya89, I created an environment from Jim's previous work you ported to this PR (Closed). On this branch, the lock file is not recent and so not ready for osx-arm64. A while back I installed https://orbstack.dev/ on my machine, it makes it very easy to spin up a Linux machine on Mac, and you can still navigate through your normal files, very handy. This allowed me to prepare the project from Jim's branch and run the notebook, with some modifications to time it (%%time display(...)) and make the comparison more valid (e.g. remove inspect polygons).

Comparing old vs new with holoviz/holoviews#6470:

  1. All buildings rasterized with tiles: 5.1s vs 5.1s
  2. Rasterized groupby with tiles, first category is unknown: 1.7s vs 2.4s
  3. Datashaded: 2.7s vs 2.8s

Only the plot 2 takes longer to display, the difference lies in the HoloViews object creation (0.6s in the new code):

  • The old code computed the categories explicitly (cats), used then to construct a HoloMap. They were computed in an efficient way with cats = ['unknown'] + list(ddf.type.value_counts().compute().iloc[:10].index.values)
  • The new code using hvPlot doesn't provide that explicit list of categories, they are internally computed (by HoloViews) within a groupby operation. Computing the unique values in the type column with a Dask object means this computation is distributed. Since the dataset isn't that big (1.2 million rows), I guess this comes with some overhead which explains why it's not so quick.

So I think in terms of performance, at least for the first render, we're all good with the changes.

@maximlt
Copy link
Contributor

maximlt commented Dec 8, 2024

I've opened holoviz/holoviews#6471 and holoviz/hvplot#1463 to follow-up on #386 (comment), i.e. allow sorting the values of the groupby dimensions. I don't consider it a blocker and think it needs some discussion at the hvplot and holoviews levels, so I'd say we can move on.

@Azaya89 Azaya89 removed the request for review from philippjfr December 17, 2024 11:54
@maximlt
Copy link
Contributor

maximlt commented Dec 17, 2024

Thank @Azaya89 for re-locking but:

No need for anyone to review this example until these two points above are resolved, and the CI is green :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NF SDG NumFocus Software Development Grant 2024
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants