Skip to content

Notes and experiments in Jupyter dashboarding

License

Notifications You must be signed in to change notification settings

somedave/dashboards

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nbgallery/dashboards

Project Status: Concept – Minimal or no implementation has been done yet, or the repository is only intended to be a limited example, demo, or proof-of-concept.

What's in this repo?

This repository contains various notes and experiments in Jupyter dashboarding. Any code you find here may or may not be suitable for production. We are definitely open to comments and suggestions, especially if we're doing something crazy, so please feel free to open an issue to give us feedback. A quick list of what's here:

Our use case for dashboards

Our use case for Jupyter widgets and dashboards is a little different from what we tend to see in other organizations in the Jupyter community. Our organization deals with a highly-regulated big data environment with two major constraints:

  • Each user has a different view of the data due to row- or even cell-level access permissions.
  • It is not always desirable (or feasible) to analyze all the data at once.

As an example, consider a large hospital, subject to HIPAA privacy regulations, where patient data may not be visible to all employees. Suppose doctors can only see data about their own patients, while department heads can see records for all patients in their department. Suppose administrators can view records for all patients, but only billing data; similarly, maybe the lab can view only test results. Lastly, suppose the data science team can't view any data about real patients -- perhaps they only have access to notional or anonymized data. So, when a data scientist writes a Jupyter notebook to look at, say, prescription trends, doctors who want to run that analysis must run the notebook themselves so that database access controls can limit the input data to their own patients.

The data access restrictions of our environment have led to two effects that we've found are somewhat unique:

  • We have significantly more Jupyter notebook users than notebook authors. In our organization, we have thousands of users, but fewer than 20% have written a notebook themselves. This has motivated a number of features in our notebook collaboration platform nbgallery: recommenders, code health monitoring, and peer review.
  • We tend to use Jupyter widgets less to manipulate output visualizations and more to create input forms for database queries. These input forms prompt the user for parameters (say, patient gender and age), and the database returns only records that match the parameters and are viewable by the user running the notebook. The consequence of this model is that notebook execution needs to block until the user submits the form. Since ipywidgets do not block by default, this means we must structure our notebooks to use widget callback functions or something like ipython_blocking.

Our interest in dashboards is primarily motivated by two requirements:

  • Since many of our Jupyter users are non-coders, dashboards can lower the barrier to entry to the Jupyter platform. The growing interest in dashboards across the Jupyter community is a reflection of the fact that this is not unique to our organization.
  • Several dashboard solutions provide immutable execution, meaning the user cannot modify the code. This may be important in regulated environments where the analysis needs to be approved before running in production. In our hospital example, perhaps notebooks need to be vetted for HIPAA compliance before the data science team can release them for use by the doctors.

More generally on the immutable execution front, we're also interested in what we're calling "Notebook as a Service" (NaaS). The idea here is that once the data scientist has finalized the algorithms and analysis in a notebook, you no longer need the interactivity of the native Jupyter interface, and the notebook can be treated as an immutable code artifact. That enables various applications:

  • Creating user-friendly dashboards
  • Parameterizing notebooks as back-ends for existing tools and UIs
  • Parameterizing notebooks for use in cron jobs and pipelines
  • Using notebooks as a web service API

One of the questions we get when we talk about NaaS is if you don't need the interactive Jupyter interface, why use the notebook format at all? If you need a cron job, why not a plain python script? If you need a web app, why not Flask or Django? The answer is that if the analysis was initially developed in a notebook, then it may just be easier and cheaper to keep it that way, especially if the notebook's author already added a simple widget UI. You need to have a compelling argument for re-engineering the solution in order to justify the time and labor costs. Our experience in most cases has been similar to that of Zillow, who found that it wasn't worth re-writing R analytics in C++ or Java.

Here are some NaaS-related projects that we've been tracking:

About

Notes and experiments in Jupyter dashboarding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 91.1%
  • Python 7.9%
  • Dockerfile 1.0%