diff --git a/zz-cell-metadata/allthekernels.png b/zz-cell-metadata/allthekernels.png new file mode 100644 index 00000000..75f475ef Binary files /dev/null and b/zz-cell-metadata/allthekernels.png differ diff --git a/zz-cell-metadata/pick-2.png b/zz-cell-metadata/pick-2.png new file mode 100644 index 00000000..4eee85e8 Binary files /dev/null and b/zz-cell-metadata/pick-2.png differ diff --git a/zz-cell-metadata/pick.png b/zz-cell-metadata/pick.png new file mode 100644 index 00000000..eca2cca6 Binary files /dev/null and b/zz-cell-metadata/pick.png differ diff --git a/zz-cell-metadata/sos-1.png b/zz-cell-metadata/sos-1.png new file mode 100644 index 00000000..5ba3e828 Binary files /dev/null and b/zz-cell-metadata/sos-1.png differ diff --git a/zz-cell-metadata/sos-2.png b/zz-cell-metadata/sos-2.png new file mode 100644 index 00000000..cef59134 Binary files /dev/null and b/zz-cell-metadata/sos-2.png differ diff --git a/zz-cell-metadata/transmit-cell-metadata.md b/zz-cell-metadata/transmit-cell-metadata.md new file mode 100644 index 00000000..7ffd19fd --- /dev/null +++ b/zz-cell-metadata/transmit-cell-metadata.md @@ -0,0 +1,348 @@ +--- +title: Transmitting Cell Metadata in Jupyter Execute Requests +authors: John Lam (jflam@microsoft.com), Matthew Seal (matt@noteable.io), Carol Willing (willingc@gmail.com) +issue-number: +pr-number: +date-started: 2021-02-10 +--- + +# Summary + +This proposal discusses the **transmission of cell metadata** with execute +message requests and would modify the Jupyter Messaging Protocol. Individual +kernels would interpret or ignore this metadata. This enables flexibility in +different usage scenarios implemented in various front-end clients. + +# Motivation + +By transmitting cell metadata inline with the `execute`, `inspect_request`, +and `complete_request` messages, Jupyter implementations will have a +reliable channel to transmit additional metadata to the kernel in a standard +way. + +Notebook extensions can also use this channel to transmit additional +information that was often transmitted using magic commands. + +Some use cases which motivated this proposal are: + +- Route requests automatically to an appropriate kernel via libraries like + [allthekernels](https://github.com/minrk/allthekernels) without need for + additional metadata within the cell itself +- Create or find a conda environment without needing to use magics, like + [pick](https://github.com/nteract/pick) +- Support polyglot (more than one language/kernel within a single notebook) + scenarios, like [sos](https://vatlab.github.io/sos-docs/) +- Provide hints to the kernel for localization purposes, like how the + `ACCEPT_LANGUAGE HTTP` header works +- Provide hints to the kernel about client capabilities, similar to how + hints of a web browser client's capabilities work + +# Guide-level explanation + +Transmitting cell metadata enables many scenarios as described briefly in the +Motivation section. In this section, we consider one scenario in more detail: +running a code cell using a specific kernel. + +Today a typical approach is for the user to include a magic command in the +cell that identifies the kernel. This approach interferes with other +extensions that may want to use the contents of the cell, e.g., autocomplete +providers would now need to be aware of and ignore the syntax of magics. + +## Simple example + +For example, in the [allthekernels](https://github.com/minrk/allthekernels) +project, users select the kernel using a `>` command: + +```R +>python3 +1+1 +``` + +But in our example, let's imagine that we use cell metadata to specify the +kernel instead. Now, let's consider a minimal JSON fragment for the above +cell: + +```json +{ + "cell_type" : "code", + "execution_count": 1, + "metadata" : { + "kernel": "python3", + }, + "source" : "1+1", +} +``` + +The cell metadata dict contains an entry that specifices that the `kernel` is +`python3`. But where did the `"kernel": "python3"` metadata come from? What +wrote it into the cell metadata in the first place? + +Elaborating a bit more on the user experience here, you could imagine a client +extension providing some additional UI elements such as a cell drop-down that +lets the user pick from a list of installed kernels on the user's machine. The +user picks one, and the kernelspec or its identifier is written to that cell's +metadata. + +In this example, there is also a corresponding `allthekernels` kernel that is +installed on the user's machine that knows how to multiplex between different +kernel processes that are running on the user's machine. When the user runs +the cell, the Jupyter implementation will send an +[execute](https://jupyter-client.readthedocs.io/en/stable/messaging.html#execute) +message to the kernel. + +Here's a minimal representation of the execute message for the above cell: + +```js +{ + "header" : { + "msg_id": "...", + "msg_type": "...", + "metadata": { + "kernel": "python3", + }, + //... + }, + "parent_header": {}, + "content": { + "code": "1+1", + }, + "content": {}, + "buffers": [], +} +``` + +In this case the `allthekernels` kernel sees the `"kernel": "python3"` entry +in the message, and locates and activates a child kernel to handle the +request, and passes the message onto the child kernel for processing. + +There could be other cell metadata that was transmitted from the client as +well. Some of that metadata could have been put there by client extensions, +like in the case of `allthekernels`. Other metadata could be put there by the +Jupyter implementation itself, e.g., language or client capabilities like +screen size. + +## Metadata Key Conflicts and Namespacing + +The potential for conflicts exists across extensions that want to add their +own cell metadata to notebook file. We recommend that extensions namespace +their metadata keys to minimize the possibility of conflicts between +extensions. For example, in the `allthekernels` case it could look like: + +```json +{ + "cell_type" : "code", + "execution_count": 1, + "metadata" : { + "allthekernels:kernel": "python3", + }, + "source" : "1+1", +} +``` + +## Kernels declaring the need for Cell Metadata + +Kernels should have a way to declare that they require metadata to be sent. +For a kernel like `allthekernels`, this kernel *needs* to have cell metadata +that specifies the available options. The kernel on receipt of the metadata +can take the appropriate action or warn that it requires additional +information. + +# Reference-level Explanation + +Cell metadata will be transmitted to the kernel in messages that are +associated with the cell. Some examples of messages include: +[execute](https://jupyter-client.readthedocs.io/en/stable/messaging.html#execute), +[inspect_request](https://jupyter-client.readthedocs.io/en/stable/messaging.html#introspection), +and +[complete_request](https://jupyter-client.readthedocs.io/en/stable/messaging.html#completion) +messages. + +The general form of a message is: + +```js +{ + "header" : { + "msg_id": "...", + "msg_type": "...", + //... + }, + "parent_header": {}, + "metadata": {}, + "content": {}, + "buffers": [], +} +``` + +We propose adding cell metadata to the existing `message.metadata` dict +[see Jupyter client +docs](https://jupyter-client.readthedocs.io/en/stable/messaging.html#metadata). +This will be used to transmit the cell metadata for the executed cell. + +In cases where Jupyter extensions generate their own metadata, the keys for +the metadata should be namespaced using an extension-specific prefix. The +prefix is ideally human-readable and identifies the extension that wrote the +metadata. There is no current provision to guarantee global uniqueness for +these prefixes in a way that other technologies, e.g., XML Namespaces do using +URIs. + +Below is a nominal example of these proposals, cell metadata and execute +requests, in action. This fragment of a notebook contains a cell to be +executed. Note that the `kernel` attribute is namespaced using `allthekernels` +and the existing Jupyter attributes `collapsed` and `scrolled` are not +namespaced. + +```js +{ + "cell_type" : "code", + "execution_count": 1, + "metadata" : { + "allthekernels:kernel" : "python3", + "collapsed" : True, + "scrolled": False, + }, + "source" : "1+1", + "outputs": [{ + "output_type": "stream", + ... + }], +} +``` + +Below is the corresponding EXECUTE message: + +```js +{ + "header" : { + "msg_id": "...", + "msg_type": "...", + //... + }, + "parent_header": {}, + "metadata": { + "allthekernels:kernel": "python3", + "collapsed": True, + "scrolled": False, + }, + "content": { + "code": "1+1", + }, + "content": {}, + "buffers": [], +} +``` + +# Rationale and Alternatives + +## Rejected alternative: Metadata in content + +We considered another approach, content-level cell metadata, before we arrived +at this JEP's proposed recommendation. + +Transmitting the metadata as a dict in the content of an EXECUTE message is +illustrated here: + +```js +{ + "header" : { + "msg_id": "...", + "msg_type": "...", + //... + }, + "parent_header": {}, + "content": { + "code": "1+1", + "metadata": { + "allthekernels:kernel": "python3", + "collapsed": True, + "scrolled": False, + }, + }, + "buffers": [], +} +``` + +We decided against this pattern as there are types of metadata that could be +transmitted to the kernel that are not logically associated with the content; +the examples below describe capabilities of the client: + +- Provide hints to the kernel for localization purposes, like how the + `ACCEPT_LANGUAGE HTTP` header works +- Provide hints to the kernel about client capabilities, similar to how + hints of a web browser client's capabilities work + +## Rejected approach: Allow-List Pattern + +In looking at metadata that should or shouldn't be sent, we investigated if +the fields to be passed should be allow-list or block-list pattern matching. +e.g. Allow `allthekernels:kernel` metadata only. The issue is that this +greatly complicates existing applications over the current proposal as kernels +would need to indicate the metadata fields they accept, and clients would then +need to track that and filter fields sent back during execution. The +attributes within the metadata today are: A) small in size and B) not harmful +to send across the wire so keeping the solution simpler was the preferred +pattern in the proposal. + +## Impact + +This proposal will add a new foundational capability to the Jupyter Messaging +Protocol: the ability to transmit additional information to the kernel which +the kernel can use to make better decisions about execution of user code. This +makes it much more straightforward to have independent collaboration on +polyglot notebooks (notebooks that contain code in more than one programming +language). + +If the proposal is accepted, we benefit from an opportunity to improve the +ability to send out-of-band information to the kernel with the EXECUTE +message. Scenarios like polyglot notebooks, or adaptive rendering based on +changes to the user's browser window size or graphics settings would be +realized. + +# Prior Art + +## allthekernels + +`allthekernels` uses a special syntax ("> __kernelspec__") within the cell to +specify the kernel to use to run the code in the cell. This would be replaced +by writing the kernelspec as cell metadata and transmitting it to the kernel +as described earlier in this document. + +![allthekernels screenshot](./allthekernels.png) + +[GitHub](https://github.com/minrk/allthekernels) + +## Script of Scripts (SoS) + +`SoS` is a combination of a meta-kernel (authors call it a "super kernel") +that controls a set of child kernels and magic commands to identify the kernel +to target in a cell. It also provides a shared context in the “super kernel” +to share variables and data between different kernels. Requires an extension +to manage language metadata (see screenshot below) + +![sos architecture](./sos-1.png) +![sos screenshot](./sos-2.png) + +[GitHub](https://github.com/vatlab/sos-notebook) +[JupyterCon Presentation](https://www.youtube.com/watch?v=U75eKosFbp8) +[Documentation](https://vatlab.github.io/sos-docs/notebook.html#content) + +## nteract pick + +`pick` is a kernel proxy that uses magics to specify an existing conda +environment to use or an environment to create to run code in the notebook. + +![pick architecture](pick.png) +![pick screenshot](pick-2.png) + +[Github](https://github.com/nteract/pick) + +# Open Questions + +We have an opinion around some decision points but would be open to +suggestions around: + +- Whether the cell metadata is transmitted as a new dict in the EXECUTE + message, or whether it is transmitted as a new dict in the content field of + the EXECUTE message. +- Decide whether kernels need to explicitly declare the metadata that they + need, and if so, the mechansim for communicating that declaration to the + Jupyter implementation.