On the MyST Markdown V3 AST

Steve Purves

On the MyST Markdown V3 AST

Over the last few months myself and Angus (@agoose77)^[1] have been working as a tag team to deliver a significant change to the MyST Markdown stack, an upgrade from V2 to V3 of the Abstract Syntax Tree (AST) specification (recently released in mystmd@1.7.0+) along with migration capabilities.

This upgrade changes how Jupyter notebook outputs are represented and rendered bringing them into the AST as first class citizens. Here are some technical details of this transformation and why it matters.

The Problem: Single Node Data Blobs¶

In the V2 AST, Jupyter notebook cell outputs were represented as a single node containing a monolithic data blob, which was essentially the entire Jupyter cell output array. This structure looked something like:

{
  "type": "output",
  "id": "cell-output-1",
  "data": [
    // Array of all output data from the cell
    { "output_type": "display_data", "data": {...}, ... },
    { "output_type": "stream", "text": "...", ... },
    // ... more outputs
  ]
}

While this approach worked and was a good “starter for 10”, it had significant limitations:

No granular control: All outputs from a cell were bundled together, making it impossible to handle individual outputs differently
Limited rendering flexibility: Front-end web themes couldn’t apply specific styling or behavior to different output types in a MyST native way i.e. using NodeRenderers
A Lacking AST Representation: The contents of Jupyter mime-bundles are dependent on what are essentially external rendering capabilites, from an AST point of view, with additional dependencies and implications for content longevity.

The Solution: First-Class Output Representations¶

The V3 AST introduces a new structure where outputs are represented as first-class nodes in the AST. Instead of a single node containing all output data, we now have:

An outputs container node: Groups multiple individual output nodes
Individual output nodes: Each node representing a single output from a Jupyter cell

{
  "type": "outputs",
  "id": "cell-outputs-1",
  "children": [
    {
      "type": "output",
      "id": "output-1",
      "jupyter_data": { "output_type": "display_data", "data": {...} },
      "children": []
    },
    {
      "type": "output",
      "id": "output-2",
      "jupyter_data": { "output_type": "stream", "text": "..." },
      "children": []
    }
  ]
}

This structural change enables a much more flexible and powerful rendering system, while it maintains the Jupyter mime-bundle data intact^[2] for when it is needed for certain content or for initializing “live” rendering.

The Future: making use of output.children¶

So this change is foundational, does this mean that more change is coming? Well, yes, but the changes should now be nicely contained within the new outputs structure and be well represented by the current MyST Markdown spec.

The MyST Markdown engine can now start processing certain outputs differently, representing those with MyST AST nodes such that standard NodeRenderers can pickup and render them. This means that the MyST themes and renderers can take full control of how those are displayed, styled and add custom behaviour that departs from whay Jupyter might do alone. We can also start to provide additional support for things what are difficult at the moment like rendering generated Markdown or even generated MyST Markdown.

Future changes are then going to be to populate the outputs nodes differently, and build out new NodeRenderers where needed. The outputs support in MyST themes will need to change a little to ensure that these outputs are rendered faithfully, but this should be easy to handle.

As outputs>output nodes always ship the original jupyter_data which is what the current themes render, then ensuring backwards compatibility should be easy, with themes being able to opt in to rendering the output.children when they are ready.

Technical Details¶

Because we like details here is some info on Today’s technical implementation, maybe not tomorrow’s!

Context-Based Rendering¶

The new architecture uses React Context to provide output metadata to child components. The OutputsContextProvider wraps output containers, allowing individual output nodes to access their parent’s execution context:

<OutputsContextProvider outputsId={outputsId}>
  <MyST ast={children} />
</OutputsContextProvider>

This context enables:

Shared execution state: All outputs in a cell can access the same execution context
Coordinated rendering: Outputs can coordinate their rendering behavior
Proper scoping: Each cell’s outputs are properly isolated

However, this render time behavior could/should be deprecated by decorating child output nodes with the parent outputs’ id.

Per-Output Rendering Methods¶

Previously, any non-simple (requires Jupyter services / Thebe to render) output meant that all outputs were rendered via Thebe and that handling of simpler (names as “safe” in the codebase) outputs was a function carried out by the myst-theme.

Now, each output is assessed individually:

export function isOutputSafe(
  output: MinifiedOutput,
  directOutputTypes: Set<string>,
  directMimeTypes: Set<string>,
) {
  if (directOutputTypes.has(output.output_type)) return true;
  // ... check mime types
}

And rendered as safe/Jupyter accordingly.

New Separate Active and Passive Renderers¶

The new structure maintains separation between to completely separate rendering branches:

Passive rendering (PassiveOutputsRenderer): Used when rendering a static page, used to render each individual output node. So if a notebook cell has 5 outputs, 5 of these will be instanced and used rather than 1 in previous versions of the theme.
Active rendering (ActiveJupyterCellOutputs): Used when a kernel is attached in via live compute / Thebe, and takes over rendering for the whole outputs node, replacing any individual output that was passively rendered.

If you look at the code in @myst-theme/jupyter you will see that this is a simplification but I wanted to capture that important point i.e. there is a shift between static/passive rendering of outputs on the page, and what is rendered when “live compute” mode is enabled. This means that in future and in some scenarios, depending on the content, significant visual changes could occur when switching to live compute mode. This is something we’ll need to understand on a case by case basis once we get there and may prompt people to think differently about how they present interactive outputs to their readers.

Migration and Compatibility¶

This is a breaking change that requires content to be migrated to the V3 AST structure.

For theme developers, the migration is handled automatically by the MyST toolchain (myst-migrate package) by targeting version 3 in the content loading process.

For theme developers, they need to:

Update to the latest @myst-theme/juptyer package support the new outputs and output node types (you’ll need all the latest packages in reality)
Using the new context providers for output rendering, if you need to
Adopt the myst-migrate package to enable AST upgrades/downgrades when loading your static content for pages and .json endpoints (to enable cross reference lookups)

Curvenote themes¶

Over at Curvenote, we’ve already moved the Curvenote themes, and our customized Jupyter live compute packages, on to the new AST and we are rolling these changes out now, to ensure a smooth transition for Curvenote hosted sites. Curvenote hosts a significant amount of MyST Markdown content that has proven to be a valuable test bed to test out the new spec and the accompanying myst-migrate functionality.

Even more technical details¶

For even more technical details, see the myst-theme changes, the related mystmd AST changes and the MEP: Per-Output AST Representation for Code Cell Outputs

Shouts outs!¶

A big shout out here to Angus (@agoose77) my tag team development partner here - who envisioned the change, kicked off and chewed through a big chunk of the initial work.

License¶

Footnotes¶

mainly, but there have been other contributors of course
↩
We see the potential for some lossy edge cases in future, but expect these to be rare
↩

Footnotes¶

mainly, but there have been other contributors of course
↩
We see the potential for some lossy edge cases in future, but expect these to be rare
↩