AST for the content #159

Andrew-Morozko · 2024-04-19T11:18:20Z

Right now each content block produces markdown strings, which requires a lot of string manipulation. For example, the title block replaces newlines in its value, otherwise, the title would be broken:

# Tiltle with
newline is half-broken

We should use an AST structure closely resembling markdown and use it instead:

func Title(size int, text string) ContentNode
func Text(text string) ContentNode
func List(kind ListKind, items [] ContentNode) ContentNode
// And so on

And use it to avoid repeated validation:

List(Unordered, []ContentNode{Text("list item, but it can contain\n * and not be split into two")})

This also decouples us from markdown, making it possible for data sinks to render a document in a custom format (html, latex, or, of course, markdown ...) from our ContentNode structure. Also, this makes the content more easily inspectable and traversable for meta-content providers such as toc.

I considered using goldmark's AST instead of creating our own, but their AST types are assumed to be valid markdown (since they usually come out of their markdown parser), while ours need to be validated/adjusted before we can pass them as a valid markdown.

Since our AST would be based on markdown it would be straightforward to implement ContentNode -> goldmark/ast.Node converter, after which we can use goldmark-markdown to render goldmark AST back into markdown, built-in renderer to produce HTML or goldmark-latex to produce LaTeX code.

The text was updated successfully, but these errors were encountered:

traut · 2024-04-22T08:58:42Z

+1 on using goldmark's AST, seems like a nice data model (better than plain Markdown strings) we can build upon

dobarx · 2024-04-22T09:44:02Z

@traut
-1 on using goldmark's AST. Like @Andrew-Morozko mentioned their AST is strictly based on markdown. If we used their AST directly it's even more complicated than using markdown strings all together and would be harder to extend with nodes that are specific for other outputs, like html or pdf.

traut · 2024-04-22T09:47:55Z

Hmm, okay. Do you have any suggestions on what data model to use here for the content trees? I would love to use an existing one instead of building one, and if it comes from the lib we'll be using -- it's a bonus.

Markdown strings on their own are quite versatile, being a serialized version of a data model -- whatever the plugin returns, we can parse and adjust.

dobarx · 2024-04-22T10:51:06Z

Yes, markdown strings are versatile but it creates a problem that we must do a a lot of string manipulation and sanitization at plugin level and this creates bugs.

Using existing library AST for this might be also tricky since we would need to encode/decode this tree using protobufs and it would be pretty hard to extend it since the focus for that tree is just one output - Markdown. But that doesn't work well since we also have a concept of section which doesn't render into a specific markdown element but can effect markdown content inside.

If we want to extend it then it's best just to make a copy of their AST tree nodes that we could convert to goldmark's AST when rendering markdown. We don't really need all functionality of goldmark AST we just need similar data structure for core content elements.

traut · 2024-04-22T11:04:58Z

we must do a a lot of string manipulation and sanitization at plugin level and this creates bugs.

I like that this is a plugin's problem — Fabric expects a valid Markdown string, so the plugin can do whatever it wants to produce it. It does limit us, though, and is a ticking trouble :) It's too flexible, so the validation of the content is difficult (what if the plugin returns a huge, complex Markdown that will mess up the whole document?) but I'm not too worried about that just yet.

We don't really need all functionality of goldmark AST we just need similar data structure for core content elements.

I agree, I think you are right. Our model can be more straightforward and should not be tied to Markdown explicitly -- we're going to support other serializations anyway. On that note, we can use our template data model as an inspiration for our content data model, adopting a concept of block with a content type. Even if it's just to wrap a markdown strings into nodes with metadata - it will still be very useful, I think

traut · 2024-10-23T13:37:08Z

I guess this can be closed @Andrew-Morozko? We'll continue working on the migration to the new structure

Andrew-Morozko · 2024-10-24T12:59:50Z

I guess this can be closed

@traut I'd rather close it after part two of AST is merged in, but it's up to you!

Andrew-Morozko self-assigned this Apr 19, 2024

traut added this to the v0.6 milestone Apr 22, 2024

traut added the enhancement New feature or request label Apr 22, 2024

traut modified the milestones: v0.6, v0.5 May 12, 2024

This was referenced May 12, 2024

Support content selection using meta tags #99

Closed

Use unique anchors in ToC links #164

Open

traut mentioned this issue Jul 30, 2024

Notion Publisher #221

Open

traut mentioned this issue Sep 8, 2024

Content formatting, 2nd iteration #240

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AST for the content #159

AST for the content #159

Andrew-Morozko commented Apr 19, 2024

traut commented Apr 22, 2024

dobarx commented Apr 22, 2024 •

edited

Loading

traut commented Apr 22, 2024 •

edited

Loading

dobarx commented Apr 22, 2024 •

edited

Loading

traut commented Apr 22, 2024

traut commented Oct 23, 2024

Andrew-Morozko commented Oct 24, 2024

AST for the content #159

AST for the content #159

Comments

Andrew-Morozko commented Apr 19, 2024

traut commented Apr 22, 2024

dobarx commented Apr 22, 2024 • edited Loading

traut commented Apr 22, 2024 • edited Loading

dobarx commented Apr 22, 2024 • edited Loading

traut commented Apr 22, 2024

traut commented Oct 23, 2024

Andrew-Morozko commented Oct 24, 2024

dobarx commented Apr 22, 2024 •

edited

Loading

traut commented Apr 22, 2024 •

edited

Loading

dobarx commented Apr 22, 2024 •

edited

Loading