Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AST for the content #159

Open
Andrew-Morozko opened this issue Apr 19, 2024 · 7 comments
Open

AST for the content #159

Andrew-Morozko opened this issue Apr 19, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@Andrew-Morozko
Copy link
Contributor

Right now each content block produces markdown strings, which requires a lot of string manipulation. For example, the title block replaces newlines in its value, otherwise, the title would be broken:

# Tiltle with
newline is half-broken

We should use an AST structure closely resembling markdown and use it instead:

func Title(size int, text string) ContentNode
func Text(text string) ContentNode
func List(kind ListKind, items [] ContentNode) ContentNode
// And so on

And use it to avoid repeated validation:

List(Unordered, []ContentNode{Text("list item, but it can contain\n * and not be split into two")})

This also decouples us from markdown, making it possible for data sinks to render a document in a custom format (html, latex, or, of course, markdown ...) from our ContentNode structure. Also, this makes the content more easily inspectable and traversable for meta-content providers such as toc.

I considered using goldmark's AST instead of creating our own, but their AST types are assumed to be valid markdown (since they usually come out of their markdown parser), while ours need to be validated/adjusted before we can pass them as a valid markdown.

Since our AST would be based on markdown it would be straightforward to implement ContentNode -> goldmark/ast.Node converter, after which we can use goldmark-markdown to render goldmark AST back into markdown, built-in renderer to produce HTML or goldmark-latex to produce LaTeX code.

@Andrew-Morozko Andrew-Morozko self-assigned this Apr 19, 2024
@traut
Copy link
Member

traut commented Apr 22, 2024

+1 on using goldmark's AST, seems like a nice data model (better than plain Markdown strings) we can build upon

@traut traut added this to the v0.6 milestone Apr 22, 2024
@traut traut added the enhancement New feature or request label Apr 22, 2024
@dobarx
Copy link
Contributor

dobarx commented Apr 22, 2024

@traut
-1 on using goldmark's AST. Like @Andrew-Morozko mentioned their AST is strictly based on markdown. If we used their AST directly it's even more complicated than using markdown strings all together and would be harder to extend with nodes that are specific for other outputs, like html or pdf.

@traut
Copy link
Member

traut commented Apr 22, 2024

Hmm, okay. Do you have any suggestions on what data model to use here for the content trees? I would love to use an existing one instead of building one, and if it comes from the lib we'll be using -- it's a bonus.

Markdown strings on their own are quite versatile, being a serialized version of a data model -- whatever the plugin returns, we can parse and adjust.

@dobarx
Copy link
Contributor

dobarx commented Apr 22, 2024

Yes, markdown strings are versatile but it creates a problem that we must do a a lot of string manipulation and sanitization at plugin level and this creates bugs.

Using existing library AST for this might be also tricky since we would need to encode/decode this tree using protobufs and it would be pretty hard to extend it since the focus for that tree is just one output - Markdown. But that doesn't work well since we also have a concept of section which doesn't render into a specific markdown element but can effect markdown content inside.

If we want to extend it then it's best just to make a copy of their AST tree nodes that we could convert to goldmark's AST when rendering markdown. We don't really need all functionality of goldmark AST we just need similar data structure for core content elements.

@traut
Copy link
Member

traut commented Apr 22, 2024

we must do a a lot of string manipulation and sanitization at plugin level and this creates bugs.

I like that this is a plugin's problem — Fabric expects a valid Markdown string, so the plugin can do whatever it wants to produce it. It does limit us, though, and is a ticking trouble :) It's too flexible, so the validation of the content is difficult (what if the plugin returns a huge, complex Markdown that will mess up the whole document?) but I'm not too worried about that just yet.

We don't really need all functionality of goldmark AST we just need similar data structure for core content elements.

I agree, I think you are right. Our model can be more straightforward and should not be tied to Markdown explicitly -- we're going to support other serializations anyway. On that note, we can use our template data model as an inspiration for our content data model, adopting a concept of block with a content type. Even if it's just to wrap a markdown strings into nodes with metadata - it will still be very useful, I think

@traut
Copy link
Member

traut commented Oct 23, 2024

I guess this can be closed @Andrew-Morozko? We'll continue working on the migration to the new structure

@Andrew-Morozko
Copy link
Contributor Author

I guess this can be closed

@traut I'd rather close it after part two of AST is merged in, but it's up to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants