Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render Gutenberg markup before indexing #118

Open
kevinfodness opened this issue Jan 22, 2020 · 3 comments
Open

Render Gutenberg markup before indexing #118

kevinfodness opened this issue Jan 22, 2020 · 3 comments

Comments

@kevinfodness
Copy link
Member

Currently, SearchPress indexes the raw contents of post_content, which historically wasn't much of an issue because the only thing that wasn't indexed was the rendered output of shortcodes. However, now that Gutenberg is becoming utilized more heavily, the fact that Gutenberg markup (comment tags, for example) are stored in the Elasticsearch index is problematic for a few reasons:

  1. It bloats the index by adding data that doesn't need to be in there;
  2. It interferes with the ability to do highlighting because the highlighting breaks within comment tags, which makes them break on render,
  3. Any dynamically rendered blocks (that is, blocks that render via PHP rather than saving markup in post_content, which is the case for a lot of custom blocks) will not have their actual content saved in the index, and will therefore not be searchable.

I recommend that we run the_content against post content before sending it up to the index.

@kevinfodness
Copy link
Member Author

As it turns out, this is possible already by setting the 'sp_post_index_filtered_data' filter to return true, but it should be default behavior now that Gutenberg has been merged to core.

@jakewrfoster
Copy link
Member

@kevinfodness I have a use-case for integrating block content into a custom field (perhaps this is the route forward rather than simply filtering the content). I see the comment that @mboynes left on #119. OOC, where was this left off? Have you done any more thinking or written anything else as a result?

@kevinfodness
Copy link
Member Author

I ended up using the sp_post_index_filtered_data filter in my case to ensure that it rendered out the Gutenberg blocks. Personally, I would be in favor of doing this by default in all cases if Gutenberg is active for the post in question, since there is a pretty significant difference in terms of how data is stored to the post content field in Gutenberg vs. the classic editor. To Boynes' point, we could run the filter in a loop (basically, create a loop in order to run the filter in it). It may not be foolproof - we have similar issues with running the_content filter on the Publish to Apple News plugin, where certain plugins don't register filters if the request isn't running on the front-end - but it should be an improvement over the unfiltered behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants