Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype Typesense search support #245

Open
ormsbee opened this issue Dec 17, 2024 · 9 comments
Open

Prototype Typesense search support #245

ormsbee opened this issue Dec 17, 2024 · 9 comments

Comments

@ormsbee
Copy link

ormsbee commented Dec 17, 2024

Acceptance Criteria

  • Prototype a search abstraction layer for content libraries.
  • Prototype a Typesense backend.
  • Refactor Meilisearch support to be a backend.
  • Create rough estimates and development stories for full support.
  • (Question: Should a Tutor plugin be part of this story?)

The scope of this abstraction layer would be to work on browser-oriented search engines like Meilisearch. This would not try to stretch to cover more traditional search engines like Elasticsearch, since doing so would be much more work and present performance concerns.

The biggest challenge is likely relate to tagging (quote from @bradenmacdonald):

Most current usages of Meilisearch are pretty straightforward search/filter that would be relatively easy to map onto TypeSense APIs or Algolia for that matter. But when it comes to filtering using hierarchical tags, which can themselves be filtered by a keyword, the implementation is quite complex and Meilisearch-specific - so that's part that would be trickiest to abstract and/or reimplement. https://github.com/openedx/frontend-app-authoring/blob/b110b6bdc9216759d509c660c025bfb8c6b973d8/src/search-manager/data/api.ts#L304-L506

Background

MIT has expressed concerns about Meilisearch's lack of failover/high-availability. While this feature is on the Meilisearch roadmap, it does not look like it will be prioritized in the near future.

At the same time, Algolia is an extremely popular commercial search engine that Meilisearch modeled its API on top of. While nobody has expressed interest in using Algolia yet, it is a strong long term possibility.

Testing

@blarghmatey, @pdpinch: Before this work kicks off, could you please verify that Typesense will be an acceptable backend? If there's early validation you need to do at the prototype step, I'd like to get a sense for that would look like on your side.

@ormsbee ormsbee converted this from a draft issue Dec 17, 2024
@ormsbee ormsbee changed the title Proof-of-concept TypeSense search backend implementation Prototype TypeSense search backend implementation Dec 17, 2024
@ormsbee ormsbee changed the title Prototype TypeSense search backend implementation Prototype Typesense search backend implementation Dec 17, 2024
@ormsbee ormsbee changed the title Prototype Typesense search backend implementation Prototype Typesense search support Dec 17, 2024
@ormsbee
Copy link
Author

ormsbee commented Dec 17, 2024

FYI: @jmakowski1123, @kdmccormick

@bradenmacdonald
Copy link
Contributor

I'd also like to know more about the root concern here: is it concern that learners and authors won't have access to search features (or e.g. the content libraries UI) during occasional downtime of a non-replicated search engine? And/or is it concern that writes (updates to the search index) will be lost during such downtime?

In other words, if we had a feature for queueing writes so they weren't ever lost, even if the search engine was occasionally unavailable, would that be "good enough"?

@pdpinch
Copy link

pdpinch commented Jan 10, 2025

This isn't just an issue for MIT. For any system expected to handle a high number of concurrent users, it's essential to have more than one node available to service requests. This is crucial for horizontal scaling and fault tolerance. While vertical scaling can be an option, horizontal scaling is more manageable and flexible, especially with variable loads.

There is no reason to complicate this with abstractions that allow for customizable search backends. We should choose a search backend that satisfies most requirements

The platform could tolerate using meilisearch if it's used only for this particular use case -- search content library -- but if it eventually replaces search in Open edX, meilisearch is not a solution we would go with.

@bradenmacdonald
Copy link
Contributor

My assumption has always been that search is neither an essential nor commonly used feature in Open edX. i.e. as a learner, you can log in, access your course, learn, complete exams, use the forum, view your grades, etc. whether or not the search engine is up. Search downtime degrades the experience somewhat but isn't on the critical path. (Content libraries is actually the exception, as the entire UI depends on Meilisearch but it can easily be put on its own Meilisearch instance, and author traffic is orders of magnitude less than learner traffic in most cases.)

Of course, we'd like to change that and see search improved and made more reliable. What I can tell you is as a developer the feature side of things is much easier to improve using Meilisearch than any other open source product we've looked at. And from a reliability perspective we know there is a lot of community interest in getting Meilisearch HA soon. Meilisearch is one of the most quickly-developed open source search engines in terms of regularly releasing new versions with new features and better performance.


In fact, apparently Meilisearch can be run in a cluster, but it doesn't yet have a "nice" way to do this out of the box. You have to turn on the MEILI_EXPERIMENTAL_REPLICATION_PARAMETERS setting, and send all index update tasks to each member of the cluster. Then you can use a regular load balancer to distribute read requests among the cluster members. A few options for implementing this are described in meilisearch/meilisearch#3494 and I think the simplest options could be implemented for Open edX with a relatively small lift.

I believe their cloud offering has something like this in place.

Personally, my preferred actions would be:
(1) We need people to actually test Meilisearch + Open edX at scale and let us know what its actual performance and reliability characteristics are. The easiest way to do this is simply to turn on the new features on a large instance, pointing to Meilisearch Cloud. The more complex way requires a full staging environment and load testing scripts.
(2) If self-hosted HA is required, then I'd prefer to start with attempting a lightweight "minimal HA" solution based on the approach above (since we don't need geo-replication and we can always re-create the entire search index if needed, and we can queue the writes as well, our write requirements are much less than others requesting some of the more advanced HA features).
(3) We can also evaluate TypeSense as an alternative (I have used it before), but I suspect it's going to be a lot of work to evaluate and have other shortcomings that are blockers. And we can't fully evaluate it unless we can test it at scale, which means we need the same thing as (1).

Would MIT be willing to help with (1), and if so what would be required? (Ability to toggle back and forth between Elasticsearch and Meilisearch? Some clustering/HA functionality before you're even willing to test?)

@blarghmatey
Copy link

I think that your first statement is rather telling and points to a symptom of the Open edX project ecosystem that I have seen over the years; namely a tendency to build a point solution to a problem that would be more effectively addressed in a broader and more holistic manner. One of the contributing factors to that tendency is the fact that the Open edX suite of software has been developed over several years by a large and constantly changing set of developers with competing priorities. The fact that you are unsure of whether and how search is essential to the edX functionality is a problem that we, as a community, need to address first before we decide on any solutions. Choosing a technology (whether Meilisearch, Typesense, or any other option) to "do search" without fully knowing what problems and capabilities we are trying to solve for is a recipe for long-term pain.

In terms of the Meilisearch specifically, the fact of the matter is that the issue asking about HA functionality has been open for almost 2 years now with no meaningful progress. The fact that there is an experimental flag with no associated documentation or best practices doesn't change that fact. From what I can determine based on the scant information available, the operator is expected to build their own replication and consensus implementation. That is an extremely non-trivial undertaking and would be an unreasonable expectation of any Open edX operator. This is effectively the same as suggesting that we use SQLite for the relational database. While it's possible to build a replication and fail-over mechanism, it's not part of the design of the engine. And while SQLite is a great technology with many useful applications, it is not something that is suited to the use case of a project like Open edX.

TL;DR is that we need to determine what are the applications for search across the Open edX suite of services and how can we best address those needs with a single core technology that can be integrated across the various processes that comprise a running Open edX system.

@bradenmacdonald
Copy link
Contributor

bradenmacdonald commented Jan 14, 2025

I don't disagree with anything you're saying.

As you know, our more or less official plan was to roll out Meilisearch in Redwood and get feedback:

the Studio Course Search [BETA], which is disabled by default as it depends on a new search engine, Meilisearch. We encourage operators to install Meilisearch, test out this feature, and give us feedback on the viability of using Meilisearch as a replacement for Elasticsearch in future releases of Open edX.

However we received very little feedback, other than from developers who loved working with it, several enthusiastic community members who wanted to start applying Meilisearch everywhere, one major hosting provider who said they're "not that concerned about the HA problem, We don’t actually deploy that many ES clusters, even for large instances, mostly because search isn’t really a critical path in the overall experience so downtime there isn’t as terrible as it could be for Redis for example", and you who said the lack of HA was at least a significant concern if not a showstopper.

For Sumac we developed one additional new feature (content libraries v2 UI) that depends on Meilisearch, and we again hoped to get feedback from operators about this.

Now for context, it's important to note at this juncture that for the overwhelming majority of Open edX installations, the high memory usage of ElasticSearch is a very big, persistent, and real pain point, and HA is not a major concern. In addition, Elasticsearch and Opensearch are continuing to diverge and their API differences and licensing issues can be an issue. This is why the Tutor maintainers and many others became very excited about Meilisearch. In particular, shortly before the Sumac release, the Tutor maintainers decided to just pull the trigger on something they've said they wanted to do for a while: they implemented Meilisearch support as an alternative to ES everywhere it's used in the core distribution, removed ES support from Tutor, and implemented Meilisearch support. I think this is a good decision in terms of benefiting most Tutor users, but it was definitely "jumping the gun" in terms of our plan for a more considered and incremental rollout, after hearing feedback from production use.


We're still at a point where both ES and Meilisearch are officially supported by most of the search features in the platform (other than the two new ones I mentioned), and now is still a good time to evaluate TypeSense as an alternative and do a holistic evaluation of "search" requirements etc. I know @blarghmatey you've offered to help with this in the past and I'd really appreciate any work you're able to invest in this, because as you've seen most of the community seems either indifferent or happy with Meilisearch, so I don't expect a big line of other volunteers ready to work on such evaluations.

TL;DR if we want to do a holistic evaluation of search use cases and/or test out Meilisearch/TypeSense/Algolia? on actual large instance data sets, we're already starting to go against a bit of a headwind so we'll need one or two big players like MIT, Axim, 2U, etc. to make it a priority and put some resources into making it happen ASAP. I'm supportive of such an effort and willing to help.

@ormsbee
Copy link
Author

ormsbee commented Jan 14, 2025

Hi folks! Thank you for pushing this conversation forward. 😄

I'd like to level set on a few things:

The importance of search functionality.

We should consider search to be critical infrastructure. It's already an important part of the student experience and a critical part of the content library experience. Per @jmakowski1123, we're only going to see more critical usage of it going forward in the student experience, so we should position our technical choices accordingly.

Point solutions vs. more holistic ones.

I agree that the project has historically had a lot of point solutions that have been generalized after the fact. That's one of the reasons I tried to engage folks in things like the Discourse thread on this topic, to try to get input from others. We had a tentative direction in March to use Meilisearch, with renewed discussions around HA concerns and Typesense as an alternative starting in August. I stated in that thread that it was likely too late to change things for Sumac, but that we would evaluate after Sumac was released.

Sumac was released. We're now re-evaluating and working towards that more holistic solution. If that means we eventually land on Typesense, then so be it. We just need to start planning for this work.

Funding

Axim can fund the development work to evaluate Typesense and do necessary development work after that. What we can't easily do is test at scale, but it seems like MIT is in a good place to be able to provide that. I think we have the right people and resources to start planning what that will look like in the Teak timeframe.

@ormsbee
Copy link
Author

ormsbee commented Jan 14, 2025

In order to move this forward, I would like to request the following from you folks:

@bradenmacdonald: Much of the description for this ticket was based on stuff that you said or wrote in the past. Please add any more details and line items that you think are relevant to this work.

@blarghmatey: I know that you've previously mentioned that you planned to put aside some time to help with this evaluation. Could you please provide a description of what your evaluation will require? In particular, do you want to focus on the forums backend first, as it's the one that likely comes under the most load? Or both forums and course content indexing? Something else? Synthetic load testing, or trialing it with some live traffic?

We have two sets of work here that are both important to capture, but I don't want to conflate the two:

  1. The set of work needed to do the scale/reliability/operational evaluation.
  2. The set of work needed to fully support Typesense as a backend for the platform as a whole.

@blarghmatey: Your input here is going to be critical with item (1). We're going to want to implement enough to do the Typesense evaluation as quickly as possible, so that we can decide whether or not we need to queue up the rest of the work in the Teak timeframe.

Thank you!

@bradenmacdonald
Copy link
Contributor

Please add any more details and line items that you think are relevant to this work.

I believe that forum search and learner courseware search are by far the use cases that will place the most load on the search engine, take the longest to index, and also require the most uptime. Conveniently, they are also already abstracted somewhat (you can use Elasticsearch or Meilisearch). And they also make only rudimentary use of each search engine's features (unlike, say, the complicated hierarchical tag search in Studio). Indeed, we saw how quickly the Meilisearch backend for them was developed just prior to the Sumac launch.

So my recommendation would be not to develop any new abstraction layer just yet nor worry about the more complex use cases, but to implement a minimal TypeSense backend for these two use cases, alongside the existing ElasticSearch+Meilisearch backends, and then rigorously test all three with production data: indexing speed1, search performance, resource usage, and (if at all possible) uptime under real world conditions.

Keep in mind that others have done some of these tests; for example here you can see that for the HackerNews dataset with 1.1M documents, Meilisearch easily outperforms TypeSense despite TypeSense being an "in-memory" database. And the TypeSense test only runs on a machine with 100GB of RAM2, whereas the 1GB RAM machine can run the same test on Meilisearch. That test was using quite an old version of Meilisearch though; I suspect the new versions are even better.

So I'm mostly interested in learning about the indexing performance (/errors), and uptime.

It would also be great if someone could repeat the same tests with Meilisearch cloud and TypeSense cloud. After all, we are used to paying AWS for ElasticSearch, Atlas for MongoDB, and Algolia for Algolia for large instances that need HA clusters, so it makes sense to try the equivalent cloud offering for Meilisearch and TypeSense.

At the same time, and as a separate GitHub ticket from this one, I support @blarghmatey's proposal for doing an updated, comprehensive look at all the search use cases in the platform; something like this one from 7 years ago. We can also do a feature matrix of the different search engines that are under consideration but I can already tell you there is no search engine that will meet all the requirements we've already identified. This page provides a pretty fair comparison of Meilisearch and TypeSense at a feature level, though it downplays both the importance of HA and the problems people have had trying to scale TypeSense to large datasets.

Footnotes

  1. Note that the recent 1.12 version of Meilisearch, which isn't yet the default for Open edX, more or less doubled the indexing speed, so it would be best to test that version.

  2. the actual RAM required may be significantly less, but it's unclear from their test. They just ran on 100GB and 1GB machines, and obviously TypeSense can't run that dataset within the smaller memory footprint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In grooming
Development

No branches or pull requests

4 participants