-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search relevancy sandbox #392
Comments
Update 2023-04-12DoneNext
Blockers |
Update 2023-05-02DoneNext
|
Both the rapid iteration IP (#1985) and the staging database recreation DAG (#1989) are underway. Most of the implementation for the DAG is complete, with #2207 and #2211 being the final two pieces before that can be shipped. We'll also need to complete #1990 before we turn the DAG on for regular use. |
The staging database restore DAG is complete! 🥳 I will enable it for the first time once I'm back from WCEU next week. |
The implementation plan for the rapid iteration (#2133) has been merged and the associated issues (#2370, #2371, #2372) have been created. This work is necessary to perform serially but it can begin immediately! The initial implementation plan for the 3rd portion of this project has also been published by @krysal in #2358 |
Hi @AetherUnbound, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information. |
The staging database restore DAG work was completed and was enabled for the first time this week! While it completed successfully, @sarayourfriend encountered some trouble with the Terraform state which was referencing the RDS instance (in that the instance appeared to Terraform to be removed, even though a new instance was spun up). Sara corrected the state file and we believe this issue is resolved, but we'll need to monitor the next run to see if we encounter the same issue and adjust either Terraform or the DAG accordingly. @krysal opened the final IP for this work in #2358, and we've begun discussing it there. |
After the subsequent runs the changes in Terraform work as expected, no further work here is needed. |
Per the priorities meeting discussion that happened around this project, we'll be putting this project on hold for now. We'll continue the review process to merge #2358, but hold off on any ongoing development efforts in favor of other ongoing projects. |
Hi @AetherUnbound, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information. |
We've just pulled this project off of on-hold - the Elasticsearch rapid iteration milestone issues can be worked on once we address the API stability! |
Hi @AetherUnbound, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information. |
The work on this project has been slower due to our current effort to focus on #3197. With that work slowing down, we should be able to start working on this project again this cycle. |
Hi @AetherUnbound, this project has not received an update comment in 14 days. Please leave an update comment as soon as you can. See the documentation on project updates for more information. |
The only update is that we now have a PR open for creating the staging indices: #3232 |
While working on #3232, I went back and read the implementation plans for this project and came away with a lot of questions. It's possible some of this may be answered in comment threads on the IP prs that didn't make it into the final text, so apologies if I'm rehashing anything! Here's a really brief summary of my understanding of the three IPs involved in this project, and the DAGs they describe:
So here are my questions:
|
I'll try my best to answer the above, @krysal may also be able to provide some additional context!
Part of that move was predicated by the API response time investigation project, and the motivation was as you mention to reduce pressure on the live indices while the data refresh is happening. Since we were able to reduce response times through a combination of related media query simplification and API ASGI implementation, I think we should be safe to target live indices for this DAG in the future. With the frequency of the data refresh (and the ease with which we could change the order of operations for it!), it made sense to make that change. For the more on-the-fly index generation that this new DAG was intended to enable, I'm not sure how we'd get around using the live indices (unless we decided to create a whole new index in order to create the index from that, in which case we should just create the index from the database with the new settings). All that to say that I think with how infrequently this particular DAG is likely to run and how improved our Elasticsearch response time is, I think this should be a safe thing to do now 🙂
That's a great point, I like the idea of adding those new configuration options!
As I understand it,
@sarayourfriend also mentioned this in the third IP. I also support having separate DAGs for these but as we're developing them we should keep reuse in mind!
That could be useful! Especially for testing changes to the data refresh process. I think one thing we'd want to add on an infrastructure level is an official DNS name to the staging data refresh server, because currently we only reference it by IP which would change every deployment. Hopefully that helps! I'd love some other folks to weigh in, but I think it might make sense to take some of these changes back to the IPs and update them appropriately once we've desided 😄 |
@stacimc You summarized the parts of these plans very well. The questions are quite valid, and I pretty much agree with the answers that @AetherUnbound has already given, I'll just add a few comments.
Exactly! The Given the complexity of this project at the beginning, it was decided to split the different forms for creating the indexes (by type, environment, and size with custom configurations) in different IPs, and therefore, in different DAGs, it was more understandable and manageable this way. Now that the requirements are clearer and there is more familiarity with ES we can see opportunities to merge the DAGs, but I'll wait to build at least some of them before doing so, because the many moving parts are still there, and will be easy to test them separately, although that's my opinion! Maybe others see it differently. I also believe the same DAGs should probably apply to both environments at the end of the day. |
Thanks @krysal and @AetherUnbound, that additional context makes a ton of sense :)
That's a great summary, totally clicked for me. I think the underlying IPs are great, but I was stuck on a higher-level picture of how these DAGs are practically going to be used and relate to one another. That's actually sort of split out into a separate, future project 😅 For now, it makes a lot of sense to just be really flexible in the implementation, as already described :) |
Update: I've opened a PR for the proportional index DAG which is almost ready, mostly needing a final test and then description/testing instructions. This PR also happens to tackle much of the work needed for this issue to add a DAG for pointing aliases; once it's merged that issue can be resolved very quickly. While implementing the proportional DAG I did notice a problem for which I filed a bug (#3761). In looking at that now I actually think it might've been easier to implement the fixed version than the original idea, so I may update the proportional index DAG PR to include that fix as well. Summary: there are only a few issues left in the milestone, the largest of which is almost fully implemented. The remaining issues should be tackled in the next week or so. |
The proportional index DAG was merged after awhile in code review, including the fix for the #3761 bug that was filed while working on it! All that remains is the point alias DAG, which is in progress and a much smaller issue. |
A PR is open for the point alias DAG. While working on it I noticed that a few of the concurrency dependencies between all the related elasticsearch DAGs had not been set up properly, or noticed in review. That is a huge pain point of having all these DAGs and very easy to miss. I prototyped an idea for handling that in a more automated way and created an issue (#3891). At minimum the dependencies need to be fixed; ideally, that issue is implemented and this problem is solved into the future. My plan is to timebox the larger issue and fall back to simply manually fixing the dependencies if there are any issues with the more complex approach. |
The final PR for this milestone has been merged! Moving this project to |
Per the project's stated success criteria, which is:
I think this could be considered a success. The process docs note that a project-specific retro could be held at this stage as well, but feedback related to this project has already been discussed at the two most recent retros. I would like to propose that this project be moved to |
I would be comfortable moving this to |
Excellent! Moving to |
Description
Modify the API staging environment to include a proportional subset of media from each provider. Increase the frequency of data refreshes.
This project does not address metrics for measuring and tracking relevancy.
To implement this project, we want to read the production
/stats/
endpoints to get the media totals for each provider, then scale these numbers down to set the ingestion limit per provider in staging.This project could also include the update of the Elasticsearch cluster (or setting up a new cluster and moving the staging there).
Documents
Project homepage
Issues
Milestones
Prior Art
The text was updated successfully, but these errors were encountered: