Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move filtered index creation totally to Airflow #3240

Open
krysal opened this issue Oct 23, 2023 · 3 comments
Open

Move filtered index creation totally to Airflow #3240

krysal opened this issue Oct 23, 2023 · 3 comments
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server ⛔ status: blocked Blocked & therefore, not ready for work 🔧 tech: airflow Involves Apache Airflow 🐍 tech: python Involves Python

Comments

@krysal
Copy link
Member

krysal commented Oct 23, 2023

Problem

Currently, the DAGs for the creation of filtered indexes (for image and audio) depend on the Ingestion Server. There is no reason we can not leave all that work to Airflow and it would be preferable to have fewer moving parts so it's also easier to debug when things go wrong.

Description

Move the create_and_populate_filtered_index function out of the Ingestion Server to the create filtered index DAG in the Catalog.

def create_and_populate_filtered_index(
self,
model_name: str,
origin_index_suffix: str | None = None,
destination_index_suffix: str | None = None,
**_,
):

Additional context

This will be required down the line for other DAGs in the Search relevancy sandbox project.

@krysal krysal added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository 🧱 stack: ingestion server Related to the ingestion/data refresh server 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Oct 23, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Oct 23, 2023
@krysal krysal added 🐍 tech: python Involves Python 🔧 tech: airflow Involves Apache Airflow ⛔ status: blocked Blocked & therefore, not ready for work labels Oct 25, 2023
@AetherUnbound AetherUnbound removed the ⛔ status: blocked Blocked & therefore, not ready for work label Nov 21, 2023
@AetherUnbound AetherUnbound moved this from 📋 Backlog to 📅 To Do in Openverse Backlog Nov 21, 2023
@AetherUnbound
Copy link
Collaborator

Just noting for this that we'll want to make these values configurable:

requests_per_second=15_000,
# Temporary workaround to allow the action to complete.
request_timeout=48 * 3600,

@sarayourfriend
Copy link
Collaborator

Linking this to #3336 as they are relevant to each other.

@stacimc stacimc added the ⛔ status: blocked Blocked & therefore, not ready for work label Feb 21, 2024
@openverse-bot openverse-bot moved this from 📅 To Do to ⛔ Blocked in Openverse Backlog Feb 21, 2024
@stacimc
Copy link
Collaborator

stacimc commented Feb 21, 2024

This is blocked on #3336. If that work goes forward, we will remove the filtered index entirely and this work will not be necessary.

I'm also going to remove it from the search relevancy milestone as it should not be a requirement for that project to be resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server ⛔ status: blocked Blocked & therefore, not ready for work 🔧 tech: airflow Involves Apache Airflow 🐍 tech: python Involves Python
Projects
Status: ⛔ Blocked
Development

No branches or pull requests

4 participants