Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[81] - add worker for media clustering #379

Merged
merged 14 commits into from
Sep 9, 2024

Conversation

Snehil-Shah
Copy link
Contributor

@Snehil-Shah Snehil-Shah commented Sep 1, 2024

Resolves #378

Summary: [WIP]

  • add Dockerfile
  • add config
  • add worker logic
  • add payload writer for testing

Comment on lines 152 to 157
# init all operators
audio_vec_embedding_clap.initialize(param={})
vid_vec_rep_clip.initialize(param={})
classify_video_zero_shot.initialize(param={})
cluster_embeddings.initialize(param={})
# dimension_reduction.setup_reduction(model_type='tsne', params={})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chaithanya512 I have commented out the dimension reduction operator as it does not adhere to the Feluda's operator interface conventions and hence was throwing an error when initializing Feluda with a config file. Every operator must only have the initialize and run methods so some refactoring is needed on the dimension_reduction operator.
Keeping it aside, the worker is now working properly on clustering as tested with the payload writer.

@Chaithanya512
Copy link
Contributor

  • Refactored dimension reduction operator to align with Feluda interface.
  • Migrated CLAP module to Hugging Face Transformers to reduce unnecessary dependencies.
  • Enabled support for dimension reduction operator in clustering_media worker.

@aatmanvaidya Both the dimension reduction operator and CLAP operator have passed all unit tests, and the clustering_media worker is functioning correctly for clustering and reduction, as tested with the payload writer.

@aatmanvaidya aatmanvaidya self-requested a review September 9, 2024 06:36
@aatmanvaidya aatmanvaidya marked this pull request as ready for review September 9, 2024 06:36
@aatmanvaidya
Copy link
Collaborator

@Snehil-Shah @Chaithanya512
I have reviewed the PR and tested out the worker locally, things are working fine and as expected!

just one small thing - we should add a try catch block around where clustering and reduction is happening, attaching the lines below

also, I tried printing the report like this

report = make_report_indexed(clustering_results_json, dim_reduction_results_json, "indexed")
print(report)

nothing was getting printed on my terminal - which is a bit strange, but was able to verify the json structure of the report using the rabbitmq UI.

@aatmanvaidya aatmanvaidya merged commit 4abae4d into tattle-made:development Sep 9, 2024
3 of 4 checks passed
@aatmanvaidya aatmanvaidya linked an issue Sep 9, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a Worker for Clustering Media
3 participants