Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger path concatenator from within the matcher/merger subsystem #2527

Open
paul-butcher opened this issue Jan 24, 2024 · 4 comments
Open

Comments

@paul-butcher
Copy link
Contributor

paul-butcher commented Jan 24, 2024

Path concatenator is currently in the relation_embedder subsystem, but this is the wrong place for it as it writes to works-merged, so it belongs in the matcher/merger subsystem.

It could be triggered as part of the sendWorkOrImage function in the merger.

@paul-butcher
Copy link
Contributor Author

This is a non-trivial refactor, so I'll put in some of my thinking now so I don't have to relearn it later.

  1. move the path_concatenator code - it should be in matcher_merger
  2. move the triggering code, it is currently in router, it should be in sendWorkOrImage
  3. change the downstream notification to send work ids to merger output.

Perhaps this can be better expressed without being a separate stage, and instead being run as part of the merger?

@paul-butcher
Copy link
Contributor Author

It needs to be a (sub)stage after merger, as it relies on the right data being in the merged database.

@paul-butcher
Copy link
Contributor Author

paul-butcher commented May 22, 2024

There is a minor conflict between efficiency and purity here. The current incarnation knows that there is a collectionPath in each record it messes with, so it can notify the relationEmbedder via the pathSender.

However, I want us to be able to describe each full stage (either individual standalone apps, or a subsystem like relation_embedder or matcher_merger) as accepting and sending work ids.

As a result, the concatenator, as the final stage within the matcher_merger subsystem, should notify downstream with work ids, which will then be used by the merger to retrieve the work and notify the next stage in its own subsystem with a path.

This inefficiency is pretty minor, and it's better to be clear about boundaries.

@paul-butcher
Copy link
Contributor Author

I think it also needs to accept a work id. That way it knows what to notify downstream about if nothing changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant