Skip to content
This repository has been archived by the owner on May 15, 2024. It is now read-only.

Manual flush command for Motion #225

Open
xinaxu opened this issue Nov 13, 2023 · 4 comments
Open

Manual flush command for Motion #225

xinaxu opened this issue Nov 13, 2023 · 4 comments

Comments

@xinaxu
Copy link
Collaborator

xinaxu commented Nov 13, 2023

Context

Motion will automatically flush the data into sectors once the current pending size is more than MOTION_SINGULARITY_PACK_THRESHOLD.
Otherwise, it will wait for 24 hours(hardcoded) before it flush those data into deals.

The impact of current approach is that, anyone who is trying out motion with small files won't see any deals being made or propsoed until 24 hours later.

Solution Choices

Make default flush interval configurable

Instead of hardcoded 24 hours, expose it to be configurable via flags or envvar

Manual flush API for motion

Add a new API in Motion so new user can call the API to flush all pending items into deals.

Use low-level Singularity API

Document the PrepareSource API in Singularity so new user can call it to flush all pending items into deals

@xinaxu
Copy link
Collaborator Author

xinaxu commented Nov 13, 2023

Make default flush interval configurable

This should happen regardless of in which way we decide to make a flush API.

Manual flush API for motion

This somehow contradicts the motion's goal of encapsulating complexity and offer a few of very simple to use APIs. In fact, if we allow this granular operation, we are opening gates for all other granular operations, e.g. change deal schedule interval, add another SP, flush all deals to SP instead of one per minute.
Hence I'm more leaning toward documenting an already existing Singularity API https://data-programs.gitbook.io/singularity/web-api-reference/job#preparation-id-source-name-finalize

@masih
Copy link
Member

masih commented Nov 13, 2023

My vote is to do one and two: make flush time configurable and introduce manual flush in motion under administrative apis.

Rationale: fixed configurations are inevitably less than ideal for some environments. We need manual flush anyway during shutdown. An amin might want the data to be flushed out on service shutdown due to maintenance of a node for example.

@elijaharita
Copy link
Contributor

elijaharita commented Nov 14, 2023

To note, the duration is already configurable, 24h is just the default. --singularityForcePackAfter= or env MOTION_SINGULARITY_FORCE_PACK_AFTER=

@xinaxu
Copy link
Collaborator Author

xinaxu commented Nov 14, 2023

Let me step back and align on understanding of what flush means:

My current thinking about what this operation does is just change of state in the database, and it will not push data or deals to SPs. If that's the expectation, then I'd prefer a different word for this operation, e.g. pack or packAll. I am also concerned that this operation is very specific to Singularity implementation and the meaning is not clear for other impl like jiffy.

If we are talking about finishing all deal packing and deal making with the flush API, then we have more work to do. This is also closer to what user expects from the word "flush", and we can split it into two. FlushPacking and FlushDealMaking. The first completes all data prep and commp, and the 2nd completes all deal proposals to all SPs in one go.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants