Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add table eligible_deals #485

Merged
merged 5 commits into from
Jan 9, 2025
Merged

Conversation

bajtos
Copy link
Member

@bajtos bajtos commented Jan 8, 2025

We need to enhance the list of deals eligible for retrieval testing with new columns (piece_cid, piece_size). This change involves several components, it would be tricky to deploy it incrementally while not breaking spark-api and fil-deal-ingester.

Since the current table retrievable_deals has a confusing name, I decided to create a new table eligible_deals that we can incrementally populate without breaking the existing operations, and switch over when everything is ready.

This pull request adds a DB migration script to set up the new table.

Links:

@bajtos bajtos requested a review from juliangruber January 8, 2025 08:58
Signed-off-by: Miroslav Bajtoš <[email protected]>
@bajtos bajtos force-pushed the feat-eligible-deals-with-piece-info branch from fed4add to 41d8d50 Compare January 8, 2025 09:00
@bajtos bajtos requested a review from NikolasHaimerl January 8, 2025 09:01
@bajtos bajtos marked this pull request as ready for review January 8, 2025 09:03
@juliangruber
Copy link
Member

This pull request adds a DB migration script to set up the new table.

What is the motivation for not including the logic for populating the table in this PR as well?

@bajtos
Copy link
Member Author

bajtos commented Jan 8, 2025

This pull request adds a DB migration script to set up the new table.

What is the motivation for not including the logic for populating the table in this PR as well?

Great question.

First of all, the table is populated by fil-deal-ingester, see filecoin-station/fil-deal-ingester#30

In the past, I always included ~10k deals to seed the table. Eventually, we ended with more than 5 old schema migration scripts populating 10k deals that were deleted by subsequent migration scripts. Because DB schema migrations are immutable, we have to keep them around forever. I find that rather inefficient.

So this time, I want to try something else:

  • Keep the table empty,
  • If tests need some deals in the DB, populate the deals from a test data builder script.

These test data builders can be updated as we evolve the schema.

Now that I wrote this, I realised a potential flaw:

  • In the current setup with 10k deals in the DB, when we write a new database migration modifying the table with eligible deals, the migration is automatically tested and verified just by running the entire DB migration from an empty DB to the latest schema version.
  • In the new setup, database migrations modifying eligible_deals will be executed against an empty table, so we won't catch certain classes of errors like adding a new column with NOT NULL constraint and no default value set.

In that light, I am going to add 10k rows to eligible_deals to seed the new table.

@bajtos bajtos force-pushed the feat-eligible-deals-with-piece-info branch from 48a9653 to 7c365eb Compare January 8, 2025 13:23
@bajtos
Copy link
Member Author

bajtos commented Jan 8, 2025

@juliangruber PTAL again

@juliangruber
Copy link
Member

Got it, again it was hard to reason about the changes as another repository is responsible for managing the table's data. Can you please add a comment to either migration file, so that a reader can understand where the production data will come from?

@juliangruber
Copy link
Member

juliangruber commented Jan 9, 2025

Got it, again it was hard to reason about the changes as another repository is responsible for managing the table's data. Can you please add a comment to either migration file, so that a reader can understand where the production data will come from?

For clarity, I would prefer if the script managing the table's data and the management of the db lived in the same repo. This could for example be achieved by having a deals service that takes care of ingestion and then exposes via a REST Api. Or, the deal ingester could be another submodule of spark-api.

This comment doesn't affect that PR.

@bajtos
Copy link
Member Author

bajtos commented Jan 9, 2025

Got it, again it was hard to reason about the changes as another repository is responsible for managing the table's data. Can you please add a comment to either migration file, so that a reader can understand where the production data will come from?

For clarity, I would prefer if the script managing the table's data and the management of the db lived in the same repo. This could for example be achieved by having a deals service that takes care of ingestion and then exposes via a REST Api. Or, the deal ingester could be another submodule of spark-api.

I agree this is not optimal.

For Spark v2, I envision a new component that will manage the list of eligible deals and provide a REST API to create a sample of active deals. This will allow us to simplify spark-api - the round tracker will call that new REST API to obtain a list of tasks for the newly started round and we won't need to manage the table with eligible deals in spark-api at all.

We will start working on that new component very soon - see Deal Observer: initial implementation (deal activation events).

For now (Spark v1 and v1.5), we need to keep the existing table and fil-deal-ingester service. Having written that, I can imagine bringing fil-deal-ingester into this repository, so that we can make changes to the database schema and the code writing to the tables together in one PR. On the other hand, there were only 6 migration scripts touching the list of eligible deals in the last year, the last one in August 2024.

@juliangruber WDYT? Is it worth investing our time into improving this area?

Signed-off-by: Miroslav Bajtoš <[email protected]>
@bajtos bajtos requested a review from juliangruber January 9, 2025 12:18
@juliangruber
Copy link
Member

Sounds great! I think it's fine to keep things separated for now as Spark v2 will revisit this anyway

@bajtos bajtos merged commit aaa852b into main Jan 9, 2025
8 checks passed
@bajtos bajtos deleted the feat-eligible-deals-with-piece-info branch January 9, 2025 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ done
Development

Successfully merging this pull request may close these issues.

2 participants