Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adjust asset job backfills to respect backfill policies (dagster-io#2…
…1259) ## Summary & Motivation Addresses issues discussed [here](dagster-io/internal#7417). This PR adds a `backfill_policy` to `JobDefinition`, which is then accessed in the backfill daemon via `ExternalPartitionSet`. This allows the backfill daemon to execute a job backfill (distinct from an asset backfill) as a ranged backfill, if that is what is specified by the policy. Details: - Only asset jobs can have a non-null backfill policy. Non-asset jobs have a `None` `backfill_policy`, which means status quo behavior is preserved. - All assets in an asset job must have the same backfill policy. This matches the existing constraint that they must all have the same partitions definition. An error is thrown at asset job resolution for differing backfill policies. This means that all selected assets in a job will always be able to executed together in a single run, which is a tighter constraint than asset backfills have, but is consistent with how jobs work in general. - Passing a `PartitionedConfig` as the config for an assets job will cause a resolution error if the job backfill policy specifies ranged backfills. That is because `PartitionedConfig` behavior is currently undefined for ranges (this can be addressed in a future PR). Passing a static config still works. Other notes: - I first explored adding support for backfill policies on asset jobs by modifying the asset backfill pathway to accept a job name. This ended up being over-complicated and brittle. The further approach does not add any new dependencies on the "job" concept, which is good as we want to move away from the centrality of jobs. ## How I Tested These Changes Manually and with new unit tests. One snapshot updated to account for new backfill policy property on ExternalPartitionSet. Manual details: Load these definitions: ``` from dagster import ( AssetExecutionContext, BackfillPolicy, Config, Definitions, IOManager, StaticPartitionsDefinition, asset, define_asset_job, ) parts = StaticPartitionsDefinition(["a", "b"]) class DummyIOManager(IOManager): def __init__(self): super().__init__() self._db = {} def handle_output(self, context, obj) -> None: pass def load_input(self, context) -> int: return 1 class FooConfig(Config): name: str @asset(partitions_def=parts, backfill_policy=BackfillPolicy.single_run()) def foo(context: AssetExecutionContext, config: FooConfig): context.log.info(config.name) return {"a": 1, "b": 2} @asset(partitions_def=parts, backfill_policy=BackfillPolicy.single_run()) def bar(context: AssetExecutionContext): return {"a": 1, "b": 2} asset_job = define_asset_job( "asset_job", [foo, bar], tags={"alpha": "beta"}, config={"ops": {"foo": {"config": {"name": "harry"}}}}, ) defs = Definitions( assets=[foo, bar], jobs=[asset_job,], resources={"io_manager": DummyIOManager()}, ) ``` Select the `asset_job` in the UI <img width="1456" alt="image" src="https://github.com/dagster-io/dagster/assets/1531373/b0f7a47e-b46e-43f5-bdfc-41dc9460cb46"> Click "Materialize all" to launch a backfill. Note that it advises that backfill policies will be respected. This is false (before this PR): <img width="797" alt="image" src="https://github.com/dagster-io/dagster/assets/1531373/2575738b-01ae-4c9f-93d9-7acfa1993b8d"> Result before this PR (2 runs launched!): <img width="1328" alt="image" src="https://github.com/dagster-io/dagster/assets/1531373/347fff85-1fd8-4cb8-a35c-9806113f8b4e"> Result after this PR (only 1 run launched): <img width="1328" alt="image" src="https://github.com/dagster-io/dagster/assets/1531373/941f1a24-5948-4347-99a8-8f43887b28fd">
- Loading branch information