Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1/n] Add with_source_code_references fn to add source location metadata to assets #21465

Merged
merged 8 commits into from
May 8, 2024

Conversation

benpankow
Copy link
Member

@benpankow benpankow commented Apr 26, 2024

Summary

Rework of #12096. Adds a with_source_code_references wrapper which attaches source code location metadata to the assets which it wraps.

Stacked PRs handle displaying it in the UI.

Right now, the code source metadata is structured as a list of file references, each includes a path to the source file and and line number, along with an optional label to be shown in the UI. Users can choose to manually add these to their asset definitions, either via decorator or as part of an asset factory.

@asset(
    metadata={
        **CodeReferencesMetadataSet(
            source_data=CodeReferencesMetadataValue(
                code_references=[
                    LocalFileSource(
                        file_path=..., line_number=...
                    ),
                    LocalFileSource(
                        label="SQL file", file_path=..., line_number=...
                    )
                ]
            )
        )
    }
)
def my_multiple_source_asset() -> pd.DataFrame: ...

The with_source_code_references wrapper can be used to automatically attach local file paths and line numbers to assets using inspect. This behavior may later become default-on, but for now is explicitly opt-in.

@asset 
def my_asset() -> pd.DataFrame:
  ...

@asset 
def my_other_asset() -> pd.DataFrame:
  ...

defs = Definitions(
  assets=with_source_code_references([my_asset, my_other_asset])
)

Test Plan

New unit tests.

@benpankow benpankow marked this pull request as draft April 26, 2024 21:30
@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from ebafe76 to e032f1d Compare April 26, 2024 23:27
@benpankow benpankow marked this pull request as ready for review April 26, 2024 23:28
@benpankow benpankow requested a review from sryza April 29, 2024 15:47
@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from e032f1d to d0ff45e Compare April 29, 2024 15:50
@benpankow benpankow requested a review from rexledesma April 29, 2024 16:36
@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from 04406c7 to 859f329 Compare April 29, 2024 22:20
@@ -257,6 +257,8 @@ export const MetadataEntry = ({
</IconButton>
</Group>
);
case 'SourceMetadataEntry':
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimal change to placate bk, actual tsx changes are stacked

Copy link

github-actions bot commented Apr 29, 2024

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-doucdpro0-elementl.vercel.app
https://benpankow-add-code-source-metadata-wrapper.core-storybook.dagster-docs.io

Built with commit 8d6aefa.
This pull request is being automatically deployed with vercel-action

@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from 0b354bc to d46bc1d Compare April 30, 2024 17:29
@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from d46bc1d to bda13a9 Compare May 2, 2024 00:08
@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from bda13a9 to 9fb26a5 Compare May 2, 2024 00:23
@benpankow benpankow changed the base branch from master to benpankow/restructure-metadata-py May 2, 2024 00:23
@benpankow benpankow requested a review from sryza May 2, 2024 00:24
@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from 0a18e77 to f7932d1 Compare May 2, 2024 00:54
Base automatically changed from benpankow/restructure-metadata-py to master May 2, 2024 16:00
benpankow added a commit that referenced this pull request May 2, 2024
## Summary

Some restructuring/cleanup to make #21465 a bit easier
@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from f7932d1 to 4afa85d Compare May 2, 2024 16:01
@benpankow benpankow requested a review from sryza May 2, 2024 18:02
@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from 0a9304d to c9a6c28 Compare May 2, 2024 18:03
@schrockn schrockn requested review from schrockn and removed request for schrockn May 2, 2024 19:28
@benpankow
Copy link
Member Author

PR is updated to move the label to the LocalFileCodeReference class & just use a list instead. Updated description to match.

have it default to the last component of the file path (e.g. if the file is /at/some/folder/foo.py the label would default to foo.py), which I think will be a suitable default in 99% of cases

This is a good idea, will incorporate in downstream UI PRs.

While I think with_source_code_references is a good approach to experiment with and opt-in into this feature, I believe our goal should be to have this applied by default.

Agreed - right now the feature is explicitly opt-in because of the potential cost of inspect calls (& because the output filepaths/line numbers may be misleading for more advanced cases like asset factories), but once we stabilize the impl I think it makes sense to roll out by default.

@benpankow benpankow requested review from schrockn and sryza May 6, 2024 17:51
@benpankow benpankow dismissed stale reviews from schrockn and sryza May 6, 2024 17:52

queue

@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch 2 times, most recently from 8e20595 to fceb4be Compare May 6, 2024 23:22
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely. Thanks for working through this.

@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from fceb4be to 33d4e57 Compare May 7, 2024 17:15
Copy link

github-actions bot commented May 7, 2024

Deploy preview for dagster-university ready!

✅ Preview
https://dagster-university-912my3521-elementl.vercel.app
https://benpankow-add-code-source-metadata-wrapper.dagster-university.dagster-docs.io

Built with commit 33d4e57.
This pull request is being automatically deployed with vercel-action

@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from 33d4e57 to 180d3e7 Compare May 7, 2024 23:23
@benpankow
Copy link
Member Author

benpankow commented May 8, 2024

Merge activity

  • May 8, 8:36 AM PDT: @benpankow started a stack merge that includes this pull request via Graphite.
  • May 8, 8:38 AM PDT: Graphite rebased this pull request as part of a merge.
  • May 8, 8:39 AM PDT: @benpankow merged this pull request with Graphite.

@benpankow benpankow force-pushed the benpankow/add-code-source-metadata-wrapper branch from 180d3e7 to 8d6aefa Compare May 8, 2024 15:37
@benpankow benpankow merged commit f8ba248 into master May 8, 2024
1 of 2 checks passed
@benpankow benpankow deleted the benpankow/add-code-source-metadata-wrapper branch May 8, 2024 15:39
benpankow added a commit that referenced this pull request May 8, 2024
## Summary & Motivation

Displays a button link to view an asset in the user's editor, for code link metadata specified in #21465.
Also hides this metadata from the table.



```python

@asset
def my_asset() -> pd.DataFrame: ...


with_code_source([my_asset])
```

<img width="955" alt="Screenshot 2024-04-26 at 3 41 53 PM" src="https://github.com/dagster-io/dagster/assets/10215173/8d846b4d-377b-4a4e-a9bd-69c30eb9bee4">




If the user specifies multiple files, they are displayed with a dropdown:


```python

@asset(
    metadata={
        **SourceDataMetadataSet(
            source_paths={
                DEFAULT_SOURCE_FILE_KEY: SourcePathMetadataSet(
                    path_to_module=..., path_from_module=..., line_number=...
                ),
                "SQL model": SourcePathMetadataSet(
                    path_to_module=..., path_from_module=..., line_number=...
                ),
            }
        )
    }
)
def my_two_source_asset() -> pd.DataFrame: ...
```

<img width="248" alt="Screenshot 2024-04-26 at 4 23 55 PM" src="https://github.com/dagster-io/dagster/assets/10215173/16ec33ef-8ade-4b52-97ad-6b0a491b426e">
danielgafni pushed a commit to danielgafni/dagster that referenced this pull request Jun 18, 2024
## Summary

Some restructuring/cleanup to make dagster-io#21465 a bit easier
danielgafni pushed a commit to danielgafni/dagster that referenced this pull request Jun 18, 2024
…adata to assets (dagster-io#21465)

## Summary

Rework of dagster-io#12096. Adds a `with_source_code_references` wrapper which attaches source code location metadata to the assets which it wraps. 

Stacked PRs handle displaying it in the UI.

Right now, the code source metadata is structured as a list of file references, each includes a path to the source file and and line number, along with an optional label to be shown in the UI. Users can choose to manually add these to their asset definitions, either via decorator or as part of an asset factory.

```python
@asset(
    metadata={
        **CodeReferencesMetadataSet(
            source_data=CodeReferencesMetadataValue(
                code_references=[
                    LocalFileSource(
                        file_path=..., line_number=...
                    ),
                    LocalFileSource(
                        label="SQL file", file_path=..., line_number=...
                    )
                ]
            )
        )
    }
)
def my_multiple_source_asset() -> pd.DataFrame: ...
```

The `with_source_code_references` wrapper can be used to automatically attach local file paths and line numbers to assets using `inspect`. This behavior may later become default-on, but for now is explicitly opt-in.

```python
@asset 
def my_asset() -> pd.DataFrame:
  ...

@asset 
def my_other_asset() -> pd.DataFrame:
  ...

defs = Definitions(
  assets=with_source_code_references([my_asset, my_other_asset])
)
```



## Test Plan

New unit tests.
danielgafni pushed a commit to danielgafni/dagster that referenced this pull request Jun 18, 2024
## Summary & Motivation

Displays a button link to view an asset in the user's editor, for code link metadata specified in dagster-io#21465.
Also hides this metadata from the table.



```python

@asset
def my_asset() -> pd.DataFrame: ...


with_code_source([my_asset])
```

<img width="955" alt="Screenshot 2024-04-26 at 3 41 53 PM" src="https://github.com/dagster-io/dagster/assets/10215173/8d846b4d-377b-4a4e-a9bd-69c30eb9bee4">




If the user specifies multiple files, they are displayed with a dropdown:


```python

@asset(
    metadata={
        **SourceDataMetadataSet(
            source_paths={
                DEFAULT_SOURCE_FILE_KEY: SourcePathMetadataSet(
                    path_to_module=..., path_from_module=..., line_number=...
                ),
                "SQL model": SourcePathMetadataSet(
                    path_to_module=..., path_from_module=..., line_number=...
                ),
            }
        )
    }
)
def my_two_source_asset() -> pd.DataFrame: ...
```

<img width="248" alt="Screenshot 2024-04-26 at 4 23 55 PM" src="https://github.com/dagster-io/dagster/assets/10215173/16ec33ef-8ade-4b52-97ad-6b0a491b426e">
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants