Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(datasets): Remove tracking datasets which are used in Kedro Viz Experiment Tracking #969

Merged
merged 5 commits into from
Jan 8, 2025

Conversation

ravi-kumar-pilla
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla commented Dec 17, 2024

Description

Resolves kedro-org/kedro#4370

NOTE: This should be merged after kedro-datasets 6.0.0 release

Development notes

  • Remove docs references
  • Remove json_dataset and metrics_dataset
  • Remove pytests related to tracking datasets

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Updated jsonschema/kedro-catalog-X.XX.json if necessary
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes
  • Received approvals from at least half of the TSC (required for adding a new, non-experimental dataset)

@ravi-kumar-pilla ravi-kumar-pilla changed the title Remove tracking datasets which are used in Kedro Viz Experiment Tracking chore(datasets): Remove tracking datasets which are used in Kedro Viz Experiment Tracking Dec 17, 2024
@ravi-kumar-pilla ravi-kumar-pilla marked this pull request as ready for review December 18, 2024 18:28
Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋🏾

Copy link
Member

@DimedS DimedS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ravi-kumar-pilla !

@ravi-kumar-pilla
Copy link
Contributor Author

Hi @ankatiyar and @DimedS ,

The docs build fails as dask deprecated the usage of dask.dataframe and shifted to dask-expr. As a temporary fix in this PR, I have updated the docstring to stable doc link. I will create another ticket if we decide on migrating to dask-expr. Let me know your thoughts.

Thank you

@ankatiyar
Copy link
Contributor

@ravi-kumar-pilla The docs fix seems fine for now, if the dataset itself is working fine. We can create an issue for the migration!

@DimedS
Copy link
Member

DimedS commented Jan 7, 2025

@ravi-kumar-pilla The docs fix seems fine for now, if the dataset itself is working fine. We can create an issue for the migration!

I agree, thanks @ravi-kumar-pilla and @ankatiyar !

@ravi-kumar-pilla
Copy link
Contributor Author

Hi @ankatiyar and @DimedS ,

Thanks for the response. After further reading, I found that Dask-Expr is a new backend for Dask DataFrame that provides query optimization and other performance improvements. Starting with Dask version 2024.3, it became the default backend.

I don't think there is a need for migration from our side unless we plan to opt out of the optimization by setting the below config -

import dask
dask.config.set({'dataframe.query-planning': False})

Note

The above config is also set to be removed as it had some issues mentioned here

As per the docs, they actually updated the docs link as mentioned here. So the fix in this PR is suffice.

Thank you

@ravi-kumar-pilla ravi-kumar-pilla merged commit 159e0a3 into main Jan 8, 2025
12 of 13 checks passed
@ravi-kumar-pilla ravi-kumar-pilla deleted the chore/remove-et-ds branch January 8, 2025 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kedro-datasets: remove MetricsTrackingDataset and JSONTrackingDataset
3 participants