Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(chart): download CSV chart contents from S3 for Athena charts #31485

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

william-fundrecs
Copy link

@william-fundrecs william-fundrecs commented Dec 16, 2024

SUMMARY

Add a right click context menu option if the chart is an Athena table to download full chart contents directly from S3 bucket instead of through Superset

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

image

TESTING INSTRUCTIONS

Set required env vars
SUPERSET_REGION=<my_aws_region>
SUPERSET_WORKGROUP=<my_superset-workgroup>
SUPERSET_ATHENA_DB=<my_superset_db>

Enable feature flags
'DOWNLOAD_CSV_FROM_S3'
'SHOW_DEFAULT_CSV_OPTIONS'

Ensure Athena is set to automatically persist query results to S3 bucket in CSV format

Test download of chart through right click option.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

No DB changes

resolves #31482

@github-actions github-actions bot added api Related to the REST API doc Namespace | Anything related to documentation dependencies:npm packages labels Dec 16, 2024
@dosubot dosubot bot added data:connect:athena Related to Athena data:csv Related to import/export of CSVs labels Dec 16, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

@villebro
Copy link
Member

@william-fundrecs I assume this PR was mistakenly opened against upstream Superset, not your fork?

@william-fundrecs
Copy link
Author

@villebro No, this was functionality I thought would be useful to donate back to superset, we already make use of this in our presto Athena DB backed Superset instance.

@rusackas
Copy link
Member

Can we do it without adding two feature flags, perhaps leaning on config? And do we need to single out S3? Could there be numerous other options for where people get/store their CSV?

@rusackas rusackas requested a review from villebro December 17, 2024 18:22
@william-fundrecs
Copy link
Author

Potentially yes, it made sense to me to add the feature flag to enable the S3 feature but perhaps not as sensible to use a feature flag to hide the default CSV/XLSX right click options, if there is a better way to hide these I can change to it.

This is coloured by our own needs, we wanted to hide the default options in favour of using the S3 download as it's a lot faster, our users will only be using full export and retrieving the file handle from S3 is multitudes faster than creating the dataframe and processing the file.

S3 was specified as this is existing Athena functionality, it can be configured to automatically persist query results to a CSV inside a configured bucket. Potentially there are other use cases for other DB engines but we only use Athena Presto and PostGreSQL in-house.

If someone wants to use this PR as a foundation for expanding file output locations that is fine with me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the REST API data:connect:athena Related to Athena data:csv Related to import/export of CSVs dependencies:npm doc Namespace | Anything related to documentation packages size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SIP-150] Use existing Athena Presto functionality for large downloads from S3
3 participants