Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update datasets to use API #6126

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

anthayes92
Copy link
Contributor

@anthayes92 anthayes92 commented Aug 21, 2024

Context:
Datasets are currently downloaded by hitting the S3 bucket (via CloudFront) where the .h5 files are stored. This PR updates the way datasets are downloaded by directing download requests to the Software Cloud managed Datasets Service.

Description of the Change:
The qml.data.load function now queries the Datasets API in order to download datasets.

Benefits:

  • Removes the dependency on the foldermap and data_struct files, alleviating the need to manually manage them.
  • Facilitates tracking dataset downloads for analytics.

Possible Drawbacks:

  • Introduces network dependency for accessing the external API.

Related GitHub Issues:

@anthayes92 anthayes92 marked this pull request as ready for review August 26, 2024 19:20
@DSGuala DSGuala self-requested a review August 26, 2024 19:28
Copy link

codecov bot commented Aug 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.69%. Comparing base (b78565c) to head (10d2903).
Report is 18 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6126      +/-   ##
==========================================
+ Coverage   99.58%   99.69%   +0.11%     
==========================================
  Files         443      444       +1     
  Lines       42273    42141     -132     
==========================================
- Hits        42096    42012      -84     
+ Misses        177      129      -48     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anthayes92 anthayes92 marked this pull request as draft August 26, 2024 20:49
@anthayes92 anthayes92 marked this pull request as ready for review September 13, 2024 21:44
Comment on lines 35 to +36

GRAPHQL_URL = "https://cloud.pennylane.ai/graphql"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GRAPHQL_URL = "https://cloud.pennylane.ai/graphql"
import os
GRAPHQL_URL = os.getenv("DATASETS_ENDPOINT_URL", "https://cloud.pennylane.ai/graphql")

It'll be useful to be able to override this for testing new datasets

branch = parameter_tree
choices = []
for param in parameters:
print(f"Available options for {param}:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We should move the leaf check to the top of the loop, this is helpful because it guarantees that branch is always a non-leaf node in the rest of the loop
  2. A leaf node is a str, not a dict
Suggested change
print(f"Available options for {param}:")
if isinstance(branch, str):
break
if len(branch["next"]) == 1:
branch = next(iter(branch["next"].values()))
continue
print(f"Available options for {param}:")

branch = branch["next"][user_value]
except KeyError as e:
raise ValueError(f"Must enter a valid {param}:") from e
choices.append(user_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
choices.append(user_value)

I don't think we need choices since we're not returning it

Comment on lines +549 to +553
if "next" in branch:
if len(branch["next"]) == 1:
branch = next(iter(branch["next"].values()))
continue
branch = branch["value"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "next" in branch:
if len(branch["next"]) == 1:
branch = next(iter(branch["next"].values()))
continue
branch = branch["value"]

Since we moved this logic to the top of the loop

"""Prompts the user to select parameters for datasets one at a time."""

branch = parameter_tree
choices = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
choices = []

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants