Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search_data & search_datasets documentation parameter list is incomplete #345

Open
mfisher87 opened this issue Nov 9, 2023 · 10 comments
Open
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@mfisher87
Copy link
Collaborator

mfisher87 commented Nov 9, 2023

Currently, kwargs are passed from search_data to a DataGranules object's parameters method, which also accepts kwargs. The possible keys are fixed by the methods on DataGranules, and populated dynamically by matching keys to method names.

if earthaccess.__auth__.authenticated:
query = DataGranules(earthaccess.__auth__).parameters(**kwargs)
else:
query = DataGranules().parameters(**kwargs)

The same technique is used for both data & dataset search functions.

However, the interface is untyped and the documentation is generated from a hand-maintained docstring. I feel we should move towards fully typing the interface and generating the docs from the annotations. Python's type system feels like a toy at times like this. How do we ensure that a DataGranules method exists for each parameter on the search_data interface? Maybe we should instead collect the parameter methods in a dict? Just thinking out loud.

@mfisher87 mfisher87 changed the title search_data documentation includes cloud_hosted parameter in example, but not in parameter list search_data documentation parameter list is incomplete Nov 9, 2023
@mfisher87 mfisher87 added the documentation Improvements or additions to documentation label Nov 9, 2023
@andypbarrett
Copy link
Collaborator

I am currently working on updating the earthaccess docs. My focus is on the "Getting Started", "User Guide" and "How Tos". I can think about how we can do this as part of my doc work, so I'll assign it to myself. But it seems like this API documentation might be automated, I will need to learn about this.

@andypbarrett andypbarrett self-assigned this Nov 9, 2023
@paolodep36
Copy link

Hi, i have the same difficulties. I wanted to download AST L1T data and download it with a filter based on cloud coverage. Is it possible to print a dataframe with the dates and specifications such as Day/ night and cloud coverage? Thanks in advance

@MattF-NSIDC
Copy link

MattF-NSIDC commented Nov 15, 2023

Hey @paolodep36, thanks for posting your question! Is there a specific parameter you're expecting to see in the docs for search_data()? If not, let's move your question to another place so we can focus on it better:

  • If you think you have a bug on your hands (or would like to request a new feature), can you please open a new issue?

  • If you're looking for more general support (i.e. you aren't sure whether you found a bug or need help), can you please post on our Q&A discussion board?

@mfisher87
Copy link
Collaborator Author

mfisher87 commented Mar 1, 2024

More confusion that I think comes from this docs problem: #478

In this case the undocumented parameter was point.

@mfisher87
Copy link
Collaborator Author

More pain for our users: #504 (comment)

I'm going to pin this.

@mfisher87 mfisher87 pinned this issue Mar 30, 2024
@mfisher87 mfisher87 changed the title search_data documentation parameter list is incomplete search_data & search_datasets documentation parameter list is incomplete Mar 31, 2024
@andypbarrett
Copy link
Collaborator

I'm listing the parameters for search_datasets that are not documented in the docstring. Probably need to look at python-cmr and CMR API documentation for definitions for some of these.

Missing parameters from search_datasets from DataCollections

  • concept_id
  • short_name
  • instrument
  • project
  • parameters??
  • fields??
  • cloud_hosted
  • data_center
  • temporal
  • bounding_box

Parameters in CollectionQuery

  • archive_center
  • native_id
  • tool_concept_id
  • service_concept_id

Parameters in GranuleCollectionQuery

  • online_only
  • version
  • point
  • circle
  • polygon
  • line
  • downloadable
  • entry_title

@andypbarrett
Copy link
Collaborator

Parameters for search_data not in docustring

From DataGranules

  • data_center
  • orbit_number
  • cloud_hosted
  • granule_name
  • online_only
  • day_night_flag
  • instrument
  • platform
  • cloud_cover
  • point
  • polygon
  • line
  • downloadable

From GranuleQuery

  • granule_ur - (is this the same as granule_name?)

@mfisher87
Copy link
Collaborator Author

We shouldn't have to (or rather, I don't want us to ;) ) maintain parameter lists in the docstring. I'd love to hack on how we can get our autodoc tool to pick them up, but it may require some architectural changes!

@andypbarrett
Copy link
Collaborator

Having this automated would be great. I do wonder how information about what a keyword is/does get's into this workflow.

@chuckwondo
Copy link
Collaborator

Automating this might be quite tricky, but I'm not sure it's the route we should take. I believe we should avoid any attempt to exhaustively list supported keyword arguments within the docstring.

In fact, I'm going to make a potentially controversial suggestion: we should consider deprecating search_datasets and search_data, precisely because of both the difficulty in maintaining "dynamic" documentation for them and the difficulty they give users in trying to figure out exactly what keywords are supported. Both problems are rather frustrating.

Instead, we should encourage direct use of DataCollections and DataGranules, so that we avoid both such difficulties. We may also want to consider renaming them, as I believe there may be a discussion elsewhere as to their arguably not-so-intuitive names. Even further afield, we should really look to upstreaming all of this to python_cmr, but I digress.

Short of such a potentially controversial change, something that might at least help users with the existing approach would be to provide type hints. In this case, my proposal would be to define a TypedDict of supported keyword arguments that correspond to query parameter methods, coupled with typing.Unpack. We would do this for both the search_datasets and search_data, since they don't support all of the same keyword arguments.

For example, something along the following lines:

from typing_extentions import Union, Unpack


class SearchDatasetsKwargs(TypedDict, total=False):
    """Keyword arguments for finding collections via `earthaccess.search_datasets`.

    Attributes:
        short_name: value to pass to [earthaccess.DataCollections.short_name][]
        ...
    """
    short_name: str
    cloud_hosted: bool
    provider: str
    # ... and so on (one for each method that sets a query parameter)


class SearchDataKwargs(TypedDict, total=False):
    """Keyword arguments for finding granules via `earthaccess.search_data`.'

    Attributes:
        short_name: value to pass to [earthaccess.DataGranules.short_name][]
        ...
        temporal: tuple of arguments to pass to [earthaccess.DataGranules.temporal][]
        ...
    """
    short_name: str
    cloud_hosted: bool
    provider: str
    temporal: tuple[...]
    # ... and so on (one for each method that sets a query parameter)


def search_datasets(**kwargs: Unpack[SearchDatasetsKwargs]) -> List[DataCollection]:
    ...


def search_data(**kwargs: Unpack[SearchDataKwargs]) -> List[DataGranule]:
   ...

This would at least give users of IDEs some help with auto-completion.

Of course, this still requires a bit of maintenance effort, because whenever a new method for setting a query parameter is added, an attribute of the same name should be to the corresponding *Kwargs class, but at least we should be able to readily provide docstring links between *Kwargs class attributes and their corresponding methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: 🆕 New
Development

No branches or pull requests

5 participants