Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess (meta)data quality #16

Open
PGijsbers opened this issue Jul 5, 2024 · 2 comments
Open

Assess (meta)data quality #16

PGijsbers opened this issue Jul 5, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@PGijsbers
Copy link
Member

PGijsbers commented Jul 5, 2024

For a given dataset, assess its (meta)data quality.
This may include things like:

  • Does it have a clear title?
  • Is a clear description of the data provided (note that this is not specific to the description field)?
    • Do the features have clear names and are they described?
    • Does the description mention where the data is sourced from?
    • Does the description mention the intended use-case for the dataset?
    • Is there author information and a way to cite it?
  • Is the data used by others?
    • Does it have runs?
    • Is it included in benchmarks?
  • If multiple versions of the dataset exist, is it clear what the difference is to other versions?

A good source of inspiration can be the "datasheets for datasets" paper.

We can use this for multiple purposes, e.g.:

  • Identify what metadata is frequently missing, and better prompt the users to provide this information on upload.
  • Add prompts to the dataset page that make users aware important information is missing, or allowing them to add it.
  • Influence dataset search results based on dataset quality.
  • Set up projects to improve dataset quality w.r.t. specific aspects which may be easier to crowd-source/automate.
  • Deactivate datasets of exceptionally poor quality.
@PGijsbers PGijsbers changed the title Assess metadata quality Assess (meta)data quality Jul 5, 2024
@PGijsbers PGijsbers added the enhancement New feature or request label Jul 5, 2024
@PGijsbers PGijsbers added this to the Metadata Quality milestone Jul 5, 2024
@Taniya-Das Taniya-Das self-assigned this Jul 17, 2024
@PGijsbers
Copy link
Member Author

Unassigning Taniya as she is currently also busy with the filter extraction (and AIoD). Taniya, you are free to join back on this issue later. I am sure this general topic is not something easily solved. That said, in the mean time I want to make clear that it is also open for contributions from other people.

@Taniya-Das
Copy link
Collaborator

Sounds good, I will attach the code repo so others can contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants