Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: Refactor MaterialsProjectDataset to not serialize pymatgen Structures in LMDB #267

Open
2 of 6 tasks
laserkelvin opened this issue Aug 6, 2024 · 0 comments
Assignees
Labels
code maintenance Issue/PR for refactors, code clean up, etc. data Issues related to data loading, pipelining, etc. enhancement New feature or request

Comments

@laserkelvin
Copy link
Collaborator

Feature/behavior summary

Currently, the workflow implemented for MaterialsProjectDataset will save and reload a pymatgen.Structure object. The issue with this is that it is very intimately tied to the version of pymatgen, where small API changes can make it difficult to reload the dataset in later versions.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?
  • Does this proposal include a new model?
  • Does this proposal include a new dataset?
  • Does this proposal include a new task/workflow?

Related issues

No response

Solution description

If we can refactor it so that Structures are created at load time - in line with other dataset implementations - it would make it break this dependency...breaking.

We would have to re-process the existing LMDBs being distributed, and make sure that the data is stored as just plain coordinates, atoms, and lattice parameters.

Additional notes

No response

@laserkelvin laserkelvin added enhancement New feature or request data Issues related to data loading, pipelining, etc. code maintenance Issue/PR for refactors, code clean up, etc. labels Aug 6, 2024
@laserkelvin laserkelvin self-assigned this Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code maintenance Issue/PR for refactors, code clean up, etc. data Issues related to data loading, pipelining, etc. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant