Load OQMD into PyG #186

kshitij-v-mehta · 2024-04-18T18:09:10Z

kshitij-v-mehta
Apr 18, 2024

Hello,
I have downloaded the pre-packages lmdb-based datasets. I have a couple of questions:

Does the lmdb dataset for OQMD contain all molecules, or is it a subset of the OQMD dataset?
How do I read the lmdb dataset for OQMD into PyG?

Thanks for your help.
Kshitij Mehta
ORNL

Apr 19, 2024

It seems like you may have an old version of the dataset. Please redownload from the latest zenodo release(v2 from 3/4/2024) and try the same code snippet again.

View full answer

melo-gonzo · 2024-04-18T18:35:30Z

melo-gonzo
Apr 18, 2024
Maintainer

Hello Kshitij,

Thanks for reaching out with your questions!

Our lmdb dataset for OQMD has a total of 1,022,595 samples which was the amount available at the time of creating our dataset.

To load the lmdb dataset with the PyG backend, you can use something like this:

from matsciml.datasets import OQMDDataset
from matsciml.datasets.transforms import PointCloudToGraphTransform

oqmd_data = OQMDDataset("/path/to/oqmd/data/",transforms=[PointCloudToGraphTransform(backend="pyg")])
sample = oqmd_data.__getitem__(0)
sample_graph = sample['graph']

If you'd like to get started quickly without downloading the full dataset, you may load directly from the provided "devset" in our repo, which will load in 200 samples of data:

oqmd_data = OQMDDataset.from_devset(transforms=[PointCloudToGraphTransform(backend="pyg")])

Have a look at our oqmd example for how to get started with training a model using this data.

Let us know if there are any more questions.

2 replies

kshitij-v-mehta Apr 19, 2024
Author

Thanks. I have installed matsciml on a local Linux machine. I installed it by editing the conda.yml to remove intel-specific libraries. My conda.yml is

channels:
  - conda-forge
  - defaults
  - dglteam
dependencies:
  - pytorch=2.1.0
  - scipy
  - numpy
  - numba
  - mpi
  - dglteam::dgl=2.0.0
  - setuptools
  - pybind11
  - pip
  - pip:
    - "-e './[all]'"

I ran the test code you provided above:

from matsciml.datasets import OQMDDataset
from matsciml.datasets.transforms import PointCloudToGraphTransform

oqmd_lmdb = "/home/kmehta/vshare/applications/aisd/oqmd/oqmd/all/"
oqmd_data = OQMDDataset(oqmd_lmdb,transforms=[PointCloudToGraphTransform(backend="pyg")])
sample = oqmd_data.__getitem__(0)
sample_graph = sample['graph']

The path /home/kmehta/vshare/applications/aisd/oqmd/oqmd/all/ contains two files: data.lmdb and data.lmdb-lock. data.lmdb is 1.4 GB in size.

but I get the following error:

$ python3 reader.py 
Traceback (most recent call last):
  File "/media/psf/vshare/applications/aisd/oqmd/oqmd/reader.py", line 8, in <module>
    sample = oqmd_data.__getitem__(0)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kmehta/vshare/applications/aisd/oqmd/matsciml/matsciml/datasets/base.py", line 174, in __getitem__
    data = self.data_from_key(*keys)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kmehta/vshare/applications/aisd/oqmd/matsciml/matsciml/datasets/oqmd/dataset.py", line 79, in data_from_key
    cell = data["cell"]
           ~~~~^^^^^^^^
KeyError: 'cell'

Could you let me know if I am doing something wrong?
Thanks for your help.

melo-gonzo Apr 19, 2024
Maintainer

It seems like you may have an old version of the dataset. Please redownload from the latest zenodo release(v2 from 3/4/2024) and try the same code snippet again.

Answer selected by melo-gonzo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load OQMD into PyG #186

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Load OQMD into PyG #186

kshitij-v-mehta Apr 18, 2024

Replies: 1 comment · 2 replies

melo-gonzo Apr 18, 2024 Maintainer

kshitij-v-mehta Apr 19, 2024 Author

melo-gonzo Apr 19, 2024 Maintainer

kshitij-v-mehta
Apr 18, 2024

Replies: 1 comment 2 replies

melo-gonzo
Apr 18, 2024
Maintainer

kshitij-v-mehta Apr 19, 2024
Author

melo-gonzo Apr 19, 2024
Maintainer