Batching Graphs in PygPCQM4MDataset #149

edwardelson · 2021-04-09T21:50:16Z

edwardelson
Apr 9, 2021

Hi thanks for preparing the processing code!

Was just thinking if batching the smiles graph into separate torch files would be a feasible solution to reduce memory requirement? I notice in the process() function of class PygPCQM4MDataset(InMemoryDataset), the list of graphs obtained from the smiles strings are all combined into a single dataset, and subsequently torch.save'd into one file (only to be split again later on to different dataloaders? during training and testing)

Since all of the graphs are independent of each other, would it be possible to perhaps save these into a couple of torch files, each made of batches of several graphs data to reduce RAM requirement?

Thanks!

weihua916 · 2021-04-09T21:59:55Z

weihua916
Apr 9, 2021
Maintainer

Hi! Yes, we thought about that option, but we think 8GB should be manageable for most of the RAM on the server.

2 replies

weihua916 Apr 9, 2021
Maintainer

If RAM is an issue, you can use the smiles dataset and transform smiles strings into graph objects on the fly (using transform function in pytorch dataloader). It should give you descent speed if you set num_workers to be large enough.

from ogb.lsc import PCQM4MDataset
dataset = PCQM4MDataset(root = ROOT, only_smiles = True)

edwardelson Apr 9, 2021
Author

brilliant thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching Graphs in PygPCQM4MDataset #149

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Batching Graphs in PygPCQM4MDataset #149

edwardelson Apr 9, 2021

Replies: 1 comment · 2 replies

weihua916 Apr 9, 2021 Maintainer

weihua916 Apr 9, 2021 Maintainer

edwardelson Apr 9, 2021 Author

edwardelson
Apr 9, 2021

Replies: 1 comment 2 replies

weihua916
Apr 9, 2021
Maintainer

weihua916 Apr 9, 2021
Maintainer

edwardelson Apr 9, 2021
Author