Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLoader num_workers > 0 causes CPU memory from parent process to be replicated in all worker processes #165

Open
sethiay opened this issue Mar 11, 2024 · 0 comments

Comments

@sethiay
Copy link

sethiay commented Mar 11, 2024

Hey Team,

We are using DLIO to simulate Unet3d e.g.

mpirun -np 8 python3 dlio_benchmark/main.py workload=unet3d ++workload.workflow.generate_data=False ++workload.workflow.train=True ++workload.dataset.num_files_train=500000 ++workload.workflow.checkpoint=False ++workload.dataset.data_folder=/mnt/disks/100KB-50GB ++workload.dataset.record_length=100000 ++workload.reader.batch_size=1500 ++workload.reader.read_threads=12 ++workload.reader.file_shuffle=seed ++workload.reader.sample_shuffle=seed ++workload.train.epochs=5

When the above DLIO command is running, we monitored the memory profile of our machine and found that:

  1. Total Memory utilization of user space processes is around ~40GiB.
  2. Total Memory reported by OS is up to 370-380GiB, out of which upto 330-340GiB is taken by shared memory (/dev/shm). We confirmed the numbers using cat /proc/meminfo and df -h commands.
  3. Dropping page cache by running echo 3 > sudo /proc/sys/vm/drop_caches also doesn't clear this shared memory.

Given above, it seems like the issue is pytorch/pytorch#13246 (comment) i.e. it may be required to use multiprocessing.Arrays to avoid duplication of data in shared memory by differerent processes.

Request you to look into this and do the needful.

Thanks,
Ayush

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant