ENH: Reduce number of processes spawned by `ProcessPoolExecutor` in `blocked_io.py` #426

amitpendharkar · 2024-08-21T17:31:32Z

Issue:
ProcessPoolExecutor spawns the processes based on max_workers argument. Total number of logical cpus are currently used as default value for max_workers

mdio-python/src/mdio/segy/blocked_io.py

Line 124 in 1475793

executor = ProcessPoolExecutor(max_workers=num_workers, mp_context=context)

This spawns too many processes. Based of the size of the segy file, resource contention and context switching bogs the machine down and eventually the run get's terminated.

Suggested solution:
Make default value of max_workers as either number of cores or limit it to 80% of available logical cpus.(discuss)

The text was updated successfully, but these errors were encountered:

tasansal · 2024-08-21T19:06:38Z

Hi, thanks for looking into it. The max cpus is a good approximate for normal sized machines. If we are using a large machine with 48+ cores I can see where this could start trashing the machine.

MDIO takes the following environment variable to control number of CPUs used in ingestion: MDIO__IMPORT__CPU_COUNT.
I propose, instead of changing the default, you should set this value based on your machine.

amitpendharkar · 2024-08-21T19:18:00Z

We noticed this issue only after we saw some random ingestion failures. Out of two similar files, one fails and other other succeeds. The failed one would sometimes succeed if re-submitted multiple times. This causes loss of time in production.

I believe setting the value to 80% of available logical cpus would take care of the issue. It would also leave some room to other processes (like opentelemetry) to run.

amitpendharkar · 2024-08-21T20:52:11Z

I believe, setting MDIO_IMPORT_CPUT_COUNT when pod get's launched makes things really unclear. One would not know why it's only using few resources. Coming up with right value for MDIO_IMPORT_CPUT_COUNT is also difficult.

Also, setting MDIO_IMPORT_CPUT_COUNT should not be the norm. It should be a one off thing.

We can make
default_cpus = cpu_count(logical=False) below and other such places.

mdio-python/src/mdio/segy/blocked_io.py

Line 40 in 1475793

default_cpus = cpu_count(logical=True)

FYI: We routinely use 48+ core machines in production

I will update here with some benchmarking results to validate the change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Reduce number of processes spawned by `ProcessPoolExecutor` in `blocked_io.py` #426

ENH: Reduce number of processes spawned by `ProcessPoolExecutor` in `blocked_io.py` #426

amitpendharkar commented Aug 21, 2024

tasansal commented Aug 21, 2024

amitpendharkar commented Aug 21, 2024

amitpendharkar commented Aug 21, 2024

ENH: Reduce number of processes spawned by ProcessPoolExecutor in blocked_io.py #426

ENH: Reduce number of processes spawned by ProcessPoolExecutor in blocked_io.py #426

Comments

amitpendharkar commented Aug 21, 2024

tasansal commented Aug 21, 2024

amitpendharkar commented Aug 21, 2024

amitpendharkar commented Aug 21, 2024

ENH: Reduce number of processes spawned by `ProcessPoolExecutor` in `blocked_io.py` #426

ENH: Reduce number of processes spawned by `ProcessPoolExecutor` in `blocked_io.py` #426