Filesystem / DB contention with multiple readers #1005

tskisner · 2024-10-17T17:18:02Z

This issue is just for keeping track of an investigation into observed "slowdowns" when multiple processes call get_meta() / get_obs() on different wafers (i.e. different framefiles), both within a single book and from separate books. This is from within the LoadContext operator, so each process creates a context, does the operation (either get_meta or get_obs) and then closes the context.

Mostly this is just anecdotal so far. For example running a single process that loads 7 wafers in sequence from one observation takes about 60 seconds per wafer (perlmutter compute node, reading data from CFS) to call get_meta + get_obs. Running with 8 processes, each reading 7 wafers in sequence from different observations, seems to take considerably longer.

A more systematic test is needed. The changes in #845 should also be tested to see if they help.

The text was updated successfully, but these errors were encountered:

tskisner · 2024-10-24T13:54:56Z

Just adding some more numbers to this, loading one observation (7 wafers) on one node takes about 450s. Loading 2 observations on 2 nodes takes about the same. Loading 8 observations on 8 nodes takes about 800s. I have copied a small set of data (100 observations plus metadata) to scratch to see if it helps to run from there.

tskisner · 2024-10-25T17:33:17Z

Adding some more details to this. I copied the data in question to scratch and compared several cases using 8 nodes each running one of 8 observations and 64 nodes each running one of 64 observations.

8 observations of 7 wafers on 8 nodes
=================================================

One process / one thread reading each wafer
-------------------------------------------------

Data on CFS, metadata on CFS:  418s
	get_meta = 3-6s
	get_obs = 40-60s

Data on CFS, metadata on scratch:  414s
	get_meta = 3-6s
	get_obs = 40-60s

Data on scratch, metadata on scratch: 95s
	get_meta ~= 1-10s
	get_obs = 8-10s

Data on scratch, metadata on scratch, "better sqlite (PR #845)": 95s
        (no change, most of the benefits in this branch 
         would only be seen with multiple writers)
	get_meta ~= 1-10s
	get_obs = 8-10s

One process / four threads reading each wafer
-------------------------------------------------

Data on scratch, metadata on scratch: 75s
	get_meta ~= 1-10s
	get_obs = 4-6s

64 observations of 7 wafers on 64 nodes
=================================================

One process / one thread reading each wafer.
-------------------------------------------------

Data on scratch, metadata on scratch: 155s
	get_meta ~= 1-10s
	get_obs = 6-20s

One process / four threads reading each wafer.
-------------------------------------------------

Data on scratch, metadata on scratch: 126s
	get_meta ~= 1-10s
	get_obs = 3-10s

The DB access for reading seems to not hit any filesystem contention up to 64x7 = 448 readers, regardless of whether metadata is on CFS or scratch (slightly faster on CFS).

The data access seems to also scale well and is much faster on scratch. Since this total time is after a barrier, the increase in time going from 8 nodes to 64 nodes may just be due to including some longer observations. A better profiling exercise should break out the time for I/O and FLAC decompression separately, and present those as "samples per second" or similar to take account of the differing lengths of observations.

I will leave this issue open until we have tested data reading at higher concurrency, but for now the solution seems to be caching data to scratch and leaving metadata on CFS.

tskisner self-assigned this Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filesystem / DB contention with multiple readers #1005

Filesystem / DB contention with multiple readers #1005

tskisner commented Oct 17, 2024

tskisner commented Oct 24, 2024

tskisner commented Oct 25, 2024

Filesystem / DB contention with multiple readers #1005

Filesystem / DB contention with multiple readers #1005

Comments

tskisner commented Oct 17, 2024

tskisner commented Oct 24, 2024

tskisner commented Oct 25, 2024