Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary output proposal #237

Closed
wants to merge 7 commits into from
Closed

Conversation

donnaaboise
Copy link
Contributor

I've included an option in the 2d/3d valout.f90 files that strips ghost cells from binary output. Seems to be significantly faster than writing with ghost cells.

  • In 2d, the ghost cells are included in binary output, since Python/visclaw depends on this. Easy to strip these though (see end of valout.f90).
  • In 3d, ghost cells are stripped.
  • Matlab code makes same assumptions for 2d/3d code.
  • easy to change options (although requires recompilation).
  • I haven't done anything with aux array output.

@rjleveque
Copy link
Member

@donnaaboise: Since you reverted to writing out the ghost cells, why not just revert to the original versions of valout.f90 in 2d, rather than this new version with an additional write_slice subroutine call?

But more generally we should resolve the issue of how to handle ghost cells following the discussion in #236 and be consistent in 2d and 3d.

If we decide not to write ghost cells in 3d, or allow the user to specify how many ghost cells to include, then there has to be a corresponding change made to $CLAW/pyclaw/src/pyclaw/fileio/binary.py since it currently assumes ghost cells are included (in any number of dimensions).

Finally, I would avoid using the term "slice" when talking about the interior grid cells without ghost cells. This doesn't correspond to what I think of as a slice of the data, and conflicts with the way @cjvogl and I are using the term in another branch that I've started using again for 3d output (and that we should try to clean up and merge in soon). This version has a slices_module that prints out only 2d planar slices of the 3d solution (coordinate-aligned, with a fixed value of x, y, or z on each slice). These are output in the format of a 2d amrclaw solution (from whatever grid patches intersect the slice) so that they can be plotted using the 2d Python plotting routines. For some purposes this is sufficient and gives much less output than the full 3d solution.

@donnaaboise
Copy link
Contributor Author

I could have left the original 2d valout.f90 code - I just included the write_slice routine, in case there was a consensus to not include ghost cells. If ghost cells should be included, then we should revert to the original code.

The main reason I see for not including the ghost cells is that they take more storage and takes longer to output the data. This is particularly noticable in 3d. In 2d, it makes less of a difference, and in any case, it sounds like the 2d Python graphics expects ghost cells, so may not be worth stripping them before outputting data.

I used the term "slice" only because that is the term I use in the Matlab graphics. I have no particular attachment to that term, and can change it.

In any case, the Matlab graphics can read the 2d/3d binary data with or without ghost cells. Currently I have hardwired the selection, but can easily make this a user choice, depending on what amrclaw does.

In any case, I would think it is okay if 2d wrote out ghost cells (since, at the very least Python graphics depends on them), but the 3d doesn't. In 3d, there is a big performance hit, and there is no corresponding plotting routines that depends on ghost cells.

@rjleveque
Copy link
Member

Just a point of clarification about reading binary files and plotting with visclaw:

The reading is done by pyclaw.fileio.binary, which can read in 1d, 2d, or 3d data. It currently always assumes ghost cells are included (though this could be changed pretty easily). Ghost cells are stripped off in creating the pyclaw.solution.Solution object that is returned.

The 2d plotting routines in visclaw use this stripped down version.

So it is not the plotting that depends on the ghost cells being in the binary files, but the fileio.binary routines, which already handle 3d (you can read in the solution and do your own thing with it) even though we don't have Python plotting routines in visclaw for 3d yet.

So I suggest that whatever we do should be consistent between 2d and 3d. (And 1d, which also supports binary in the same way.)

@donnaaboise
Copy link
Contributor Author

donnaaboise commented Jan 4, 2019

Thanks @rjleveque for clarification - I had been assuming that Visclaw didn't read 3d files at all.

I still think it might be worth considering the option of not printing out ghost cells, especially in 3d. IN In the one example I have looked at, the ratio is of timing/storage without verses with ghost cells is about 60%.

Timing (without ghost cells)

============================== Timing Data ==============================

Integration Time (stepgrid + BC + overhead)
Level           Wall Time (seconds)    CPU Time (seconds)   Total Cell Updates
  1                     8.251                  8.241            0.288E+07
  2                    84.831                 84.642            0.300E+08
total                  93.082                 92.884            0.328E+08

All levels:
stepgrid               91.240                 91.045    
BC/ghost cells          1.378                  1.375
Regridding              3.963                  3.957  
Output (valout)         0.091                  0.090  

Total time:            97.494                 97.280  
Using  1 thread(s)

Note: The CPU times are summed over all threads.
      Total time includes more than the subroutines listed above

=========================================================================

Storage : (without ghost cells)

(bash) ~/.../amrclaw/examples/advection_3d_swirl (donna_valout) % du -hsc _output/fort.b*
2.3M	_output/fort.b0000
3.2M	_output/fort.b0001
3.7M	_output/fort.b0002
4.3M	_output/fort.b0003
5.0M	_output/fort.b0004
4.4M	_output/fort.b0005
23M	total

Timing (with ghost cells)

============================== Timing Data ==============================

Integration Time (stepgrid + BC + overhead)
Level           Wall Time (seconds)    CPU Time (seconds)   Total Cell Updates
  1                     8.019                  8.008            0.288E+07
  2                    82.827                 82.699            0.300E+08
total                  90.846                 90.707            0.328E+08

All levels:
stepgrid               89.117                 88.978    
BC/ghost cells          1.306                  1.303
Regridding              3.858                  3.843  
Output (valout)         0.153                  0.150  

Total time:            95.200                 95.048  
Using  1 thread(s)

Note: The CPU times are summed over all threads.
      Total time includes more than the subroutines listed above

=========================================================================

Storage (with ghost cells)

(bash) ~/.../amrclaw/examples/advection_3d_swirl (donna_valout) % du -hsc _output/fort.b*
3.3M	_output/fort.b0000
5.1M	_output/fort.b0001
5.9M	_output/fort.b0002
6.4M	_output/fort.b0003
8.0M	_output/fort.b0004
7.6M	_output/fort.b0005
36M	total

So while the timing for the output isn't that much (0.091s vs. 0.153s, or about 59%), the storage is significant (23M vs. 36M, or about 63%).

Regardless of what AMRClaw decides, I'll go ahead and make an option in Matlab so the user can decide whether or not to read in ghost cells.

@donnaaboise
Copy link
Contributor Author

The timing results for the 2d swirl example show that the timing is about 55% faster when not writing out ghost cells (0.052s vs. 0.096s) and 80% for storage (5.0Mb vs. 6.2Mb).

@rjleveque
Copy link
Member

It seems misleading to say it is faster by such a huge margin when ghost cells aren't included since you're only looking at the valout time, which is a tiny fraction of the total run time. In fact for the 3d example you show, the total time without ghost cells was greater than the total time with ghost cells by a couple seconds, so the uncertainty in the timings is much greater than the total time spent in valout.

But I agree the storage saving is considerable and as long as it doesn't significantly slow down valout to strip out the ghost cells, I think it's great to include this as an option.

But we do need to modify pyclaw.fileio.binary to allow this option before people can use it, if they want to read the resulting binary files into Python.

@donnaaboise
Copy link
Contributor Author

donnaaboise commented Jan 6, 2019

In this example (the 3d advection example), the time in valout is negligible, but the several runs I did were surprisingly consistent in these timings, so I didn't see much uncertainty. What I was originally checking was to see if using the F90 slicing to strip the ghost cells would slow down the code significantly. I was surprised to see that it didn't, so proposed it as a way to avoid printing out ghost cells. I assumed that the only reason for printing out the ghost cells was to output contiguous memory. As @rjleveque pointed out, there are several other reasons why the ghost cells might be useful, though.

As a second data point, I did a run with outstyle=3, nout=200, nstep=1. Here, the valout times are longer, but still show that stripping the ghost cells is faster than printing them out (4.58s. vs 7.405s). But it is also the case that the overall time is slower when the ghost cells are stripped. I can't really explain this - is the printing somehow asynchronous? The other times (regridding, BC) were essentially the same between the two runs.

Another consideration is that I am running this on my i7 laptop, with a fast SSD hard drive. On other machines, I/O might be slower.

The storage savings show about the same savings as in the first set of simulations - 1.016Gb vs. 1.6Gb, or about 63%.

Timing (without ghost)

============================== Timing Data ==============================

Integration Time (stepgrid + BC + overhead)
Level           Wall Time (seconds)    CPU Time (seconds)   Total Cell Updates
  1                    38.518                 38.419            0.128E+08
  2                   528.565                527.249            0.189E+09
total                 567.083                565.668            0.201E+09

All levels:
stepgrid              560.387                558.992    
BC/ghost cells          3.747                  3.724
Regridding             21.589                 21.522  
Output (valout)         4.581                  4.399  

Total time:           595.360                593.680  
Using  1 thread(s)

Note: The CPU times are summed over all threads.
      Total time includes more than the subroutines listed above

=========================================================================

Storage (without ghost)

......
5.4M    _output/fort.b0193
5.4M    _output/fort.b0194
5.4M    _output/fort.b0195
5.4M    _output/fort.b0196
5.4M    _output/fort.b0197
5.4M    _output/fort.b0198
5.4M    _output/fort.b0199
4.4M    _output/fort.b0200
1016M   total

Timing (with ghost cells)

============================== Timing Data ==============================

Integration Time (stepgrid + BC + overhead)
Level           Wall Time (seconds)    CPU Time (seconds)   Total Cell Updates
  1                    37.756                 37.681            0.128E+08
  2                   518.820                517.725            0.189E+09
total                 556.576                555.406            0.201E+09

All levels:
stepgrid              549.949                548.787    
BC/ghost cells          3.697                  3.690
Regridding             21.682                 21.599  
Output (valout)         7.405                  7.297  

Total time:           587.754                586.401  
Using  1 thread(s)

Note: The CPU times are summed over all threads.
      Total time includes more than the subroutines listed above

=========================================================================

Storage (with ghost cells)

.....
8.6M    _output/fort.b0192
8.6M    _output/fort.b0193
8.6M    _output/fort.b0194
8.6M    _output/fort.b0195
8.6M    _output/fort.b0196
8.6M    _output/fort.b0197
8.6M    _output/fort.b0198
8.6M    _output/fort.b0199
7.6M    _output/fort.b0200
1.6G    total

@donnaaboise
Copy link
Contributor Author

donnaaboise commented Jan 6, 2019

And as a final data point, here are the timing results using the ascii format (without ghost cells, I assume) for the 3d advection example, with outstyle=3, nout=200, nstep=1.

The binary output is about 30x faster than the ascii, and there is about a 3:1 compression ratio using the binary (2.8GB vs 1.016GB).

Timing (without ghost)

============================== Timing Data ==============================

Integration Time (stepgrid + BC + overhead)
Level           Wall Time (seconds)    CPU Time (seconds)   Total Cell Updates
  1                    38.053                 37.970            0.128E+08
  2                   531.170                530.122            0.189E+09
total                 569.223                568.091            0.201E+09

All levels:
stepgrid              562.409                561.294    
BC/ghost cells          3.830                  3.809
Regridding             21.721                 21.672  
Output (valout)       122.280                120.765  

Total time:           715.239                712.529  
Using  1 thread(s)

Note: The CPU times are summed over all threads.
      Total time includes more than the subroutines listed above

=========================================================================

Storage (without ghost cells)

15M	_output/fort.q0192
15M	_output/fort.q0193
15M	_output/fort.q0194
15M	_output/fort.q0195
15M	_output/fort.q0196
15M	_output/fort.q0197
15M	_output/fort.q0198
15M	_output/fort.q0199
15M	_output/fort.q0200
2.8G	total

@donnaaboise
Copy link
Contributor Author

Leave valout.f as is for now and Matlab code will strip ghost cells from binary output.

@rjleveque
Copy link
Member

Let's discuss further at SIAM CSE and figure out how best to handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants