Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_blt_slices is slow #858

Open
steven-murray opened this issue Jan 12, 2023 · 0 comments
Open

get_blt_slices is slow #858

steven-murray opened this issue Jan 12, 2023 · 0 comments

Comments

@steven-murray
Copy link
Contributor

While profiling the LST-binner over 40 days of H6C data, I found that out of the ~20k total seconds taken, almost 5k (1.5 hours!) were spent in the get_blt_slices function. This seems somewhat unnecessary. For reference, here's the output from line-profiler for the function:

Total time: 4757.35 s
File: /lustre/aoc/projects/hera/heramgr/anaconda3/envs/h6c/lib/python3.10/site-packages/hera_cal/io.py
Function: get_blt_slices at line 422

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   422                                           def get_blt_slices(uvo, tried_to_reorder=False):
   423                                               '''For a pyuvdata-style UV object, get the mapping from antenna pair to blt slice.
   424                                               If the UV object does not have regular spacing of baselines in its baseline-times,
   425                                               this function will try to reorder it using UVData.reorder_blts() to see if that helps.
   426                                           
   427                                               Arguments:
   428                                                   uvo: a "UV-Object" like UVData or baseline-type UVFlag. Blts may get re-ordered internally.
   429                                                   tried_to_reorder: used internally to prevent infinite recursion
   430                                           
   431                                               Returns:
   432                                                   blt_slices: dictionary mapping anntenna pair tuples to baseline-time slice objects
   433                                               '''
   434      4799      10367.0      2.2      0.0      blt_slices = {}
   435  42100895  230656109.0      5.5      4.8      for ant1, ant2 in uvo.get_antpairs():
   436  42096096 3186135020.0     75.7     67.0          indices = uvo.antpair2ind(ant1, ant2)
   437  42096096   77943582.0      1.9      1.6          if len(indices) == 1:  # only one blt matches
   438    617976    3585403.0      5.8      0.1              blt_slices[(ant1, ant2)] = slice(indices[0], indices[0] + 1, uvo.Nblts)
   439  41478120  986563753.0     23.8     20.7          elif not (len(set(np.ediff1d(indices))) == 1):  # checks if the consecutive differences are all the same
   440                                                       if not tried_to_reorder:
   441                                                           uvo.reorder_blts(order='time')
   442                                                           return get_blt_slices(uvo, tried_to_reorder=True)
   443                                                       else:
   444                                                           raise NotImplementedError('UVData objects with non-regular spacing of '
   445                                                                                     'baselines in its baseline-times are not supported.')
   446                                                   else:
   447  41478120  271685967.0      6.6      5.7              blt_slices[(ant1, ant2)] = slice(indices[0], indices[-1] + 1, indices[1] - indices[0])
   448      4799     768416.0    160.1      0.0      return blt_slices

A lot of the time is taken up with finding the indices for each antpair. I get that this is sometimes necessary, because in general a UVData can have blt's in any order. But in fact for HERA data it is unnecessary because blt's always go time-first, antenna-second. If we can find a way to quickly determine (or maybe allow "assuming") that we can use this info, it would be a significant speed up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant