Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttle late packet messages #4

Open
wants to merge 477 commits into
base: master
Choose a base branch
from

Conversation

david-macmahon
Copy link
Owner

This PR is intended to prevent the disk from filling up when a "late packet storm" occurs. See the commit log message for more details.

jack and others added 30 commits June 13, 2019 01:35
... in the block. The disk thread uses the same bcnt to map the
BDA baseline-times to the hdf5 file.
Otherwise it gets too noisy
And don't allow start a recording by simply increasing the NFILES
without a trigger
… valid

Because each packet contains 3 ants and they may not all be real
Not clear how these can be different in a working system at present
2 workers, and one parent
For testing, don't bother to throttle the output packets.
NB: hardcoded N_ANTS still exists in the template hdf5 header generation script
... to us the 16 sec of integration time for sending data, instead
of the 2sec it takes to fill the buffer, separating the BDA averaging
and data output to the catcher machine.
... catcher writes 16 sec integration files. Changed packet reception
and data writing to accomodate BDA parameters.
... scripts to check output of X-engs and generate fake
input to the catcher.
... thread to terminate a hashpipeline for testing. Takes the
last databuf available and sets it free for the pipeline to
continue.
dgorthi and others added 29 commits July 15, 2020 17:44
Minor variable declaration that didn't get added to pull request #42
Update hera_catcher_net_thread_bda.c
Changed affinity of NET CPU in px* machines in paper_init.sh
Fix nsamples array in data files
Tweaks made by Aaron Parsons to absorb a higher data rate into the da…
Previously, each late packet was logged because late packets was assumed
to be a rare occurrence.  Unfortunately, due to not-yet-understood
circumstances, the HERA X engines sometimes receive "late" packets from
some of the SNAPs, sometimes many many late packets from some of the
SNAPs.  Logging all of these late packets causes the log files to grow
until eventually the file system is filled up, which further impacts
observing.

These messages are now logged in a still generous, but much more
constrained quantity.  Up to 5,000 of these "late packet" messages will
be logged within an hour from the first such log message.  After that
hour has elapsed, the next such log message will start a new hour long
window.  This will result in a max of 120,000 such log messages in a 24
hour period (per X engine), which is far fewer than the 3+ million such
log messages per X engine that led to a full file system.

The threshold of 5,000 was chosen because there seemed to be some
periodicity in the neighborhood of 1024 occurrences.  The limit of 5,000
will allow for the capture of several such cycles per "burst".
The reduced threshold of 120 messages per burst should still be plenty
for diagnostic purposes.
@AaronParsons AaronParsons deleted the throttle-late-packet-messages branch October 26, 2022 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants