Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC 2023]Student Blog for FASER Real Time Compression Project - SumalyoDatta #1458

Merged
merged 8 commits into from
Feb 5, 2024

Conversation

Sumalyo
Copy link
Contributor

@Sumalyo Sumalyo commented Oct 3, 2023

My student blog for GSoC 2023
Project - Real-time lossless data compression for the FASER experiment
Mentors: Claire Antel and Brian Petersen

@cantel, Could you please review?

FASER has a trigger-based data acquisition model, and the DAQ software is developed using the DAQling framework.
>> DAQling is an open-source lightweight C++ software framework that can be used as the core of data acquisition systems of small and medium-sized experiments. It provides a modular system in which custom applications can be plugged in. It also provides the communication layer based on the widespread ZeroMQ messaging library, error logging using ERS, configuration management based on the JSON format, control of distributed applications and extendable operational monitoring with web-based visualization.

You can read more about the strategy in the [paper published by the FASER collaboration](https://arxiv.org/abs/2110.15186).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace "strategy" with "FASER TDAQ design".

The [FASER experiment](https://faser.web.cern.ch/index.php/) is an LHC experiment located in a tunnel parallel to the LHC ring. It is considerably smaller and low-budget compared to the titan LHC experiments such as ATLAS and CMS. The experiment seeks to detect new long-lived particles that have travelled half a kilometre from the LHC proton-proton collision site at the centre of the ATLAS experiment, escaping the ATLAS detector undetected. During proton collisions at the LHC, the FASER experiment records up to 1500 events per second using an open-source data acquisition (DAQ) software framework developed at CERN. The DAQ software receives dedicated data fragments from subcomponents on the detector, packs them into a single event and writes completed events to a file. The assigned total storage space for the experiment was, however, quickly met and already resulted in a doubling of the requested storage space. The challenge was to develop a compression engine that would compress data in real-time, __introducing minimal latency and performance overheads__. The compression engine would also have to decompress events transparently for data reconstruction for physics analysis. <p>

FASER has a trigger-based data acquisition model, and the DAQ software is developed using the DAQling framework.
>> DAQling is an open-source lightweight C++ software framework that can be used as the core of data acquisition systems of small and medium-sized experiments. It provides a modular system in which custom applications can be plugged in. It also provides the communication layer based on the widespread ZeroMQ messaging library, error logging using ERS, configuration management based on the JSON format, control of distributed applications and extendable operational monitoring with web-based visualization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is quoted from somewhere, maybe add a link to the document as reference?

The approach for finding the best compressor is described below
- Events were divided into a set of 10 classes based on event size (Class 0 having events of smallest size)
- For each event class, the average compression speed and compression ratio were calculated (for each compressor)
- The resulting points were plotted on a graph, and this helped to visualize the tradeoff. The compressor configuration offering the highest average compression ratio at the highest compression speed was considered optimal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit nitpicky since I know what you mean: Since there is a trade-off between compression ratio and speed, we will not get an algorithm that will achieve both highest compression and highest speed. I would rephrase by referring to your "acceptance region" in fig 2 and something like "The compressor configuration configuration offering the highest compression speed at an acceptable compression ratio" (or vice versa).

@cantel
Copy link
Contributor

cantel commented Oct 5, 2023

My student blog for GSoC 2023 Project - Real-time lossless data compression for the FASER experiment Mentors: Claire Antel and Brian Petersen

@cantel, Could you please review?

very nice, thanks :)

@vvolkl
Copy link
Contributor

vvolkl commented Feb 5, 2024

@cantel Seems like your comments were not followed up on, should we merge anyway or close?

@@ -67,7 +73,7 @@ The logs and the python-based analysis notebooks can be found at [this repo](htt
The approach for finding the best compressor is described below
- Events were divided into a set of 10 classes based on event size (Class 0 having events of smallest size)
- For each event class, the average compression speed and compression ratio were calculated (for each compressor)
- The resulting points were plotted on a graph, and this helped to visualize the tradeoff. The compressor configuration offering the highest average compression ratio at the highest compression speed was considered optimal.
- The resulting points were plotted on a graph, and this helped to visualize the tradeoff. A compression ration of about 2 (or 50% compression ) and a speed more that 40 MB/s was considered as acceptable performance (as denoted in the diagram above).The compressor configuration offering the highest average compression ratio at the highest compression speed (in the acceptable performance region of the graph) was considered optimal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks, @Sumalyo . Since it's such a nice blog entry, I'm adding one last comment on your last commit as it includes some grammar mistakes: "compression ration" ->"compression ratio", "and a speed more that 40 MB/s" -> "and a compression speed of more than 40 MB/s"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Claire,
I have made the changes that you have suggested

@cantel
Copy link
Contributor

cantel commented Feb 5, 2024

@cantel Seems like your comments were not followed up on, should we merge anyway or close?

Hi @vvolkl , thanks for the nudge and thanks @Sumalyo for the quick response. Sorry, I should have follow up much earlier as well.

@Sumalyo
Copy link
Contributor Author

Sumalyo commented Feb 5, 2024

Thanks @cantel for your feedback; thank you @vvolkl for reminding me, I am sorry again for the delay.

@vvolkl
Copy link
Contributor

vvolkl commented Feb 5, 2024

No problem, thanks for the quick reaction and the nice blog post!

@vvolkl vvolkl merged commit 51bc2b5 into HSF:main Feb 5, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants