You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Meeting Summary: Short- to Medium Roadmap towards official GNU Radio 4.0 (GR4) Beta release @ FAIR
participants: @wirew0rm, @drslebedev, @RalphSteinhagen
overarching goal: getting GR4 in shape by the end of May (Beta0) and EU GR Day Workshop (BetaN | official release)
Important GR4/services do not need to authenticate (i.e. handle personal data <-> EU GDPR) but just validate RABC token (i.e. signed role name, token expiration time).
Integration of UI testing framework integration using ImGUI's test engine focus on the unit- (individual widget components) to system integration (main UI views and GitHub bot diff screenshot integration)
Follow-up and continue development of backlock items that had to be de-prioritised during CALL#4.
Time-Series DB integration
N.B. This will be a separate EU-wide tender process outside of the framework agreement due to different government funding sources.
CIT kindly provided a VM test setup on a (very) high-performance machine
should (a formality) verify as a proof-of-concept solution that it is possible to:
a) sustainably write > 100 MB/s of streaming data into the DB (e.g. 8 digitizer channels @20 MS/s, one sample: 16-byte integer)
b) achieve >1-4 GB/s read-performance (multiple users, faster than real-time processing, ...)
short-term: polling of LSA DB may need service-local caching -> singleton (for further details see below)
long-term: LSA drives settings to services/blocks (for further details, see below)
Towards official GNU Radio 4.0 'Beta 0' Version [to be finished by mid-May]
Smaller half- to one-day items to be followed up:
rename 'graph-prototype' to 'gnuradio4' & create/update landing README.md (logo, intro, text, ....) => ACTION: Alex (GH actions) & Bailey + keep Josh & Jeff in the loop
fair-acc repo will be mirrored/forked by GR's GitHub organisation and kept synchronised (but with disabled issues)
initially GR4 issues will be tracked by fair-acc repo until contributions and GR4 usage picked up also to avoid GR3 vs GR4 confusions
some general code clean-up/hygiene tasks:
move from gcc13->gcc14 & clang17->clang18 (std::print(..), std::format(..), modules, ... support), mostly CI/CD => ACTION: ALEX
fix of gcc14 related warnings and errors => ACTION: Ralph
revise and minimise/eliminate ToDos in the code base, either:
Fix simple ToDos on the spot
fix deprecated work(..)-implementing blocks
remove and move the content of larger ToDos to proper issues
[ ] eliminate {fmt} and move to systematically using std::print(..), std::format(..) (may need additional unit-tests and helpers, especially for pmt/range formatting) N.B. need to wait for P2216R3 being available for gcc/clang, since some format-strings need to be constexpr evaluated.
put binary code-sizes on an optional diet
generated binaries mostly contain block documentation and meta-information -> OK and needed for desktop and, notably UI users
add optional EMBEDDED compile flag/target that minimises the binary size (& eliminates documentation)
goal: make simple flow-graphs fit in < 1 MB flash (e.g. RP Pico (RP2040, 2 MB) or Arduino Nano (RP2024, 16 MB)
excellent real-world demonstrator for industry, embedded users, and educators that GR4 is efficient, fast and could even be used on 5$ micro-controller
improved clang-format definition, needs draft for further discussions and evaluation => **ACTION:**Semen
focus on keeping intentional line breaks/long-lines & being vertically compact otherwise
N.B. not intended to early-adopt this because it is new, but it eliminates the refl-cpp MACRO annotation that is hard to teach and a common source of programming errors for new users
to check in particular: handling of templated classes and inheritance
Medium-Term Plans and Open Modelling Questions [to be finalised until GR Workshop]
Ensure that the following conceptual dimensions are handled adequately by the existing GR4 design:
implement some of the logic/visual scripting blocks from Reactive and Unity to check/validate GR4 architecture/assumptions (see also issue#161 -- not all, but just critical corner cases (N.B. focus on unit-testing, documentation, etc.), e.g. (tbd. which one could be dropped because of triviality):
Control Nodes notably: 'if', 'while', 'for', 'break' and feedback/closed-loops in graphs
Runtime Expression Evaluation
Rationale: while many expressions can be composed using graphs of basic blocks, some topologies can rather large, cane be clumsy (example: f(x, y, mu) = 1/(sqrt(2*pi)*|y|)*exp(-0.5*pow((x-mu)/|y|, 2))), and are not always known at compile-time. For more complex problems, we'd opt for some proper scripting language such as Python, Cling(C++), SYCL, or other JIT-compiled solution. However, these are not necessarily fast and carry some runtime and build dependencies that are not necessarily compatible with all applications (e.g. embedded platforms, security/safety aspects, ...). Still, we need to allow the user to express basic math and notably filter expressions often only known during runtime (for example, chunking block). We could write our own expression evaluation engine, but this quickly becomes unwieldy if going beyond basic math and bracket operations and may affect medium- to long-term maintenance.
Evaluate whether we/GR4 onboards exprTk as an external dependency, to which level (i.e., 'basic math + brackets', 'functions', 'if-else', ...), and how to integrate this => ACTION: Alex, Semen, Ralph, and others interested (John?).
model filter syntax for event-matching use-case that needs to allow to specify
GR4 is fine w.r.t. runtime performance (see benchmarks, SIMD, planned SYCL integration) but recent CI/CD experience showed that the whole project requires ~1h on a single core to compile. Recent PR optimised this quite a bit but targeting thousands of blocks this needs to be improved. Which path should we pursue:
C++ modules? N.B. support in recent gcc14 and clang18 improved a lot
Optimise templating structure (repeated instantiation overhead, code-optimisation on the back-end, ...)
pmt-optimisations (one of the major compile-time hogs due to the std::visit patterns being used)
use of 'ccache' - doesn't improve overall compilation performance but caching improves when changing isolated block implementations or changing code back-and-forth
Target: < 20 seconds per block to be compiled
Modelling of Timing
invest in integrating WR, GPS, Net, and simulated timing sources
goal: same external block interface for WhiteRabbit, WR, GPS, Net, Simulated, ... (i.e. similar to existing block Clock interface)
The WR src block needs to be modelled/narrowed/simplified regarding IO capabilities and triggering models (WR IO can be used as inputs and outputs, optional: reference clock generation, etc.).
LSA & UI integration, i.e. displaying a BPC cycle with its chain-sequence-process sub-structure + LSA-defined timing events, and selecting the required start-stop event combinations
Standardise event structure and filter definitions, i.e., all events must be defined, and a validation function should enforce this for debugging/testing purposes => ACTION:??
trigger_meta_info - pmt-map carring (for GR4 optional) meta info, default for FAIR:
WR_RAW_PAYLOAD - 256 bits WR raw timing byte data
context - std::string e.g. "FAIR.SELECTOR.C=:S=:P=:T="
LSA_context - std::string corresponding LSA context for that event (needs to be injected a posteriori, not part of the WR pay-load)
C - int8_t chain ID
S - int8_t sequence ID
P - int8_t beam process ID
T - int8_t timing-group ID (often optional)
BPCTS - uint64_t [UTC ns] beam-production-chain-time-stamp (encodes unique beam ID when it was created)
Modelling of LSA (FAIR's setting supply Mgmt. System) Integration
Long-term: LSA will push settings and configurations to the service and GR4 flow-graph blocks. However, this will likely not be in place before 2025. As an intermediate solution, we may thus need to poll (periodic and/or via SSE) LSA and mimic that behaviour.
read settings interface for scalars and functions -> needs good abstraction
basic trim-interface PoC -> needs good abstraction and RBAC (!!!)
open question on multiplexing modelling (both settings and state):
within the block
PRO: Existing Transactions.hpp setting implementation -> should we make this the default??
CON: Does not handle block state (e.g. history), which should probably be treated differently to settings
within and creating new (Sub)-Graph, i.e. each block in the sub-graph contains settings for a given context
PRO: would handle block state more easily/intuitively and potentially allow for different per-multiplexing-context logic
CON: needs UI/service integration for creating new sub-graphs
Do both?
polling may need some local caching to detect whether settings changed and to emit only new scalar or DataSet values on change
the blocks would define the parameters (e.g. 'SIS18BEAM/Energy') to be monitored plus optional FAIR selector context and emit them as unpacked data streams (ints, floats, DataSet)
re-packing should be done with separate blocks to generate <key, value> pmt pairs that could be used to drive the settings of other blocks.
Modelling of SYCL Integration
We eventually need to integrate GPU and FPGA support into GR4 since some algorithms cannot be efficiently handled on the CPU alone for throughput (->GPU) or latency (-> FPGA) reasons. There are already existing attempts to use CUDA or vendor-specific FPGA integration ... all have in common that they require quite a bit of boiler-plate, specific non-Python/C++ programming expertise, and are often quite volatile (on time scales of 3-5 years) and vendor-specific. We cannot afford to integrate and reliably maintain such a zoo of solutions long-term with the available manpower and commitment. However: There is SYCL, a vendor-neutral abstraction for integrating heterogeneous accelerator platforms such as CPUs (SIMD, OpenMP, ...), GPUs, FPGAs, and TPUs using high-level C++ standard-driven abstractions.
SYCL substantially simplifies portability, development, and the learning curve for all. Can we onboard more support/developers/help with this?
We should evaluate whether we/the GR radio community are willing to invest in this stack as an optional dependency (i.e., disabled on embedded platforms) => ACTION: Alex, Josh, Jeff, Semen, Ralph + GR architecture group.
contacted Vincent Heuveline and Aksel Alpay (EMCL, Uni Heidelberg) for advice and invited them to collaborate w.r.t. SYCL integration into GR4 (both are core-contributor/developers to SYCL/AdaptiveCpp)
for info: SYCL via AdaptiveCpp (formerly hipSYCL) intro material:
While digging into the examples, the three things we should focus on:
learn, play, and have fun
see how this could be smartly integrated and used as an optional dependency into Block<T>, merged sub-graph (N.B. SYCL uses a JIT compiler, or elsewhere to support heterogeneous computing on the CPU (+ distributed machines), GPU, and eventually GPU
see how this could be integrated and unit-tested as an optional GR4 CI/CD dependency (i.e. keeping C++'s 'don't pay for what you don't use' mantra). N.B. (AdaptiveCPP: ~65 MB + compiler, CUDA (full): ~7 GB)
check which dependencies are pulled in and linked with the user code (Alex: watch out for 'boost'-related deps)
The text was updated successfully, but these errors were encountered:
Meeting Summary: Short- to Medium Roadmap towards official GNU Radio 4.0 (GR4) Beta release @ FAIR
participants: @wirew0rm, @drslebedev, @RalphSteinhagen
overarching goal: getting GR4 in shape by the end of May (Beta0) and EU GR Day Workshop (BetaN | official release)
CALL#5 see also Etherpad
Main topics to be covered:
focus on the unit- (individual widget components) to system integration (main UI views and GitHub bot diff screenshot integration)
Time-Series DB integration
N.B. This will be a separate EU-wide tender process outside of the framework agreement due to different government funding sources.
Main topics to be covered:
a) sustainably write > 100 MB/s of streaming data into the DB (e.g. 8 digitizer channels @20 MS/s, one sample: 16-byte integer)
b) achieve >1-4 GB/s read-performance (multiple users, faster than real-time processing, ...)
EU GNU Radio Days '24 Workshop, 27–31 Aug 2024
Some organisational things & ToDos:
Graph::init()
actually establishes the connection)Graph
(aka. 'hier-blocks' in GR3) - creation and introspectionTowards official GNU Radio 4.0 'Beta 0' Version [to be finished by mid-May]
Smaller half- to one-day items to be followed up:
std::print(..)
,std::format(..)
, modules, ... support), mostly CI/CD => ACTION: ALEXToDos
in the code base, either:work(..)
-implementing blocksToDos
to proper issues[ ] eliminate{fmt}
and move to systematically usingstd::print(..)
,std::format(..)
(may need additional unit-tests and helpers, especially for pmt/range formatting)N.B. need to wait for P2216R3 being available for gcc/clang, since some format-strings need to be
constexpr
evaluated.EMBEDDED
compile flag/target that minimises the binary size (& eliminates documentation)Medium-Term Plans and Open Modelling Questions [to be finalised until GR Workshop]
Ensure that the following conceptual dimensions are handled adequately by the existing GR4 design:
Runtime Expression Evaluation
Rationale: while many expressions can be composed using graphs of basic blocks, some topologies can rather large, cane be clumsy (example:
f(x, y, mu) = 1/(sqrt(2*pi)*|y|)*exp(-0.5*pow((x-mu)/|y|, 2)))
, and are not always known at compile-time. For more complex problems, we'd opt for some proper scripting language such as Python, Cling(C++), SYCL, or other JIT-compiled solution. However, these are not necessarily fast and carry some runtime and build dependencies that are not necessarily compatible with all applications (e.g. embedded platforms, security/safety aspects, ...). Still, we need to allow the user to express basic math and notably filter expressions often only known during runtime (for example, chunking block). We could write our own expression evaluation engine, but this quickly becomes unwieldy if going beyond basic math and bracket operations and may affect medium- to long-term maintenance.Compile-Time Performance & Optimisations
GR4 is fine w.r.t. runtime performance (see benchmarks, SIMD, planned SYCL integration) but recent CI/CD experience showed that the whole project requires ~1h on a single core to compile. Recent PR optimised this quite a bit but targeting thousands of blocks this needs to be improved. Which path should we pursue:
std::visit
patterns being used)Modelling of Timing
trigger_name
- std::stringtrigger_time
- uint64_t [ns] (UTC or similar)trigger_offset
- float [s] (offset/delay of triggering edge w.r.t. generating trigger)trigger_meta_info
- pmt-map carring (for GR4 optional) meta info, default for FAIR:WR_RAW_PAYLOAD
- 256 bits WR raw timing byte datacontext
- std::string e.g. "FAIR.SELECTOR.C=:S=:P=:T="LSA_context
- std::string corresponding LSA context for that event (needs to be injected a posteriori, not part of the WR pay-load)C
- int8_t chain IDS
- int8_t sequence IDP
- int8_t beam process IDT
- int8_t timing-group ID (often optional)BPCTS
- uint64_t [UTC ns] beam-production-chain-time-stamp (encodes unique beam ID when it was created)Modelling of LSA (FAIR's setting supply Mgmt. System) Integration
Long-term: LSA will push settings and configurations to the service and GR4 flow-graph blocks. However, this will likely not be in place before 2025. As an intermediate solution, we may thus need to poll (periodic and/or via SSE) LSA and mimic that behaviour.
Graph
, i.e. each block in the sub-graph contains settings for a given contextModelling of SYCL Integration
We eventually need to integrate GPU and FPGA support into GR4 since some algorithms cannot be efficiently handled on the CPU alone for throughput (->GPU) or latency (-> FPGA) reasons. There are already existing attempts to use CUDA or vendor-specific FPGA integration ... all have in common that they require quite a bit of boiler-plate, specific non-Python/C++ programming expertise, and are often quite volatile (on time scales of 3-5 years) and vendor-specific. We cannot afford to integrate and reliably maintain such a zoo of solutions long-term with the available manpower and commitment. However: There is SYCL, a vendor-neutral abstraction for integrating heterogeneous accelerator platforms such as CPUs (SIMD, OpenMP, ...), GPUs, FPGAs, and TPUs using high-level C++ standard-driven abstractions.
Block<T>
, merged sub-graph (N.B. SYCL uses a JIT compiler, or elsewhere to support heterogeneous computing on the CPU (+ distributed machines), GPU, and eventually GPUN.B. (AdaptiveCPP: ~65 MB + compiler, CUDA (full): ~7 GB)
The text was updated successfully, but these errors were encountered: