Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite reflection and merging #426

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mattkretz
Copy link
Collaborator

@mattkretz mattkretz commented Sep 27, 2024

Improvements in this PR

  • Minimize the number of symbols resulting from reflection. If a symbol is emitted now, it's because a string needs to be used at runtime. This reduces binary size.
  • Split Port into
    • Port: the type used as Block data member
    • PortDescriptor: the type used when reflecting over port members in Blocks. This type is never meant to be used for objects. It's a plain compile-time facility.
  • Rewrite MergedGraph ports: Before MergedGraph reflected over the existing ports in the Left and Right blocks, removed the internally connected ports and constructed the remaining ports as data members (via std::tuple inheritance). Additionally it created data members of the Left and Right blocks, which themselves store Port data members. The latter were unused and potentially a source for error since multiple Port objects seemed to describe the same port. Now, PortDescriptor has a simple static method getPortObject(Block&) which allows to go from a given Block object and corresponding port reflection to the Port data member. (If that Port data member is a std::tuple or std::array then this will lead to the individual Port object, not the tuple/array. Thus tuple/array Ports appear as if they were directly declared as data members — except that their name is disambiguated.) This getPortObject method is the customization point for MergedGraph to abstract how the port object is found. MergedGraph replaces the standard PortDescriptor with a slightly modified type that knows to look into Left or Right for the port object. This recurses through an arbitrarily merged set of blocks into the one-and-only Port object. Thus no data member bloat is needed for MergedGraph anymore, and any potential for confusing Port objects is gone.
  • The reflection macro now goes directly into the class definition (improved locality of related names in the source file).

own PR? (last commit): Refactor gr::registerBlock

(Please excuse my opinionated boy-scouting, but this was easier to propose in code than in words 😜)

  1. IMHO a simple registerBlock overload for a list of complete block types was missing. If the given type is a specialized template, then the type_name is transformed into a template argument string and passed to addBlockType.
  2. Generalize the registerBlock function for a block template with type template arguments to an arbitrary number of parameters/arguments.
  3. Generalize the registerBlockTT function to an arbitrary number of type lists (and corresponding number of parameters for the template template parameter). The relatively expensive use of std::apply is replaced by an outer_product over all given typelists.

Open questions, TODOs, FIXMEs

  • The MergedGraph specialization for port reflection could be generalized into a customization point. But we probably want to think of a better name than AllPorts for triggering that customization.
  • PortDescription traits: They are currently called is_…. Better name describes_…?
  • Port names in general are simply the data member name in the Block class definition. In the case of std::tuple/std::array of Ports, the index is directly appended to the string. I.e. std::array<PortIn<T>, 3> in; becomes "in0", "in1", "in2". Should name and number be separated by a # or (space)? Also should connect<"in", 0UZ> allow connection to "in0"?
  • Where should the PortReflectable concept go. Any idea for a better name? Should this be merged with the BlockLike concept? The same_as<T, T::derived_t> bit has led me to find two of the bug reports I triggered over the weekend. IOW it seems quite valueable.
  • When and how should port names of MergedGraph be adjusted? Should port names be concatenated? e.g. "out0-scaled" and "out1-scaled" instead of two ports with the name "scaled"? The rule would be:
    • If an output port belongs to a Right block, prepend the name of the Left output that is internally connected.
    • If an input port belongs to a Left block, append the name of the Right input that is internally connected.

Questions raised while reading, unrelated to this PR

  • TODO: Add a comment why a unique ID is necessary for merged blocks but not for all other blocks. (I.e. unique_id already is a member of the Block base class, this is shadowing that member with a different value. No other block does this.)
  • TODO: Document why RTLD_LOCAL and not RTLD_GLOBAL is used here. (RTLD_LOCAL breaks RTTI/dynamic_cast across plugin boundaries. Note that RTTI can be very helpful in the debugger.)
  • FIXME: Casting a void* to function-pointer is UB in C++. Yes "… 'dlsym' is not C++ and therefore we can do whateever …". But we don't need to. Simply have a single 'extern "C"' symbol in the plugin which is an object storing two function pointers. Then we need a single cast from the 'dlsym' result to an aggregate type and can then extract the two function pointers from it. That's simpler and more likely to be conforming C++.
  • enum class PortType FIXME: can we still rename this to e.g. PortFlavor? "type" has a very specific meaning in C++ already. And since we're doing a lot of reflection on ports there's ambiguity all over the place.
  • template<bool UseIoThread = true> struct BlockingIO TODO: replace bool by an enum or tag type: BlockingIO<false> is very misleading
  • FIXME: make operators of UncertainValue hidden friends or members to reduce compile time (simplifies overload resolution)

Results

Binary size when compiled with -O2 -g0 -march=native -D_GLIBCXX_HARDEN=3 is down:

object file relative size size delta notes
example_ImChart.cpp.o 100.0% 0 kiB
qa_algorithm_fourier.cpp.o 100.0% 0 kiB
qa_FilterTool.cpp.o 100.0% 0 kiB
qa_ImChart.cpp.o 100.0% 0 kiB
bm_example.cpp.o 100.0% 0 kiB
qa_BasicKnownBlocks.cpp.o 78.9% -2211 kiB
qa_Converter.cpp.o 80.3% -2719 kiB
qa_DataSink.cpp.o 85.2% -1246 kiB
qa_Selector.cpp.o 84.6% -980 kiB
qa_sources.cpp.o 80.9% -1704 kiB
qa_StreamToDataSet.cpp.o 76.4% -4879 kiB
qa_FileIo.cpp.o 80.7% -7974 kiB
qa_filter.cpp.o 86.0% -528 kiB
qa_fourier.cpp.o 81.0% -1666 kiB
qa_HttpBlock.cpp.o 95.0% -223 kiB
qa_Math.cpp.o 83.9% -1850 kiB
qa_UI_Integration.cpp.o 81.0% -1783 kiB
bm_Buffer.cpp.o 100.0% 0 kiB
bm_fft.cpp.o 87.2% -504 kiB
bm_HistoryBuffer.cpp.o 100.0% 0 kiB
bm-nosonar_node_api.cpp.o 98.6% -282 kiB
bm-nosonar_node_api.cpp.o 75.2% -5502 kiB
bm_Profiler.cpp.o 100.0% 0 kiB
bm_Scheduler.cpp.o 88.9% -4138 kiB
plugin.cpp.o 100.0% 0 kiB
main.cpp.o 102.1% 62 kiB I added code to main.cpp
qa_AtomicBitset.cpp.o 100.0% 0 kiB
qa_Block.cpp.o 89.9% -958 kiB
qa_buffer.cpp.o 100.0% 0 kiB
qa_DynamicBlock.cpp.o 83.9% -1049 kiB
qa_DynamicPort.cpp.o 94.6% -164 kiB
qa_GraphMessages.cpp.o 80.5% -7194 kiB
qa_grc.cpp.o 99.3% -26 kiB
qa_HierBlock.cpp.o 96.2% -100 kiB
qa_LifeCycle.cpp.o 100.0% 0 kiB
qa_Messages.cpp.o 85.0% -1309 kiB
qa_PerformanceMonitor.cpp.o 83.9% -970 kiB
qa_plugins_test.cpp.o 99.8% -3 kiB
qa_Port.cpp.o 99.7% -2 kiB
qa_reader_writer_lock.cpp.o 100.0% 0 kiB
qa_Scheduler.cpp.o 82.3% -5953 kiB
qa_Settings.cpp.o 86.8% -1246 kiB
qa_Tags.cpp.o 85.0% -879 kiB
qa_thread_affinity.cpp.o 100.0% 0 kiB
qa_thread_pool.cpp.o 100.0% 0 kiB
qa_TriggerMatcher.cpp.o 100.0% 0 kiB
bad_plugin.cpp.o 100.0% 0 kiB
good_base_plugin.cpp.o 89.1% -232 kiB
good_conversion_plugin.cpp.o 94.2% -97 kiB
good_math_plugin.cpp.o 91.3% -184 kiB
qa_formatter.cpp.o 100.0% 0 kiB
qa_traits.cpp.o 100.0% 0 kiB
qa_UncertainValue.cpp.o 100.0% 0 kiB

Compile time and space of qa_Converter

GCC 15 -O2 -g0

without block registration

time space
before 2:47.34 7089M
after 2:27.02 6370M

with block registration

time space
before 13:03.05 30413M
after 11:31.50 26862M

Clang 18 Release

without block registration

time space
before 1:50.68 2502M
after 1:39.51 2254M

with block registration

time space
before 8:10.26 10811M
after 7:52.08 9900M

GCC 14 Release

without block registration

time space
before 2:20.37 3925M
after 5:58.27 3610M

with block registration

time space
before 12:03.28 15248M
after 34:02.59 13758M

GCC 13 Release

without block registration

time space
before 2:09.52 3605M
after 2:00.32 3112M

with block registration

time space
before 11:40.46 14862M
after 9:49.10 12119M

@mattkretz
Copy link
Collaborator Author

@RalphSteinhagen Is there any chance that you'd want to merge this? The compile-time numbers on GitHub Actions seem like a minor improvement, but nothing substantial. The other important measure would be memory usage by the compiler. I could set up an experiment for that here and add those results (but I'd be happy if someone beats me to it).

If you are interested, there's still another pass of code cleanup and then clang-format-back-to-long-lines to do. But I don't want to waste my time if this PR has no chance of getting merged.

@mattkretz mattkretz force-pushed the rewrite-port-reflection-and-merging branch 2 times, most recently from 7ec0b22 to 4bb332d Compare September 30, 2024 19:57
@RalphSteinhagen
Copy link
Member

@mattkretz I ran the difference with Clang's -ftime-trace option to analyze the compile times for the qa_Converter unit tests (with registry and plugins enabled to emphasise the compile-time impact).

TL;DR:

  • Before the PR: total compilation time was 972 seconds (16.2 minutes).
  • After the PR: total compilation time is 948 seconds (15.8 minutes), a reduction of about 3%.

Before (full_analysis):

Analyzing build trace from 'qa_Converter_before.bin'...
**** Time summary:
Compilation (2 times):
  Parsing (frontend):          194.5 s
  Codegen & opts (backend):    777.5 s

**** Files that took longest to parse (compiler frontend):
141459 ms: ./blocks/basic/test/CMakeFiles/qa_Converter.dir//qa_Converter.cpp.o
 53041 ms: ./blocks/basic/test/CMakeFiles/qa_Converter.dir//qa_Converter.cpp.o

**** Files that took longest to codegen (compiler backend):
777491 ms: ./blocks/basic/test/CMakeFiles/qa_Converter.dir//qa_Converter.cpp.o

**** Templates that took longest to instantiate:
  2097 ms: std::apply<(lambda at /home/steinhagen/git/gnuradio4/core/include/gnuradio-4.0/Block.hpp:2109:9), std::tuple<unsigned char, unsigned short, unsigned int, unsigned long, signed char, short, int, long, float, double>> (2 times, avg 1048 ms)
  2096 ms: std::__apply_tuple_impl<(lambda at /home/steinhagen/git/gnuradio4/core/include/gnuradio-4.0/Block.hpp:2109:9), std::tuple<unsigned char, unsigned short, unsigned int, unsigned long, signed char, short, int, long, float, double>, 0UL, 1UL, 2UL, 3UL, 4UL, 5UL, 6UL, 7UL, 8UL, 9UL> (2 times, avg 1048 ms)
  [..]

After (full analysis):

Analyzing build trace from 'qa_Converter_after.bin'...
**** Time summary:
Compilation (2 times):
  Parsing (frontend):          193.2 s
  Codegen & opts (backend):    755.1 s

**** Files that took longest to parse (compiler frontend):
145068 ms: ./blocks/basic/test/CMakeFiles/qa_Converter.dir//qa_Converter.cpp.o
 48090 ms: ./blocks/basic/test/CMakeFiles/qa_Converter.dir//qa_Converter.cpp.o

**** Files that took longest to codegen (compiler backend):
755062 ms: ./blocks/basic/test/CMakeFiles/qa_Converter.dir//qa_Converter.cpp.o

**** Templates that took longest to instantiate:
  7969 ms: vir::refl::for_each_data_member_index((lambda at /home/steinhagen/git/gnuradio4/core/include/gnuradio-4.0/Settings.hpp:330:59) &&)::(anonymous class)::operator()<0UL, 1UL, 2UL, 3UL, 4UL, 5UL, 6UL, 7UL, 8UL> (144 times, avg 55 ms)
  6443 ms: vir::refl::for_each_data_member_index((lambda at /home/steinhagen/git/gnuradio4/core/include/gnuradio-4.0/Settings.hpp:330:59) &&)::(anonymous class)::operator()<0UL, 1UL, 2UL, 3UL, 4UL, 5UL, 6UL, 7UL, 8UL, 9UL> (108 times, avg 59 ms)
 [..]

Tentative Summary:

  • Approximately 80% of the compilation time is spent in the backend code generation and optimisation phases.
  • The data indicates that template instantiations are not the dominant factor in compile time.
  • The compile-time overhead seems to have shifted towards the new reflection library and SIMD code.

While there is a slight improvement in compile time and a more significant (~20%) reduction in binary size, the compile-time gains are modest. Introducing the new reflection library and SIMD code doesn't substantially reduce compile times but does affect where the time is spent during compilation.

I like the more expressive/intuitive new compile-time macro but this doesn't solve the initial problem of excessive compile-times. Maybe others could have a look/express their thoughts and ideas? @ivan-cukic any ideas?

Use GR_MAKE_REFLECTABLE macro instead of ENABLE_REFLECTION...

Signed-off-by: Matthias Kretz <[email protected]>
@mattkretz mattkretz force-pushed the rewrite-port-reflection-and-merging branch 3 times, most recently from 94ebbd7 to 1c4aefc Compare October 3, 2024 19:36
- remove ports defined as template arguments (MergedGraph)
  => drop Block inheritance from std::tuple

- split Port into Port and PortDescriptor, where the latter is used in
  the type lists returned from block reflection

- remove fixed_string name template parameter from Port

- remove ability to programmatically set a Port name; only the
  reflected member name is ever used — and then as string_view instead
  of string (because the name can never be set by users / points to
  compile-time strings).

- flatten tuples of Ports and arrays of Ports when reflecting on block
  members, as if they were direct members of the block.

- reflect vectors of Ports as dynamically sized Ports instead of
  `vector<PortDescriptor>`

- `reflFirstTypeName` is not needed with the way `vir::refl::type_name`
  presents the names. If it would be, then it's a bug to fix in
  vir-reflect-light. Therefore, replace `reflFirstTypeName` calls with
  `vir::refl::type_name`.

Signed-off-by: Matthias Kretz <[email protected]>
1. A simple `registerBlock` overload for a list of complete block types
was missing. If the given type is a specialized template, then the
`type_name` is transformed into a template argument string and passed to
`addBlockType`.

2. Generalize the `registerBlock` function for a block template with
type template arguments to an arbitrary number of parameters/arguments.

3. Generalize the `registerBlockTT` function to an arbitrary number of
type lists (and corresponding number of parameters for the template
template parameter). The relatively expensive use of std::apply is
replaced by an `outer_product` over all given typelists.

4. Add a comment on how to improve the remaining `registerBlock`
overloads.

5. Add meta::outer_product (lifted from mattkretz/virtest).

Signed-off-by: Matthias Kretz <[email protected]>
Signed-off-by: Matthias Kretz <[email protected]>
@mattkretz mattkretz force-pushed the rewrite-port-reflection-and-merging branch from 1c4aefc to d8eda0c Compare October 3, 2024 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants