Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring CMakeLists #22

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

blacktea
Copy link

@blacktea blacktea commented Aug 8, 2018

There are useful configure files a client project can include the runtime project using find_package funcion.
There is module to find iconv and include naturely.

@GreyCat
Copy link
Member

GreyCat commented Aug 10, 2018

The situation with CMakeLists in this repo is pretty ambiguous, to put it mildly. This is not recommended way to use the library. It's not supported in any way, i.e. we don't test it, it's not documented, we can't be sure that it works as intended / at all / etc.

@blacktea, you propose here a relatively large change that I can't even test manually, and, to be frank, many modifications here warrant a thoughtful explanation. I see at least a few minor syntactic problems (EOF newlines, broken indents, Kaitai namespace which does not seem to match anything in official defaults), but, again, I'm not the one who should or should not accept it.

Given that nobody of the people who worked on these CMakeLists before reviewed it so far, @blacktea, could you explain in more detail, what is your idea behind using CMakeLists, i.e.:

  • Why it should be used? Any particular use case that warrants (or at least makes acceptable) usage of .so here?
  • Any ideas how to test it?
  • Would you mind supporting it in future?

@blacktea
Copy link
Author

I'd start with an explanation of my using the project.

Firstly, I had to build kaitai_struct_cpp_stl_runtime to get a library(lib/so), then to generate cpp files from .ksy files by the compiler.
Finally, I had to write a CMakeLists file to use these things. The CMakeLists finds the built library and includes the generated files. To find the library all people should use find_library declaring paths and write mark_as_advanced. It is not convinient.

The my proposal intend to simplify that.

I have the following project struct:
Demo:
-CMakeList.txt
-demo.cpp
-GeneratedCppFromKsy
--pcap.cpp
--...
--CMakeLists.txt

Demo/GeneratedCppFromKsy/CMakeLists.txt looks like:

project(pcap_parser)
find_package(KaitaiRuntime REQUIRED)

FILE(GLOB HDRS "*.h")
FILE(GLOB SRC "*cpp")

add_library(${PROJECT_NAME} ${HDRS} ${SRC})

Demo/CMakeLists.txt contains:

project(demo LANGUAGES CXX)

add_subdirectory(GeneratedCppFromKsy)
add_executable(${PROJECT_NAME} demo.cpp)
target_link_libraries(${PROJECT_NAME} pcap_parser)

...

I suppose it's easy and natural way to use the project.

If it's supposed to include kaitaistruct.* kaitaistream.* and generated files to a target project. I think it's not convinient and nobody to do that.

Moreover, I think the project compiler would be able to create CMakeLists for generated files(similar with Demo/GeneratedCppFromKsy/CMakeLists.txt). Because, it seems as a pattern.

I attached demo project to check it out (kaitai_tiny_example.zip). However, before it's needed to build kaitai_struct_cpp_stl_runtime.

  1. mkdir build && cd build
  2. cmake ..
  3. make && sudo make install

"EOF newlines, broken indents, Kaitai namespace which does not seem to match anything in official defaults"

Please, tell me more details I'm ready to fix it and ready to support it.

@GreyCat
Copy link
Member

GreyCat commented Aug 12, 2018

Firstly, I had to build kaitai_struct_cpp_stl_runtime to get a library(lib/so)
If it's supposed to include kaitaistruct.* kaitaistream.* and generated files to a target project. I think it's not convinient and nobody to do that.

Actually, this is my first big question, what is your rationale to not just include kaitaistruct.* and kaitaistream.* in your project, as this is indeed the recommended way to do it.

The main problem with any dynamically linked library is that it impedes inlining of the simple methods like read_u1, read_s1, etc, which are used very often. Lack of inlining might lead to serious performance problems — we don't have good reproducible benchmarks, but in some tests I've done earlier manually, I've seen 30-50% performance drop.

Performance aside, I'm also pretty surprised about "covenience". C++, unfortunately, never got any good single package management system like most modern languages have, so, introduction of any extra libraries is always a big mess. Even if you can solve the problem for, say, Linux (by packaging binary shared library for major distributions), it would be still a problem for OS X (yet another homebrew package?.. unlikely a popular solution), and for Windows (vcpkg and/or nugets? both have its own cons).

Even if we'll package the runtime at some point as a shared library, I don't think that CMake compatibility layer would be the answer for everyone (or, even for everyone on *nix-like systems). People still use other build systems (QMake, scons, bazel, Gradle, Ninja, etc), people would want us to support some standards like pkgconfig, etc.

then to generate cpp files from .ksy files by the compiler.
Finally, I had to write a CMakeLists file to use these things.

Ok, why can't we just use CMakeLists in your project to run kaitai-struct-compiler on .ksy files, i.e. using the very same principle as with C compiler that compiles .cpp → .cpp.o (and these never appear in source dir, only in build dir)?

As far as I understand your current approach, you're not only building them manually, but you also compile them using GeneratedCppFromKsy/CMakeLists.txt into yet another shared library, which, again, impedes aggressive inlining on this level.

@blacktea
Copy link
Author

Actually, this is my first big question, what is your rationale to not just include kaitaistruct.* and kaitaistream.* in your project, as this is indeed the recommended way to do it.

So, you don't add source code of ZLIB library instead you find the already built library.
Secondly, it's easy to manage a single copy of a source code in case of a new releases or bug fixes appear. Otherwise, you have to update source code manually.

The main problem with any dynamically linked library is that it impedes inlining of the simple methods

Some links related to your concerns (In short dynamic library does not have any impact on run-time performance):

  1. https://stackoverflow.com/questions/2247465/does-using-large-libraries-inherently-make-slower-code
  2. https://stackoverflow.com/questions/25008801/will-there-be-a-performance-hit-on-including-unused-header-files-in-c-c
  3. https://stackoverflow.com/questions/36983905/what-is-the-runtime-performance-cost-of-including-a-library

There is an example: everybody uses OpenCV that does a lot of reading and calculating as a dynamic library.
Moreover, read_u1, read_s1 are not marked as an inline function (https://en.cppreference.com/w/cpp/language/inline). Anyway, they are not called from a client side.
And, I would try to avoid so much new calls.

Even if we'll package the runtime at some point as a shared library, I don't think that CMake compatibility layer would be the answer for everyone (or, even for everyone on *nix-like systems). People still use other build systems (QMake, scons, bazel, Gradle, Ninja, etc), people would want us to support some standards like pkgconfig, etc.

In any case, they will build the library and include it using their favourite build system.

@GreyCat
Copy link
Member

GreyCat commented Aug 13, 2018

So, you don't add source code of ZLIB library instead you find the already built library.

Sorry, I don't quite understand what you mean here. Is ZLIB just an example by itself, or you mean that KS runtime requiring zlib complicates things and dynamic library would help here in some way?

Secondly, it's easy to manage a single copy of a source code in case of a new releases or bug fixes appear. Otherwise, you have to update source code manually.

The problem with C++ is there is no really a clear way to "manage". In Linux and, to some extent, BSD, you have distribution packages which you can (in theory) rely on, but even here upgrading to a new verson of KS is not that trivial as just running apt-get install libkaitaistruct: most likely ABI would change => soname would change => you'll have to both re-run ksc and re-build your binary.

In most other systems (OS X, Windows), it's much worse: there's no such magic as apt-get install, there's no system-wide dependency information available, there's DLL hell, etc. In these systems, typically whole idea of shared libraries is largerly compromised and many applications opt to just bring their shared libraries with them, thus effectively nullyfing the benefits of "sharing".

https://stackoverflow.com/questions/2247465/does-using-large-libraries-inherently-make-slower-code

This is irrelevant, this question is about whether loading large libraries with tons of unused functions would negatively affect code performance.

https://stackoverflow.com/questions/25008801/will-there-be-a-performance-hit-on-including-unused-header-files-in-c-c

Irrelevant as well, this has nothing to do with shared libraries.

https://stackoverflow.com/questions/36983905/what-is-the-runtime-performance-cost-of-including-a-library

Slightly relevant, but majority of answers are far from reality. For example, no-one even mentions inlining, no-one even mentions LTO, etc.

There is an example: everybody uses OpenCV that does a lot of reading and calculating as a dynamic library.

Using OpenCV is typically a matter of calling a function that implements certain algorithm once. Using KS, little things like read_u1 could be called billions of times.

So, bottom line, before comitting to maintaining CMakeLists (and dynamic build at all) for this library, I would start with a solid benchmark, as suggested in kaitai-io/kaitai_struct#347.

Moreover, read_u1, read_s1 are not marked as an inline function

inline keyword has a long and painful history, and, to make a long story short, it does not affect even compiler's inlining decisions much in modern world. Given that functions like read_u1 are not even declared in .h, they obviously can't be inlined by a compiler, but rather that optimization might or might not be done at link time, in LTO stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants