Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile a static crengine library #1679

Merged
merged 5 commits into from
Nov 1, 2023

Conversation

benoit-pierre
Copy link
Contributor

@benoit-pierre benoit-pierre commented Oct 26, 2023

Take 2 of #1676: includes #1677, #1678, and use a custom crsetup.h header to avoid having to move the flags around.


This change is Reviewable

@benoit-pierre benoit-pierre force-pushed the pr/static_crengine_lib branch from d6d9f8c to 2be832e Compare October 26, 2023 01:48
@benoit-pierre
Copy link
Contributor Author

Oops, I forgot about the ugly duckling… (macOS)

@benoit-pierre benoit-pierre force-pushed the pr/static_crengine_lib branch from 2be832e to 9660111 Compare October 26, 2023 02:15
@poire-z
Copy link
Contributor

poire-z commented Oct 26, 2023

Any noticable change to the (re)build time, when touching a single crengine source file?

Copy link
Member

@Frenzie Frenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm (besides @poire-z's comments which I agree with)

You already checked that it runs properly on Android?

@benoit-pierre
Copy link
Contributor Author

Any noticable change to the (re)build time, when touching a single crengine source file?

No.

@benoit-pierre benoit-pierre force-pushed the pr/static_crengine_lib branch from 9660111 to 9e5b9c2 Compare October 26, 2023 21:22
@benoit-pierre
Copy link
Contributor Author

You already checked that it runs properly on Android?

Yes.

Copy link
Contributor

@poire-z poire-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with the idea and the bump.
(But no idea about reviewing the build stuff changes :))

Copy link
Member

@NiLuJe NiLuJe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No complaints besides that one stat64 query ;).

@benoit-pierre benoit-pierre force-pushed the pr/static_crengine_lib branch from 9e5b9c2 to 0d0947a Compare October 27, 2023 17:35
Update `CMakeLists.txt` tests for `KODEBUG` to match the makefiles:
check for a non-empty `KODEBUG`, not for definition (`kodev` export
an empty `KODEBUG` on release builds).
- using `$@` for some of those multiple-targets rules is a mistake
  (it's not going be the library name when triggered by a dependency
  on the include directory)
- add missing targets (include directory, library names)
Generate a custom `crsetup.h`, so users of the library will be sure to use the correct (same) flags.
There's only one user: `libkoreader-cre.so`.
Set it to `inlineshidden`, since we compile a static library and don't
need to export anything (all accesses go through `libkoreader-cre.so`).
@benoit-pierre benoit-pierre force-pushed the pr/static_crengine_lib branch from 0d0947a to f97de61 Compare November 1, 2023 01:23
@Frenzie Frenzie merged commit 7ef7abb into koreader:master Nov 1, 2023
2 checks passed
@benoit-pierre benoit-pierre deleted the pr/static_crengine_lib branch November 1, 2023 13:25
@poire-z
Copy link
Contributor

poire-z commented Nov 4, 2023

(Nearly random issue to post this;)
Since recently, for each thumbnail in PageBrowser on CRE document, generated by a background process, I get this in my /var/log/messages:

Nov  4 09:52:53 kernel: reader.lua[8606]: segfault at 88 ip 00007f6ac3f127d9 sp 00007fffe7fee210 error 4 in libkoreader-cre.so[7f6ac3e76000+141000] likely on CPU 0 (core 0, socket 0)
Nov  4 09:52:53 kernel: Code: 4c 89 ef e8 b9 6a f6 ff 44 89 e0 44 8d 60 ff 85 c0 74 33 49 63 c4 48 c1 e0 03 48 03 43 08 48 8b 00 48 8b 40 40 48 85 c0 74 06 <83> 78 08 01 75 9b 48 8d 7b 18 44 89 e6 e8 fd 9e ff ff 49 89 c5 48

A quick try at launching gdb via some script around I use for the main process failed to give me more info with these background processes. And not much realtime atm to dig into that.
Does anybody else observe this ? Or it is specific to my setup/settings?

ps: Everything works: it's probably happening at teardown, and these subprocess explicitely are not terminated properly to not waste time.

@Frenzie
Copy link
Member

Frenzie commented Nov 4, 2023

On a Kobo or the emulator? (Or both?) I don't recall seeing this otoh but I'll take a look later.

@poire-z
Copy link
Contributor

poire-z commented Nov 4, 2023

Emulator, I haven't checked on my Kobo (where I don't even know if/where's the /var/log/messages equivalent :)

@Frenzie
Copy link
Member

Frenzie commented Nov 4, 2023

Indeed, same here.

4/11/2023 10:55	kernel	reader.lua[277127]: segfault at 7f9afddfe990 ip 00007f9b0b89950b sp 00007ffcbf5f2ea0 error 4 in libc.so.6[7f9b0b826000+17f000] likely on CPU 0 (core 0, socket 0)
4/11/2023 10:55	systemd-coredump	Process 277127 (reader.lua) of user 1000 dumped core.

Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/luajit from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libtesseract.so.3 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libk2pdfopt.so.2 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/liblept.so.5 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libkoreader-cre.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libcrypto.so.1.1 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/rocks/lib/lua/5.1/rapidjson.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libssl.so.1.1 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/common/ssl.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libmupdf.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/common/socket/score.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libsqlite3.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libharfbuzz.so.0 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libpng16.so.16 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/liblunasvg.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libunibreak.so.5 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libfribidi.so.0 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libkoreader-xtext.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libwebp.so.7 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libfreetype.so.6 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libjpeg.so.8 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libkoreader-nnsvg.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/common/mime/mcore.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/rocks/lib/lua/5.1/lpeg.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libwrap-mupdf.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libsharpyuv.so.0 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/liblodepng.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libwebpdemux.so.2 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libz.so.1 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libzstd.so.1 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libblitbuffer.so from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libutf8proc.so.3 from deb systemd-253.5-1ubuntu6.amd64
Module /home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/libs/libkoreader-lfs.so from deb systemd-253.5-1ubuntu6.amd64
Module libudev.so.1 from deb systemd-253.5-1ubuntu6.amd64
Module libsystemd.so.0 from deb systemd-253.5-1ubuntu6.amd64
Stack trace of thread 277127:
#0  0x00007f9b0b89950b __pthread_clockjoin_ex (libc.so.6 + 0x9950b)
#1  0x00007f9b07f1eef4 n/a (iris_dri.so + 0x11eef4)
#2  0x00007f9b07ecef56 n/a (iris_dri.so + 0xcef56)
#3  0x00007f9b07ecefe4 n/a (iris_dri.so + 0xcefe4)
#4  0x00007f9b0b845126 __run_exit_handlers (libc.so.6 + 0x45126)
#5  0x00007f9b0b845260 __GI_exit (libc.so.6 + 0x45260)
#6  0x000055afac20104a n/a (/home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/luajit + 0x5504a)
#7  0x000055afac1d1116 n/a (/home/frans/src/koreader/base/build/x86_64-linux-gnu-debug/luajit + 0x25116)
ELF object binary architecture: AMD x86-64

@poire-z
Copy link
Contributor

poire-z commented Nov 4, 2023

Thanks.
(We get both a crash, but not sure it is the same reason, yours is in libc.so, mine in libkoreader-cre.so)

@benoit-pierre
Copy link
Contributor Author

benoit-pierre commented Nov 4, 2023

Since recently […]

How recent are we talking?

I can't reproduce.

I don't think the static crengine library is to blame. I'd be more inclined to think it's an issue with one of the leak-on-exit fix combined with the (horrible?) use of fork to extract metadata in the background or do partial rendering.

From atexit man page:

When a child process is created via fork(2), it inherits copies of its parent's registrations. Upon a successful call to one of the exec(3) functions, all registrations are removed.

Particularly for the crash in the __run_exit_handlers on desktop, does it happen without b6eb2fb?

@poire-z: Can you run a git bisect search to find out which commit is to blame?

@poire-z
Copy link
Contributor

poire-z commented Nov 4, 2023

How recent are we talking?

Looking at my /var/log/messages, I didn't get them before Oct 30 21:16:53 , and at the time it logged:
Oct 30 21:16:53 bubonic kernel: reader.lua[6473]: segfault at 58 ip 00007f8366176a61 sp 00007fff6549f6f0 error 4 in libcrengine.so

I think It was the day before I pulled stuff and got your big base changes that needed me to kodev clean (and lost all my settings :/) and recompile everything.

I've been a bit busy and not focused on KOReader these last weeks, so I haven't really followed what I picked and when I pulled - but I'm not lagging more than 2 or 3 days, so your commit from 3 weeks ago is probably not the culprit. Just tried reverting line 144 S.renderer = ffi.gc(SDL.SDL_CreateRenderer(S.screen, -1, 0), SDL.SDL_DestroyRenderer) and it did not change anything.

with the (horrible?)

:( it was the cleverest idea I've ever had related to KOReader :/

use of fork to extract metadata in the background or do partial rendering.

I have no such segfault with 1) refresh cached book information on a book in history - neither 2) partial rerendering nor 3) wikipedia lookup and save as epub.
Only one segfault for each page browser generated thumbnail.

Can you run a git bisect search to find out which commit is to blame?

No time right now to get involved in that and rebuild the whole thing :/
But if you have other ideas about small commits that I can revert by hand, I can quickly check.

@benoit-pierre
Copy link
Contributor Author

By the way, to debug with gdb, use:

> set detach-on-fork off
> set schedule-multiple

And use inferior / info inferiors to switch processes.

@benoit-pierre
Copy link
Contributor Author

Thread 5.1 "luajit" received signal SIGSEGV, Segmentation fault.
0x00007ffff7096f04 in LVRefCounter::getRefCount (this=0x58) at ../../subprojects/crengine/crengine/src/../include/lvref.h:71
71          int getRefCount() const { return refCount; }
(gdb) up
#1  0x00007ffff70a8a2d in LVProtectedFastRef<LVFont>::getRefCount (this=0x7ffff213c540)
    at ../../subprojects/crengine/crengine/src/../include/lvref.h:336
336         int getRefCount() const { return _ptr->getRefCount(); }
(gdb) up
#2  0x00007ffff70984aa in LVFontCache::clear (this=0x7ffff213c430) at ../../subprojects/crengine/crengine/src/lvfntman.cpp:732
732                 assert(fnt.isNull() || fnt.getRefCount() == 1);
(gdb) l
727                 delete _registered_list.remove(i);
728             }
729             for (i = _instance_list.length(); i--; ) {
730     #ifndef NDEBUG
731                 LVFontRef &fnt = _registered_list[i]->_fnt;
732                 assert(fnt.isNull() || fnt.getRefCount() == 1);
733     #endif
734                 delete _instance_list.remove(i);
735             }
736         }
(gdb) print fnt.isNull()
11/04/23-21:48:25 WARN  PageBrowserWidget thumbnail deserialize() failed: malformed serialized data (unexpected end of buffer)
$1 = false
(gdb) print fnt
$2 = (LVFontRef &) @0x7ffff213c540: {_ptr = 0x50}

:/

@benoit-pierre
Copy link
Contributor Author

OK, tentative patch:

 base/cre.cpp                                     | 7 +++++++
 frontend/apps/reader/modules/readerthumbnail.lua | 3 +++
 frontend/document/credocument.lua                | 4 ++++
 base/thirdparty/kpvcrlib/crengine/crengine/src/lvfntman.cpp   | 2 +-
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git i/base/cre.cpp w/base/cre.cpp
--- i/base/cre.cpp
+++ w/base/cre.cpp
@@ -764,6 +764,12 @@ static int isCacheFileStale(lua_State *L) {
     return 1;
 }
 
+static int setCacheFileStale(lua_State *L) {
+    CreDocument *doc = (CreDocument*) luaL_checkudata(L, 1, "credocument");
+    doc->dom_doc->setCacheFileStale(lua_toboolean(L, 2));
+    return 0;
+}
+
 static int invalidateCacheFile(lua_State *L) {
     CreDocument *doc = (CreDocument*) luaL_checkudata(L, 1, "credocument");
     doc->dom_doc->invalidateCacheFile();
@@ -4023,6 +4029,7 @@ static const struct luaL_Reg credocument_meth[] = {
     {"isBuiltDomStale", isBuiltDomStale},
     {"hasCacheFile", hasCacheFile},
     {"isCacheFileStale", isCacheFileStale},
+    {"setCacheFileStale", setCacheFileStale},
     {"invalidateCacheFile", invalidateCacheFile},
     {"getCacheFilePath", getCacheFilePath},
     {"updateTocAndPageMap", updateTocAndPageMap},
diff --git i/frontend/apps/reader/modules/readerthumbnail.lua w/frontend/apps/reader/modules/readerthumbnail.lua
--- i/frontend/apps/reader/modules/readerthumbnail.lua
+++ w/frontend/apps/reader/modules/readerthumbnail.lua
@@ -319,6 +319,8 @@ end
 
 function ReaderThumbnail:startTileGeneration(request)
     local pid, parent_read_fd = ffiutil.runInSubProcess(function(pid, child_write_fd)
+        self.ui.document:setCallback()
+        self.ui.document:setCacheFileStale(false)
         -- Get page image as if drawn on the screen
         local bb = self:_getPageImage(request.page)
         -- Scale it to fit in the requested size
@@ -337,6 +339,7 @@ function ReaderThumbnail:startTileGeneration(request)
         -- bb:free() -- no need to spend time freeing, we're dying soon anyway!
 
         ffiutil.writeToFD(child_write_fd, self.codec.serialize(tile:totable()), true)
+        self.ui.document:close()
     end, true) -- with_pipe = true
     if pid then
         -- Store these in the request object itself
diff --git i/frontend/document/credocument.lua w/frontend/document/credocument.lua
--- i/frontend/document/credocument.lua
+++ w/frontend/document/credocument.lua
@@ -1447,6 +1447,10 @@ function CreDocument:isCacheFileStale()
     return self._document:isCacheFileStale()
 end
 
+function CreDocument:setCacheFileStale(stale)
+    return self._document:setCacheFileStale(stale)
+end
+
 function CreDocument:invalidateCacheFile()
     self._document:invalidateCacheFile()
 end
diff --git i/base/thirdparty/kpvcrlib/crengine/crengine/src/lvfntman.cpp w/base/thirdparty/kpvcrlib/crengine/crengine/src/lvfntman.cpp
--- i/base/thirdparty/kpvcrlib/crengine/crengine/src/lvfntman.cpp
+++ w/base/thirdparty/kpvcrlib/crengine/crengine/src/lvfntman.cpp
@@ -728,7 +728,7 @@ public:
         }
         for (i = _instance_list.length(); i--; ) {
 #ifndef NDEBUG
-            LVFontRef &fnt = _registered_list[i]->_fnt;
+            LVFontRef &fnt = _instance_list[i]->_fnt;
             assert(fnt.isNull() || fnt.getRefCount() == 1);
 #endif
             delete _instance_list.remove(i);

@poire-z
Copy link
Contributor

poire-z commented Nov 5, 2023

Your (logical, fixing a coding error that was affecting only debug builds) change to lvfntman.cpp solves all my segfault cases :) 👍 (I'm still on koreader/crengine@a32b9ed)

(Didn't try the other changes, I guess you're just being super clean - but my idea with fork & die fast is that we just don't have any cleaning to waste doing.)

@poire-z
Copy link
Contributor

poire-z commented Nov 9, 2023

^ Looks like I didn't really test much :/
Got the segfault in my yesterday /var/log/messages, and get them too.
I indeed need your full patch to not get them.
I guess what you added doesn't cost much (dunno the amount of work :close() does, even when no cache file saving) - but why can't it just die like a brave? Because of the finalizer stuff you added ? No way to suicide -9 and don't worry?

@benoit-pierre
Copy link
Contributor Author

Yes, after fixing the stupid mistake in lvfntman.cpp, the assert just below gets triggered on exit.

@poire-z
Copy link
Contributor

poire-z commented Nov 14, 2023

No problem with adding #1696. But can you say why it does help with the segfaults (which were happening only in some cases with Harfbuzz or images around)?

Also, is it because in ffi/util.runInSubProcess(), for the child, it ends with os.exit(0) that the fancy finalizers are called?
Is there no other way (like os._exit() vs os.exit() in Python) to die roughly without final farewells?
(I'm not saying we should keep being dirty - although I don't see why we should care spending time for nothing - but if we know there's nothing worthy finalizing, or it's complicated to solve diying cleaning, it's good to have a solution...)

@benoit-pierre
Copy link
Contributor Author

I don't think there's a way to prevent the finalizers from being called. The assert is triggered because there's actually still a document registered. #1696 is so self.ui.document:close() can be called without fear of changing the cache.

@poire-z
Copy link
Contributor

poire-z commented Nov 14, 2023

The assert is triggered because there's actually still a document registered.

It still bugs me to have to think what happens after we are done.
Why shouldnt' we now go at adding setCacheFileStale in the subprocess for dict lookup or bookinfo extraction?

#1696 is so self.ui.document:close() can be called without fear of changing the cache.

Why do we call it then ? :) Is that C code or Lua code ? May be it can be hard for C destructors to not be called, but it feels we should be able to suicide more easily in Lua :)
Is it really the finalizers (I guess that means all the desctructors code) that deal with that ? I thought it was explicit (and if it is not, it feels it should be - unless setCacheFileStale is the way to not have done that... May be another more generic flag to ask everything to not cleanup - which would setCacheFileStale(), would feel cleaner and more explicite and less "have to think".)

@poire-z
Copy link
Contributor

poire-z commented Nov 14, 2023

Btw, this also prevents the segfaults:

--- a/ffi/util.lua
+++ b/ffi/util.lua
@@ -376,6 +376,9 @@ function util.runInSubProcess(func, with_pipe, double_fork)
         if not ok then
             print("error in subprocess:", err)
         end
+        print("killing -9 me")
+        C.kill(C.getpid(), 9)
+        print("killed -9 me")
         os.exit(0)
     end
     -- parent/main process

and it is in the spirit of this subprocess stuff, and what the parent process does when asked to terminate a subprocess:

koreader-base/ffi/util.lua

Lines 418 to 431 in 01efb98

function util.terminateSubProcess(pid)
local done = util.isSubProcessDone(pid)
if not done then
-- We kill with signal 9/SIGKILL, which may be violent, but ensures
-- that it is terminated (a process may catch or ignore SIGTERM)
-- If we used setpgid(0,0) above, we can kill the process group
-- instead, by just using -pid
-- C.kill(pid, 9)
C.kill(-pid, 9)
-- Process will still have to be collected with calls to
-- util.isSubProcessDone(), which may still return false for
-- some small amount of time after our kill()
end
end

So, if the parent does this awful thing to its child, I guess the child can be allowed to do it to itself :)
(Which doesn't prevent the child function from doing any explicite cleaning of its own.)

Dunno if it should also be done in the intermediate process (which, I guess, should also call finalizers?) when double forking ?

koreader-base/ffi/util.lua

Lines 331 to 345 in 01efb98

if pid == 0 then -- child process
if double_fork then
pid = C.fork()
if pid ~= 0 then
-- Parent side of the outer fork, we don't need it anymore, so just exit.
-- NOTE: Technically ought to be _exit, not exit.
os.exit((pid < 0) and 1 or 0)
end
-- pid == 0 -> inner child :)
end
-- We need to wrap it with pcall: otherwise, if we were in a
-- subroutine, the error would just abort the coroutine, bypassing
-- our os.exit(0), and this subprocess would be a working 2nd instance
-- of KOReader (with libraries or drivers probably getting messed up).
local ok, err = xpcall(function()

Btw, @NiLuJe , what did you have in mind (Python?) when writing there: NOTE: Technically ought to be _exit, not exit. ? There is no os._exit() in my Lua.

@Frenzie
Copy link
Member

Frenzie commented Nov 14, 2023

There is no os._exit()` in my Lua.

If there were he wouldn't have said it ought to exist I'd say, but simply used it instead. ^_^

@poire-z
Copy link
Contributor

poire-z commented Nov 14, 2023

I guess so :) but I asked what, when saying it, he had in mind but didn't say :)
ie. if there is a _exit() in C, we could just ffi wrap it. If not, we could code what Python does if it does some additional nice stuff.
My worry with self-kill-9 is that if the Lua func given to be run in the subprocess output stuff in the pipe (most do), killing-9 may prevent all that stuff to be flushed on the pipe. And may be we should ourself take care of flush/close at least that pipe before suicide.

@NiLuJe
Copy link
Member

NiLuJe commented Nov 14, 2023

Relevant to our interests is that _exit doesn't call atexit. Also relevant to our interests is that it also doesn't flush stdio streams either, so that might be problematic.

I'm also not sure if the glibc doesn't do something funky with that symbol, so it may not be easy to get at via FFI ;).

@poire-z
Copy link
Contributor

poire-z commented Nov 15, 2023

I think all the destructors we may be talking about are not implicitely called on os.exit().
They are explicitely called by us:

koreader-base/cre.cpp

Lines 4123 to 4133 in 01efb98

// Library finalizer (c.f., dlopen(3)). This serves no real purpose except making Valgrind's output slightly more useful.
__attribute__((destructor)) static void cre_teardown(void) {
if (cre_callback_forwarder) {
delete cre_callback_forwarder;
cre_callback_forwarder = NULL;
}
HyphMan::uninit();
ShutdownFontManager();
CRLog::setLogger( NULL );
ldomDocCache::close();
}

So, if we don't want to go yet with a C.kill(C.getpid(), 9) (although, who knows what may happen in other libraries/engines like MuPDF/Djvu), the gentler solution would be to just prevent the above from happening when in a subprocess (ie. by setting a static cre.cpp variable, and exiting early from this cre_teardown() if set?

@NiLuJe
Copy link
Member

NiLuJe commented Nov 15, 2023

I really don't see the interest in that when those are essentially free?

@poire-z
Copy link
Contributor

poire-z commented Nov 15, 2023

What that and those ?

@NiLuJe
Copy link
Member

NiLuJe commented Nov 15, 2023

I mean trying to avoid destructors and sane cleanup (anywhere, anywhen, context is irrelevant ^^).

@poire-z
Copy link
Contributor

poire-z commented Nov 15, 2023

I believe context is relevant...
It's in the contract of our subprocess stuff, to not touch anything related to the main process, and exiting as soon as the job is done, without cleanup, is the best way to achieve that.
The destructors' stuff might be costly (but ok, that's probably peanuts) - but what @benoit-pierre was doing with these fixes is adding more kludge to the code, unrelated to the job we ask of it, just so that the existing destructors do not crash and cause no harm.
My point is that we should not have and neither need to think about that.
(Moreover, when a subprocess is interrupted - which is the benefit of having them handled via fork - we just kill -9 them - so no destructor called. We shouldn't need to have them called either when the process has done irs work regularly.)

@NiLuJe
Copy link
Member

NiLuJe commented Nov 16, 2023

I fairly strongly disagree with that (on paper, at least), it seems to be a perfectly fun and interesting way to introduce weird and random issues due to the different behavior and potential codepaths taken.

to not touch anything related to the main process

On that front, you've already lost that fight by basically just having spawned a Lua fork at all ;).

@poire-z
Copy link
Contributor

poire-z commented Nov 16, 2023

I fairly strongly disagree with that (on paper, at least),

Bureaucrats, theoreticians... but we're talking about Life in the jungle here :)

it seems to be a perfectly fun and interesting way to introduce weird and random issues due to the different behavior and potential codepaths taken.

The random issue and different behaviour we had above (really, we all 4 didn't manage to reproduce the same bug !) is because we took these same codepaths (that we didn't know/think/remember about, and didn't need to take)...
And the exit/suicide idea is to actually just not take any codepath.
We had lived for years without these finalizees. They were added recently to limit valgrind reports about non-freed memory. They serve no other purpose.
I'd rather have to explicitely call any useful cleanup functions in these subprocess code - than having any useful and unuseful finalizations stuff - that a dev rightfully added for the main normal contect - called in this different/hacky context.

On that front, you've already lost that fight by basically just having spawned a Lua fork at all ;).

I don't think so. We just left the fighting cocks peaceful, avoiding any fight by disappearing early - with fork&exec for the dict lookup, by exiting as soon as done for the other stuff.
The fact that we're single threaded and that no other code runs than the code we explicitely take in the subprocess until we explicitely die - helps with that.

I dunno if some Lua gc can happen during that, touching other objects - but it hadn't hit us all these years. Also dunno if this can more easily happen in a subprocess where we do network stuff (ie. Wikipedia lookup) and are indeed idle and waiting.

Frenzie added a commit to Frenzie/koreader-base that referenced this pull request Nov 17, 2023
This is one potential solution to fixing the regression introduced by koreader#1679. I don't support this solution, nor its friend of adding a `touch $(FRIBIDI_DIR)/include` at the bottom.

Including (hah!) most these `/include`s seems to be a mistake. They are to be generated by their relevant targets. Either they fail or they don't, simple, no weirdness. They're targets for the target!
Frenzie added a commit to Frenzie/koreader-base that referenced this pull request Nov 17, 2023
Partially reverts 8c133b0 (from koreader#1679).

The logic behind including them doesn't check out. The targets in question depend on the files that are to be written in those very /include directories. If they fail to be created properly the target fails. Therefore including them in Makefile.third in this manner doesn't make much sense.

I originally thought it was a bit weird but harmless, but as it turns out it's the exact opposite. So yeah, it has to go away ASAP.
@poire-z
Copy link
Contributor

poire-z commented Dec 3, 2023

So, how are we going about that issue?
I don't get segfaults logs in my /var/log/messages anymore, I just get luajit: koreader/base/thirdparty/kpvcrlib/crengine/crengine/src/lvfntman.cpp:732: void LVFontCache::clear(): Assertion 'fnt.isNull() || fnt.getRefCount() == 1' failed. in the emulator output.

There is the proposed solution above at #1679 (comment) that solves it for this page thumbnail generations.
But I just tested, and I get the assertion failed line also when:

  • Wikipedia lookup
  • Refresh cached book information.
  • (but not dictionary lookup - which I guess does not fork but uses io.popen())

So, it still feels to me that a more generic suicide solution - or a "do not finalize" flag - is a lot better than having to plug many ad-hoc self.ui.document:setCacheFileStale(false) and self.ui.document:close() in places where this stuff has no real relevance.

@poire-z
Copy link
Contributor

poire-z commented Apr 6, 2024

Zero feedback :(
@benoit-pierre : I can't believe, given how you care about tidyness everywhere, that you're just happy with your benoit-pierre/koreader@928da4e, not getting any assert failure with PageBrowser, but still getting them with Wikipedia or Refresh cached book info like I mentionned above...

Adding the same kludge in these other run-in-subprocess function would be ugly.
I see 2 solutions:

  • either explicitely cancel/make-no-op any cre finalizer when starting launching stuff in the various subprocess executing code (like it happened before you added these finalizers, and we were just fine)
  • allow modules (for now, cre only I think) to register pre-fork & after-fork & before-dying callbacks globally that these various subprocess executors would run (and in them, you would put your various self.ui.document:setCallback() self.ui.document:setCacheFileStale(false) self.ui.document:close()) that would do this (useless) work, but in all present and future cases.

My preference is still for the first option.

@benoit-pierre
Copy link
Contributor Author

I don't like it, no.

I thought my position on the matter was clear: the mistake is using fork in the first place. And I believe disabling CRE finalizers so the asserts are not triggered is the equivalent of putting on a blindfold: just because you don't see it does not mean the problem is not still there.

With that being said those forks are not going away any time soon, so… Pick a kludge: 1 is less work, 2 is cleaner (and there's already some others instances that could benefit from a way to do some pre-fork cleaning: see calls to BookInfoManager:closeDbConnection()).

@poire-z
Copy link
Contributor

poire-z commented Apr 6, 2024

And I believe disabling CRE finalizers so the asserts are not triggered is the equivalent of putting on a blindfold: just because you don't see it does not mean the problem is not still there.

I meant: disabling them in the forked process, after the fork: we know the subprocess is going to die soon, we don't care about any problem in it.
We would not disable them in the main process, we'd still have the assert in the main process, which is where you may want to notice any problem.

some others instances that could benefit from a way to do some pre-fork cleaning

I mentionned "pre-fork" for completeness in a set of callbacks, but I don't see why we would need to do anything pre-fork - so, in the main process, that will continue to live on and whose state does not need to change.
Any db closing (if we cared) would be done "after-fork" in the subprocess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants