Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating Gravity after restoring a v5 Teleport into a v6 instance crashes FTL #2035

Open
robertobado opened this issue Aug 17, 2024 · 20 comments

Comments

@robertobado
Copy link

Versions

Core
Version is 9564a6e (Latest: null)
Branch is development-v6
Hash is 9564a6e9 (Latest: 4972cc6f)
Web
Version is a6807d1 (Latest: null)
Branch is development-v6
Hash is a6807d1a (Latest: 046b5629)
FTL
Version is vDev-0c36f47 (Latest: null)
Branch is development-v6
Hash is 0c36f47 (Latest: 0c36f47)

Platform

  • OS and version: Armbian 24.8.0-trunk.523 bookworm
  • Platform: Docker @ BananaPi

Expected behavior

  1. Updating Gravity after importing config should succeed
  2. FTL should restart if crashed

Actual behavior / bug

  1. Updating Gravity causes FTL to crash
  2. Once crashed, FTL doesn't restarts automatically, stopping all DNS resolution

Steps to reproduce

Steps to reproduce the behavior:

  1. Create a fresh installation of Pihole on docker
  2. Login to Dashboard
  3. Browse to Teleport
  4. Import a Teleport backup. Import is successful
  5. Browse to Update Gravity. Start update
  6. See error below

pihole | 2024-08-17 12:58:18.204 EDT [237/T293] INFO: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
pihole | 2024-08-17 12:58:18.204 EDT [237/T293] INFO: ----------------------------> FTL crashed! <----------------------------
pihole | 2024-08-17 12:58:18.204 EDT [237/T293] INFO: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
pihole | 2024-08-17 12:58:18.205 EDT [237/T293] INFO: Please report a bug at https://github.com/pi-hole/FTL/issues
pihole | 2024-08-17 12:58:18.205 EDT [237/T293] INFO: and include in your report already the following details:
pihole | 2024-08-17 12:58:18.205 EDT [237/T293] INFO: FTL has been running for 101 seconds
pihole | 2024-08-17 12:58:18.206 EDT [237/T293] INFO: FTL branch: development-v6
pihole | 2024-08-17 12:58:18.206 EDT [237/T293] INFO: FTL version: vDev-0c36f47
pihole | 2024-08-17 12:58:18.206 EDT [237/T293] INFO: FTL commit: 0c36f47
pihole | 2024-08-17 12:58:18.206 EDT [237/T293] INFO: FTL date: 2024-08-14 06:07:59 +0200
pihole | 2024-08-17 12:58:18.206 EDT [237/T293] INFO: FTL user: started as pihole, ended as pihole
pihole | 2024-08-17 12:58:18.207 EDT [237/T293] INFO: Compiled for linux/arm/v7 (compiled on CI) using cc (Alpine 13.2.1_git20240309) 13.2.1 20240309
pihole | 2024-08-17 12:58:18.207 EDT [237/T293] INFO: Process details: MID: 237
pihole | 2024-08-17 12:58:18.207 EDT [237/T293] INFO: PID: 237
pihole | 2024-08-17 12:58:18.207 EDT [237/T293] INFO: TID: 293
pihole | 2024-08-17 12:58:18.207 EDT [237/T293] INFO: Name: housekeeper
pihole | 2024-08-17 12:58:18.208 EDT [237/T293] INFO: Received signal: Segmentation fault
pihole | 2024-08-17 12:58:18.208 EDT [237/T293] INFO: at address: 0x69647508
pihole | 2024-08-17 12:58:18.208 EDT [237/T293] INFO: with code: SEGV_MAPERR (Address not mapped to object)
pihole | 2024-08-17 12:58:18.208 EDT [237/T293] INFO: !!! INFO: pihole-FTL has not been compiled with glibc/backtrace support, not generating one !!!
pihole | 2024-08-17 12:58:18.208 EDT [237/T293] INFO: ------ Listing content of directory /dev/shm ------
pihole | 2024-08-17 12:58:18.208 EDT [237/T293] INFO: File Mode User:Group Size Filename
pihole | 2024-08-17 12:58:18.209 EDT [237/T293] INFO: rwxrwxrwx root:root 280 .
pihole | 2024-08-17 12:58:18.209 EDT [237/T293] INFO: rwxr-xr-x root:root 320 ..
pihole | 2024-08-17 12:58:18.210 EDT [237/T293] INFO: rw------- pihole:pihole 56 FTL-lock
pihole | 2024-08-17 12:58:18.211 EDT [237/T293] INFO: rw------- pihole:pihole 292 FTL-counters
pihole | 2024-08-17 12:58:18.211 EDT [237/T293] INFO: rw------- pihole:pihole 16 FTL-settings
pihole | 2024-08-17 12:58:18.212 EDT [237/T293] INFO: rw------- pihole:pihole 82K FTL-strings
pihole | 2024-08-17 12:58:18.212 EDT [237/T293] INFO: rw------- pihole:pihole 4K FTL-domains
pihole | 2024-08-17 12:58:18.213 EDT [237/T293] INFO: rw------- pihole:pihole 168K FTL-clients
pihole | 2024-08-17 12:58:18.214 EDT [237/T293] INFO: rw------- pihole:pihole 12K FTL-upstreams
pihole | 2024-08-17 12:58:18.214 EDT [237/T293] INFO: rw------- pihole:pihole 295K FTL-queries
pihole | 2024-08-17 12:58:18.215 EDT [237/T293] INFO: rw------- pihole:pihole 8K FTL-overTime
pihole | 2024-08-17 12:58:18.216 EDT [237/T293] INFO: rw------- pihole:pihole 20K FTL-dns-cache
pihole | 2024-08-17 12:58:18.216 EDT [237/T293] INFO: rw------- pihole:pihole 4K FTL-per-client-regex
pihole | 2024-08-17 12:58:18.217 EDT [237/T293] INFO: rw------- pihole:pihole 560K FTL-fifo-log
pihole | 2024-08-17 12:58:18.217 EDT [237/T293] INFO: ---------------------------------------------------
pihole | 2024-08-17 12:58:18.217 EDT [237/T293] INFO: Please also include some lines from above the !!!!!!!!! header.
pihole | 2024-08-17 12:58:18.218 EDT [237/T293] INFO: Thank you for helping us to improve our FTL engine!
pihole | 2024-08-17 12:58:18.218 EDT [237/T293] INFO: Waiting for threads to join
pihole | 2024-08-17 12:58:18.218 EDT [237/T293] INFO: Thread database (0) is idle, terminating it.
pihole | 2024-08-17 12:58:18.281 EDT [237/T294] INFO: Terminating resolver thread
pihole | 2024-08-17 12:58:18.312 EDT [237/T295] INFO: Terminating timer thread
pihole | 2024-08-17 12:58:20.218 EDT [237/T293] INFO: Thread housekeeper (1) is still busy, cancelling `it.

Debug Token

Additional context

  1. To everyone contributing to Pihole project, thank you! You rock!
  2. This issue happens when restoring a v5 Teleport backup in a clean v6 instance. Restoring the same v5 Teleport backup in another v5 instance does not cause the described failure (as of today, based on the latest v5 at pihole/pihole:latest)
  3. Before posting this I exercised reproducing the issue multiple times. Failure is consistent
  4. Updating Gravity before import (with only the default list) works fine
  5. After restarting the container, updating Gravity works fine as well (including the imported lists)
@DL6ER
Copy link
Member

DL6ER commented Aug 18, 2024

Sorry to hear you are experiencing an issue and many thanks for the very good ticket on this. Unfortunately, I am unable to reproduce following your particular description so it seems to depend on the content of the particular file you are trying to restore.

As you can reproduce this reliably, can I ask you to try to reproduce this again with the debugger being attached to pihole-FTL?

As you said, you're running in an docker container, please first (temporarily!) add the required changes to your compose file:

services:
  pihole:
    ...

    cap_add:
      ...
      - SYS_PTRACE
    ....

    security_opt:
      - seccomp:unconfined

and then follow the description on the v6 documentation draft:
https://deploy-preview-338--pihole-docs.netlify.app/ftldns/debugging/

The final step (output of the backtrace command) should tell us exactly where it happened.

@robertobado
Copy link
Author

Thanks DL6ER for your prompt reply!

I prepared the debugging information. Here is it:

Attaching to process 14709
[New LWP 14710]
[New LWP 14711]
[New LWP 14712]
[New LWP 14713]
[New LWP 14714]
[New LWP 14715]
[New LWP 14716]
[New LWP 14717]
[New LWP 14718]
[New LWP 14719]
[New LWP 14720]
[New LWP 14721]
warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable. Connect to gdbserver inside the container.
__cp_begin () at src/thread/arm/syscall_cp.s:23
23 src/thread/arm/syscall_cp.s: No such file or directory.
(gdb) continue
Continuing.
[Detaching after vfork from child process 14967]
[Detaching after vfork from child process 14968]
Thread 1 "pihole-FTL" received signal SIGTERM, Terminated.
Thread 1 "pihole-FTL" received signal SIG41, Real-time event 41.
[LWP 14712 exited]
[LWP 14715 exited]
[LWP 14714 exited]
[LWP 14713 exited]
[LWP 14716 exited]
[LWP 14719 exited]
[LWP 14720 exited]
[LWP 14721 exited]
[LWP 14718 exited]
[LWP 14717 exited]
[LWP 14711 exited]
[LWP 14710 exited]
process 14709 is executing new program: /usr/bin/pihole-FTL
[New LWP 15081]
[New LWP 15082]
[New LWP 15083]
[New LWP 15084]
[New LWP 15085]
[New LWP 15086]
[New LWP 15087]
[New LWP 15088]
[New LWP 15089]
[New LWP 15090]
[New LWP 15091]
[New LWP 15092]
<------UPDATING GRAVITY STARTED AROUND HERE ----->
[Detaching after fork from child process 15147]
Thread 17 "housekeeper" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 15084]
get_meta (p=p@entry=0xb6b4a460 "$BALLOON-SHA256$v=1$s=1024,t=32$1JDArQljXS7zgBW9/u0rNw==$NRj1s+ITOkdmDyQ/X3lw7oPQHldsIWgNP+lESCd73YI=") at src/malloc/mallocng/meta.h:141
141 src/malloc/mallocng/meta.h: No such file or directory.
(gdb) backtrace
# 0 get_meta (p=p@entry=0xb6b4a460 "$BALLOON-SHA256$v=1$s=1024,t=32$1JDArQljXS7zgBW9/u0rNw==$NRj1s+ITOkdmDyQ/X3lw7oPQHldsIWgNP+lESCd73YI=") at src/malloc/mallocng/meta.h:141
# 1 0x002b9f4a in __libc_free (p=0xb6b4a460) at src/malloc/mallocng/free.c:105
# 2 0x002b98a8 in free (p=) at src/malloc/free.c:5
# 3 0x001f3aca in FTLfree (ptr=ptr@entry=0xb6cfc760, file=file@entry=0x3150c8 "/app/src/config/config.c", func=func@entry=0x49fee4 <FUNCTION.6> "free_config", line=line@entry=371) at /app/src/syscalls/free.c:34
# 4 0x001fcea6 in free_config (conf=0xb6cfb260) at /app/src/config/config.c:371
# 5 replace_config (newconf=newconf@entry=0xb6cfd780) at /app/src/config/config.c:1774
# 6 0x001fd092 in reread_config () at /app/src/config/config.c:1827
# 7 0x00021620 in GC_thread (val=) at /app/src/gc.c:559
# 8 0x002c89dc in start (p=0xb6cffd70) at src/thread/pthread_create.c:207
# 9 0x002d005c in __clone () at src/thread/arm/clone.s:23
# 10 0x002d005c in __clone () at src/thread/arm/clone.s:23
# 11 0x002d005c in __clone () at src/thread/arm/clone.s:23

The log above was generated with GDB running directly from the host while the process was running inside the container. Please let me know if this is helpful, I noticed the following warning

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable. Connect to gdbserver inside the container.

But I could not find a way to install gdb inside the container (neither apt nor yum are available :/ ).

If there is any other information I can provide to help, don't hesitate in asking!

@DL6ER
Copy link
Member

DL6ER commented Aug 20, 2024

Somehow, this doesn't really fit together.

It can bei either a bad thing (double-free) or actually a lot worse (mangling memory on a low level), both rather hard to debug.

We'd need many step-by-step ping-pong messages until we can get this resolved. I'm also afraid we may need to utilize memcheck64 to monitor memory access violations to exclude this as possible cause. So it seems to be the better alternative if you could send me your particular file which I should then be able to trigger the exact same crash here. If there is stuff in there you aren't comfortable sharing with us, you can surely modify it but please test again if the modified version still crashes FTL before sending. The best way is to send it via e-mail to my username here .at. pi-hole.net

@robertobado
Copy link
Author

Hi @DL6ER, I just sent you the Teleport pack by email.

I made some more tests and found some more information that may be useful for you:

1 - Restarting from scratch, as I though it was related to the lists, in the Teleport restore dialog I tried importing just the lists and trigger gravity. Instead of failing with FTL crashing like before, it failed too but more gracefully, apparently related to DB.

Dashboard log:

[i] Upgrading gravity database from version 18 to 19
[i] Neutrino emissions detected...
[✓] Pulling blocklist source list into range
[✓] Preparing new gravity database
[i] Creating new gravity databases...
[✗] Unable to copy data from /etc/pihole/gravity.db to /etc/pihole/gravity.db_temp
Runtime error near line 20: FOREIGN KEY constraint failed (19)
[✗] Unable to create gravity database. Please try again later. If the problem persists, please contact support.

Container log:

2024-08-21 20:34:40.307 EDT [238/T278] WARNING: API: Gravity failed

FTL kept running fine so I was able to keep browsing the dashboard

2 - Restarting from scratch, if in the Teleport restore dialog I restore something non related to gravity, for example, if I restore just "Clients" and nothing else, I still get the same FOREIGN KEY failure described in (1) above

3 - Restarting from scratch, if I don't restore anything and simply try to update gravity on a brand new setup, I don't get any failure in the dashboard log, but looking at the container log I saw the following line

2024-08-21 21:14:12.067 EDT [238/T241] ERROR: SQLite3: database is locked in "SELECT value FROM info WHERE property = 'updated';" (5)

Not sure if expected, I guess not so I decided to report here

4 - Restarting from scratch, after some trial and error I could trace the FTL crashing down to the "Configuration" restore in Teleport. If I restore just that and nothing else, updating gravity will cause the FTL crash as the original problem reported in this issue

@DL6ER
Copy link
Member

DL6ER commented Aug 22, 2024

I have not been able to confirm/reproduce the crash and need some more details like your particular docker-compose.yaml file. Follow-up sent via email. I will summarize what we've found for the public discussion in the end.

@DL6ER
Copy link
Member

DL6ER commented Aug 23, 2024

Even though I am not able to reproduce it, I might have found what is causing this looking again at your first backtrace. Actually, I have fixed this a few days ago but now got more confident that this might be the real fix we need here.

Could you please build a local docker image with my proposed fix so you can verify it works for you? Following the instructions

git clone https://github.com/pi-hole/docker-pi-hole
cd docker-pi-hole
git checkout development-v6 # NOTE: This step is only needed until V6 is released
./build.sh -f fix/config_crash2

and then using the generated image should be enough.

@robertobado
Copy link
Author

Hi DL6ER,

there is no development-v6 branch in https://github.com/pi-hole/docker-pi-hole

@robertobado
Copy link
Author

I just tried running a test on a amd64 architecture instead of arm.

Based on latest development-v6 tag, I could reproduce the issue exactly the same

@DL6ER
Copy link
Member

DL6ER commented Sep 5, 2024

development-v6 has been replaced by development in preparation for the v6 release. fix/config_crash2 has already been integrated so this branch is gone, too - if it is still there, it will be outdated by now.

Please follow these instructions to attach the debugger to the container pihole-FTL. The backtrace it will generate should exactly tell us where we need to be looking.

Thanks for your continued support to get this fixed - once we exactly know where it happens, it should also learn that the required boundary conditions are and it should become obvious why I failed to reproduce the bug myself.

@robertobado
Copy link
Author

I cannot follow the debugger instructions because there is no package manager available inside the docker image:

$ docker exec -it pihole-test bash
pihole:/# sudo apt-get install screen gdb
sudo: apt-get: command not found
pihole:/# apt
bash: apt: command not found
pihole:/# yum
bash: yum: command not found
pihole:/#

could you please advise how to install gdb in this setup?

Thank you!

@DL6ER
Copy link
Member

DL6ER commented Sep 6, 2024

apk add gdb should do it

@robertobado
Copy link
Author

Here is the gdb log.

First, gdb signal breakpoints stopped a few times while importing teleport. I had to continue a couple times to resume processing:

Thread 1 "pihole-FTL" received signal SIGTERM, Terminated.
Thread 1 "pihole-FTL" received signal SIG41, Real-time event 41.
[LWP 65 exited]
[LWP 62 exited]
Thread 6 "dns-client" received signal SIG33, Real-time event 33.
[Switching to LWP 64]
__cp_begin () at src/thread/arm/syscall_cp.s:23
23 in src/thread/arm/syscall_cp.s
(gdb) continue
Continuing.
[LWP 64 exited]
Thread 5 "housekeeper" received signal SIG33, Real-time event 33.
[Switching to LWP 63]
__cp_begin () at src/thread/arm/syscall_cp.s:23
23 in src/thread/arm/syscall_cp.s
(gdb) continue
Continuing.
[LWP 63 exited]
[LWP 8521 exited]
[LWP 8520 exited]
[LWP 8518 exited]
[LWP 8519 exited]
[LWP 68 exited]
[LWP 66 exited]
[LWP 61 exited]
[LWP 60 exited]
process 48 is executing new program: /usr/bin/pihole-FTL
[New LWP 8559]
[New LWP 8560]
[New LWP 8561]
[New LWP 8562]
[New LWP 8563]
[New LWP 8564]
[New LWP 8565]
[New LWP 8566]
[Detaching after vfork from child process 8567]
[Detaching after vfork from child process 8568]
[New LWP 8569]
[New LWP 8570]
[New LWP 8571]
[New LWP 8572]

Second, following is the log while updating gravity

[Detaching after fork from child process 8582]
Thread 17 "housekeeper" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 8562]
get_meta (p=p@entry=0xb6b27460 "$BALLOON-SHA256$v=1$s=1024,t=32$PCRAXx1MZF9zYms0+V5vcg==$g2tlDk91ScRTnif2zU2JTQRKZYpg2UODuqO3JVcii8k=") at src/malloc/mallocng/meta.h:141
warning: 141 src/malloc/mallocng/meta.h: No such file or directory
(gdb)
Continuing.
Thread 16 "database" received signal SIG33, Real-time event 33.
[Switching to LWP 8561]
__cp_begin () at src/thread/arm/syscall_cp.s:23
warning: 23 src/thread/arm/syscall_cp.s: No such file or directory
(gdb)
Continuing.
[LWP 8561 exited]
[LWP 8564 exited]
[LWP 8563 exited]
[LWP 8562 exited]
[LWP 8572 exited]
[LWP 8570 exited]
[LWP 8569 exited]
[LWP 8565 exited]
[LWP 8571 exited]
[LWP 8566 exited]
[LWP 8559 exited]
[LWP 48 exited]
[LWP 8560 exited]
[New process 48]
[Inferior 1 (process 48) exited normally]
(gdb)
The program is not being run.

I hope this helps! Let me know if you need more info and thanks for the support!

@DL6ER
Copy link
Member

DL6ER commented Sep 7, 2024

Thank you for the backtrace. This confirms it is a double-free corruption issue - I thought I had already fixed it in #2043 but there seems to be another one. More to come later.

@DL6ER
Copy link
Member

DL6ER commented Sep 9, 2024

I just had some time to look at your output and just realized that there isn't actually any backtrace. When you see a real crash indicated by signal SIGSEGV, like

Thread 17 "housekeeper" received signal SIGSEGV, Segmentation fault.

the next step would have to be backtrace instead of `continue.

Could you maybe try this once again for us?

Either way, I also prepared a new FTL branch fix/free which should circumvent such issues from being possible - even when we don't exactly know (yet!) what is causing the one we are currently discussing. Unfortunately, running FTL from a different branch requires building the docker image locally, see

https://github.com/pi-hole/docker-pi-hole/tree/development?tab=readme-ov-file#usage

In this case, it'd be

./build.sh -f fix/free

which will create a new docker image with tag pihole:local which you can use as usual in your compose file.

@robertobado
Copy link
Author

I built the docker image from fix/free but unfortunately the issue still happens. Not sure if by circumvent you meant some graceful error handling to hide the crash from the end user, but in my tests, FTL is still crashing and redirecting browser to a defunct login screen.

I used the same fix/free branch to get the backtrace you requested:

pihole-test | 2024-09-10 09:52:19.509 EDT [56M] INFO: FTL branch: fix/free
pihole-test | 2024-09-10 09:52:19.509 EDT [56M] INFO: FTL version: vDev-f29749d
pihole-test | 2024-09-10 09:52:19.509 EDT [56M] INFO: FTL commit: f29749d
pihole-test | 2024-09-10 09:52:19.510 EDT [56M] INFO: FTL date: 2024-09-09 17:51:54 +0200
pihole-test | 2024-09-10 09:52:19.510 EDT [56M] INFO: FTL user: pihole

First, a backtrace from the SIG33 received while importing teleport:

Thread 5 "housekeeper" received signal SIG33, Real-time event 33.
[Switching to LWP 74]
__cp_begin () at src/thread/arm/syscall_cp.s:23
23 in src/thread/arm/syscall_cp.s
(gdb) backtrace
#0 __cp_begin () at src/thread/arm/syscall_cp.s:23
#1 0x002d6a7e in __syscall_cp_c (nr=0, u=, v=, w=0, x=0, y=0, z=0) at src/thread/pthread_cancel.c:33
#2 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

And finally the backtrace for update gravity SIGSEGV:

Thread 17 "housekeeper" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 203]
get_meta (p=p@entry=0xb6b3f460 "$BALLOON-SHA256$v=1$s=1024,t=32$o1owt/OAMJJA+7f0rg8MGA==$byWk6jupLZZwA7TJ1EzFqwqiQQe8Z3pvD13V2+wUv7E=") at src/malloc/mallocng/meta.h:141
warning: 141 src/malloc/mallocng/meta.h: No such file or directory
(gdb) backtrace
#0 get_meta (p=p@entry=0xb6b3f460 "$BALLOON-SHA256$v=1$s=1024,t=32$o1owt/OAMJJA+7f0rg8MGA==$byWk6jupLZZwA7TJ1EzFqwqiQQe8Z3pvD13V2+wUv7E=") at src/malloc/mallocng/meta.h:141
#1 0x002c8b42 in __libc_free (p=0xb6b3f460) at src/malloc/mallocng/free.c:105
#2 0x002c84a0 in free (p=) at src/malloc/free.c:5
#3 0x001f3d80 in FTLfree (ptr=, file=file@entry=0x323f48 "/app/src/config/config.c", func=func@entry=0x4b19b4 <FUNCTION.6> "free_config", line=line@entry=376) at /app/src/syscalls/free.c:29
#4 0x001fd0cc in free_config (conf=0xb6c4c1e0) at /app/src/config/config.c:376
#5 replace_config (newconf=newconf@entry=0xb6c4e740) at /app/src/config/config.c:1795
#6 0x001fd2c0 in reread_config () at /app/src/config/config.c:1848
#7 0x000217c4 in GC_thread (val=) at /app/src/gc.c:559
#8 0x002d7684 in start (p=0xb6c50d70) at src/thread/pthread_create.c:207
#9 0x002ded04 in __clone () at src/thread/arm/clone.s:23
#10 0x002ded04 in __clone () at src/thread/arm/clone.s:23
#11 0x002ded04 in __clone () at src/thread/arm/clone.s:23
(continues repeating src/thread/arm/clone.s:23 indefinitely)

I hope this helps. I also have easy access to test under x86 instead of arm. Let me know if it would give you any insightful perspective!

Thanks!

@DL6ER
Copy link
Member

DL6ER commented Sep 10, 2024

Okay ... two things: Please built a new container on the latest version of fix/free using the same command as before. It should read commit c0aaad3a during FTL startup. I found a small glitch where memory is indeed free'd too early, circumvent my newly added graceful handling.

If it does not work, please try again with FTL from branch fix/free_ (extra underscore at the end), it should log much more verbosely - we are interested in basically everything starting in INFO: XXX - if anything contains a very long password hash, feel free to remove or disturb it - I still hope step one above fixes it already

@DL6ER DL6ER mentioned this issue Sep 10, 2024
5 tasks
@robertobado
Copy link
Author

I built c0aaad3a as requested. This is the log when importing teleport:

pihole-test | 2024-09-11 00:29:44.985 EDT [157M] INFO: ########## FTL terminated after 1m 16s (code 22)! ##########
pihole-test | 2024-09-11 00:29:45.002 EDT [157M] INFO: Config file /etc/pihole/pihole.toml not available (r): No such file or directory
pihole-test | 2024-09-11 00:29:45.004 EDT [157M] INFO: ########## FTL started on pihole! ##########
pihole-test | 2024-09-11 00:29:45.005 EDT [157M] INFO: FTL branch: fix/free
pihole-test | 2024-09-11 00:29:45.005 EDT [157M] INFO: FTL version: vDev-c0aaad3
pihole-test | 2024-09-11 00:29:45.005 EDT [157M] INFO: FTL commit: c0aaad3
pihole-test | 2024-09-11 00:29:45.006 EDT [157M] INFO: FTL date: 2024-09-10 20:29:54 +0200
pihole-test | 2024-09-11 00:29:45.006 EDT [157M] INFO: FTL user: pihole

After finishing import, I tried updating gravity. I got kicked to the login screen just like before when FTL was crashing, however, according to the logs, there was no crash this time (still, the update process was interrupted right at the beginning, exactly at the same time a crash would happen). I could confirm that FTL did not crash because I could login back right away (as opposed to login becoming defunct when FTL crashed). I assume this was your graceful handling in action?

Once I logged back in, I triggered another gravity update. This second run did get to finish (naturally taking longer, since I have 36 lists), but I don't believe we can consider it successful. All lists were processed successfully, except one, here are some relevant browser logs:

(several lists processed successfully)
...
[i] Target: https://raw.githubusercontent.com/bigdargon/hostsVN/master/hosts
[✓] Status: Retrieval successful
rm: cannot remove '/tmp/tmp.phgpb': No such file or directory
[✗] Unable to insert domain into database file /etc/pihole/gravity.db_temp
...
(several lists processed successfully)
...
[✓] Building tree
Error: in prepare, database is locked (5)
[i] Number of gravity domains: (869017 unique domains)
Error: in prepare, database is locked (5)
Error: in prepare, database is locked (5)
[i] Number of exact denied domains:
Error: in prepare, database is locked (5)
[i] Number of regex denied filters:
Error: in prepare, database is locked (5)
[i] Number of exact allowed domains:
Error: in prepare, database is locked (5)
[i] Number of regex allowed filters:
[✓] Swapping databases
[✓] The old database remains available
[✓] Cleaning up stray matter
[✓] Done.

After this, the dashboard started to report the number of blocked domains as "-2". Apparently, this also broke the whole gravity db. Trying to update gravity again would always fail with the following browser log:

Error: in prepare, no such table: info
[✓] DNS resolution is available
[i] Neutrino emissions detected...
[✓] Pulling blocklist source list into range
[i] No source list found, or it is empty
[✓] Preparing new gravity database
[i] Creating new gravity databases...
[✗] Unable to copy data from /etc/pihole/gravity.db to /etc/pihole/gravity.db_temp
Parse error near line 11: no such table: OLD.group
Parse error near line 12: no such table: OLD.domain_audit
Parse error near line 14: no such table: OLD.domainlist
Parse error near line 15: no such table: OLD.domainlist_by_group
Parse error near line 16: no such table: OLD.domainlist_by_group
Parse error near line 18: no such table: OLD.adlist
Parse error near line 19: no such table: OLD.adlist_by_group
Parse error near line 20: no such table: OLD.adlist_by_group
Parse error near line 22: no such table: OLD.client
Parse error near line 23: no such table: OLD.client_by_group
Parse error near line 24: no such table: OLD.client_by_group
[✗] Unable to create gravity database. Please try again later. If the problem persists, please contact support.

There was no recovery beyond that. I deleted the container and restarted from scratch and I was able to reproduce the errors.

I was going to try to get you some backtrace using fix/free_ branch, however, the container is failing right upon start, before even being able to initialize:

pihole-test | [i] Setting up user & group for the pihole user
pihole-test | [i] PIHOLE_UID not set in environment, using default (100)
pihole-test | [i] PIHOLE_GID not set in environment, using default (101)
pihole-test |
pihole-test | [i] Starting FTL configuration
pihole-test | /usr/bin/pihole-FTL: line 1: html: No such file or directory
pihole-test | /usr/bin/pihole-FTL: line 2: syntax error near unexpected token <' 'ihole-test | /usr/bin/pihole-FTL: line 2: <title>404 Not Found</title>
pihole-test exited with code 0

Once you fix it, I can get you the detailed logs you want.

Thanks!

@DL6ER
Copy link
Member

DL6ER commented Sep 11, 2024

This is the log when importing teleport

That's all there was? If so, it looks fine.

I got kicked to the login screen just like before when FTL was crashing.

Yeah, importing the password from the Teleporter archive is expected to cause a termination of all web sessions as the password might have changed.

I assume this was your graceful handling in action?

No, I think this means the issue was actually fixed. Whatever is causing the following issue seems to be something new - or rather - something we have not been able to see before.

however, according to the logs, there was no crash this time (still, the update process was interrupted right at the beginning, exactly at the same time a crash would happen).

Could you share the involved logs?


I was going to try to get you some backtrace using fix/free_ branch, however, the container is failing right upon start, before even being able to initialize

I don't think we will need this.

@PromoFaux Seems we need some error handling in the custom-docker-image sh script. On non-existing branch, it stored the error 404 page as the FTL binary into the container which then obviously cannot start.

@robertobado
Copy link
Author

That's all there was? If so, it looks fine.

Most of it looks fine, I was just confused because apparently FTL is finishing with exit code 22 instead of 0. Is it expected? If you want I can share the full log but I don't see anything suspicious other than this

Yeah, importing the password from the Teleporter archive is expected to cause a termination of all web sessions as the password might have changed.

This would be fine if it happened during Teleport import, however, this is happening during gravity update

Could you share the involved logs?

Sure. Here are the long until I get kicked back to login screen:

pihole-test | 2024-09-11 00:30:25.368 EDT [157/T289] INFO: Reloading config due to pihole.toml change
pihole-test | 2024-09-11 00:30:27.058 EDT [157/T288] ERROR: SQLite3: database is locked in "SELECT value FROM info WHERE property = 'updated';" (5)
pihole-test | 2024-09-11 00:30:27.431 EDT [157/T289] INFO: Upgrading password from SHA256^2 to BALLOON-SHA256
pihole-test | 2024-09-11 00:30:27.443 EDT [157/T289] INFO: Wrote config file:
pihole-test | 2024-09-11 00:30:27.443 EDT [157/T289] INFO: - 149 total entries
pihole-test | 2024-09-11 00:30:27.443 EDT [157/T289] INFO: - 139 entries are default
pihole-test | 2024-09-11 00:30:27.443 EDT [157/T289] INFO: - 10 entries are modified
pihole-test | 2024-09-11 00:30:27.443 EDT [157/T289] INFO: - 0 entries are forced through environment
pihole-test | 2024-09-11 00:30:27.450 EDT [157/T289] INFO: Config file written to /etc/pihole/pihole.toml
pihole-test | 2024-09-11 00:30:27.453 EDT [157/T289] INFO: 1 FTLCONF environment variable found (1 used, 0 invalid, 0 ignored)
pihole-test | 2024-09-11 00:30:27.453 EDT [157/T289] INFO: [✓] FTLCONF_webserver_api_password is used
pihole-test | 2024-09-11 00:30:27.465 EDT [157/T289] INFO: Wrote config file:
pihole-test | 2024-09-11 00:30:27.466 EDT [157/T289] INFO: - 149 total entries
pihole-test | 2024-09-11 00:30:27.466 EDT [157/T289] INFO: - 140 entries are default
pihole-test | 2024-09-11 00:30:27.466 EDT [157/T289] INFO: - 9 entries are modified
pihole-test | 2024-09-11 00:30:27.466 EDT [157/T289] INFO: - 0 entries are forced through environment
pihole-test | 2024-09-11 00:30:27.488 EDT [157/T289] INFO: Config file written to /etc/pihole/pihole.toml

And here are the logs after I log back in and try updating gravity again:

pihole-test | 2024-09-11 00:31:21.005 EDT [157/T288] ERROR: SQLite3: database is locked in "SELECT value FROM info WHERE property = 'updated';" (5)
pihole-test | 2024-09-11 00:37:11.043 EDT [157/T288] INFO: Gravity database has been updated, reloading now
pihole-test | 2024-09-11 00:37:11.058 EDT [157/T288] ERROR: gravityDB_count(SELECT value FROM info WHERE property = 'gravity_count';) - SQL error step no more rows available
pihole-test | 2024-09-11 00:37:11.059 EDT [157/T288] WARNING: Count of gravity domains not available. Please run pihole -g
pihole-test | 2024-09-11 00:37:11.079 EDT [157/T288] INFO: Compiled 3 allow and 0 deny regex for 0 client in 5.4 msec
pihole-test | 2024-09-11 00:37:11.080 EDT [157/T288] WARNING: List with ID 16 (https://raw.githubusercontent.com/PolishFiltersTeam/KADhosts/master/KADhosts.txt) was inaccessible during last gravity run
pihole-test | 2024-09-11 00:37:11.113 EDT [157/T288] WARNING: List with ID 31 (https://raw.githubusercontent.com/FadeMind/hosts.extras/master/add.2o7Net/hosts) was inaccessible during last gravity run
pihole-test | 2024-09-11 00:37:11.121 EDT [157/T288] WARNING: List with ID 32 (https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/spy.txt) was inaccessible during last gravity run
pihole-test | 2024-09-11 00:37:11.129 EDT [157/T288] WARNING: List with ID 41 (https://malware-filter.gitlab.io/malware-filter/phishing-filter-hosts.txt) was inaccessible during last gravity run
pihole-test | 2024-09-11 00:37:12.043 EDT [157/T288] ERROR: SQLite3: no such table: info in "SELECT value FROM info WHERE property = 'updated';" (1)
pihole-test | 2024-09-11 00:37:12.044 EDT [157/T288] WARNING: gravity_updated(): SELECT value FROM info WHERE property = 'updated'; - SQL error prepare: SQL logic error
pihole-test | 2024-09-11 00:37:13.048 EDT [157/T288] ERROR: SQLite3: no such table: info in "SELECT value FROM info WHERE property = 'updated';" (1)
pihole-test | 2024-09-11 00:37:13.049 EDT [157/T288] WARNING: gravity_updated(): SELECT value FROM info WHERE property = 'updated'; - SQL error prepare: SQL logic error
pihole-test | 2024-09-11 00:37:14.053 EDT [157/T288] ERROR: SQLite3: no such table: info in "SELECT value FROM info WHERE property = 'updated';" (1)
pihole-test | 2024-09-11 00:37:14.053 EDT [157/T288] WARNING: gravity_updated(): SELECT value FROM info WHERE property = 'updated'; - SQL error prepare: SQL logic error
...
(SQL log messages repeat indefinitely)
...

And for

Whatever is causing the following issue seems to be something new - or rather - something we have not been able to see before.

Would you like me to create a new issue?

@DL6ER
Copy link
Member

DL6ER commented Sep 12, 2024

Most of it looks fine, I was just confused because apparently FTL is finishing with exit code 22 instead of 0. Is it expected?

Yes, code 22 means an internally triggered restart. I made this clearer in 1f264eb

Sure. Here are the long until I get kicked back to login screen:
...

pihole-test | 2024-09-11 00:30:27.488 EDT [157/T289] INFO: Config file written to /etc/pihole/pihole.toml

...

And here are the logs after I log back in and try updating gravity again:

pihole-test | 2024-09-11 00:31:21.005 EDT [157/T288] ERROR: SQLite3: database is locked in "SELECT value FROM info WHERE property = 'updated';" (5)

...

Would be interesting to see the log in between from /var/log/pihole/FTL.log


And for

Whatever is causing the following issue seems to be something new - or rather - something we have not been able to see before.

Would you like me to create a new issue?

Yeah, it probably better as this issue ticket here is already complex enough to risk loosing one or the other bits. Thanks for your continues support helping making Pi-hole better for us all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants