Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[defect]: Crash of v3.2.x with TLS-PSK enabled home_server #5343

Closed
pauldekkers opened this issue Jun 19, 2024 · 5 comments
Closed

[defect]: Crash of v3.2.x with TLS-PSK enabled home_server #5343

pauldekkers opened this issue Jun 19, 2024 · 5 comments
Labels
defect category: a defect or misbehaviour

Comments

@pauldekkers
Copy link

What type of defect/bug is this?

Crash or memory corruption (segv, abort, etc...)

How can the issue be reproduced?

Minimal configuration in which the crash happens is:

home_server tls {
	ipaddr = 127.0.0.1
	port = 2084

	type = auth
	proto = tcp
	status_check = none

	nonblock = yes

	tls {
		psk_identity = "myidentity"
		psk_hexphrase = "00000000000000000000000000000000"

                tls_min_version = "1.3"
                tls_max_version = "1.3"

		fragment_size = 8192
		cipher_list = "DEFAULT"
		connect_timeout = 30
	}
}

home_server_pool tls {
		 type = fail-over
		 home_server = tls
}

realm tls.test {
      auth_pool = tls
}

When I send a request (any auth, regardless of it coming in via RADIUS or RadSec or TLS-PSK) for a @tls.test user it gets authenticated when everything aligns (TLS-PSK works, TLS version, identity and psk). My peer is radsecproxy. If the psk_identity is wrong (just any other string), or the psk_hexphrase, or I enable tls_min_version = "1.2" (which radsecproxy doesn't support if I'm not mistaken) or the radsecproxy peer is configured with tls context, FreeRADIUS crashes. Backtrace provided.

Log output from the FreeRADIUS daemon

(0) suffix: Looking up realm "tls.test" for User-Name = "[email protected]"
(0) suffix: Found realm "tls.test"
Waking up in 0.3 seconds.
(0) suffix: Adding Stripped-User-Name = "test"
(0) suffix: Adding Realm = "tls.test"
(0) suffix: Proxying request from user test to realm tls.test
(0) suffix: Preparing to proxy authentication request to realm "tls.test"
(0)     [suffix] = updated
(0) eap: No EAP-Message, not doing EAP
(0)     [eap] = noop
(0)     [files] = noop
(0)     [expiration] = noop
(0)     [logintime] = noop
(0)     [pap] = noop
(0)   } # authorize = updated
(0) Starting proxy to home server 127.0.0.1 port 2084
(0) server default {
(0) }
(TLS) Trying new outgoing proxy connection to proxy (0.0.0.0, 0) -> home_server (127.0.0.1, 2084)
Requiring Server certificate
(0) (TLS) RADIUS/TLS - Handshake state - before SSL initialization
(0) (TLS) RADIUS/TLS - Handshake state - Client before SSL initialization
(0) (TLS) RADIUS/TLS - send TLS 1.3 Handshake, ClientHello
(0) (TLS) RADIUS/TLS - Handshake state - Client SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - recv TLS 1.3 Alert, fatal decode_error
(0) (TLS) RADIUS/TLS - The client is informing us that there is a failure inside the TLS protocol exchange.
(0) ERROR: (TLS) RADIUS/TLS - Alert read:fatal:decode error
(0) (TLS) RADIUS/TLS - Client : Need to read more data: error
tls: (TLS) Failed in connecting TLS session.: error:0A00041A:SSL routines::tlsv1 alert decode error
(TLS) Failed opening connection on proxy socket 'proxy (0.0.0.0, 0) -> home_server (127.0.0.1, 2084)'
(0) Failed to insert request into the proxy list
(0) There was no response configured: rejecting request
(0) Using Post-Auth-Type Reject
(0) # Executing group from file /opt/freeradius/etc/raddb/sites-enabled/default
(0)   Post-Auth-Type REJECT {
(0) attr_filter.access_reject: EXPAND %{User-Name}
(0) attr_filter.access_reject:    --> [email protected]
(0) attr_filter.access_reject: Matched entry DEFAULT at line 11
(0)     [attr_filter.access_reject] = updated
(0)     [eap] = noop
(0)     policy remove_reply_message_if_eap {
(0)       if (&reply:EAP-Message && &reply:Reply-Message) {
(0)       if (&reply:EAP-Message && &reply:Reply-Message)  -> FALSE
(0)       else {
(0)         [noop] = noop
(0)       } # else = noop
(0)     } # policy remove_reply_message_if_eap = noop
(0)   } # Post-Auth-Type REJECT = updated
(0) Sent Access-Reject Id 0 from 0.0.0.0:2083 to 145.100.181.186:63560 length 20
(0) (TLS) RADIUS/TLS - send TLS 1.3 Handshake, Finished
(0) Finished request
Thread 2 waiting to be assigned a request
Waking up in 4.6 seconds.
Marking home server 127.0.0.1 port 2084 as dead.
Bad talloc magic value - unknown value

talloc abort: Bad talloc magic value - unknown value

Backtrace of last 16 frames:
/opt/freeradius/lib/libfreeradius-radius.so(+0xd04c)[0x7ffff7f4204c]
/lib/x86_64-linux-gnu/libtalloc.so.2(+0x3b08)[0x7ffff79f6b08]
/opt/freeradius/sbin/radiusd(+0x4f21b)[0x5555555a321b]
/opt/freeradius/lib/libfreeradius-radius.so(+0x2b0e1)[0x7ffff7f600e1]
/opt/freeradius/lib/libfreeradius-radius.so(rbtree_walk+0xfa)[0x7ffff7f602b5]
/opt/freeradius/sbin/radiusd(mark_home_server_dead+0x114)[0x5555555a33a8]
/opt/freeradius/sbin/radiusd(+0x47407)[0x55555559b407]
/opt/freeradius/sbin/radiusd(+0x48520)[0x55555559c520]
/opt/freeradius/sbin/radiusd(+0x46e51)[0x55555559ae51]
/opt/freeradius/lib/libfreeradius-radius.so(fr_event_run+0x147)[0x7ffff7f69f78]
/opt/freeradius/lib/libfreeradius-radius.so(fr_event_loop+0x452)[0x7ffff7f6aa1a]
/opt/freeradius/sbin/radiusd(radius_event_process+0x26)[0x5555555a8148]
/opt/freeradius/sbin/radiusd(main+0xd04)[0x55555558fa30]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7ffff77fd24a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7ffff77fd305]
/opt/freeradius/sbin/radiusd(_start+0x21)[0x5555555696f1]

Thread 1 "radiusd" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.

Relevant log output from client utilities

No response

Backtrace from LLDB or GDB

(gdb) bt full
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
        tid = <optimized out>
        ret = 0
        pd = <optimized out>
        old_mask = {__val = {0}}
        ret = <optimized out>
#1  0x00007ffff7860e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
No locals.
#2  0x00007ffff7811fb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
        ret = <optimized out>
#3  0x00007ffff77fc472 in __GI_abort () at ./stdlib/abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction = 0x20}, sa_mask = {__val = {0 <repeats 13 times>, 140737488345824, 16230630494067845376, 140737488348392}}, sa_flags = -140525456,
          sa_restorer = 0x7fffffffdef0}
#4  0x00007ffff7f4208f in _fr_talloc_fault_simple (reason=0x7ffff79fc070 "Bad talloc magic value - unknown value") at src/lib/debug.c:850
No locals.
#5  0x00007ffff79f6b08 in ?? () from /lib/x86_64-linux-gnu/libtalloc.so.2
No symbol table info available.
#6  0x00005555555a321b in eol_home_listener (ctx=0x0, data=0x7fffec00c640) at src/main/process.c:4188
        this = 0x7ffff786ee8f <__GI___libc_free+111>
#7  0x00007ffff7f600e1 in walk_delete_order (tree=0x5555558eeff0, compare=0x5555555a31ee <eol_home_listener>, context=0x0) at src/lib/rbtree.c:657
        solid = 0x7ffff7f802e0 <sentinel>
        x = 0x7fffec00c9f0
        rcode = 0
#8  0x00007ffff7f602b5 in rbtree_walk (tree=0x5555558eeff0, order=RBTREE_DELETE_ORDER, compare=0x5555555a31ee <eol_home_listener>, context=0x0) at src/lib/rbtree.c:717
        rcode = 0
#9  0x00005555555a33a8 in mark_home_server_dead (home=0x5555557edf60, when=0x7fffffffe0e0, down=false) at src/main/process.c:4239
        previous_state = 5
        buffer = "127.0.0.1\000\372\367\377\177\000\000\000\340\377\377\377\177\000\000b\246\373\367\377\177\000\000\240\246\233UUU\000\000\260\227\233UUU\000\000\000\000\000\000\256\002\000\000BP]UUU\000\000\260\227\233UUU\000\000\260\227\233UUU\000\000\000\000\000\000\000\000\000\000\200K}\367\377\177\000\000\260\340\377\377\377\177\000\000\337y\372\367\377\177\000\000`\340\377\377\377\177\000\000\260\227\233UUU\000"
#10 0x000055555559b407 in request_done (request=0x5555559b97b0, original=10) at src/main/process.c:740
        home = 0x5555557edf60
        i = 0
        now = {tv_sec = 1718806577, tv_usec = 757183}
        when = {tv_sec = 140737488347408, tv_usec = 140737353775583}
        action = 10
#11 0x000055555559c520 in request_cleanup_delay (request=0x5555559b97b0, action=4) at src/main/process.c:1265
        when = {tv_sec = 1718806577, tv_usec = 752120}
        now = {tv_sec = 1718806577, tv_usec = 757183}
        __FUNCTION__ = "request_cleanup_delay"
#12 0x000055555559ae51 in request_timer (ctx=0x5555559b97b0) at src/main/process.c:585
        request = 0x5555559b97b0
        action = 4
#13 0x00007ffff7f69f78 in fr_event_run (el=0x5555558f6ee0, when=0x7fffffffe2f0) at src/lib/event.c:319
        callback = 0x55555559adfe <request_timer>
        ctx = 0x5555559b97b0
        ev = 0x5555559d6810
#14 0x00007ffff7f6aa1a in fr_event_loop (el=0x5555558f6ee0) at src/lib/event.c:689
        i = 10
        rcode = 0
        when = {tv_sec = 1718806577, tv_usec = 757183}
--Type <RET> for more, q to quit, c to continue without paging--
        wake = 0x7fffffffe2f0
        read_fds = {fds_bits = {0 <repeats 16 times>}}
        write_fds = {fds_bits = {0 <repeats 16 times>}}
#15 0x00005555555a8148 in radius_event_process () at src/main/process.c:6494
No locals.
#16 0x000055555558fa30 in main (argc=3, argv=0x7fffffffe4f8) at src/main/radiusd.c:641
        rcode = 0
        status = 0
        argval = -1
        spawn_flag = true
        display_version = false
        flag = 0
        from_child = {-1, -1}
        p = 0x7fffffffe75c "/radiusd"
        state = 0x5555555ef2a0 <global_state>
        autofree = 0x5555555f0300
@pauldekkers pauldekkers added the defect category: a defect or misbehaviour label Jun 19, 2024
@ndptech
Copy link
Member

ndptech commented Jul 20, 2024

This should be resolved by the latest commit on the 3.2 branch.

@pauldekkers
Copy link
Author

pauldekkers commented Jul 21, 2024

Thanks; I later realized this was the same crash as I found with RADIUS/TLS certificate-based home_servers (without TLS-PSK).

The crash seem to be resolved for both scenarios, but there's still something odd.

Before your fixes, I found that b9af38a with the !this->nonblock on line 3596 replaced by false also gave me a stable result for RADIUS/TLS, happily proxying millions of requests per hour:

!this->nonblock

... and in fact there is a peer where this commit+patch does connect, but your fixes up to fb9693f still don't connect. I don't control the peer so I cannot see what's on the other side,

This one does work:

(TLS) Trying new outgoing proxy connection to proxy (0.0.0.0, 0) -> home_server (192.168.1.78, 2083)
Requiring Server certificate
(0) (TLS) RADIUS/TLS - Handshake state - before SSL initialization
(0) (TLS) RADIUS/TLS - Handshake state - Client before SSL initialization
(0) (TLS) RADIUS/TLS - send TLS 1.2 Handshake, ClientHello
(0) (TLS) RADIUS/TLS - Handshake state - Client SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - Client : Need to read more data: SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - tls_new_client_session WANT_READ
Listening on proxy (10.0.0.34, 35993) -> home_server (192.168.1.78, 2083)
(0) Proxying request to home server 192.168.1.78 port 2083 (TLS) timeout 4.000000
Waking up in 0.3 seconds.
(0) Sent Access-Request Id 241 from 10.0.0.34:35993 to 192.168.1.78:2083 length 139
...
(0)   Proxy-State = 0x30
(0) (TLS) RADIUS/TLS - Client : Need to read more data: SSLv3/TLS write client hello
Thread 1 waiting to be assigned a request
(0) (TLS) RADIUS/TLS - Handshake state - Client SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - recv TLS 1.2 Handshake, ServerHello
(0) (TLS) RADIUS/TLS - Client : Need to read more data: SSLv3/TLS read server hello
Waking up in 0.3 seconds.
(0) (TLS) RADIUS/TLS - Handshake state - Client SSLv3/TLS read server hello
(0) (TLS) RADIUS/TLS - recv TLS 1.2 Handshake, Certificate
(0) (TLS) RADIUS/TLS - Creating attributes from 2 certificate in chain
(0)   TLS-Cert-Serial := "01"
...

and so on, but this (on fb9693f) doesn't connect:

(TLS) Trying new outgoing proxy connection to proxy (0.0.0.0, 0) -> home_server (192.168.1.78, 2083)
Requiring Server certificate
(0) (TLS) RADIUS/TLS - Handshake state - before SSL initialization
(0) (TLS) RADIUS/TLS - Handshake state - Client before SSL initialization
(0) (TLS) RADIUS/TLS - send TLS 1.2 Handshake, ClientHello
(0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - tls_new_client_session WANT_WRITE
(0) Proxying request to home server 192.168.1.78 port 2083 (TLS) timeout 4.000000
(0) Sent Access-Request Id 184 from 110.0.0.34:41391 to 192.168.1.78:2083 length 139
...
(0) (TLS) RADIUS/TLS - Client : Need to write more data: SSLv3/TLS write client hello
...
(0) ERROR: Failing proxied request for user "[email protected]", due to lack of any response from home server 192.168.1.78 port 2083
...
(0) Sent Access-Reject Id 0 from 10.0.0.34:1645 to 172.0.7.129:62565 length 38
(0) Finished request
Thread 3 waiting to be assigned a request
Waking up in 4.6 seconds.
(0) Cleaning up request packet ID 0 with timestamp +4 due to cleanup_delay was reached
Waking up in 591.4 seconds.

... after a litle while

(0) (TLS) RADIUS/TLS - Handshake state - Client SSLv3/TLS write client hello
(0) (TLS) RADIUS/TLS - Client : Need to read more data: SSLv3/TLS write client hello
tls: (TLS) System call (I/O) error (-1)
(TLS) Closing connection
 ... shutting down socket proxy (10.0.0.34, 41391) -> home_server (192.168.1.78, 2083) (1 of 128)
... cleaning up socket proxy (10.0.0.34, 41391) -> home_server (192.168.1.78, 2083)
Closing TLS socket to home server
Ready to process requests

So, maybe this is caused by one of the other commits between b9af38a and fb9693f?

@pauldekkers
Copy link
Author

Hi, it seems to fail already when I cherry-pick f440863 on top of b9af38a, also when adding fb9693f

To be clear; it connects to a lot of other peers. I happen to know this peer is FreeRADIUS, but I don't know what version. And it's annoying, that b9af38a still works but only if I change the !this->nonblock to false (that made it stable overall)

@pauldekkers
Copy link
Author

Yep, this in fact makes it connect happily again - but exactly how I got b9af38a already stable.

diff --git a/src/main/listen.c b/src/main/listen.c
index 7456f2cf3c..1de27b5233 100644
--- a/src/main/listen.c
+++ b/src/main/listen.c
@@ -3597,7 +3597,7 @@ rad_listen_t *proxy_new_listener(TALLOC_CTX *ctx, home_server_t *home, uint16_t
                this->fd = fr_socket_client_tcp(&home->src_ipaddr,
                                                &home->ipaddr, home->port,
 #ifdef WITH_TLS
-                                               this->nonblock
+                                               false
 #else
                                                false
 #endif

@alandekok
Copy link
Member

I think this issue is fixed now. We're spent a lot of time in 2024 cleaning up all of the TLS things :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect category: a defect or misbehaviour
Projects
None yet
Development

No branches or pull requests

3 participants