-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: binary/binary-boundary.test.py test gets stuck from time to time #73
Comments
How many times out of 50? |
Sorry, I didn't do such measurements. |
Need more details about environment. With script that runs test in loop (
tarantool-dev 2.10.0~beta1-1 amd64 @ylobankov, please assist with repro. |
I reproduced this bug on Ubuntu 20.04 quite often. If you still have any issues with that, ping me, I will provide the server. |
How to reproduce the problem:
--- test/binary/binary-boundary.test.py 2021-12-02 19:09:10.852429691 +0300
+++ test/binary/binary-boundary.py 2021-12-11 10:45:11.327712154 +0300
@@ -4,19 +4,19 @@
import inspect
import traceback
-saved_path = sys.path[:]
-sys.path.append(os.path.dirname(os.path.abspath(inspect.getsourcefile(lambda:0))))
+root = os.getcwd()
+sys.path.append(os.path.join(root, 'test-run'))
from internal.memcached_connection import MemcachedBinaryConnection
from internal.memcached_connection import STATUS, COMMANDS
-mc = MemcachedBinaryConnection("127.0.0.1", iproto.py_con.port)
+mc = MemcachedBinaryConnection("127.0.0.1", 8080)
def iequal(left, right, level = 1):
if (left != right):
tb = traceback.extract_stack()[-(level + 1)]
- print "Error on line %s:%d: %s not equal %s" % (tb[0], tb[1],
- repr(left), repr(right))
+ print("Error on line %s:%d: %s not equal %s" % (tb[0], tb[1],
+ repr(left), repr(right)))
if (isinstance(left, basestring)):
if (len(left) != len(right)):
print("length is different")
@@ -28,28 +28,29 @@
iequal(res.get('val', val), val, level + 1)
def check(key, flags, val, level = 0):
+ print("Get", key)
res = mc.get(key)
- __check(res[0], flags, val, level + 1)
+ #__check(res[0], flags, val, level + 1)
print("""#---------------------# test protocol boundary overruns #---------------------#""")
-for i in range(1900, 2100):
+for i in range(1950, 2000):
print ("iteration %d" % i)
key = "test_key_%d" % i
val = "x" * i
mc.setq(key, val, flags=82, nosend=True)
mc.setq("alt_%s" % key, "blah", flags=82, nosend=True)
- data = "".join(mc.commands)
+ data = b"".join(mc.commands)
mc.commands = []
if (len(data) > 2024):
for j in range(2024, min(2096, len(data))):
+ print("send", data[:j])
mc.socket.sendall(data[:j])
time.sleep(0.00001)
+ print("send", data[j:])
mc.socket.sendall(data[j:])
else:
mc.socket.sendall(data)
check(key, 82, val)
- check("alt_%s" % key, 82, "blah")
-
-sys.path = saved_path
+ #check("alt_%s" % key, 82, "blah")
box.cfg{
log_level = 7,
feedback_enabled = false,
--log = 'tarantool.log',
--background = true,
--pid_file = '/home/sergeyb/sources/MRG/memcached/pid',
}
mc = require('memcached')
m = mc.create('instance_1', '8080', {
expire_enabled = false,
protocol = 'binary',
verbosity = 3
})
require('console'):start()
What's going on
mc.setq(key, val, flags=82, nosend=True)
mc.setq("alt_%s" % key, "blah", flags=82, nosend=True)
data = b"".join(mc.commands)
mc.commands = []
if (len(data) > 2024):
for j in range(2024, min(2096, len(data))):
print("send", data[:j])
mc.socket.sendall(data[:j])
time.sleep(0.00001)
print("send", data[j:])
mc.socket.sendall(data[j:])
} else if (rc == 0 && ibuf_used(con->in) > 0 &&
batch_count < con->cfg->batch_count) {
batch_count++;
goto next;
} |
Closed accidentally. |
memcached creates an endless read-parse-process-flush loop for each network connection. diff --git a/memcached/internal/memcached.c b/memcached/internal/memcached.c
index 3f760c7..33b667b 100644
--- a/memcached/internal/memcached.c
+++ b/memcached/internal/memcached.c
@@ -154,6 +154,7 @@ memcached_loop(struct memcached_connection *con)
int batch_count = 0;
for (;;) {
+ say_info("%s, ibuf_used %d, batch_count %d", __func__, ibuf_used(con->in), batch_count);
rc = memcached_loop_read(con, to_read);
if (rc == -1) {
/**
@@ -166,6 +167,7 @@ memcached_loop(struct memcached_connection *con)
next:
con->noreply = false;
con->noprocess = false;
+ say_info("%s, ibuf_used %d, batch_count %d", __func__, ibuf_used(con->in), batch_count);
rc = con->cb.parse_request(con);
if (rc == -1) {
memcached_loop_error(con);
@@ -177,6 +179,7 @@ next:
} else {
memcached_skip_request(con);
}
+ say_info("%s, ibuf_used %d, batch_count %d", __func__, ibuf_used(con->in), batch_count);
memcached_flush(con);
batch_count = 0;
continue;
@@ -188,6 +191,7 @@ next:
assert(!con->close_connection);
rc = 0;
if (!con->noprocess) {
+ say_info("%s, ibuf_used %d, batch_count %d", __func__, ibuf_used(con->in), batch_count);
rc = con->cb.process_request(con);
memcached_connection_gc(con);
}
@@ -205,11 +209,13 @@ next:
}
/* Write back answer */
if (!con->noreply)
+ say_info("%s, ibuf_used %d", __func__, ibuf_used(con->in));
memcached_flush(con);
fiber_reschedule();
batch_count = 0;
continue;
}
+ say_info("%s, ibuf_used %d", __func__, ibuf_used(con->in));
memcached_flush(con);
} One can apply a patch and start Tarantool with memcached that listen on port 8080 and accept connections on memcached text protocol:
We can send commands to memcached via text commands. For example command below we set key $ printf "set tarantool 0 0 8\r\ndatabase\r\nget tarantool\r\n" | nc 127.0.0.1 8080
STORED
VALUE tarantool 0 8
database
END Tarantool will print debug prints:
Bug with stalled printf "set foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nset foo 0 0 1\r\n1\r\nget foo\r\n" | nc 127.0.0.1 8080
<'STORED' repeats 20 times>
STORED
^C In Tarantool session:
In Tarantool's log we see that memcached processed all According to description of memcached protocol there is a number of retrieval commands and all of them variations of |
Summary: memcached stuck on processing request when number of processed commands is exceeded of (batch_count + 1) and buffer has a request with retrieval command ("get", "gets", "gat", and "gats"). For each incoming connection memcached starts endless loop in function memcached_loop() where it reads network data to buffer (memcached_loop_read) , parse commands from a buffer one by one (parse_request), process commands (process_request) and write answers to network socket (memcached_flush). Imagine we have a sequence of (batch_count + 1) commands with SETQ (SETQ is the same as a SET command but it is not required confirmation) and GET command that should return a value set by one of the previous SETQ command. Processing of SETQ commands finished successfully. After processing the latest request with SETQ command we have a GET command in a buffer and batch_count < 20. In such case control flow returns to start of the read-parse-process-write loop where we stuck in memcached_loop_read() and wait for next requests instead of replying to client with value of requested key. rc = con->cb.process_request(con); ... if (rc == -1) memcached_loop_error(con); if (con->close_connection) { break; } else if (rc == 0 && ibuf_used(con->in) > 0 && batch_count < con->cfg->batch_count) { goto next; } ... continue; Fixes #73
Summary: memcached stuck on processing request when number of processed commands is exceeded of (batch_count + 1) and buffer has a request with retrieval command ("get", "gets", "gat", and "gats"). For each incoming connection memcached starts endless loop in function memcached_loop() where it reads network data to buffer (memcached_loop_read) , parse commands from a buffer one by one (parse_request), process commands (process_request) and write answers to network socket (memcached_flush). Imagine we have a sequence of (batch_count + 1) commands with SETQ (SETQ is the same as a SET command but it is not required confirmation) and GET command that should return a value set by one of the previous SETQ command. Processing of SETQ commands finished successfully. After processing the latest request with SETQ command we have a GET command in a buffer and batch_count < 20. In such case control flow returns to start of the read-parse-process-write loop where we stuck in memcached_loop_read() and wait for next requests instead of replying to client with value of requested key. rc = con->cb.process_request(con); ... if (rc == -1) memcached_loop_error(con); if (con->close_connection) { break; } else if (rc == 0 && ibuf_used(con->in) > 0 && batch_count < con->cfg->batch_count) { goto next; } ... continue; Fixes #73
Summary: memcached stuck on processing request when number of processed commands is exceeded of (batch_count + 1) and buffer has a request with retrieval command ("get", "gets", "gat", and "gats"). For each incoming connection memcached starts endless loop in function memcached_loop() where it reads network data to buffer (memcached_loop_read), parse commands from a buffer one by one (parse_request), process commands (process_request) and write answers to network socket (memcached_flush). Imagine we have a sequence of (batch_count + 1) commands with SETQ (SETQ is the same as a SET command but it is not required confirmation) and GET command that should return a value set by one of the previous SETQ command. Processing of SETQ commands finished successfully. After processing the latest request with SETQ command we have a GET command in a buffer and batch_count < 20. In such case control flow returns to start of the read-parse-process-write loop where we stuck in memcached_loop_read() and wait for next requests instead of replying to client with value of requested key. rc = con->cb.process_request(con); ... if (rc == -1) memcached_loop_error(con); if (con->close_connection) { break; } else if (rc == 0 && ibuf_used(con->in) > 0 && batch_count < con->cfg->batch_count) { goto next; } ... continue; Fixes #73
Summary: memcached stuck on processing request when number of processed commands is exceeded of (batch_count + 1) and buffer has a request with retrieval command ("get", "gets", "gat", and "gats"). For each incoming connection memcached starts endless loop in function memcached_loop() where it reads network data to buffer (memcached_loop_read), parse commands from a buffer one by one (parse_request), process commands (process_request) and write answers to network socket (memcached_flush). Imagine we have a sequence of requests equal to con->cfg->batch_count (it is a constant value and equal to 20) + 1 with SETQ (SETQ is the same as a SET command, but it is not required confirmation) commands and a single GET command that should return a value set by one of the previous SETQ command. Processing of SETQ commands finished successfully. After processing the latest request with SETQ command we have a GET command in a buffer and batch_count counter become equal to 20 (batch_count initial value is 0). In such case control flow returns to start of the read-parse-process-write loop where we stuck in memcached_loop_read() and wait for next requests instead of replying to client with value of requested key. rc = con->cb.process_request(con); ... if (rc == -1) memcached_loop_error(con); if (con->close_connection) { break; } else if (rc == 0 && ibuf_used(con->in) > 0 && batch_count < con->cfg->batch_count) { goto next; } ... continue; Fixes #73
Summary: memcached stuck on processing request when number of processed commands is exceeded of (batch_count + 1) and buffer has a request with retrieval command ("get", "gets", "gat", and "gats"). For each incoming connection memcached starts endless loop in function memcached_loop() where it reads network data to buffer (memcached_loop_read), parse commands from a buffer one by one (parse_request), process commands (process_request) and write answers to network socket (memcached_flush). Imagine we have a sequence of requests equal to con->cfg->batch_count (it is a constant value and equal to 20) + 1 with SETQ (SETQ is the same as a SET command, but it is not required confirmation) commands and a single GET command that should return a value set by one of the previous SETQ command. Processing of SETQ commands finished successfully. After processing the latest request with SETQ command we have a GET command in a buffer and batch_count counter become equal to 20 (batch_count initial value is 0). In such case control flow returns to start of the read-parse-process-write loop where we stuck in memcached_loop_read() and wait for next requests instead of replying to client with value of requested key. rc = con->cb.process_request(con); ... if (rc == -1) memcached_loop_error(con); if (con->close_connection) { break; } else if (rc == 0 && ibuf_used(con->in) > 0 && batch_count < con->cfg->batch_count) { goto next; } ... continue; Fixes #73
Summary: memcached stuck on processing request when number of processed commands is exceeded of (batch_count + 1) and buffer has a request with retrieval command ("get", "gets", "gat", and "gats"). For each incoming connection memcached starts endless loop in function memcached_loop() where it reads network data to buffer (memcached_loop_read), parse commands from a buffer one by one (parse_request), process commands (process_request) and write answers to network socket (memcached_flush). Imagine we have a sequence of requests equal to con->cfg->batch_count (it is a constant value and equal to 20) + 1 with SETQ (SETQ is the same as a SET command, but it is not required confirmation) commands and a single GET command that should return a value set by one of the previous SETQ command. Processing of SETQ commands finished successfully. After processing the latest request with SETQ command we have a GET command in a buffer and batch_count counter become equal to 20 (batch_count initial value is 0). In such case control flow returns to start of the read-parse-process-write loop where we stuck in memcached_loop_read() and wait for next requests instead of replying to client with value of requested key. rc = con->cb.process_request(con); ... if (rc == -1) memcached_loop_error(con); if (con->close_connection) { break; } else if (rc == 0 && ibuf_used(con->in) > 0 && batch_count < con->cfg->batch_count) { goto next; } ... continue; Fixes #73
From time to time the
binary/binary-boundary.test.py
test gets stuck:Looks like we are hanging at some iteration because the
test/var/binary-boundary.result
file shows as follows:Unfortunately, I don't have exact repro steps to make this situation happen. But running tests in the loop will reproduce this issue with high probability:
I did a small debugging and found that we hang in this loop and if to be more accurate here. If we dig deeper, it turns out that we get stuck here actually.
Please pay attention that #51 may happen because of the same problem.
The text was updated successfully, but these errors were encountered: