-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf8tok and utf8tok_r #89
Comments
I stumbled across this and had a look: The problem is, that while void *utf8incr(void *utf8_restrict str, size_t len) {
char* s = (char*) str;
while(*s && len--) {
size_t l = utf8codepointcalcsize(s);
while(*s && l--) ++s;
}
return s;
}
void *utf8tok_r(void *utf8_restrict str, const void *utf8_restrict sep, void **utf8_restrict ptr) {
char* s = (char*) str;
char** p = (char**) ptr;
if (!s && !(s = *p)) {
return NULL;
}
s = utf8incr(s, utf8spn(s, sep));
if (!*s) {
return *p = 0;
}
*p = utf8incr(s, utf8cspn(s, sep));
if (**p) {
*(*p)++ = 0;
} else {
*p = 0;
}
return s;
} And as a small change to the test: UTEST(utf8tok_r, token_walking) {
char* string = utf8dup("this|aäáé|föőf|that|");
char* ptr = NULL;
ASSERT_EQ(0, utf8ncmp(utf8tok_r(string, "|", &ptr), "this", 4));
string = NULL;
ASSERT_EQ(0, utf8ncmp(utf8tok_r(string, "|", &ptr), "aäáé", 4));
ASSERT_EQ(0, utf8ncmp(utf8tok_r(string, "|", &ptr), "föőf", 4));
ASSERT_EQ(0, utf8ncmp(utf8tok_r(string, "|", &ptr), "that", 4));
free(string);
} |
@sheredom this is a pretty interesting find |
I've been playing with adding
utf8tok
but the problem with the original implementation is that it is not re-entrant.I've been looking at musl at how they implemented
utf8tok_r
and it's relatively simple. hereThe following is the implemented test (it fails at the assert for
föőf
.After playing with this for a bit, I am kind of at a loss for what to do.
Anyways, leaving this here in case someone else wants to pick it up and go on.
The text was updated successfully, but these errors were encountered: