Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"P" format for ::createFromFormat swallows string literals #17159

Open
PrinsFrank opened this issue Dec 14, 2024 · 1 comment
Open

"P" format for ::createFromFormat swallows string literals #17159

PrinsFrank opened this issue Dec 14, 2024 · 1 comment

Comments

@PrinsFrank
Copy link

Description

When creating dates from PDF time strings, the following format is used: (D:20241122222357+08:00). To convert these strings to DateTimeImmutable objects, dates without timezones can be parsed with the following format:

\DateTimeImmutable::createFromFormat('\(\D\:YmdHis\)', '(D:20241122222357)'); // object(DateTimeImmutable) for 

When modifying the format to include a timezone modifier (P), this breaks:

\DateTimeImmutable::createFromFormat('\(\D\:YmdHisP\)', '(D:20241122222357+08:00)'); // false

To allow the timezone modifier to be present, the format has to be modified, and the last \) escape for the closing bracket has to be removed in the format:

\DateTimeImmutable::createFromFormat('\(\D\:YmdHisP', '(D:20241122222357+08:00)'); // object(DateTimeImmutable)

It looks like the "P" modifier swallows any closing brackets. This makes the format less readable, as the opening bracket has to be included (\(), while the closing bracket doesn't have to be included (\))

PHP Version

8.3.14

Operating System

No response

@nielsdos
Copy link
Member

nielsdos commented Dec 14, 2024

Yeah timelib_parse_zone eats as many ( or horizontal whitespace as possible at the start, and then eats as many ) as possible at the end. One would assume that the number of ( and ) eaten must match, which would fix this issue. However, that may also break BC.

while (**ptr == ')') {
++*ptr;
}

So a patch like this works, and your sample code now returns the right object instead of false:

diff --git a/ext/date/lib/parse_date.c b/ext/date/lib/parse_date.c
index ea1602ef13b..fa71e8210ad 100644
--- a/ext/date/lib/parse_date.c
+++ b/ext/date/lib/parse_date.c
@@ -944,7 +944,9 @@ timelib_long timelib_parse_zone(const char **ptr, int *dst, timelib_time *t, int
 
 	*tz_not_found = 0;
 
+	size_t paren_count = 0;
 	while (**ptr == ' ' || **ptr == '\t' || **ptr == '(') {
+		paren_count += **ptr == '(';
 		++*ptr;
 	}
 	if ((*ptr)[0] == 'G' && (*ptr)[1] == 'M' && (*ptr)[2] == 'T' && ((*ptr)[3] == '+' || (*ptr)[3] == '-')) {
@@ -993,8 +995,9 @@ timelib_long timelib_parse_zone(const char **ptr, int *dst, timelib_time *t, int
 		*tz_not_found = (found == 0);
 		retval = offset;
 	}
-	while (**ptr == ')') {
+	while (**ptr == ')' && paren_count > 0) {
 		++*ptr;
+		paren_count--;
 	}
 	return retval;
 }

cc @derickr for opinions on the matter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants