-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#9 contradits #6 somewhat #5
Comments
You are right, those two clauses are a little contradictory, but they're meant to build on each other based on their order. Maybe clause 6 could mention that it does not apply to quoted fields. As for your suggestion, I don't think it will work, as clauses 7 and 8 specify that any field containing a double quote (and other special characters) must be enclosed in double quotes, and a double quote within a quoted field must be escaped by preceding it with another double quote. I have sadly seen CSV files which have spaces around the field separators while using quoted fields. That's why clause 9 exists :) All that said, it's been years since I last looked at this, and I don't recall how Excel, Google Sheets, and others deal with double quotes in these kind of situations. I'll find some time to refresh my memory in the next couple of days to double check things still make sense. |
I have done a quick comparison of Excel (Mac), Numbers (Mac) and Google sheets. They all don't put spaces before or after the field separator. They all do textbook exports when quotes or field separators are in the fields: single quote at start and finish, doubled quotes to escape When I try to mess up those exported files by adding a space between the field separators and quoted fields I get these results: When I replace that space with a all three treat the field as unquoted like Excel and Numbers above. |
+1 for this. I wrote my .csv parser with same logic. Because sane list-of-lists format allows parser with fixed lookahead. To allow it output results along with getting data from stream. Not like |
#9 reads "When a field enclosed in double quotes has spaces before and/or after the double quotes, the spaces MUST be ignored"
But #6 says "Spaces are considered part of a field and MUST NOT be ignored."
In my view this overcomplicates the parsing. This practically means that the file cannot easily be parsed one way. I'd have to remember if the field started with spaces only and then if I find a quote go back to where the first non-space was. Space is not a special character other than for this purpose.
My suggestion:
If the first character after the comma is a " (double quote) the field should be considered quoted - it spans until the next single ".
If the first character after the comma is not a double quote the field should be interpreted as not quoted thus it spans until the next new line or next comma whichever comes first. If there are double quotes inside the field then treat them as regular characters.
This way I'd never have to go backwards.
Are there many csv exporters that use ", " as field delimiter for quoted fields?
The text was updated successfully, but these errors were encountered: