Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#9 contradits #6 somewhat #5

Open
Mdeuschl opened this issue Dec 25, 2021 · 3 comments
Open

#9 contradits #6 somewhat #5

Mdeuschl opened this issue Dec 25, 2021 · 3 comments

Comments

@Mdeuschl
Copy link

#9 reads "When a field enclosed in double quotes has spaces before and/or after the double quotes, the spaces MUST be ignored"
But #6 says "Spaces are considered part of a field and MUST NOT be ignored."

In my view this overcomplicates the parsing. This practically means that the file cannot easily be parsed one way. I'd have to remember if the field started with spaces only and then if I find a quote go back to where the first non-space was. Space is not a special character other than for this purpose.

My suggestion:
If the first character after the comma is a " (double quote) the field should be considered quoted - it spans until the next single ".
If the first character after the comma is not a double quote the field should be interpreted as not quoted thus it spans until the next new line or next comma whichever comes first. If there are double quotes inside the field then treat them as regular characters.
This way I'd never have to go backwards.

Are there many csv exporters that use ", " as field delimiter for quoted fields?

@jimeh
Copy link
Member

jimeh commented Dec 28, 2021

You are right, those two clauses are a little contradictory, but they're meant to build on each other based on their order. Maybe clause 6 could mention that it does not apply to quoted fields.

As for your suggestion, I don't think it will work, as clauses 7 and 8 specify that any field containing a double quote (and other special characters) must be enclosed in double quotes, and a double quote within a quoted field must be escaped by preceding it with another double quote.

I have sadly seen CSV files which have spaces around the field separators while using quoted fields. That's why clause 9 exists :)

All that said, it's been years since I last looked at this, and I don't recall how Excel, Google Sheets, and others deal with double quotes in these kind of situations. I'll find some time to refresh my memory in the next couple of days to double check things still make sense.

@Mdeuschl
Copy link
Author

I have done a quick comparison of Excel (Mac), Numbers (Mac) and Google sheets. They all don't put spaces before or after the field separator. They all do textbook exports when quotes or field separators are in the fields: single quote at start and finish, doubled quotes to escape
Excel.csv
Google Sheets.csv
Numbers.csv

When I try to mess up those exported files by adding a space between the field separators and quoted fields I get these results:
Excel and Numbers treat the field as unquoted, don't un-escape doubled quotes and treat a field separator inside the quoted field as a regular field separator.
Google Sheets ignores the space.

When I replace that space with a all three treat the field as unquoted like Excel and Numbers above.

@martin-eden
Copy link

If the first character after the comma is a " (double quote) the field should be considered quoted - it spans until the next single ".
If the first character after the comma is not a double quote the field should be interpreted as not quoted thus it spans until the next new line or next comma whichever comes first.

+1 for this. I wrote my .csv parser with same logic.

Because sane list-of-lists format allows parser with fixed lookahead. To allow it output results along with getting data from stream.

Not like , , terabyte more spaces -- we still can not decide what it is: padded quoted field or unquoted field. And probably we will never know as our parser will die from memory exhaustion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants