Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto()/autoDetectionForDataString() cannot determine delimiter with a single row file #206

Open
atbigelow opened this issue Aug 18, 2021 · 3 comments · Fixed by #208
Open

Comments

@atbigelow
Copy link

tab_test.csv

Running auto() or autoDetectionForDataString() on the contents of this file does not properly determine the delimiter.

The example of

$csv = new \ParseCsv\Csv();
$csv->auto('tab_test.csv');

Should result in $csv->delimiter being a tab, but it is in fact false.

My workaround at this point is to first check if the file has multiple rows, and if not, re-insert the same string into $file_data so it can work with autoDetectionForDataString, and then call parse on the original string. There doesn't seem to be a way to workaround a straight auto call.

@jimeh
Copy link
Member

jimeh commented Aug 24, 2021

Out of curiosity, what happens if you remove the linebreak (\r\n) at the very end of the line?

The delimiter detection works by a process of elimination, finding all characters in each record (line) which are likely to be a delimiter character, and that appear the exact same number of times outside of quoted fields on each line. So I'm thinking possibly the delimiter detection is getting confused when there's only a single record with a linebreak after it, so it thinks there's actually two records, at which point TAB (\t) definitely does not occur the same number of times in each of them.

That said, it was around 15 years ago I wrote the original delimiter detection implementation, and I'm not that familiar with the codebase anymore... lol

@gogowitsch
Copy link
Member

@jimeh The end of file character was not the issue.

@atbigelow The function _guess_delimiter only worked for 2 or more lines. I will create a pull request for code review by the community (including you!). Once 2 weeks have passed I will create a release.

Thanks to both of you for your input on this issue!

@jimeh
Copy link
Member

jimeh commented Oct 16, 2021

@fonata ah, that makes sense. No clue why I wouldn't have thought to cater for that back in the day with the original implementation 🤦‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants