Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Allow dec == sep in fread to be applied to quoted data #6604

Open
iagogv3 opened this issue Nov 5, 2024 · 3 comments
Open

FR: Allow dec == sep in fread to be applied to quoted data #6604

iagogv3 opened this issue Nov 5, 2024 · 3 comments

Comments

@iagogv3
Copy link
Contributor

iagogv3 commented Nov 5, 2024

Found a csv file with data formatted as follows:

13800,10864,"27,03","3,2","9,8"

If I do not add arguments dec and sep columns are right, but quoted numeric data appears to be char. If I only specify dec = ",", columns are wrong (it separates by \t or blanks, instead of commas). If I try to specify dec = ",", sep = ",", then I get

 sep == dec (',') is not allowed

I'm conscious of the ambiguity of this instruction, since 13800,10864 could be just a decimal number, but then, I would set as assumption that the dec = "," only would apply to data inside quotes

@ecoRoland2
Copy link

Please show the output you get. I see this:

> fread(text = '13800,10864,"27,03","3,2","9,8"')
      V1    V2     V3     V4     V5
   <int> <int> <char> <char> <char>
1: 13800 10864  27,03    3,2    9,8

This is as expected, so I'm unsure where you see a bug. Quoted fields should be imported as character strings.

@iagogv3
Copy link
Contributor Author

iagogv3 commented Nov 5, 2024

Indeed, I meant quoted numeric data in V3 to V5, so I would like to get

> fread(text = '13800,10864,"27,03","3,2","9,8"')
      V1    V2     V3     V4     V5
   <int> <int>  <num>  <num>  <num>
1: 13800 10864  27.03    3.2    9.8

@iagogv3 iagogv3 changed the title FR: Allow dec == sep in fread FR: Allow dec == sep in fread to be applied to quoted data Nov 5, 2024
@ecoRoland2
Copy link

Well, then I think you are asking too much here. You have quoted strings but don't want to parse these as strings, which is against common convention. Ignoring that issue, you would have commas both as column separator and as decimal separator, which simply isn't a valid file format. I suggest, you post-process after importing, i.e., as.numeric(sub(",", ".", "27,03", fixed = TRUE)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants