Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve and document CSV reading options #787

Open
koperagen opened this issue Jul 17, 2024 · 3 comments
Open

Improve and document CSV reading options #787

koperagen opened this issue Jul 17, 2024 · 3 comments
Assignees
Labels
csv CSV / delim related issues documentation Improvements or additions to documentation (not KDocs) enhancement New feature or request
Milestone

Comments

@koperagen
Copy link
Collaborator

koperagen commented Jul 17, 2024

https://youtrack.jetbrains.com/issue/KT-69798
Unlike readCSV, readDelim exposes underlying apache common CSVFormat. There're a lot more configuration options. From the linked issue it seems arguments that readCSV provide are not enough to read every possible kind of file.
So at least readDelim needs documentation here
https://kotlin.github.io/dataframe/read.html
Or readCSV should have more options. But, i'd avoid exposing CSVFormat in readCSV

@koperagen koperagen added documentation Improvements or additions to documentation (not KDocs) enhancement New feature or request labels Jul 17, 2024
@zaleslaw zaleslaw added this to the 0.14.0 milestone Jul 19, 2024
@zaleslaw
Copy link
Collaborator

zaleslaw commented Aug 7, 2024

We agreed to split the issue

  • answer the use-case
  • provide a default

@Jolanrensen
Copy link
Collaborator

Jolanrensen commented Aug 8, 2024

For csv I suggest the following:

  • Split readDelim and readCsv into different files, possibly a different module?
  • Deprecate .read() overload, this should only exist in guess.kt
  • KDocs for each overload
  • Check which CSVFormat options we can put in the readDelim and readCsv functions to create overloads without it. (Like setting the quote mark)
  • Keep one overload for readDelim and readCsv with CSVFormat as option so people can set advanced settings with it.

@unrec
Copy link

unrec commented Aug 9, 2024

Thanks everyone for working on this issue.

I agree that exposing CSVFormat to readCSV fun is not good, but there should be a flexibility in CSV setup configuration.
I faced it with a .csv file with quote character in some row, so as I understand there are 2 possible options:

  • be able to set specific quote character instead of "
  • have an opportunity to disable quote char completely

This should be added as a optional param to readCsv fun instead of readDelim.

@Jolanrensen Jolanrensen added the csv CSV / delim related issues label Aug 20, 2024
@Jolanrensen Jolanrensen mentioned this issue Aug 20, 2024
19 tasks
@Jolanrensen Jolanrensen modified the milestones: 0.14.0, 0.15.0 Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
csv CSV / delim related issues documentation Improvements or additions to documentation (not KDocs) enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants