You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
E.g. we have a file with 5 entries:
a,b,c,d
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,7
5,6,7,8
We run csvtk split --nlines 2, which produces chunks of 2 entries per line:
##file1
a,b,c,d
1,2,3,4
2,3,4,5
##file2
a,b,c,d
3,4,5,6
4,5,6,7
##file3
a,b,c,d
5,6,7,8
Thanks in advance
The text was updated successfully, but these errors were encountered:
I need to use this feature when working with very large csv files, which I usually keep compressed with gzip or zstd (which supports significantly faster decompression speed). For the moment, I use xsv from https://github.com/BurntSushi/xsv which does exactly what has been asked above. However, it outputs uncompressed csv chunks only. I haven't figured out a way to output chunks compressed with gzip or zstd. This feature would be a very useful addition to csvtk.
You can change this part '($row+1)/2' by replacing 1 and 2 to other positive integers, e.g, '($row+2)/3'.
The idea is to:
csvtk grep -p '.*' -r -n -N : add a column row with row numbers
csvtk mutate2 -n chunk0 -e '($row+1)/2' : obtain the chunk as a float number
csvtk mutate -f chunk0 -n chunk -p '([0-9]+).*$' : this is equivalent to ceiling($chunk0).
csvtk cut -f -row,-chunk0 : remove the extra row and chunk0 columns.
csvtk split -f chunk -o split : split the input file.
As all split files have an extra column chunk, we remove this column using sed -i 's/,\(chunk\|[0-9]\+\)$//' split/stdin*.csv
This is a feature request for the
csvtk split
command to have and additional--nlines
option so that it behaves similarly to the GNU utils split--lines
(https://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html) but deals with the headers in a nice way.E.g. we have a file with 5 entries:
a,b,c,d
1,2,3,4
2,3,4,5
3,4,5,6
4,5,6,7
5,6,7,8
We run csvtk split --nlines 2, which produces chunks of 2 entries per line:
##file1
a,b,c,d
1,2,3,4
2,3,4,5
##file2
a,b,c,d
3,4,5,6
4,5,6,7
##file3
a,b,c,d
5,6,7,8
Thanks in advance
The text was updated successfully, but these errors were encountered: