From b01b81b96fe9d84eadd8ff8614c5200925dd35f0 Mon Sep 17 00:00:00 2001
From: Wei Shen Download
in release page.
csvtk filter2/mutate2/mutate3
:
csvtk csv2json
:csvtk pretty
:-w/--min-width
and -W/--max-width
accept multiple values for setting column-specific thresholds.round
for round corners.Similar to FASTA/Q format in field of Bioinformatics, CSV/TSV formats are basic and ubiquitous file formats in both Bioinformatics and data science.
People usually use spreadsheet software like MS Excel to process table data. However this is all by clicking and typing, which is not automated and is time-consuming to repeat, especially when you want to apply similar operations with different datasets or purposes.
You can also accomplish some CSV/TSV manipulations using shell commands, but more code is needed to handle the header line. Shell commands do not support selecting columns with column names either.
csvtk
is convenient for rapid data investigation and also easy to integrate into analysis pipelines. It could save you lots of time in (not) writing Python/R scripts.
csvkit
-f \"-id,-name\"
for all fields except \"id\" and \"name\", -F -f \"a.*\"
for all fields with prefix \"a.\".sep=,
) of separator declaration used by MS Excel54 subcommands in total.
Information
headers
: prints headersdim
: dimensions of CSV filenrow
: print number of recordsncol
: print number of columnssummary
: summary statistics of selected numeric or text fields (groupby group fields)watch
: online monitoring and histogram of selected fieldcorr
: calculate Pearson correlation between numeric columnsFormat conversion
pretty
: converts CSV to a readable aligned tablecsv2tab
: converts CSV to tabular formattab2csv
: converts tabular format to CSVspace2tab
: converts space delimited format to TSVcsv2md
: converts CSV to markdown formatcsv2rst
: converts CSV to reStructuredText formatcsv2json
: converts CSV to JSON formatcsv2xlsx
: converts CSV/TSV files to XLSX filexlsx2csv
: converts XLSX to CSV formatSet operations
head
: prints first N recordsconcat
: concatenates CSV/TSV files by rowssample
: sampling by proportioncut
: select and arrange fieldsgrep
: greps data by selected fields with patterns/regular expressionsuniq
: unique data without sortingfreq
: frequencies of selected fieldsinter
: intersection of multiple filesfilter
: filters rows by values of selected fields with arithmetic expressionfilter2
: filters rows by awk-like arithmetic/string expressionsjoin
: join files by selected fields (inner, left and outer join)split
splits CSV/TSV into multiple files according to column valuessplitxlsx
: splits XLSX sheet into multiple sheets according to column valuescomb
: compute combinations of items at every rowEdit
fix
: fix CSV/TSV with different numbers of columns in rowsfix-quotes
: fix malformed CSV/TSV caused by double-quotesdel-quotes
: remove extra double-quotes added by fix-quotes
add-header
: add column namesdel-header
: delete column namesrename
: renames column names with new namesrename2
: renames column names by regular expressionreplace
: replaces data of selected fields by regular expressionround
: round float to n decimal placesmutate
: creates new columns from selected fields by regular expressionmutate2
: creates a new column from selected fields by awk-like arithmetic/string expressionsmutate3
: create a new column from selected fields with Go-like expressionsfmtdate
: format date of selected fieldsTransform
transpose
: transposes CSV datasep
: separate column into multiple columnsgather
: gather columns into key-value pairs, like tidyr::gather/pivot_longer
spread
: spread a key-value pair across multiple columns, like tidyr::spread/pivot_wider
unfold
: unfold multiple values in cells of a fieldfold
: fold multiple values of a field into cells of groupsOrdering
sort
: sorts by selected fieldsPloting
plot
see usageplot hist
histogramplot box
boxplotplot line
line plot and scatter plotMisc
cat
stream file and report progressversion
print version information and check for updategenautocomplete
generate shell autocompletion script (bash|zsh|fish|powershell)Download Page
csvtk
is implemented in Go programming language, executable binary files for most popular operating systems are freely available in release page.
Just download compressed executable file of your operating system, and decompress it with tar -zxvf *.tar.gz
command or other tools. And then:
For Linux-like systems
If you have root privilege simply copy it to /usr/local/bin
:
sudo cp csvtk /usr/local/bin/\n
Or copy to anywhere in the environment variable PATH
:
mkdir -p $HOME/bin/; cp csvtk $HOME/bin/\n
For windows, just copy csvtk.exe
to C:\\WINDOWS\\system32
.
# >= v0.31.0\nconda install -c conda-forge csvtk\n\n# <= v0.31.0\nconda install -c bioconda csvtk\n
"},{"location":"#method-3-install-via-homebrew","title":"Method 3: Install via homebrew","text":"brew install csvtk\n
"},{"location":"#method-4-for-go-developer-latest-stabledev-version","title":"Method 4: For Go developer (latest stable/dev version)","text":"go get -u github.com/shenwei356/csvtk/csvtk\n
"},{"location":"#method-5-for-archlinux-aur-users-may-be-not-the-latest","title":"Method 5: For ArchLinux AUR users (may be not the latest)","text":"yaourt -S csvtk\n
"},{"location":"#command-line-completion","title":"Command-line completion","text":"Bash:
# generate completion shell\ncsvtk genautocomplete --shell bash\n\n# configure if never did.\n# install bash-completion if the \"complete\" command is not found.\necho \"for bcfile in ~/.bash_completion.d/* ; do source \\$bcfile; done\" >> ~/.bash_completion\necho \"source ~/.bash_completion\" >> ~/.bashrc\n
Zsh:
# generate completion shell\ncsvtk genautocomplete --shell zsh --file ~/.zfunc/_csvtk\n\n# configure if never did\necho 'fpath=( ~/.zfunc \"${fpath[@]}\" )' >> ~/.zshrc\necho \"autoload -U compinit; compinit\" >> ~/.zshrc\n
fish:
csvtk genautocomplete --shell fish --file ~/.config/fish/completions/csvtk.fish\n
"},{"location":"#compared-to-csvkit","title":"Compared to csvkit
","text":"csvkit, attention: this table wasn't updated for many years.
Features csvtk csvkit Note Read Gzip Yes Yes read gzip files Fields ranges Yes Yes e.g.-f 1-4,6
Unselect fields Yes -- e.g. -1
for excluding first column Fuzzy fields Yes -- e.g. ab*
for columns with name prefix \"ab\" Reorder fields Yes Yes it means -f 1,2
is different from -f 2,1
Rename columns Yes -- rename with new name(s) or from existed names Sort by multiple keys Yes Yes bash sort like operations Sort by number Yes -- e.g. -k 1:n
Multiple sort Yes -- e.g. -k 2:r -k 1:nr
Pretty output Yes Yes convert CSV to readable aligned table Unique data Yes -- unique data of selected fields frequency Yes -- frequencies of selected fields Sampling Yes -- sampling by proportion Mutate fields Yes -- create new columns from selected fields Replace Yes -- replace data of selected fields Similar tools:
More examples and tutorial.
Attention
-H
on.-t
for tab-delimited files.#
will be ignored, if the header row starts with #
, please assign flag -C
another rare symbol, e.g. $
.-I/--ignore-illegal-row
to skip these lines if neccessary. You can also use \"csvtk fix\" to fix files with different numbers of columns in rows.If double-quotes exist in fields not enclosed with double-quotes, e.g.,
x,a \"b\" c,1\n
It would report error:
bare `\"` in non-quoted-field.\n
Please switch on the flag -l
or use csvtk fix-quotes
to fix it.
If somes fields have only a double-quote either in the beginning or in the end, e.g.,
x,d \"e\",\"a\" b c,1\n
It would report an error:
extraneous or missing \" in quoted-field\n
Please use csvtk fix-quotes
to fix it, and use csvtk del-quotes
to reset to the original format as needed.
Examples
Pretty result
$ csvtk pretty names.csv\nid first_name last_name username\n-- ---------- --------- --------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n\n$ csvtk pretty names.csv -S 3line\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n id first_name last_name username\n----------------------------------------\n 11 Rob Pike rob\n 2 Ken Thompson ken\n 4 Robert Griesemer gri\n 1 Robert Thompson abc\n NA Robert Abel 123\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n\n$ csvtk pretty names.csv -S bold -w 5 -m 1-\n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 id \u2503 first_name \u2503 last_name \u2503 username \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 11 \u2503 Rob \u2503 Pike \u2503 rob \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 2 \u2503 Ken \u2503 Thompson \u2503 ken \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 4 \u2503 Robert \u2503 Griesemer \u2503 gri \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 1 \u2503 Robert \u2503 Thompson \u2503 abc \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 NA \u2503 Robert \u2503 Abel \u2503 123 \u2503\n\u2517\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u251b\n
Summary of selected numeric fields, supporting \"group-by\"
$ cat testdata/digitals2.csv \\\n | csvtk summary -i -f f4:sum,f5:sum -g f1,f2 \\\n | csvtk pretty\nf1 f2 f4:sum f5:sum\nbar xyz 7.00 106.00\nbar xyz2 4.00 4.00\nfoo bar 6.00 3.00\nfoo bar2 4.50 5.00\n
Select fields/columns (cut
)
csvtk cut -f 1,2
csvtk cut -f first_name,username
csvtk cut -f -1,-2
or csvtk cut -f -first_name
csvtk cut -F -f \"*_name,username\"
csvtk cut -f 2-4
for column 2,3,4 or csvtk cut -f -3--1
for discarding column 1,2,3csvtk cut -f 1-
or csvtk cut -F -f \"*\"
Search by selected fields (grep
) (matched parts will be highlighted as red)
csvtk grep -f first_name -p Robert -p Rob
csvtk grep -f first_name -r -p Rob
csvtk grep -f first_name -P name_list.txt
csvtk grep -F -f \"*\" -r -p \"^$\" -v
Rename column names (rename
and rename2
)
csvtk rename -f A,B -n a,b
or csvtk rename -f 1-3 -n a,b,c
csvtk rename2 -f 1- -p \"(.*)\" -r 'prefix_$1'
for adding prefix to all column names.Edit data with regular expression (replace
)
csvtk replace -F -f \"*_name\" -p \"\\p{Han}+\" -r \"\"
Create new column from selected fields by regular expression (mutate
)
csvtk mutate -f id
csvtk mutate -f sample -n group -p \"^(.+?)\\.\" --after sample
Sort by multiple keys (sort
)
csvtk sort -k 1
or csvtk sort -k last_name
csvtk sort -k 1,2
or csvtk sort -k 1 -k 2
or csvtk sort -k last_name,age
csvtk sort -k 1:n
or csvtk sort -k 1:nr
for reverse numbercsvtk sort -k region -k age:n -k id:nr
csvtk sort -k chr:N
Join multiple files by keys (join
)
csvtk join -f id file1.csv file2.csv
csvtk join -f \"username;username;name\" names.csv phone.csv adress.csv -k
Filter by numbers (filter
)
csvtk filter -f \"id>0\"
csvtk filter -f \"1-3>0\"
--any
to print record if any of the field satisfy the condition: csvtk filter -f \"1-3>0\" --any
csvtk filter -F -f \"A*!=0\"
Filter rows by awk-like arithmetic/string expressions (filter2
)
csvtk filter2 -f '$3>0'
csvtk filter2 -f '$id > 0'
csvtk filter2 -f '$id > 3 || $username==\"ken\"'
csvtk filter2 -H -t -f '$1 > 2 && $2 % 2 == 0'
Plotting
csvtk -t plot hist testdata/grouped_data.tsv.gz -f 2 | display\n
csvtk -t plot box testdata/grouped_data.tsv.gz -g \"Group\" \\\n -f \"GC Content\" --width 3 --title \"Box plot\" | display\n
csvtk -t plot box testdata/grouped_data.tsv.gz -g \"Group\" -f \"Length\" \\\n --height 3 --width 5 --horiz --title \"Horiz box plot\" | display\n
csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group | display\n
csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group --scatter | display\n
We are grateful to Zhiluo Deng and Li Peng for suggesting features and reporting bugs.
Thanks Albert Vilella for feature suggestions, which makes csvtk feature-rich\u3002
"},{"location":"#contact","title":"Contact","text":"Create an issue to report bugs, propose new functions or ask for help.
Or leave a comment.
"},{"location":"#license","title":"License","text":"MIT License
"},{"location":"#starchart","title":"Starchart","text":""},{"location":"bioinf/","title":"Bioinf","text":""},{"location":"chinese/","title":"\u4e2d\u6587\u4ecb\u7ecd","text":"\u5982\u540c\u751f\u7269\u4fe1\u606f\u9886\u57df\u4e2d\u7684FASTA/Q\u683c\u5f0f\u4e00\u6837\uff0cCSV/TSV\u4f5c\u4e3a\u8ba1\u7b97\u673a\u3001\u6570\u636e\u79d1\u5b66\u548c\u751f\u7269\u4fe1\u606f\u7684\u57fa\u672c\u683c\u5f0f\uff0c\u5e94\u7528\u975e\u5e38\u5e7f\u6cdb\u3002\u5e38\u7528\u7684\u5904\u7406\u8f6f\u4ef6\u5305\u62ec\uff1a
\u7136\u800c\uff0c\u7535\u5b50\u8868\u683c\u8f6f\u4ef6\u548c\u6587\u672c\u7f16\u8f91\u5668\u56fa\u7136\u5f3a\u5927\uff0c\u4f46\u4f9d\u8d56\u9f20\u6807\u64cd\u4f5c\uff0c\u4e0d\u9002\u5408\u6279\u91cf\u5904\u7406\uff1bsed/awk/cut\u7b49Shell\u547d\u4ee4\u4e3b\u8981\u7528\u4e8e\u901a\u7528\u7684\u8868\u683c\u6570\u636e\uff0c\u4e0d\u9002\u5408\u542b\u6709\u6807\u9898\u884c\u7684CSV\u683c\u5f0f\uff1b\u4e3a\u4e86\u4e00\u4e2a\u5c0f\u64cd\u4f5c\u5199Python/R\u811a\u672c\u4e5f\u6709\u70b9\u5c0f\u9898\u5927\u4f5c\uff0c\u4e14\u96be\u4ee5\u590d\u7528\u3002
\u5f00\u53d1csvtk\u524d\u73b0\u6709\u7684\u5de5\u5177\u4e3b\u8981\u662fPython\u5199\u7684csvkit\uff0cRust\u5199\u7684xsv\uff0cC\u8bed\u8a00\u5199\u7684miller\uff0c\u90fd\u5404\u6709\u4f18\u52a3\u3002\u5f53\u65f6\u6211\u521a\u5f00\u53d1\u5b8cseqkit\uff0c\u6295\u6587\u7ae0\u8fc7\u7a0b\u4e2d\u65f6\u95f4\u5145\u8db3\uff0c\u4fbf\u60f3\u8d81\u70ed\u518d\u9020\u4e00\u4e2a\u8f6e\u5b50\u3002
\u6240\u4ee5\u6211\u51b3\u5b9a\u5199\u4e00\u4e2a\u547d\u4ee4\u884c\u5de5\u5177\u6765\u6ee1\u8db3CSV/TSV\u683c\u5f0f\u7684\u5e38\u89c1\u64cd\u4f5c\uff0c\u8fd9\u5c31\u662fcsvtk\u4e86\u3002
"},{"location":"chinese/#_1","title":"\u4ecb\u7ecd","text":"\u57fa\u672c\u4fe1\u606f
\u7279\u6027
\u5728\u5f00\u53d1csvtk\u4e4b\u524d\u7684\u4e24\u4e09\u5e74\u95f4\uff0c\u6211\u5df2\u7ecf\u5199\u4e86\u51e0\u4e2a\u53ef\u4ee5\u590d\u7528\u7684Python/Perl\u811a\u672c\uff08https://github.com/shenwei356/datakit\uff09 \uff0c\u5305\u62eccsv2tab\u3001csvtk_grep\u3001csv_join\u3001csv_melt\uff0cintersection\uff0cunique\u3002\u6240\u4ee5\u6211\u7684\u8ba1\u5212\u662f\u9996\u5148\u96c6\u6210\u8fd9\u4e9b\u5df2\u6709\u7684\u529f\u80fd\uff0c\u968f\u540e\u6839\u636e\u9700\u6c42\u8fdb\u884c\u6269\u5c55\u3002
\u5230\u76ee\u524d\u4e3a\u6b62\uff0ccsvtk\u5df2\u670927\u4e2a\u5b50\u547d\u4ee4\uff0c\u5206\u4e3a\u4ee5\u4e0b\u51e0\u5927\u7c7b\uff1a
headers
\u76f4\u89c2\u6253\u5370\u6807\u9898\u884c\uff08\u64cd\u4f5c\u5217\u6570\u8f83\u591a\u7684CSV\u524d\u4f7f\u7528\u6700\u4f73\uff09stats
\u57fa\u672c\u7edf\u8ba1stats2
\u5bf9\u9009\u5b9a\u7684\u6570\u503c\u5217\u8fdb\u884c\u57fa\u672c\u7edf\u8ba1pretty
\u8f6c\u4e3a\u7f8e\u89c2\u3001\u53ef\u8bfb\u6027\u5f3a\u7684\u683c\u5f0f\uff08\u6700\u5e38\u7528\u547d\u4ee4\u4e4b\u4e00\uff09csv2tab
\u8f6cCSV\u4e3a\u5236\u8868\u7b26\u5206\u5272\u683c\u5f0f\uff08TSV\uff09tab2csv
\u8f6cTSV\u4e3aCSVspace2tab
\u8f6c\u7a7a\u683c\u5206\u5272\u683c\u5f0f\u4e3aTSVtranspose
\u8f6c\u7f6eCSV/TSVcsv2md
\u8f6cCSV/TSV\u4e3amakrdown\u683c\u5f0f\uff08\u5199\u6587\u6863\u5e38\u7528\uff09head
\u6253\u5370\u524dN\u6761\u8bb0\u5f55sample
\u6309\u6bd4\u4f8b\u968f\u673a\u91c7\u6837cut
\u9009\u62e9\u7279\u5b9a\u5217\uff0c\u652f\u6301\u6309\u5217\u6216\u5217\u540d\u8fdb\u884c\u57fa\u672c\u9009\u62e9\u3001\u8303\u56f4\u9009\u62e9\u3001\u6a21\u7cca\u9009\u62e9\u3001\u8d1f\u5411\u9009\u62e9\uff08\u6700\u5e38\u7528\u547d\u4ee4\u4e4b\u4e00\uff0c\u975e\u5e38\u5f3a\u5927\uff09uniq
\u65e0\u987b\u6392\u5e8f\uff0c\u8fd4\u56de\u6309\u6307\u5b9a\uff08\u591a\uff09\u5217\u4f5c\u4e3akey\u7684\u552f\u4e00\u8bb0\u5f55\uff08\u597d\u7ed5\u3002\u3002\uff09freq
\u6309\u6307\u5b9a\uff08\u591a\uff09\u5217\u8fdb\u884c\u8ba1\u6570\uff08\u5e38\u7528\uff09inter
\u591a\u4e2a\u6587\u4ef6\u95f4\u7684\u4ea4\u96c6grep
\u6307\u5b9a\uff08\u591a\uff09\u5217\u4e3aKey\u8fdb\u884c\u641c\u7d22\uff08\u6700\u5e38\u7528\u547d\u4ee4\u4e4b\u4e00\uff0c\u53ef\u6309\u6307\u5b9a\u5217\u641c\u7d22\uff09filter
\u6309\u6307\u5b9a\uff08\u591a\uff09\u5217\u7684\u6570\u503c\u8fdb\u884c\u8fc7\u6ee4filter2
\u7528\u7c7b\u4f3cawk\u7684\u6570\u503c/\u8868\u8fbe\u5f0f\uff0c\u6309\u6307\u5b9a\uff08\u591a\uff09\u5217\u7684\u6570\u503c\u8fdb\u884c\u8fc7\u6ee4join
\u5408\u5e76\u591a\u4e2a\u6587\u4ef6\uff08\u5e38\u7528\uff09rename
\u76f4\u63a5\u91cd\u547d\u540d\u6307\u5b9a\uff08\u591a\uff09\u5217\u540d\uff08\u7b80\u5355\u800c\u5b9e\u7528\uff09rename2
\u4ee5\u6b63\u5219\u8868\u8fbe\u5f0f\u91cd\u547d\u540d\u6307\u5b9a\uff08\u591a\uff09\u5217\u540d\uff08\u7b80\u5355\u800c\u5b9e\u7528\uff09replace
\u4ee5\u6b63\u5219\u8868\u8fbe\u5f0f\u5bf9\u6307\u5b9a\uff08\u591a\uff09\u5217\u8fdb\u884c\u66ff\u6362\u7f16\u8f91\uff08\u6700\u5e38\u7528\u547d\u4ee4\u4e4b\u4e00\uff0c\u53ef\u6309\u6307\u5b9a\u5217\u7f16\u8f91\uff09mutate
\u4ee5\u6b63\u5219\u8868\u8fbe\u5f0f\u57fa\u4e8e\u5df2\u6709\u5217\u521b\u5efa\u65b0\u7684\u4e00\u5217\uff08\u5e38\u7528\u4e8e\u751f\u6210\u591a\u5217\u6d4b\u8bd5\u6570\u636e\uff09mutate2
\u7528\u7c7b\u4f3cawk\u7684\u6570\u503c/\u8868\u8fbe\u5f0f\uff0c\u4ee5\u6b63\u5219\u8868\u8fbe\u5f0f\u57fa\u4e8e\u5df2\u6709\uff08\u591a\uff09\u5217\u521b\u5efa\u65b0\u7684\u4e00\u5217\uff08\u5e38\u7528\uff09gather
\u7c7b\u4f3c\u4e8eR\u91cc\u9762tidyr\u5305\u7684gather\u65b9\u6cd5sort
\u6309\u6307\u5b9a\uff08\u591a\uff09\u5217\u8fdb\u884c\u6392\u5e8fplot
\u57fa\u672c\u7ed8\u56feplot hist
histogramplot box
boxplotplot line
line plot and scatter plotversion
\u7248\u672c\u4fe1\u606f\u548c\u68c0\u67e5\u65b0\u7248\u672cgenautocomplete
\u751f\u6210\u652f\u6301Bash\u81ea\u52a8\u8865\u5168\u7684\u914d\u7f6e\u6587\u4ef6\uff0c\u91cd\u542fTerminal\u751f\u6548\u3002-H
-t
\"\"
\uff0c\u8bf7\u5f00\u542f\u5168\u5c40\u53c2\u6570-l
#
\u5f00\u59cb\u7684\u4e3a\u6ce8\u91ca\u884c\uff0c\u82e5\u6807\u9898\u884c\u542b#
\uff0c\u8bf7\u7ed9\u5168\u5c40\u53c2\u6570-C
\u6307\u5b9a\u53e6\u4e00\u4e2a\u4e0d\u5e38\u89c1\u7684\u5b57\u7b26\uff08\u5982$
\uff09\u4ec5\u63d0\u4f9b\u5c11\u91cf\u4f8b\u5b50\uff0c\u66f4\u591a\u4f8b\u5b50\u8bf7\u770b\u4f7f\u7528\u624b\u518c http://bioinf.shenwei.me/csvtk/usage/ \u3002
\u793a\u4f8b\u6570\u636e
$ cat names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n
\u589e\u5f3a\u53ef\u8bfb\u6027
$ cat names.csv | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
\u8f6c\u4e3amarkdown
$ cat names.csv | csvtk csv2md\nid |first_name|last_name|username\n:--|:---------|:--------|:-------\n11 |Rob |Pike |rob\n2 |Ken |Thompson |ken\n4 |Robert |Griesemer|gri\n1 |Robert |Thompson |abc\nNA |Robert |Abel |123\n
\u6548\u679c
id first_name last_name username 11 Rob Pike rob 2 Ken Thompson ken 4 Robert Griesemer gri 1 Robert Thompson abc NA Robert Abel 123\u7528\u5217\u6216\u5217\u540d\u6765\u9009\u62e9\u6307\u5b9a\u5217\uff0c\u53ef\u6539\u53d8\u5217\u7684\u987a\u5e8f
$ cat names.csv | csvtk cut -f 3,1 | csvtk pretty\n$ cat names.csv | csvtk cut -f last_name,id | csvtk pretty\nlast_name id\nPike 11\nThompson 2\nGriesemer 4\nThompson 1\nAbel NA\n
\u7528\u901a\u914d\u7b26\u9009\u62e9\u591a\u5217
$ cat names.csv | csvtk cut -F -f '*name,id' | csvtk pretty\nfirst_name last_name username id\nRob Pike rob 11\nKen Thompson ken 2\nRobert Griesemer gri 4\nRobert Thompson abc 1\nRobert Abel 123 NA\n
\u5220\u9664\u7b2c2\uff0c3\u5217\uff08\u4e0b\u5217\u7b2c\u4e8c\u79cd\u65b9\u6cd5\u662f\u9009\u5b9a\u8303\u56f4\uff0c\u4f46-3\u5728\u524d,-2\u5728\u540e\uff09
$ cat names.csv | csvtk cut -f -2,-3 | csvtk pretty\n$ cat names.csv | csvtk cut -f -3--2 | csvtk pretty\n$ cat names.csv | csvtk cut -f -first_name,-last_name | csvtk pretty\nid username\n11 rob\n2 ken\n4 gri\n1 abc\nNA 123\n
\u6309\u6307\u5b9a\u5217\u641c\u7d22\uff0c\u9ed8\u8ba4\u7cbe\u786e\u5339\u914d
$ cat names.csv | csvtk grep -f id -p 1 | csvtk pretty\nid first_name last_name username\n1 Robert Thompson abc\n
\u6a21\u7cca\u641c\u7d22\uff08\u6b63\u5219\u8868\u8fbe\u5f0f\uff09
$ cat names.csv | csvtk grep -f id -p 1 -r | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n1 Robert Thompson abc\n
\u7528\u6587\u4ef6\u4f5c\u4e3a\u6a21\u5f0f\u6765\u6e90
$ cat names.csv | csvtk grep -f id -P id-files.txt\n
\u5bf9\u6307\u5b9a\u5217\u505a\u7b80\u5355\u66ff\u6362
$ cat names.csv | csvtk replace -f id -p '(\\d+)' -r 'ID: $1' \\\n | csvtk pretty\nid first_name last_name username\nID: 11 Rob Pike rob\nID: 2 Ken Thompson ken\nID: 4 Robert Griesemer gri\nID: 1 Robert Thompson abc\nNA Robert Abel 123\n
\u7528key-value\u6587\u4ef6\u6765\u66ff\u6362\uff08seqkit\u548cbrename\u90fd\u652f\u6301\u7c7b\u4f3c\u64cd\u4f5c\uff09
$ cat data.tsv\nname id\nA ID001\nB ID002\nC ID004\n\n$ cat alias.tsv\n001 Tom\n002 Bob\n003 Jim\n\n$ csvtk replace -t -f 2 -p \"ID(.+)\" -r \"N: {nr}, alias: {kv}\" -k \\\n alias.tsv data.tsv\nname id\nA N: 1, alias: Tom\nB N: 2, alias: Bob\nC N: 3, alias: 004\n
\u5408\u5e76\u8868\u683c\uff0c\u9700\u8981\u5206\u522b\u6307\u5b9a\u5404\u6587\u4ef6\u4e2d\u7684key\u5217\uff1a\u9ed8\u8ba4\u5747\u4e3a\u7b2c\u4e00\u5217\uff1b\u82e5\u5217\uff08\u540d\uff09\u76f8\u540c\u63d0\u4f9b\u4e00\u4e2a\uff1b\u82e5\u4e0d\u540c\u7528\u5206\u53f7\u5206\u5272
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ csvtk join -f 'username;username' --keep-unmatched names.csv phones.csv \\\n | csvtk pretty\nid first_name last_name username phone\n11 Rob Pike rob 12345\n2 Ken Thompson ken 22222\n4 Robert Griesemer gri 11111\n1 Robert Thompson abc\nNA Robert Abel 123\n
csvtk
is implemented in Go programming language, executable binary files for most popular operating system are freely available in release page.
csvtk filter2/mutate2/mutate3
:csvtk csv2json
:Notes
csvtk version
to check update !!!csvtk genautocomplete
to update Bash completion !!!Download Page
csvtk
is implemented in Go programming language, executable binary files for most popular operating systems are freely available in release page.
Just download compressed executable file of your operating system, and decompress it with tar -zxvf *.tar.gz
command or other tools. And then:
For Linux-like systems
If you have root privilege simply copy it to /usr/local/bin
:
sudo cp csvtk /usr/local/bin/\n
Or copy to anywhere in the environment variable PATH
:
mkdir -p $HOME/bin/; cp csvtk $HOME/bin/\n
For windows, just copy csvtk.exe
to C:\\WINDOWS\\system32
.
# >= v0.31.0\nconda install -c conda-forge csvtk\n\n# <= v0.31.0\nconda install -c bioconda csvtk\n
"},{"location":"download/#method-3-install-via-homebrew-may-be-not-the-latest","title":"Method 3: Install via homebrew (may be not the latest)","text":"brew install csvtk\n
"},{"location":"download/#method-4-for-go-developer-latest-stabledev-version","title":"Method 4: For Go developer (latest stable/dev version)","text":"go get -u github.com/shenwei356/csvtk/csvtk\n
"},{"location":"download/#method-5-for-archlinux-aur-users-may-be-not-the-latest","title":"Method 5: For ArchLinux AUR users (may be not the latest)","text":"yaourt -S csvtk\n
"},{"location":"download/#method-6-compiling-from-source-latest-stabledev-version","title":"Method 6: Compiling from source (latest stable/dev version)","text":"# ------------------- install golang -----------------\n\n# download Go from https://go.dev/dl\nwget https://go.dev/dl/go1.17.12.linux-amd64.tar.gz\n\ntar -zxf go1.17.12.linux-amd64.tar.gz -C $HOME/\n\n# or \n# echo \"export PATH=$PATH:$HOME/go/bin\" >> ~/.bashrc\n# source ~/.bashrc\nexport PATH=$PATH:$HOME/go/bin\n\n\n# ------------- the latest stable version -------------\n\ngo get -v -u github.com/shenwei356/csvtk/csvtk\n\n# The executable binary file is located in:\n# ~/go/bin/csvtk\n# You can also move it to anywhere in the $PATH\nmkdir -p $HOME/bin\ncp ~/go/bin/csvtk $HOME/bin/\n\n# --------------- the development version --------------\n\ngit clone https://github.com/shenwei356/csvtk\ncd csvtk/csvtk/\ngo build\n\n# The executable binary file is located in:\n# ./csvtk\n# You can also move it to anywhere in the $PATH\nmkdir -p $HOME/bin\ncp ./csvtk $HOME/bin/\n
"},{"location":"download/#shell-completion","title":"Shell-completion","text":"Bash:
# generate completion shell\ncsvtk genautocomplete --shell bash\n\n# configure if never did.\n# install bash-completion if the \"complete\" command is not found.\necho \"for bcfile in ~/.bash_completion.d/* ; do source \\$bcfile; done\" >> ~/.bash_completion\necho \"source ~/.bash_completion\" >> ~/.bashrc\n
Zsh:
# generate completion shell\ncsvtk genautocomplete --shell zsh --file ~/.zfunc/_csvtk\n\n# configure if never did\necho 'fpath=( ~/.zfunc \"${fpath[@]}\" )' >> ~/.zshrc\necho \"autoload -U compinit; compinit\" >> ~/.zshrc\n
fish:
csvtk genautocomplete --shell fish --file ~/.config/fish/completions/csvtk.fish\n
"},{"location":"download/#release-history","title":"Release history","text":"csvtk mutate3
: create a new column from selected fields with Go-like expressions. Contributed by @moorereason 172csvtk sort/join
:csvtk sort
:csvtk summary
:csvtk rename2
:-n/--start-num
. #286--nr-width
.csvtk replace
:{nr}
. #286csvtk csv2json
:csvtk split
:csvtk spread
:csvtk grep
:csvtk fix-quotes
:-b, --buffer-size
.csvtk plot
:--scale
for scaling the image width/height, tick, axes, line/point and font sizes proportionally, adviced by @tseemann.csvtk plot line
:csvtk hist
:--line-width
.csvtk box
:--line-width
, --point-size
, and color-index
.csvtk
:--quiet
. #261-U, --delete-header
for disable outputing the header row. Supported commands: concat, csv2tab/tab2csv, csv2xlsx/xlsx2csv, cut, filter, filter2, freq, fold/unfold, gather, fmtdate, grep, head, join, mutate, mutate2, replace, round, sample. #258-Z/--show-row-number
: head.csvtk dim
:csvtk concat
:csvtk spread
:-k
and -v
.csvtk sort
:csvtk filter/filter2
:-Z
.csvtk xls2csv
:csvtk pretty
:-n/--buf-rows
from 128 to 1024, and 0 for loading all data.csvtk join
:-s/--suffix
for adding suffixes to colnames from each file. #263fix-quotes
: fix malformed CSV/TSV caused by double-quotes. #260del-quotes
: remove extra double-quotes added by fix-quotes
.csvtk del-header
:csvtk concat
:csvtk sort
:csvtk filter2
:in
keyword. #195csvtk plot
:--tick-label-size
.csvtk pretty
:csvtk
:-X
for the flag --infile-list
. #249csvtk pretty
:-m/--align-center
and -r/--align-right
. #244csvtk spread
:csvtk join
:-P/--prefix-duplicates
: add filenames as colname prefixes only for duplicated colnames. #246csvtk mutate2
:csvtk xlsx2csv
:open /tmp/excelize-: no such file or directory
error for big .xlsx
files. #251csvtk comb
:csvtk pretty
:-H/--no-header-row
, introduced in v0.27.0.3line
for three-line table.csvtk csv2xlsx
:csvtk splitxlsx
:invalid worksheet index
. #1617csvtk filter2/mutate2
:csvtk
:csvtk grep -f 2-
. #120-Z/--show-row-number
, supported commands: cut, csv2tab, csv2xlsx, tab2csv, pretty.csvtk spread
: spread a key-value pair across multiple columns, like tidyr::spread/pivot_wider. #91, #236, #239csvtk mutate/mutate2
:--at
, --before
, --after
for specifying the position of the new column. #193csvtk cut
:-i/--ignore-case
.csvtk pretty
:csvtk round
:7.1E-1
.csvtk summary
:csvtk corr/watch
:csvtk
:--infile-list
accepts stdin \"-\". #210csvtk fix
: fix CSV/TSV with different numbers of columns in rows. #226csvtk pretty
: rewrite to support wrapping cells. #206 #209 #228csvtk cut/fmtdate/freq/grep/rename/rename2/replace/round
: allow duplicated column names.csvtk csv2xlsx
: optionally stores numbers as float. #217csvtk xlsx2csv
: fix bug where xlsx2csv
treats small number (padj < 1e-25) as 0. It's solved by updating the excelize package. #261csvtk join
: a new flag for adding filename as column name prefix. by @tetedange13 #202csvtk mutate2
: fix wrongly treating strings like E10
as numbers in scientific notation. #219csvtk sep
: fix the logic. #218csvtk space2tab
: fix \"bufio.Scanner: token too long\". #231csvtk
: report empty files.csvtk join
: fix loading file with no records.csvtk filter2/muate2
:${var}
with special charactors including commas, spaces, and parentheses, e.g., ${a,b}
, ${a b}
, or ${a (b)}
. #186csvtk sort
: fix checking non-existed fileds.csvtk plot box/hist/line
: new flag --skip-na
for skipping missing data. #188csvtk csv2xlsx
: stores number as float. #192csvtk summary
: new functions argmin
and argmax
. #181csvtk mutate2/summary
:mutate2
: remove the option -L/--digits
.-w/--decimal-width
to limit floats to N decimal points.csvtk fmtdate
: format date of selected fields. #159csvtk grep
: fix bug for searching with -r -p .
.csvtk csv2rst
: fix bug for data containing unicode. #137csvtk filter2
: fix bug for date expression. #146csvtk mutate2/filter2
: len()
. #153csvtk cut
: new flags -m/--allow-missing-col
and -b/--blank-missing-col
. #156csvtk pretty
: still add header row for empty column.csvtk csv2md
: better format.csvtk join
: new flag -n/--ignore-null
. #163csvtk csv2rst
for converting CSV to reStructuredText format. #137csvtk pretty
: add header separator line. #123csvtk mutate2/summary
: fix message and doc. Thanks @VladimirAlexiev #127csvtk mutate2
: fix null coalescence: ??. #129csvtk genautocomplete
: supports bash|zsh|fish|powershell. #126csvtk cat
: fix progress bar. #130csvtk grep
: new flag immediate-output
.csvtk csv2xlsx
: fix bug for table with > 26 columns. 138csvtk
:-t
does not overide -D
anymore. #114tsvtk
the -t/--tabs
option for tab input is set. Thanks @bsipos. #117csvtk csv2xlsx
for converting CSV/TSV file(s) to a single .xlsx
file.csvtk unfold
for unfolding multiple values in cells of a field. #103csvtk collapse
to csvtk fold
, for folding multiple values of a field into cells of groups.csvtk cut
: support range format 2-
to choose 2nd column to the end. #106csvtk round
: fix bug of failing to round scientific notation with value small than one, e.g., 7.1E-1
.csvtk nrow/ncol
for printing number of rows or columns.round
to round float to n decimal places. #112csvtk headers
: file name and column index is optional outputted with new flag -v/--verbose
.csvtk dim
: new flags --tabluar
, --cols
, --rows
, -n/--no-files
.csvtk dim/ncol/nrow
: can handle empty files now. #108csvtk csv2json
#104:-b/--blank
: do not convert \"\", \"na\", \"n/a\", \"none\", \"null\", \".\" to null-n/--parse-num
: parse numeric values for nth column(s), multiple values are supported and \"a\"/\"all\" for all columns.csvtk xlsx2csv
: fix output for ragged table. #110csvtk join
: fix bug for joining >2 files.csvtk uniq
: new flag -n/--keep-n
for keeping first N records of every key.csvtk cut
: support repeatedly selecting columns. #106csvtk comb
: compute combinations of items at every row.csvtk sep
: separate column into multiple columns. #96csvtk
:-I
) and empty (-E
) rows. #97--infile-list
for giving file of input files list (one file per line), if given, they are appended to files from cli argumentscsvtk join
:-i/--ignore-case
. #99-L/--left-join
: left join, equals to -k/--keep-unmatched, exclusive with --outer-join
-O/--outer-join
: outer join, exclusive with --left-join--fill
to --na
.csvtk filter2
: fix bug when column names start with digits, e.g., 1000g2015aug
. Thank @VorontsovIE (#44)csvtk concat
: allow one input file. #98csvtk mutate
: new flag -R/--remove
for removing input column.csvtk
:csvtk cut -f a, b
.csvtk summary
: fix err of q1 and q3. #90csvtk version
: making checking update optional.watch
: online monitoring and histogram of selected field.corr
: calculate Pearson correlation between numeric columns.cat
: stream file and report progress.csvtk split
: fix bug of repeatedly output header line when number of output files exceed value of --buf-groups
. #83csvtk plot hist
: new option --percentiles
to add percentiles to histogram x label. #88csvtk replace/rename2/splitxlsx
: fix flag conflicts with global flag -I
since v0.18.0.csvtk replace/rename2
: removing shorthand flag -I
for --key-capt-idx
.csvtk splitxlsx
: changing shorthand flag of --sheet-index
from -I
to -N
.csvtk sort
: fix mutiple-key-sort containing natural order sorting. #79csvtk xlsx2csv
: reacts to global flags -t
, -T
, -D
and -E
. #78csvtk
: add new flag --ignore-illegal-row
to skip illegal rows. #72csvtk summary
: add more textual/numeric operations. #64csvtk sort
: fix bug for sorting by columns with empty values. #70csvtk grep
: add new flag --delete-matched
to delete a pattern right after being matched, this keeps the firstly matched data and speedups when using regular expressions. #77csvtk add-header
and csvtk del-header
for adding/deleting column names. [#62]csvtk csv2json
: convert CSV to JSON format.csvtk stats2
.csvtk summary
: summary statistics of selected digital fields (groupby group fields), usage and examples. #59csvtk replace
: add flag --nr-width
: minimum width for {nr} in flag -r/--replacement. e.g., formating \"1\" to \"001\" by --nr-width 3
(default 1)csvtk rename2/replace
: add flag -A, --kv-file-all-left-columns-as-value
, for treating all columns except 1th one as value for kv-file with more than 2 columns. #56csvtk
: add global flag -E/--ignore-empty-row
to skip empty row. #50csvtk mutate2
: add flag -s/--digits-as-string
for not converting big digits into scientific notation. #46csvtk sort
: add support for sorting in natural order. #49csvtk
: supporting multi-line fields by replacing multicorecsv with standard library encoding/csv, while losing support for metaline which was supported since v0.7.0. It also gain a little speedup.csvtk sample
: add flag -n/--line-number
to print line number as the first column (\"n\")csvtk filter2
: fix bug when column names start with digits, e.g., 1000g2015aug
(#44)csvtk rename2
: add support for similar repalecement symbols {kv} and {nr}
in csvtk replace
concat
for concatenating CSV/TSV files by rows #38csvtk
: add support for environment variables for frequently used global flags #39CSVTK_T
for flag -t/--tabs
CSVTK_H
for flag -H/--no-header-row
mutate2
: add support for eval expression WITHOUT column index symbol, so we can add some string constants #37pretty
: better support for files with duplicated column namescollapse
: collapsing one field with selected fields as keysfreq
: keeping orignal order of keys by defaultsplit
:-G/--out-gzip
for forcing output gzipped filesplit
to split CSV/TSV into multiple files according to column valuessplitxlxs
to split XLSX sheet into multiple sheets according to column valuescsvtk
, automatically check BOM (byte-order mark) and discard itxlsx2csv
to convert XLSX to CSV formatgrep
, filter
, filter2
: add flag -n/--line-number
to print line-number as the first columncut
: add flag -i/--ignore-case
to ignore case of column namecsvtk replace
: fix bug when replacing with key-value pairs brought in v0.8.0csvtk mutate2
: create new column from selected fields by awk-like arithmetic/string expressionsgenautocomplete
to generate shell autocompletion script!csvtk gather
for gathering columns into key-value pairs.csvtk sort
: support sorting by user-defined order.cut
, filter
, fitler2
, freq
, grep
, inter
, mutate
, rename
, rename2
, replace
, stats2
, uniq
.-F/--fuzzy-fields
.-t
, which overrides both -d
and -D
. If you want other delimiter for tabular input, use -t $'\\t' -D \"delimiter\"
.csvtk plot box
and csvtk plot line
: fix bugs for special cases of input-F/--fuzzy-fields
csvtk pretty
and csvtk csv2md
: add attention that these commands treat the first row as header line and require them to be unique.csvtk stat
renamed to csvtk stats
, old name is still available as an alias.csvtk stat2
renamed to csvtk stats2
, old name is still available as an alias.csvtk cut
: minor bug: panic when no fields given. i.e., csvtk cut
. All relevant commands have been fixed.csvtk grep
: large performance improvement by discarding goroutine (multiple threads), and keeping output in order of input.cut
, filter
, freq
, grep
, inter
, mutate
, rename
, rename2
, replace
, stat2
, and uniq
.csvtk filter2
, filtering rows by arithmetic/string expressions like awk
.csvtk cut
: delete flag -n/--names
, move it to a new command csvtk headers
csvtk headers
csvtk head
csvtk sample
csvtk grep
: fix result highlight when flag -v
is on.csvtk join
: support the 2nd or later files with entries with same ID.csvtk freq
: frequencies of selected fields-n
is not required anymore when flag -H
in csvtk mutate
csvtk grep
: if the pattern matches multiple parts, the text will be wrongly edited.csvtk replace
: -K
(--keep-key
) keep the key as value when no value found for the key. This is open in default in previous versions.csvtk sort
resultcsvtk grep -r -p
, when value of -p
contain \"[\" and \"]\" at the beginning or end, they are wrongly parsed.csvtk cut
supports ordered fields output. e.g., csvtk cut -f 2,1
outputs the 2nd column in front of 1th column.csvtk plot
can plot three types of plots by subcommands:csvtk plot hist
: histogramcsvtk plot box
: boxplotcsvtk plot line
: line plot and scatter plot-f \"-id\"
csvtk replace
support replacement symbols {nr}
(record number) and {kv}
(corresponding value of the key ($1) by key-value file)--fill
for csvtk join
, so we can fill the unmatched data\\r\\n
from a dependency packagecsv2md
version
which could check for updatecsvtk replace
that head row should not be edited.csvtk grep -t -P
inter
grep
csv2md
pretty
csvtk cut -n
filter
pretty
-- convert CSV to readable aligned tablegrep
grep
stat
that failed to considerate files with header rowstat2
- summary of selected number fieldsstat
prettier--colnames
to cut
-f
(--fields
) of join
supports single value now--keep-unmathed
to join
mutate
The CSV parser used by csvtk follows the RFC4180 specification.
"},{"location":"faq/#bare-in-non-quoted-field","title":"bare \" in non-quoted-field","text":" 5. Each field may or may not be enclosed in double quotes (however\n some programs, such as Microsoft Excel, do not use double quotes\n at all). If fields are not enclosed with double quotes, then\n double quotes may not appear inside the fields. For example:\n\n \"aaa\",\"bbb\",\"ccc\" CRLF\n zzz,yyy,xxx\n\n 6. Fields containing line breaks (CRLF), double quotes, and commas\n should be enclosed in double-quotes. For example:\n\n \"aaa\",\"b CRLF\n bb\",\"ccc\" CRLF\n zzz,yyy,xxx\n\n 7. If double-quotes are used to enclose fields, then a double-quote\n appearing inside a field must be escaped by preceding it with\n another double quote. For example:\n\n \"aaa\",\"b\"\"bb\",\"ccc\"\n
If a single double-quote exists in one non-quoted-field, an error will be reported. e.g,
$ echo 'a,abc\" xyz,d'\na,abc\" xyz,d\n\n$ echo 'a,abc\" xyz,d' | csvtk cut -f 1-\n[ERRO] parse error on line 1, column 6: bare \" in non-quoted-field\n
You can add the flag -l/--lazy-quotes
to fix this.
$ echo 'a,abc\" xyz,d' | csvtk cut -f 1- -l\na,\"abc\"\" xyz\",d\n
"},{"location":"faq/#extraneous-or-missing-in-quoted-field","title":"extraneous or missing \" in quoted-field","text":"But for the situation below, -l/--lazy-quotes
won't help:
$ echo 'a,\"abc\" xyz,d'\na,\"abc\" xyz,d\n\n$ echo 'a,\"abc\" xyz,d' | csvtk cut -f 1-\n[ERRO] parse error on line 1, column 7: extraneous or missing \" in quoted-field\n\n$ echo 'a,\"abc\" xyz,d' | csvtk cut -f 1- -l\na,\"abc\"\" xyz,d\n\"\n\n$ echo 'a,\"abc\" xyz,d' | csvtk cut -f 1- -l | csvtk dim\nfile num_cols num_rows\n- 2 0\n
You need to use csvtk fix-quotes (available in v0.29.0 or later versions):
$ echo 'a,\"abc\" xyz,d' | csvtk fix-quotes\na,\"\"\"abc\"\" xyz\",d\n\n$ echo 'a,\"abc\" xyz,d' | csvtk fix-quotes | csvtk cut -f 1-\na,\"\"\"abc\"\" xyz\",d\n\n$ echo 'a,\"abc\" xyz,d' | csvtk fix-quotes | csvtk cut -f 1- | csvtk dim\nfile num_cols num_rows\n- 3 0\n
Use del-quotes if you need the original format after some operations.
$ echo 'a,\"abc\" xyz,d' | csvtk fix-quotes | csvtk cut -f 1- | csvtk del-quotes\na,\"abc\" xyz,d\n
"},{"location":"tutorial/","title":"Tutorial","text":""},{"location":"tutorial/#analyzing-otu-table","title":"Analyzing OTU table","text":""},{"location":"tutorial/#data","title":"Data","text":"Here is a mock OTU table from 16S rRNA sequencing result. Columns are sample IDs in format of \"GROUP.ID\"
$ cat otu_table.csv\nTaxonomy,A.1,A.2,A.3,B.1,B.2,B.3,C.1,C.2\nProteobacteria,.13,.29,.13,.16,.13,.22,.30,.23\nFirmicutes,.42,.06,.49,.41,.55,.41,.32,.38\nBacteroidetes,.19,.62,.12,.33,.16,.29,.34,.35\nDeferribacteres,.17,.00,.24,.01,.01,.01,.01,.01\n
What a mess! Let's make it prettier!
$ csvtk pretty otu_table.csv\nTaxonomy A.1 A.2 A.3 B.1 B.2 B.3 C.1 C.2\nProteobacteria .13 .29 .13 .16 .13 .22 .30 .23\nFirmicutes .42 .06 .49 .41 .55 .41 .32 .38\nBacteroidetes .19 .62 .12 .33 .16 .29 .34 .35\nDeferribacteres .17 .00 .24 .01 .01 .01 .01 .01\n
"},{"location":"tutorial/#steps","title":"Steps","text":"Counting
$ csvtk stat otu_table.csv\nfile num_cols num_rows\notu_table.csv 9 4\n
Column names
$ csvtk headers otu_table.csv\n# otu_table.csv\n1 Taxonomy\n2 A.1\n3 A.2\n4 A.3\n5 B.1\n6 B.2\n7 B.3\n8 C.1\n9 C.2\n
Convert to tab-delimited table
$ csvtk csv2tab otu_table.csv\nTaxonomy A.1 A.2 A.3 B.1 B.2 B.3 C.1 C.2\nProteobacteria .13 .29 .13 .16 .13 .22 .30 .23\nFirmicutes .42 .06 .49 .41 .55 .41 .32 .38\nBacteroidetes .19 .62 .12 .33 .16 .29 .34 .35\nDeferribacteres .17 .00 .24 .01 .01 .01 .01 .01\n
Extract data of group A and B and save to file -o otu_table.gAB.csv
$ csvtk cut -F -f \"Taxonomy,A.*,B.*\" otu_table.csv -o otu_table.gAB.csv\n\n$ csvtk pretty otu_table.gAB.csv\nTaxonomy A.1 A.2 A.3 B.1 B.2 B.3\nProteobacteria .13 .29 .13 .16 .13 .22\nFirmicutes .42 .06 .49 .41 .55 .41\nBacteroidetes .19 .62 .12 .33 .16 .29\nDeferribacteres .17 .00 .24 .01 .01 .01\n
Search some rows by fields. Matched parts will be highlighted as red
$ csvtk grep -f Taxonomy -r -p \"tes\" otu_table.gAB.csv -T\n
Result:
Transpose
$ csvtk transpose otu_table.gAB.csv -o otu_table.gAB.t.csv\n\n$ csvtk pretty otu_table.gAB.t.csv\nTaxonomy Proteobacteria Firmicutes Bacteroidetes Deferribacteres\nA.1 .13 .42 .19 .17\nA.2 .29 .06 .62 .00\nA.3 .13 .49 .12 .24\nB.1 .16 .41 .33 .01\nB.2 .13 .55 .16 .01\nB.3 .22 .41 .29 .01\n
Rename name of the first column
$ csvtk rename -f 1 -n \"sample\" otu_table.gAB.t.csv -o otu_table.gAB.t.r.csv\n\n$ csvtk pretty otu_table.gAB.t.r.csv\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres\nA.1 .13 .42 .19 .17\nA.2 .29 .06 .62 .00\nA.3 .13 .49 .12 .24\nB.1 .16 .41 .33 .01\nB.2 .13 .55 .16 .01\nB.3 .22 .41 .29 .01\n
Add group column
$ csvtk mutate -p \"(.+?)\\.\" -n group otu_table.gAB.t.r.csv -o otu_table2.csv\n\n$ csvtk pretty otu_table2.csv\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.1 .13 .42 .19 .17 A\nA.2 .29 .06 .62 .00 A\nA.3 .13 .49 .12 .24 A\nB.1 .16 .41 .33 .01 B\nB.2 .13 .55 .16 .01 B\nB.3 .22 .41 .29 .01 B\n
Rename groups:
$ csvtk replace -f group -p \"A\" -r \"Ctrl\" otu_table2.csv \\\n | csvtk replace -f group -p \"B\" -r \"Treatment\" \\\n > otu_table3.csv\n\n$ csvtk pretty -s \" \" otu_table3.csv\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.1 .13 .42 .19 .17 Ctrl\nA.2 .29 .06 .62 .00 Ctrl\nA.3 .13 .49 .12 .24 Ctrl\nB.1 .16 .41 .33 .01 Treatment\nB.2 .13 .55 .16 .01 Treatment\nB.3 .22 .41 .29 .01 Treatment\n
Sort by abundance of Proteobacteria in descending order.
$ csvtk sort -k Proteobacteria:nr otu_table3.csv \\\n | csvtk pretty -s \" \"\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.2 .29 .06 .62 .00 Ctrl\nB.3 .22 .41 .29 .01 Treatment\nB.1 .16 .41 .33 .01 Treatment\nB.2 .13 .55 .16 .01 Treatment\nA.3 .13 .49 .12 .24 Ctrl\nA.1 .13 .42 .19 .17 Ctrl\n
Sort by abundance of Proteobacteria in descending order and Firmicutes in ascending order
$ csvtk sort -k Proteobacteria:nr -k Firmicutes:n otu_table3.csv \\\n | csvtk pretty -s \" \"\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.2 .29 .06 .62 .00 Ctrl\nB.3 .22 .41 .29 .01 Treatment\nB.1 .16 .41 .33 .01 Treatment\nA.1 .13 .42 .19 .17 Ctrl\nA.3 .13 .49 .12 .24 Ctrl\nB.2 .13 .55 .16 .01 Treatment\n
Filter samples with abundance greater than 0 in all taxons (columns except for sample and group, you can also use -f \"2-5>0\"
).
$ cat otu_table3.csv \\\n | csvtk filter -f \"2-5>0\" \\\n | csvtk pretty -s \" \" \nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.1 .13 .42 .19 .17 Ctrl\nA.3 .13 .49 .12 .24 Ctrl\nB.1 .16 .41 .33 .01 Treatment\nB.2 .13 .55 .16 .01 Treatment\nB.3 .22 .41 .29 .01 Treatment\n
Most of the time, we may want to remove samples with abundance of 0 in all taxons.
$ cat otu_table3.csv \\\n | csvtk filter -f \"2-5>0\" --any \\\n | csvtk pretty -s \" \"\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.1 .13 .42 .19 .17 Ctrl\nA.2 .29 .06 .62 .00 Ctrl\nA.3 .13 .49 .12 .24 Ctrl\nB.1 .16 .41 .33 .01 Treatment\nB.2 .13 .55 .16 .01 Treatment\nB.3 .22 .41 .29 .01 Treatment\n
Attention
-H
on.-t
for tab-delimited files.#
will be ignored, if the header row starts with #
, please assign flag -C
another rare symbol, e.g. $
.-I/--ignore-illegal-row
to skip these lines if neccessary. You can also use \"csvtk fix\" to fix files with different numbers of columns in rows.If double-quotes exist in fields not enclosed with double-quotes, e.g.,
x,a \"b\" c,1\n
It would report error:
bare `\"` in non-quoted-field.\n
Please switch on the flag -l
or use csvtk fix-quotes
to fix it.
If somes fields have only a double-quote eighter in the beginning or in the end, e.g.,
x,d \"e\",\"a\" b c,1\n
It would report error:
extraneous or missing \" in quoted-field\n
Please use csvtk fix-quotes
to fix it, and use csvtk del-quotes
to reset to the original format as needed.
Information
Format conversion
Set operations
Edit
Transform
Ordering
Ploting
Misc
Usage
csvtk -- a cross-platform, efficient and practical CSV/TSV toolkit\n\nVersion: 0.31.1\n\nAuthor: Wei Shen <shenwei356@gmail.com>\n\nDocuments : http://shenwei356.github.io/csvtk\nSource code: https://github.com/shenwei356/csvtk\n\nAttention:\n\n 1. By default, csvtk assumes input files have header row, if not, switch flag \"-H\" on.\n 2. By default, csvtk handles CSV files, use flag \"-t\" for tab-delimited files.\n 3. Column names should be unique.\n 4. By default, lines starting with \"#\" will be ignored, if the header row\n starts with \"#\", please assign flag \"-C\" another rare symbol, e.g. '$'.\n 5. Do not mix use field (column) numbers and names to specify columns to operate.\n 6. The CSV parser requires all the lines have same numbers of fields/columns.\n Even lines with spaces will cause error.\n Use '-I/--ignore-illegal-row' to skip these lines if neccessary.\n You can also use \"csvtk fix\" to fix files with different numbers of columns in rows.\n 7. If double-quotes exist in fields not enclosed with double-quotes, e.g.,\n x,a \"b\" c,1\n It would report error:\n bare \" in non-quoted-field.\n Please switch on the flag \"-l\" or use \"csvtk fix-quotes\" to fix it.\n 8. If somes fields have only a double-quote eighter in the beginning or in the end, e.g.,\n x,d \"e\",\"a\" b c,1\n It would report error:\n extraneous or missing \" in quoted-field\n Please use \"csvtk fix-quotes\" to fix it, and use \"csvtk del-quotes\" to reset to the\n original format as needed.\n\nEnvironment variables for frequently used global flags:\n\n - \"CSVTK_T\" for flag \"-t/--tabs\"\n - \"CSVTK_H\" for flag \"-H/--no-header-row\"\n - \"CSVTK_QUIET\" for flag \"--quiet\"\n\nYou can also create a soft link named \"tsvtk\" for \"csvtk\",\nwhich sets \"-t/--tabs\" by default.\n\nUsage:\n csvtk [command]\n\nCommands for Information:\n corr calculate Pearson correlation between two columns\n dim dimensions of CSV file\n headers print headers\n ncol print number of columns\n nrow print number of records\n summary summary statistics of selected numeric or text fields (groupby group fields)\n watch monitor the specified fields\n\nFormat Conversion:\n csv2json convert CSV to JSON format\n csv2md convert CSV to markdown format\n csv2rst convert CSV to reStructuredText format\n csv2tab convert CSV to tabular format\n csv2xlsx convert CSV/TSV files to XLSX file\n pretty convert CSV to a readable aligned table\n space2tab convert space delimited format to TSV\n splitxlsx split XLSX sheet into multiple sheets according to column values\n tab2csv convert tabular format to CSV\n xlsx2csv convert XLSX to CSV format\n\nCommands for Set Operation:\n comb compute combinations of items at every row\n concat concatenate CSV/TSV files by rows\n cut select and arrange fields\n filter filter rows by values of selected fields with arithmetic expression\n filter2 filter rows by awk-like arithmetic/string expressions\n freq frequencies of selected fields\n grep grep data by selected fields with patterns/regular expressions\n head print first N records\n inter intersection of multiple files\n join join files by selected fields (inner, left and outer join)\n sample sampling by proportion\n split split CSV/TSV into multiple files according to column values\n uniq unique data without sorting\n\nCommands for Edit:\n add-header add column names\n del-header delete column names\n del-quotes remove extra double quotes added by 'fix-quotes'\n fix fix CSV/TSV with different numbers of columns in rows\n fix-quotes fix malformed CSV/TSV caused by double-quotes\n fmtdate format date of selected fields\n mutate create new column from selected fields by regular expression\n mutate2 create a new column from selected fields by awk-like arithmetic/string expressions\n mutate3 create a new column from selected fields with Go-like expressions\n rename rename column names with new names\n rename2 rename column names by regular expression\n replace replace data of selected fields by regular expression\n round round float to n decimal places\n\nCommands for Data Transformation:\n fold fold multiple values of a field into cells of groups\n gather gather columns into key-value pairs, like tidyr::gather/pivot_longer\n sep separate column into multiple columns\n spread spread a key-value pair across multiple columns, like tidyr::spread/pivot_wider\n transpose transpose CSV data\n unfold unfold multiple values in cells of a field\n\nCommands for Ordering:\n sort sort by selected fields\n\nCommands for Ploting:\n plot plot common figures\n\nCommands for Miscellaneous Functions:\n cat stream file to stdout and report progress on stderr\n\nAdditional Commands:\n genautocomplete generate shell autocompletion script (bash|zsh|fish|powershell)\n version print version information and check for update\n\nFlags:\n -C, --comment-char string lines starting with commment-character will be ignored. if your header\n row starts with '#', please assign \"-C\" another rare symbol, e.g. '$'\n (default \"#\")\n -U, --delete-header do not output header row\n -d, --delimiter string delimiting character of the input CSV file (default \",\")\n -h, --help help for csvtk\n -E, --ignore-empty-row ignore empty rows\n -I, --ignore-illegal-row ignore illegal rows. You can also use 'csvtk fix' to fix files with\n different numbers of columns in rows\n -X, --infile-list string file of input files list (one file per line), if given, they are appended\n to files from cli arguments\n -l, --lazy-quotes if given, a quote may appear in an unquoted field and a non-doubled quote\n may appear in a quoted field\n -H, --no-header-row specifies that the input CSV file does not have header row\n -j, --num-cpus int number of CPUs to use (default 4)\n -D, --out-delimiter string delimiting character of the output CSV file, e.g., -D $'\\t' for tab\n (default \",\")\n -o, --out-file string out file (\"-\" for stdout, suffix .gz for gzipped out) (default \"-\")\n -T, --out-tabs specifies that the output is delimited with tabs. Overrides \"-D\"\n --quiet be quiet and do not show extra information and warnings\n -Z, --show-row-number show row number as the first column, with header row skipped\n -t, --tabs specifies that the input CSV file is delimited with tabs. Overrides \"-d\"\n\nUse \"csvtk [command] --help\" for more information about a command.\n
"},{"location":"usage/#headers","title":"headers","text":"Usage
print headers\n\nUsage:\n csvtk headers [flags]\n\nFlags:\n -h, --help help for headers\n -v, --verbose print verbose information\n\n
Examples
$ csvtk headers testdata/[12].csv\nname\nattr\nname\nmajor\n\n$ csvtk headers testdata/[12].csv -v\n# testdata/1.csv\n1 name\n2 attr\n# testdata/2.csv\n1 name\n2 major\n
"},{"location":"usage/#dimnrowncol","title":"dim/nrow/ncol","text":"Usage
dim:
dimensions of CSV file\n\nUsage:\n csvtk dim [flags]\n\nAliases:\n dim, size, stats, stat\n\nFlags:\n --cols only print number of columns\n -h, --help help for dim\n -n, --no-files do not print file names\n --rows only print number of rows\n --tabular output in machine-friendly tabular format\n\n
nrow:
print number of records\n\nUsage:\n csvtk nrow [flags]\n\nAliases:\n nrow, nrows\n\nFlags:\n -n, --file-name print file names\n -h, --help help for nrow\n\n
ncol:
print number of columns\n\nUsage:\n csvtk ncol [flags]\n\nAliases:\n ncol, ncols\n\nFlags:\n -n, --file-name print file names\n -h, --help help for ncol\n\n
Examples
with header row
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n\n$ cat testdata/names.csv | csvtk size\nfile num_cols num_rows\n- 4 5\n\n$ cat testdata/names.csv | csvtk nrow\n5\n\n$ cat testdata/names.csv | csvtk ncol\n4\n\n$ csvtk nrow testdata/names.csv testdata/phones.csv -n\n5 testdata/names.csv\n4 testdata/phones.csv\n
no header row
$ cat testdata/digitals.tsv\n4 5 6\n1 2 3\n7 8 0\n8 1,000 4\n\n$ cat testdata/digitals.tsv \\\n | csvtk size -t -H\nfile num_cols num_rows\n- 3 4\n\n$ cat testdata/names.csv | csvtk nrow -H\n3\n\n$ cat testdata/names.csv | csvtk ncol -H\n4\n
Usage
summary statistics of selected numeric or text fields (groupby group fields)\n\nAttention:\n\n 1. Do not mix use field (column) numbers and names.\n\nAvailable operations:\n\n # numeric/statistical operations\n # provided by github.com/gonum/stat and github.com/gonum/floats\n countn (count numeric values), min, max, sum, argmin, argmax,\n mean, stdev, variance, median, q1, q2, q3,\n entropy (Shannon entropy),\n prod (product of the elements)\n\n # textual/numeric operations\n count, first, last, rand, unique/uniq, collapse, countunique\n\nUsage:\n csvtk summary [flags]\n\nFlags:\n -w, --decimal-width int limit floats to N decimal points (default 2)\n -f, --fields strings operations on these fields. e.g -f 1:count,1:sum or -f colA:mean. available\n operations: argmax, argmin, collapse, count, countn, countuniq,\n countunique, entropy, first, last, max, mean, median, min, prod, q1, q2,\n q3, rand, stdev, sum, uniq, unique, variance\n -g, --groups string group via fields. e.g -f 1,2 or -f columnA,columnB\n -h, --help help for summary\n -i, --ignore-non-numbers ignore non-numeric values like \"NA\" or \"N/A\"\n -S, --rand-seed int rand seed for operation \"rand\" (default 11)\n -s, --separater string separater for collapsed data (default \"; \")\n\n
Examples
data
$ cat testdata/digitals2.csv \nf1,f2,f3,f4,f5\nfoo,bar,xyz,1,0\nfoo,bar2,xyz,1.5,-1\nfoo,bar2,xyz,3,2\nfoo,bar,xyz,5,3\nfoo,bar2,xyz,N/A,4\nbar,xyz,abc,NA,2\nbar,xyz,abc2,1,-1\nbar,xyz,abc,2,0\nbar,xyz,abc,1,5\nbar,xyz,abc,3,100\nbar,xyz2,abc3,2,3\nbar,xyz2,abc3,2,1\n
use flag -i/--ignore-non-numbers
$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:sum\n[ERRO] column 4 has non-digital data: N/A, you can use flag -i/--ignore-non-numbers to skip these data\n\n$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:sum -i\nf4:sum\n21.50\n
multiple fields suported
$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:sum,f5:sum -i\nf4:sum,f5:sum\n21.50,118.00\n
using fields instead of colname is still supported
$ cat testdata/digitals2.csv \\\n | csvtk summary -f 4:sum,5:sum -i\nf4:sum,f5:sum\n21.50,118.00\n
but remember do not mix use column numbers and names
$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:sum,5:sum -i\n[ERRO] column \"5\" not existed in file: -\n\n$ cat testdata/digitals2.csv \\\n | csvtk summary -f 4:sum,f5:sum -i\n[ERRO] failed to parse f5 as a field number, you may mix the use of field numbers and column names\n
groupby
$ cat testdata/digitals2.csv \\\n | csvtk summary -i -f f4:sum,f5:sum -g f1,f2 \\\n | csvtk pretty\nf1 f2 f4:sum f5:sum\n--- ---- ------ ------\nbar xyz 7.00 106.00\nbar xyz2 4.00 4.00\nfoo bar 6.00 3.00\nfoo bar2 4.50 5.00\n
for data without header line
$ cat testdata/digitals2.csv | sed 1d \\\n | csvtk summary -H -i -f 4:sum,5:sum -g 1,2 \\\n | csvtk pretty -H\nbar xyz 7.00 106.00\nbar xyz2 4.00 4.00\nfoo bar 6.00 3.00\nfoo bar2 4.50 5.00\n
numeric/statistical operations
$ cat testdata/digitals2.csv \\\n | csvtk summary -i -g f1 -f f4:countn,f4:mean,f4:stdev,f4:q1,f4:q2,f4:mean,f4:q3,f4:min,f4:max \\\n | csvtk pretty\nf1 f4:countn f4:mean f4:stdev f4:q1 f4:q2 f4:mean f4:q3 f4:min f4:max\n--- --------- ------- -------- ----- ----- ------- ----- ------ ------\nbar 6 1.83 0.75 1.25 2.00 1.83 2.00 1.00 3.00\nfoo 4 2.62 1.80 1.38 2.25 2.62 3.50 1.00 5.00\n
textual/numeric operations
$ cat testdata/digitals2.csv \\\n | csvtk summary -i -g f1 -f f2:count,f2:first,f2:last,f2:rand,f2:collapse,f2:uniq,f2:countunique \\\n | csvtk pretty\nf1 f2:count f2:first f2:last f2:rand f2:collapse f2:uniq f2:countunique\n--- -------- -------- ------- ------- ----------------------------------- --------- --------------\nbar 7 xyz xyz2 xyz2 xyz; xyz; xyz; xyz; xyz; xyz2; xyz2 xyz; xyz2 2\nfoo 5 bar bar2 bar2 bar; bar2; bar2; bar; bar2 bar2; bar\n
mixed operations
$ cat testdata/digitals2.csv \\\n | csvtk summary -i -g f1 -f f4:collapse,f4:max \\\n | csvtk pretty\nf1 f4:collapse f4:max\n--- -------------------- ------\nbar NA; 1; 2; 1; 3; 2; 2 3.00\nfoo 1; 1.5; 3; 5; N/A 5.00\n
count
and countn
(count of digits)
$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:count,f4:countn -i \\\n | csvtk pretty\nf4:count f4:countn\n-------- ---------\n12 10\n\n# details:\n$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:count,f4:countn,f4:collapse -i -g f1 \\\n | csvtk pretty\nf1 f4:count f4:countn f4:collapse\n--- -------- --------- --------------------\nbar 7 6 NA; 1; 2; 1; 3; 2; 2\nfoo 5 4 1; 1.5; 3; 5; N/A\n
Usage
monitor the specified fields\n\nUsage:\n csvtk watch [flags]\n\nFlags:\n -B, --bins int number of histogram bins (default -1)\n -W, --delay int sleep this many seconds after plotting (default 1)\n -y, --dump print histogram data to stderr instead of plotting\n -f, --field string field to watch\n -h, --help help for watch\n -O, --image string save histogram to this PDF/image file\n -L, --log log10(x+1) transform numeric values\n -x, --pass passthrough mode (forward input to output)\n -p, --print-freq int print/report after this many records (-1 for print after EOF) (default -1)\n -Q, --quiet supress all plotting to stderr\n -R, --reset reset histogram after every report\n
Examples
Read whole file, plot histogram of field on the terminal and PDF
csvtk -t watch -O hist.pdf -f MyField input.tsv\n
Monitor a TSV stream, print histogram every 1000 records
cat input.tsv | csvtk -t watch -f MyField -p 1000 -\n
Monitor a TSV stream, print histogram every 1000 records, hang forever for updates
tail -f +0 input.tsv | csvtk -t watch -f MyField -p 1000 -\n
Usage
calculate Pearson correlation between two columns\n\nUsage:\n csvtk corr [flags]\n\nFlags:\n -f, --fields string comma separated fields\n -h, --help help for corr\n -i, --ignore_nan Ignore non-numeric fields to avoid returning NaN\n -L, --log Calcute correlations on Log10 transformed data\n -x, --pass passthrough mode (forward input to output)\n
Examples
csvtk -t corr -i -f Foo,Bar input.tsv\n
Usage
convert CSV to a readable aligned table\n\nHow to:\n 1. First -n/--buf-rows rows are read to check the minimum and maximum widths\n of each columns. You can also set the global thresholds -w/--min-width and\n -W/--max-width.\n 1a. Cells longer than the maximum width will be wrapped (default) or\n clipped (--clip).\n Usually, the text is wrapped in space (-x/--wrap-delimiter). But if one\n word is longer than the -W/--max-width, it will be force split.\n 1b. Texts are aligned left (default), center (-m/--align-center)\n or right (-r/--align-right). Users can specify columns with column names,\n field indexes or ranges.\n Examples:\n -m A,B # column A and B\n -m 1,2 # 1st and 2nd column \n -m -1 # the last column (it's not unselecting in other commands)\n -m 1,3-5 # 1st, from 3rd to 5th column\n -m 1- # 1st and later columns (all columns)\n -m -3- # the last 3 columns\n -m -3--2 # the 2nd and 3rd to last columns\n -m 1- -r -1 # all columns are center-aligned, except the last column\n # which is right-aligned. -r overides -m.\n\n 2. Remaining rows are read and immediately outputted, one by one, till the end.\n\nStyles:\n\n Some preset styles are provided (-S/--style).\n\n default:\n\n id size\n -- ----\n 1 Huge\n 2 Tiny\n\n plain:\n\n id size\n 1 Huge\n 2 Tiny\n\n simple:\n\n -----------\n id size\n -----------\n 1 Huge\n 2 Tiny\n -----------\n\n 3line:\n\n \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n id size\n -----------\n 1 Huge\n 2 Tiny\n \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n\n grid:\n\n +----+------+\n | id | size |\n +====+======+\n | 1 | Huge |\n +----+------+\n | 2 | Tiny |\n +----+------+\n\n light:\n\n \u250c----\u252c------\u2510\n | id | size |\n \u251c====\u253c======\u2524\n | 1 | Huge |\n \u251c----\u253c------\u2524\n | 2 | Tiny |\n \u2514----\u2534------\u2518\n\n bold:\n\n \u250f\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n \u2503 id \u2503 size \u2503\n \u2523\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n \u2503 1 \u2503 Huge \u2503\n \u2523\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n \u2503 2 \u2503 Tiny \u2503\n \u2517\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u251b\n\n double:\n\n \u2554\u2550\u2550\u2550\u2550\u2566\u2550\u2550\u2550\u2550\u2550\u2550\u2557\n \u2551 id \u2551 size \u2551\n \u2560\u2550\u2550\u2550\u2550\u256c\u2550\u2550\u2550\u2550\u2550\u2550\u2563\n \u2551 1 \u2551 Huge \u2551\n \u2560\u2550\u2550\u2550\u2550\u256c\u2550\u2550\u2550\u2550\u2550\u2550\u2563\n \u2551 2 \u2551 Tiny \u2551\n \u255a\u2550\u2550\u2550\u2550\u2569\u2550\u2550\u2550\u2550\u2550\u2550\u255d\n\nUsage:\n csvtk pretty [flags] \n\nFlags:\n -m, --align-center strings align right for selected columns (field index/range or column name, type\n \"csvtk pretty -h\" for examples)\n -r, --align-right strings align right for selected columns (field index/range or column name, type\n \"csvtk pretty -h\" for examples)\n -n, --buf-rows int the number of rows to determine the min and max widths (0 for all rows)\n (default 1024)\n --clip clip longer cell instead of wrapping\n --clip-mark string clip mark (default \"...\")\n -h, --help help for pretty\n -W, --max-width int max width\n -w, --min-width int min width\n -s, --separator string fields/columns separator (default \" \")\n -S, --style string output syle. available vaules: default, plain, simple, 3line, grid,\n light, bold, double. check https://github.com/shenwei356/stable\n -x, --wrap-delimiter string delimiter for wrapping cells (default \" \")\n\n
Examples:
default
$ csvtk pretty testdata/names.csv\nid first_name last_name username\n-- ---------- --------- --------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n\n$ csvtk pretty testdata/names.csv -H\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
tree-line table
$ cat testdata/names.csv | csvtk pretty -S 3line\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n id first_name last_name username\n----------------------------------------\n 11 Rob Pike rob\n 2 Ken Thompson ken\n 4 Robert Griesemer gri\n 1 Robert Thompson abc\n NA Robert Abel 123\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n\n$ cat testdata/names.csv | csvtk pretty -S 3line -H\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n id first_name last_name username\n 11 Rob Pike rob\n 2 Ken Thompson ken\n 4 Robert Griesemer gri\n 1 Robert Thompson abc\n NA Robert Abel 123\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n
align right/center for some columns
$ csvtk pretty testdata/names.csv -w 6 -S bold -r 1,username -m first_name \n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 id \u2503 first_name \u2503 last_name \u2503 username \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 11 \u2503 Rob \u2503 Pike \u2503 rob \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 2 \u2503 Ken \u2503 Thompson \u2503 ken \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 4 \u2503 Robert \u2503 Griesemer \u2503 gri \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 1 \u2503 Robert \u2503 Thompson \u2503 abc \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 NA \u2503 Robert \u2503 Abel \u2503 123 \u2503\n\u2517\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u251b\n\n$ csvtk pretty testdata/names.csv -w 6 -S bold -m 1- -r -1\n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 id \u2503 first_name \u2503 last_name \u2503 username \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 11 \u2503 Rob \u2503 Pike \u2503 rob \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 2 \u2503 Ken \u2503 Thompson \u2503 ken \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 4 \u2503 Robert \u2503 Griesemer \u2503 gri \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 1 \u2503 Robert \u2503 Thompson \u2503 abc \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 NA \u2503 Robert \u2503 Abel \u2503 123 \u2503\n\u2517\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u251b\n
custom separator
$ csvtk pretty testdata/names.csv -s \" | \"\nid | first_name | last_name | username\n-- | ---------- | --------- | --------\n11 | Rob | Pike | rob\n2 | Ken | Thompson | ken\n4 | Robert | Griesemer | gri\n1 | Robert | Thompson | abc\nNA | Robert | Abel | 123\n
Set the minimum and maximum width.
$ csvtk pretty testdata/long.csv -w 5 -W 40\nid name message\n----- ------------------ ----------------------------------------\n1 Donec Vitae Quis autem vel eum iure reprehenderit\n qui in ea voluptate velit esse.\n2 Quaerat Voluptatem At vero eos et accusamus et iusto odio.\n3 Aliquam lorem Curabitur ullamcorper ultricies nisi.\n Nam eget dui. Etiam rhoncus. Maecenas\n tempus, tellus eget condimentum\n rhoncus, sem quam semper libero.\n
Clipping cells instead of wrapping
$ csvtk pretty testdata/long.csv -w 5 -W 40 --clip\nid name message\n----- ------------------ ----------------------------------------\n1 Donec Vitae Quis autem vel eum iure reprehenderit...\n2 Quaerat Voluptatem At vero eos et accusamus et iusto odio.\n3 Aliquam lorem Curabitur ullamcorper ultricies nisi....\n
Change the output style
$ csvtk pretty testdata/long.csv -W 40 -S grid\n+----+--------------------+------------------------------------------+\n| id | name | message |\n+====+====================+==========================================+\n| 1 | Donec Vitae | Quis autem vel eum iure reprehenderit |\n| | | qui in ea voluptate velit esse. |\n+----+--------------------+------------------------------------------+\n| 2 | Quaerat Voluptatem | At vero eos et accusamus et iusto odio. |\n+----+--------------------+------------------------------------------+\n| 3 | Aliquam lorem | Curabitur ullamcorper ultricies nisi. |\n| | | Nam eget dui. Etiam rhoncus. Maecenas |\n| | | tempus, tellus eget condimentum |\n| | | rhoncus, sem quam semper libero. |\n+----+--------------------+------------------------------------------+\n
Custom delimiter for wrapping
$ csvtk pretty testdata/lineages.csv -W 60 -x ';' -S light\n\u250c-------\u252c------------------\u252c--------------------------------------------------------------\u2510\n| taxid | name | complete lineage |\n\u251c=======\u253c==================\u253c==============================================================\u2524\n| 9606 | Homo sapiens | cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa; |\n| | | Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata; |\n| | | Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii; |\n| | | Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria; |\n| | | Eutheria;Boreoeutheria;Euarchontoglires;Primates; |\n| | | Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae; |\n| | | Homininae;Homo;Homo sapiens |\n\u251c-------\u253c------------------\u253c--------------------------------------------------------------\u2524\n| 562 | Escherichia coli | cellular organisms;Bacteria;Pseudomonadota; |\n| | | Gammaproteobacteria;Enterobacterales;Enterobacteriaceae; |\n| | | Escherichia;Escherichia coli |\n\u2514-------\u2534------------------\u2534--------------------------------------------------------------\u2518\n
Usage
transpose CSV data\n\nUsage:\n csvtk transpose [flags]\n\n
Examples
$ cat testdata/digitals.tsv\n4 5 6\n1 2 3\n7 8 0\n8 1,000 4\n\n$ csvtk transpose -t testdata/digitals.tsv\n4 1 7 8\n5 2 8 1,000\n6 3 0 4\n
"},{"location":"usage/#csv2json","title":"csv2json","text":"Usage
convert CSV to JSON format\n\nUsage:\n csvtk csv2json [flags]\n\nFlags:\n -b, --blanks do not convert \"\", \"na\", \"n/a\", \"none\", \"null\", \".\" to null\n -h, --help help for csv2json\n -i, --indent string indent. if given blank, output json in one line. (default \" \")\n -k, --key string output json as an array of objects keyed by a given filed rather than as a\n list. e.g -k 1 or -k columnA\n -n, --parse-num strings parse numeric values for nth column, multiple values are supported and\n \"a\"/\"all\" for all columns\n\n
Examples
test data
$ cat testdata/data4json.csv \nID,room,name,status\n3,G13,Simon,true\n5,103,Anna,TRUE\n1e-3,2,,N/A\n
default operation
$ cat testdata/data4json.csv | csvtk csv2json\n[\n {\n \"ID\": \"3\",\n \"room\": \"G13\",\n \"name\": \"Simon\",\n \"status\": true\n },\n {\n \"ID\": \"5\",\n \"room\": \"103\",\n \"name\": \"Anna\",\n \"status\": true\n },\n {\n \"ID\": \"1e-3\",\n \"room\": \"2\",\n \"name\": null,\n \"status\": null\n }\n]\n
change indent
$ cat testdata/data4json.csv | csvtk csv2json -i \"\"\n[{\"ID\":\"3\",\"room\":\"G13\",\"name\":\"Simon\",\"status\":true},{\"ID\":\"5\",\"room\":\"103\",\"name\":\"Anna\",\"status\":true},{\"ID\":\"1e-3\",\"room\":\"2\",\"name\":null,\"status\":null}]\n
output json as an array of objects keyed by a given filed rather than as a list.
$ cat testdata/data4json.csv | csvtk csv2json -k ID\n{\n \"3\": {\n \"ID\": \"3\",\n \"room\": \"G13\",\n \"name\": \"Simon\",\n \"status\": true\n },\n \"5\": {\n \"ID\": \"5\",\n \"room\": \"103\",\n \"name\": \"Anna\",\n \"status\": true\n },\n \"1e-3\": {\n \"ID\": \"1e-3\",\n \"room\": \"2\",\n \"name\": null,\n \"status\": null\n }\n}\n
for CSV without header row
$ cat testdata/data4json.csv | csvtk csv2json -H\n[\n [\n \"ID\",\n \"room\",\n \"name\",\n \"status\"\n ],\n [\n \"3\",\n \"G13\",\n \"Simon\",\n \"true\"\n ],\n [\n \"5\",\n \"103\",\n \"Anna\",\n \"TRUE\"\n ],\n [\n \"1e-3\",\n \"2\",\n \"\",\n \"N/A\"\n ]\n]\n
parse numeric values.
# cat testdata/data4json.csv | csvtk csv2json -n all # for all columns\n# cat testdata/data4json.csv | csvtk csv2json -n 1,2 # for multiple columns\n$ cat testdata/data4json.csv | csvtk csv2json -n 1 # for single column\n[\n {\n \"ID\": 3,\n \"room\": \"G13\",\n \"name\": \"Simon\",\n \"status\": true\n },\n {\n \"ID\": 5,\n \"room\": \"103\",\n \"name\": \"Anna\",\n \"status\": true\n },\n {\n \"ID\": 1e-3,\n \"room\": \"2\",\n \"name\": null,\n \"status\": null\n }\n]\n
do not convert \"\", \"na\", \"n/a\", \"none\", \"null\", \".\" to null (just like csvjon --blanks in csvkit)
$ cat testdata/data4json.csv | csvtk csv2json --blanks\n[\n {\n \"ID\": \"3\",\n \"room\": \"G13\",\n \"name\": \"Simon\",\n \"status\": true\n },\n {\n \"ID\": \"5\",\n \"room\": \"103\",\n \"name\": \"Anna\",\n \"status\": true\n },\n {\n \"ID\": \"1e-3\",\n \"room\": \"2\",\n \"name\": \"\",\n \"status\": \"N/A\"\n }\n]\n
values with \"
, \\
, \\n
.
$ cat testdata/data4json2.csv\ntest\nnone\n\"Make America \"\"great\"\" again\"\n\\nations\n\"This is a\nMULTILINE\nstring\"\n\n$ csvtk csv2json testdata/data4json2.csv\n[\n {\n \"test\": null\n },\n {\n \"test\": \"Make America \\\"great\\\" again\"\n },\n {\n \"test\": \"\\\\nations\"\n },\n {\n \"test\": \"This is a\\nMULTILINE\\nstring\"\n }\n]\n
Usage
convert space delimited format to TSV\n\nUsage:\n csvtk space2tab [flags]\n\nFlags:\n -b, --buffer-size string size of buffer, supported unit: K, M, G. You need increase the value when\n \"bufio.Scanner: token too long\" error reported (default \"1G\")\n -h, --help help for space2tab\n\n
Exapmles
$ echo a b | csvtk space2tab\na b\n
"},{"location":"usage/#csv2md","title":"csv2md","text":"Usage
convert CSV to markdown format\n\nAttention:\n\n csv2md treats the first row as header line and requires them to be unique\n\nUsage:\n csvtk csv2md [flags]\n\nFlags:\n -a, --alignments string comma separated alignments. e.g. -a l,c,c,c or -a c (default \"l\")\n -w, --min-width int min width (at least 3) (default 3)\n\n
Examples
give single alignment symbol
$ cat testdata/names.csv | csvtk csv2md -a left\n|id |first_name|last_name|username|\n|:--|:---------|:--------|:-------|\n|11 |Rob |Pike |rob |\n|2 |Ken |Thompson |ken |\n|4 |Robert |Griesemer|gri |\n|1 |Robert |Thompson |abc |\n|NA |Robert |Abel |123 |\n
result:
id first_name last_name username 11 Rob Pike rob 2 Ken Thompson ken 4 Robert Griesemer gri 1 Robert Thompson abc NA Robert Abel 123give alignment symbols of all fields
$ cat testdata/names.csv | csvtk csv2md -a c,l,l,l\n|id |first_name|last_name|username|\n|:-:|:---------|:--------|:-------|\n|11 |Rob |Pike |rob |\n|2 |Ken |Thompson |ken |\n|4 |Robert |Griesemer|gri |\n|1 |Robert |Thompson |abc |\n|NA |Robert |Abel |123 |\n
result
id first_name last_name username 11 Rob Pike rob 2 Ken Thompson ken 4 Robert Griesemer gri 1 Robert Thompson abc NA Robert Abel 123Usage
convert CSV to readable aligned table\n\nAttention:\n\n 1. row span is not supported.\n\nUsage:\n csvtk csv2rst [flags]\n\nFlags:\n -k, --cross string charactor of cross (default \"+\")\n -s, --header string charactor of separator between header row and data rowws (default \"=\")\n -h, --help help for csv2rst\n -b, --horizontal-border string charactor of horizontal border (default \"-\")\n -p, --padding string charactor of padding (default \" \")\n -B, --vertical-border string charactor of vertical border (default \"|\")\n\n
Example
With header row
$ csvtk csv2rst testdata/names.csv \n+----+------------+-----------+----------+\n| id | first_name | last_name | username |\n+====+============+===========+==========+\n| 11 | Rob | Pike | rob |\n+----+------------+-----------+----------+\n| 2 | Ken | Thompson | ken |\n+----+------------+-----------+----------+\n| 4 | Robert | Griesemer | gri |\n+----+------------+-----------+----------+\n| 1 | Robert | Thompson | abc |\n+----+------------+-----------+----------+\n| NA | Robert | Abel | 123 |\n+----+------------+-----------+----------+\n
No header row
$ csvtk csv2rst -H -t testdata/digitals.tsv \n+---+-------+---+\n| 4 | 5 | 6 |\n+---+-------+---+\n| 1 | 2 | 3 |\n+---+-------+---+\n| 7 | 8 | 0 |\n+---+-------+---+\n| 8 | 1,000 | 4 |\n+---+-------+---+\n
Unicode
$ cat testdata/unicode.csv | csvtk csv2rst\n+-------+---------+\n| value | name |\n+=======+=========+\n| 1 | \u6c88\u4f1f |\n+-------+---------+\n| 2 | \u6c88\u4f1fb |\n+-------+---------+\n| 3 | \u6c88\u5c0f\u4f1f |\n+-------+---------+\n| 4 | \u6c88\u5c0f\u4f1fb |\n+-------+---------+\n
Misc
$ cat testdata/names.csv | head -n 1 | csvtk csv2rst \n+----+------------+-----------+----------+\n| id | first_name | last_name | username |\n+====+============+===========+==========+\n\n$ cat testdata/names.csv | head -n 1 | csvtk csv2rst -H\n+----+------------+-----------+----------+\n| id | first_name | last_name | username |\n+----+------------+-----------+----------+\n\n$ echo | csvtk csv2rst -H\n[ERRO] xopen: no content\n\n$ echo \"a\" | csvtk csv2rst -H\n+---+\n| a |\n+---+\n\n$ echo \"\u6c88\u4f1f\" | csvtk csv2rst -H\n+------+\n| \u6c88\u4f1f |\n+------+\n
Usage
convert CSV/TSV files to XLSX file\n\nAttention:\n\n 1. Multiple CSV/TSV files are saved as separated sheets in .xlsx file.\n 2. All input files should all be CSV or TSV.\n 3. First rows are freezed unless given '-H/--no-header-row'.\n\nUsage:\n csvtk csv2xlsx [flags]\n\nFlags:\n -f, --format-numbers save numbers in number format, instead of text\n -h, --help help for csv2xlsx\n\n
Examples
Single input
$ csvtk csv2xlsx ../testdata/names.csv -o output.xlsx\n\n# check content\n\n$ csvtk xlsx2csv -a output.xlsx\nindex sheet\n1 Sheet1\n\n$ csvtk xlsx2csv output.xlsx | md5sum \n8e9d38a012cb02279a396a2f2dbbbca9 -\n\n$ csvtk cut -f 1- ../testdata/names.csv | md5sum \n8e9d38a012cb02279a396a2f2dbbbca9 -\n
Merging multiple CSV/TSV files into one .xlsx file.
$ csvtk csv2xlsx ../testdata/names*.csv -o output.xlsx\n\n$ csvtk xlsx2csv -a output.xlsx\nindex sheet\n1 names\n2 names.reorder\n3 names.with-unmatched-colname\n
Usage
convert XLSX to CSV format\n\nUsage:\n csvtk xlsx2csv [flags]\n\nFlags:\n -h, --help help for xlsx2csv\n -a, --list-sheets list all sheets\n -i, --sheet-index int Nth sheet to retrieve (default 1)\n -n, --sheet-name string sheet to retrieve\n\n
Examples
list all sheets
$ csvtk xlsx2csv ../testdata/accounts.xlsx -a\nindex sheet\n1 names\n2 phones\n3 region\n
retrieve sheet by index
$ csvtk xlsx2csv ../testdata/accounts.xlsx -i 3\nname,region\nken,nowhere\ngri,somewhere\nshenwei,another\nThompson,there\n
retrieve sheet by name
$ csvtk xlsx2sv ../testdata/accounts.xlsx -n region\nname,region\nken,nowhere\ngri,somewhere\nshenwei,another\nThompson,there\n
Usage
print first N records\n\nUsage:\n csvtk head [flags]\n\nFlags:\n -n, --number int print first N records (default 10)\n\n
Examples
with header line
$ csvtk head -n 2 testdata/1.csv\nname,attr\nfoo,cool\nbar,handsome\n
no header line
$ csvtk head -H -n 2 testdata/1.csv\nname,attr\nfoo,cool\n
Usage
concatenate CSV/TSV files by rows\n\nNote that the second and later files are concatenated to the first one,\nso only columns match that of the first files kept.\n\nUsage:\n csvtk concat [flags]\n\nFlags:\n -h, --help help for concat\n -i, --ignore-case ignore case (column name)\n -k, --keep-unmatched keep blanks even if no any data of a file matches\n -u, --unmatched-repl string replacement for unmatched data\n\n
Examples
data
$ csvtk pretty names.csv\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n\n$ csvtk pretty names.reorder.csv\nlast_name username id first_name\nPike rob 11 Rob\nThompson ken 2 Ken\nGriesemer gri 4 Robert\nThompson abc 1 Robert\nAbel 123 NA Robert\n\n$ csvtk pretty names.with-unmatched-colname.csv\nid2 First_name Last_name Username col\n22 Rob33 Pike222 rob111 abc\n44 Ken33 Thompson22 ken111 def\n
simple one
$ csvtk concat names.csv names.reorder.csv \\\n | csvtk pretty\nid first_name last_name username\n-- ---------- --------- --------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
data with unmatched column names, and ignoring cases
$ csvtk concat names.csv names.with-unmatched-colname.csv -i \\\n | csvtk pretty\nid first_name last_name username\n-- ---------- ---------- --------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n Rob33 Pike222 rob111\n Ken33 Thompson22 ken111\n\n $ csvtk concat names.csv names.with-unmatched-colname.csv -i -u Unmached \\\n | csvtk pretty\nid first_name last_name username\n-------- ---------- ---------- --------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\nUnmached Rob33 Pike222 rob111\nUnmached Ken33 Thompson22 ken111\n
Sometimes data of one file does not matche any column, they are discared by default. But you can keep them using flag -k/--keep-unmatched
$ csvtk concat names.with-unmatched-colname.csv names.csv \\\n | csvtk pretty\nid2 First_name Last_name Username col\n--- ---------- ---------- -------- ---\n22 Rob33 Pike222 rob111 abc\n44 Ken33 Thompson22 ken111 def\n\n$ csvtk concat names.with-unmatched-colname.csv names.csv -k -u NA \\\n | csvtk pretty\nid2 First_name Last_name Username col\n--- ---------- ---------- -------- ---\n22 Rob33 Pike222 rob111 abc\n44 Ken33 Thompson22 ken111 def\nNA NA NA NA NA\nNA NA NA NA NA\nNA NA NA NA NA\nNA NA NA NA NA\nNA NA NA NA NA\n
Usage
sampling by proportion\n\nUsage:\n csvtk sample [flags]\n\nFlags:\n -h, --help help for sample\n -n, --line-number print line number as the first column (\"n\")\n -p, --proportion float sample by proportion\n -s, --rand-seed int rand seed (default 11)\n\n
Examples
$ seq 100 | csvtk sample -H -p 0.5 | wc -l\n46\n\n$ seq 100 | csvtk sample -H -p 0.5 | wc -l\n46\n\n$ seq 100 | csvtk sample -H -p 0.1 | wc -l\n10\n\n$ seq 100 | csvtk sample -H -p 0.05 -n\n50,50\n52,52\n65,65\n
"},{"location":"usage/#cut","title":"cut","text":"Usage
select and arrange fields\n\nExamples:\n\n 1. Single column\n csvtk cut -f 1\n csvtk cut -f colA\n 2. Multiple columns (replicates allowed)\n csvtk cut -f 1,3,2,1\n csvtk cut -f colA,colB,colA\n 3. Column ranges\n csvtk cut -f 1,3-5 # 1, 3, 4, 5\n csvtk cut -f 3,5- # 3rd col, and 5th col to the end\n csvtk cut -f 1- # for all\n csvtk cut -f 2-,1 # move 1th col to the end\n 4. Unselect\n csvtk cut -f -1,-3 # discard 1st and 3rd column\n csvtk cut -f -1--3 # discard 1st to 3rd column\n csvtk cut -f -2- # discard 2nd and all columns on the right.\n csvtu cut -f -colA,-colB # discard colA and colB\n\nUsage:\n csvtk cut [flags]\n\nFlags:\n -m, --allow-missing-col allow missing column\n -b, --blank-missing-col blank missing column, only for using column fields\n -f, --fields string select only these fields. type \"csvtk cut -h\" for examples\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for cut\n -i, --ignore-case ignore case (column name)\n -u, --uniq-column deduplicate columns matched by multiple fuzzy column names\n\n
Examples
data:
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n
Select columns by column index: csvtk cut -f 1,2
$ cat testdata/names.csv \\\n | csvtk cut -f 1,2\nid,first_name\n11,Rob\n2,Ken\n4,Robert\n1,Robert\nNA,Robert\n\n# select more than once\n$ cat testdata/names.csv \\\n | csvtk cut -f 1,2,2\nid,first_name,first_name\n11,Rob,Rob\n2,Ken,Ken\n4,Robert,Robert\n1,Robert,Robert\nNA,Robert,Robert\n
Select columns by column names: csvtk cut -f first_name,username
$ cat testdata/names.csv \\\n | csvtk cut -f first_name,username\nfirst_name,username\nRob,rob\nKen,ken\nRobert,gri\nRobert,abc\nRobert,123\n\n# select more than once\n$ cat testdata/names.csv \\\n | csvtk cut -f first_name,username,username\nfirst_name,username,username\nRob,rob,rob\nKen,ken,ken\nRobert,gri,gri\nRobert,abc,abc\nRobert,123,123\n
Unselect:
select 3+ columns: csvtk cut -f -1,-2
$ cat testdata/names.csv \\\n | csvtk cut -f -1,-2\nlast_name,username\nPike,rob\nThompson,ken\nGriesemer,gri\nThompson,abc\nAbel,123\n
select columns except 1-3
$ cat testdata/names.csv \\\n | csvtk cut -f -1--3\nusername\nrob\nken\ngri\nabc\n123\n
select columns except first_name
: csvtk cut -f -first_name
$ cat testdata/names.csv \\\n | csvtk cut -f -first_name\nid,last_name,username\n11,Pike,rob\n2,Thompson,ken\n4,Griesemer,gri\n1,Thompson,abc\nNA,Abel,123\n
Fuzzy fields using wildcard character, csvtk cut -F -f \"*_name,username\"
$ cat testdata/names.csv \\\n | csvtk cut -F -f \"*_name,username\"\nfirst_name,last_name,username\nRob,Pike,rob\nKen,Thompson,ken\nRobert,Griesemer,gri\nRobert,Thompson,abc\nRobert,Abel,123\n
All fields: csvtk cut -F -f \"*\"
or csvtk cut -f 1-
.
$ cat testdata/names.csv \\\n | csvtk cut -F -f \"*\"\nid,first_name,last_name,username\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n1,Robert,Thompson,abc\nNA,Robert,Abel,123\n
Field ranges (read help message (\"csvtk cut -f\") for more examples)
csvtk cut -f 2-4
for column 2,3,4
$ cat testdata/names.csv \\\n | csvtk cut -f 2-4\nfirst_name,last_name,username\nRob,Pike,rob\nKen,Thompson,ken\nRobert,Griesemer,gri\nRobert,Thompson,abc\nRobert,Abel,123\n
csvtk cut -f -3--1
for discarding column 1,2,3
# or -f -1--3\n$ cat testdata/names.csv \\\n | csvtk cut -f -3--1\nusername\nrob\nken\ngri\nabc\n123\n
csvtk cut -f 2-,1
for moving 1th column to the end.
$ cat testdata/names.csv \\\n | csvtk cut -f 2-,1\nfirst_name,last_name,username,id\nRob,Pike,rob,11\nKen,Thompson,ken,2\nRobert,Griesemer,gri,4\nRobert,Thompson,abc,1\nRobert,Abel,123,NA\n
csvtk cut -f 1,1
for duplicating columns
$ cat testdata/names.csv \\\n | csvtk cut -f 1,1\nid,id\n11,11\n2,2\n4,4\n1,1\nNA,NA\n
Usage
unique data without sorting\n\nUsage:\n csvtk uniq [flags]\n\nFlags:\n -f, --fields string select these fields as keys. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for uniq\n -i, --ignore-case ignore case\n -n, --keep-n int keep at most N records for a key (default 1)\n\n
Examples:
data:
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n
unique first_name (it removes rows with duplicated first_name)
$ cat testdata/names.csv \\\n | csvtk uniq -f first_name\nid,first_name,last_name,username\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n
unique first_name, a more common way
$ cat testdata/names.csv \\\n | csvtk cut -f first_name \\\n | csvtk uniq -f 1\nfirst_name\nRob\nKen\nRobert\n
keep top 2 items for every group.
$ cat testdata/players.csv \ngender,id,name\nmale,1,A\nmale,2,B\nmale,3,C\nfemale,11,a\nfemale,12,b\nfemale,13,c\nfemale,14,d\n\n$ cat testdata/players.csv \\\n | csvtk sort -k gender:N -k id:nr \\\n | csvtk uniq -f gender -n 2\ngender,id,name\nfemale,14,d\nfemale,13,c\nmale,3,C\nmale,2,B\n
Usage
frequencies of selected fields\n\nUsage:\n csvtk freq [flags]\n\nFlags:\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -i, --ignore-case ignore case\n -r, --reverse reverse order while sorting\n -n, --sort-by-freq sort by frequency\n -k, --sort-by-key sort by key\n\n
Examples
one filed
$ cat testdata/names.csv \\\n | csvtk freq -f first_name | csvtk pretty\nfirst_name frequency\nKen 1\nRob 1\nRobert 3\n
sort by frequency. you can also use csvtk sort
with more sorting options
$ cat testdata/names.csv \\\n | csvtk freq -f first_name -n -r \\\n | csvtk pretty\nfirst_name frequency\nRobert 3\nKen 1\nRob 1\n
sorty by key
$ cat testdata/names.csv \\\n | csvtk freq -f first_name -k \\\n | csvtk pretty\nfirst_name frequency\nKen 1\nRob 1\nRobert 3\n
multiple fields
$ cat testdata/names.csv \\\n | csvtk freq -f first_name,last_name \\\n | csvtk pretty\nfirst_name last_name frequency\nRobert Abel 1\nKen Thompson 1\nRob Pike 1\nRobert Thompson 1\nRobert Griesemer 1\n
data without header row
$ cat testdata/ testdata/digitals.tsv \\\n | csvtk -t -H freq -f 1\n8 1\n1 1\n4 1\n7 1\n
Usage
intersection of multiple files\n\nAttention:\n\n 1. fields in all files should be the same, \n if not, extracting to another file using \"csvtk cut\".\n\nUsage:\n csvtk inter [flags]\n\nFlags:\n -f, --fields string select these fields as the key. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -i, --ignore-case ignore case\n\n
Examples:
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ cat testdata/region.csv\nname,region\nken,nowhere\ngri,somewhere\nshenwei,another\nThompson,there\n\n$ csvtk inter testdata/phones.csv testdata/region.csv\nusername\ngri\nken\nshenwei\n
"},{"location":"usage/#grep","title":"grep","text":"Usage
grep data by selected fields with patterns/regular expressions\n\nAttentions:\n\n 1. By default, we directly compare the column value with patterns,\n use \"-r/--use-regexp\" for partly matching.\n 2. Multiple patterns can be given by setting '-p/--pattern' more than once,\n or giving comma separated values (CSV formats).\n Therefore, please use double quotation marks for patterns containing\n comma, e.g., -p '\"A{2,}\"'\n\nUsage:\n csvtk grep [flags]\n\nFlags:\n --delete-matched delete a pattern right after being matched, this keeps the firstly matched\n data and speedups when using regular expressions\n -f, --fields string comma separated key fields, column name or index. e.g. -f 1-3 or -f id,id2\n or -F -f \"group*\" (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for grep\n -i, --ignore-case ignore case\n --immediate-output print output immediately, do not use write buffer\n -v, --invert invert match\n -n, --line-number print line number as the first column (\"n\")\n -N, --no-highlight no highlight\n -p, --pattern strings query pattern (multiple values supported). Attention: use double quotation\n marks for patterns containing comma, e.g., -p '\"A{2,}\"'\n -P, --pattern-file string pattern files (one pattern per line)\n -r, --use-regexp patterns are regular expression\n --verbose verbose output\n\n
Examples
Matched parts will be highlight.
By exact keys
$ cat testdata/names.csv \\\n | csvtk grep -f last_name -p Pike -p Abel \\\n | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\nNA Robert Abel 123\n\n# another form of multiple keys \n$ csvtk grep -f last_name -p Pike,Abel,Tom\n
By regular expression: csvtk grep -f first_name -r -p Rob
$ cat testdata/names.csv \\\n | csvtk grep -f first_name -r -p Rob \\\n | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
By pattern list
$ csvtk grep -f first_name -P name_list.txt\n
Remore rows containing any missing data (NA):
$ csvtk grep -F -f \"*\" -r -p \"^$\" -v\n
Show line number
$ cat names.csv \\\n | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n\n$ cat names.csv \\\n | csvtk grep -f first_name -r -i -p rob -n \\\n | csvtk pretty\nrow id first_name last_name username\n--- -- ---------- --------- --------\n1 11 Rob Pike rob\n3 4 Robert Griesemer gri\n4 1 Robert Thompson abc\n5 NA Robert Abel 123\n
Usage
filter rows by values of selected fields with arithmetic expression\n\nUsage:\n csvtk filter [flags]\n\nFlags:\n --any print record if any of the field satisfy the condition\n -f, --filter string filter condition. e.g. -f \"age>12\" or -f \"1,3<=2\" or -F -f \"c*!=0\"\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for filter\n -n, --line-number print line number as the first column (\"n\")\n\n
Examples
single field
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n\n$ cat testdata/names.csv \\\n | csvtk filter -f \"id>0\" \\\n | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\n
multiple fields
$ cat testdata/digitals.tsv\n4 5 6\n1 2 3\n7 8 0\n8 1,000 4\n\n$ cat testdata/digitals.tsv \\\n | csvtk -t -H filter -f \"1-3>0\"\n4 5 6\n1 2 3\n8 1,000 4\n
using --any
to print record if any of the field satisfy the condition
$ cat testdata/digitals.tsv \\\n | csvtk -t -H filter -f \"1-3>0\" --any\n4 5 6\n1 2 3\n7 8 0\n8 1,000 4\n
fuzzy fields
$ cat testdata/names.csv \\\n | csvtk filter -F -f \"i*!=0\"\nid,first_name,last_name,username\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n1,Robert,Thompson,abc\n
Usage
filter rows by awk-like arithmetic/string expressions\n\nThe arithmetic/string expression is supported by:\n\n https://github.com/Knetic/govaluate\n\nVariables formats:\n $1 or ${1} The first field/column\n $a or ${a} Column \"a\"\n ${a,b} or ${a b} or ${a (b)} Column name with special charactors,\n e.g., commas, spaces, and parentheses\n\nSupported operators and types:\n\n Modifiers: + - / * & | ^ ** % >> <<\n Comparators: > >= < <= == != =~ !~ in\n Logical ops: || &&\n Numeric constants, as 64-bit floating point (12345.678)\n String constants (single quotes: 'foobar')\n Date constants (single quotes)\n Boolean constants: true false\n Parenthesis to control order of evaluation ( )\n Arrays (anything separated by , within parenthesis: (1, 2, 'foo'))\n Prefixes: ! - ~\n Ternary conditional: ? :\n Null coalescence: ??\n\nCustom functions:\n - len(), length of strings, e.g., len($1), len($a), len($1, $2)\n - ulen(), length of unicode strings/width of unicode strings rendered\n to a terminal, e.g., len(\"\u6c88\u4f1f\")==6, ulen(\"\u6c88\u4f1f\")==4\n\nUsage:\n csvtk filter2 [flags]\n\nFlags:\n -f, --filter string awk-like filter condition. e.g. '$age>12' or '$1 > $3' or '$name==\"abc\"' or\n '$1 % 2 == 0'\n -h, --help help for filter2\n -n, --line-number print line number as the first column (\"n\")\n -s, --numeric-as-string treat even numeric fields as strings to avoid converting big numbers into\n scientific notation\n\n
Examples:
filter rows with id
greater than 3:
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n\n$ cat testdata/names.csv \\\n | csvtk filter2 -f '$id > 3'\nid,first_name,last_name,username\n11,Rob,Pike,rob\n4,Robert,Griesemer,gri\n
arithmetic and string expressions
$ cat testdata/names.csv \\\n | csvtk filter2 -f '$id > 3 || $username==\"ken\"'\nid,first_name,last_name,username\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n
More arithmetic expressions
$ cat testdata/digitals.tsv\n4 5 6\n1 2 3\n7 8 0\n8 1,000 4\n\n$ cat testdata/digitals.tsv \\\n | csvtk filter2 -H -t -f '$1 > 2 && $2 % 2 == 0'\n7 8 0\n8 1,000 4\n\n# comparison between fields and support\n$ cat testdata/digitals.tsv \\\n | csvtk filter2 -H -t -f '$2 <= $3 || ( $1 / $2 > 0.5 )'\n4 5 6\n1 2 3\n7 8 0\n
Array expressions using in
numeric or string (case sensitive)
$ cat testdata/names.csv | csvtk filter2 -f '$first_name in (\"Ken\", \"Rob\", \"robert\")'\nid,first_name,last_name,username\\\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n\n$ cat testdata/names.csv | csvtk filter2 -f '$id in (2, 4)'\nid,first_name,last_name,username\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n\n# negate by wrapping entire expression in `!()`\n$ cat testdata/names.csv | csvtk filter2 -f '!($username in (\"rob\", \"ken\"))'\nid,first_name,last_name,username\n4,Robert,Griesemer,gri\n1,Robert,Thompson,abc\nNA,Robert,Abel,123\n
Usage
join files by selected fields (inner, left and outer join).\n\nAttention:\n\n 1. Multiple keys supported\n 2. Default operation is inner join, use --left-join for left join\n and --outer-join for outer join.\n\nUsage:\n csvtk join [flags]\n\nAliases:\n join, merge\n\nFlags:\n -f, --fields string Semicolon separated key fields of all files, if given one, we think all the\n files have the same key columns. Fields of different files should be separated\n by \";\", e.g -f \"1;2\" or -f \"A,B;C,D\" or -f id (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for join\n -i, --ignore-case ignore case\n -n, --ignore-null do not match NULL values\n -k, --keep-unmatched keep unmatched data of the first file (left join)\n -L, --left-join left join, equals to -k/--keep-unmatched, exclusive with --outer-join\n --na string content for filling NA data\n -P, --only-duplicates add filenames as colname prefixes or add custom suffixes only for duplicated\n colnames\n -O, --outer-join outer join, exclusive with --left-join\n -p, --prefix-filename add each filename as a prefix to each colname. if there's no header row, we'll\n add one\n -e, --prefix-trim-ext trim extension when adding filename as colname prefix\n -s, --suffix strings add suffixes to colnames from each file\n\n
Examples:
data
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ cat testdata/region.csv\nname,region\nken,nowhere\ngri,somewhere\nshenwei,another\nThompson,there\n
All files have same key column: csvtk join -f id file1.csv file2.csv
$ csvtk join -f 1 testdata/phones.csv testdata/region.csv \\\n | csvtk pretty\nusername phone region\ngri 11111 somewhere\nken 22222 nowhere\nshenwei 999999 another\n
keep unmatched (left join)
$ csvtk join -f 1 testdata/phones.csv testdata/region.csv --left-join \\\n | csvtk pretty\nusername phone region\ngri 11111 somewhere\nrob 12345 \nken 22222 nowhere\nshenwei 999999 another\n
keep unmatched and fill with something
$ csvtk join -f 1 testdata/phones.csv testdata/region.csv --left-join --na NA \\\n | csvtk pretty\nusername phone region\ngri 11111 somewhere\nrob 12345 NA\nken 22222 nowhere\nshenwei 999999 another\n
Outer join
$ csvtk join -f 1 testdata/phones.csv testdata/region.csv --outer-join --na NA \\\n | csvtk pretty\nusername phone region\ngri 11111 somewhere\nrob 12345 NA\nken 22222 nowhere\nshenwei 999999 another\nThompson NA there\n
Files have different key columns: csvtk join -f \"username;username;name\" testdata/names.csv phone.csv adress.csv -k
. Note that fields are separated with ;
not ,
.
$ csvtk join -f \"username;name\" testdata/phones.csv testdata/region.csv --left-join --na NA \\\n | csvtk pretty\nusername phone region\ngri 11111 somewhere\nrob 12345 NA\nken 22222 nowhere\nshenwei 999999 another\n
Adding each filename as a prefix to each colname
$ cat testdata/1.csv \nname,attr\nfoo,cool\nbar,handsome\nbob,beutiful\n\n$ cat testdata/2.csv \nname,major\nbar,bioinformatics\nbob,microbiology\nbob,computer science\n\n$ csvtk join testdata/{1,2}.csv \\\n | csvtk pretty \nname attr major\n---- -------- -----------------\nbar handsome bioinformatics\nbob beutiful microbiology\nbob beutiful computer science\n\n$ csvtk join testdata/{1,2}.csv --prefix-filename \\\n | csvtk pretty \nname 1.csv-attr 2.csv-major\n---- ---------- -----------------\nbar handsome bioinformatics\nbob beutiful microbiology\nbob beutiful computer science\n\n# trim the file extention\n$ csvtk join testdata/{1,2}.csv --prefix-filename --prefix-trim-ext \\\n | csvtk pretty \nname 1-attr 2-major\n---- -------- -----------------\nbar handsome bioinformatics\nbob beutiful microbiology\nbob beutiful computer science\n
Adding each filename as a prefix to each colname for data without header row
$ cat testdata/A.f.csv \na,x,1\nb,y,2\n\n$ cat testdata/B.f.csv \na,x,3\nb,y,4\n\n$ cat testdata/C.f.csv \na,x,5\nb,y,6\n\n$ csvtk join -H testdata/{A,B,C}.f.csv \\\n | csvtk pretty -H\na x 1 x 3 x 5\nb y 2 y 4 y 6\n\n$ csvtk join -H testdata/{A,B,C}.f.csv -p \\\n | csvtk pretty\nkey1 A.f.csv A.f.csv B.f.csv B.f.csv C.f.csv C.f.csv\n---- ------- ------- ------- ------- ------- -------\na x 1 x 3 x 5\nb y 2 y 4 y 6\n\n# trim file extention\n$ csvtk join -H testdata/{A,B,C}.f.csv -p -e \\\n | csvtk pretty\nkey1 A.f A.f B.f B.f C.f C.f\n---- --- --- --- --- --- ---\na x 1 x 3 x 5\nb y 2 y 4 y 6\n\n# use column 1 and 2 as keys\n$ csvtk join -H testdata/{A,B,C}.f.csv -p -e -f 1,2 \\\n | csvtk pretty\nkey1 key2 A.f B.f C.f\n---- ---- --- --- ---\na x 1 3 5\nb y 2 4 6\n\n# change column names furthor\n$ csvtk join -H testdata/{A,B,C}.f.csv -p -e -f 1,2 \\\n | csvtk rename2 -F -f '*' -p '\\.f$' \\\n | csvtk pretty\nkey1 key2 A B C\n---- ---- - - -\na x 1 3 5\nb y 2 4 6\n
add suffixes to colnames from each file (-s/--suffix
)
$ csvtk join -H testdata/{A,B,C}.f.csv -s A,B,C \\\n | csvtk pretty\nkey1 c2-A c3-A c2-B c3-B c2-C c3-C\n---- ---- ---- ---- ---- ---- ----\na x 1 x 3 x 5\nb y 2 y 4 y 6\n
Usage
split CSV/TSV into multiple files according to column values\n\nNotes:\n\n 1. flag -o/--out-file can specify out directory for splitted files.\n 2. flag -s/--prefix-as-subdir can create subdirectories with prefixes of\n keys of length X, to avoid writing too many files in the output directory.\n\nUsage:\n csvtk split [flags]\n\nFlags:\n -g, --buf-groups int buffering N groups before writing to file (default 100)\n -b, --buf-rows int buffering N rows for every group before writing to file (default 100000)\n -f, --fields string comma separated key fields, column name or index. e.g. -f 1-3 or -f\n id,id2 or -F -f \"group*\" (default \"1\")\n --force overwrite existing output directory (given by -o).\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for split\n -i, --ignore-case ignore case\n -G, --out-gzip force output gzipped file\n -p, --out-prefix string output file prefix, the default value is the input file. use -p \"\" to\n disable outputting prefix\n -s, --prefix-as-subdir int create subdirectories with prefixes of keys of length X, to avoid writing\n too many files in the output directory\n\n
Examples
Test data
$ cat names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n
split according to first_name
$ csvtk split names.csv -f first_name\n$ ls *.csv\nnames.csv names-Ken.csv names-Rob.csv names-Robert.csv\n\n$ cat names-Ken.csv\nid,first_name,last_name,username\n2,Ken,Thompson,ken\n\n$ cat names-Rob.csv\nid,first_name,last_name,username\n11,Rob,Pike,rob\n\n$ cat names-Robert.csv\nid,first_name,last_name,username\n4,Robert,Griesemer,gri\n1,Robert,Thompson,abc\nNA,Robert,Abel,123\n
split according to first_name
and last_name
$ csvtk split names.csv -f first_name,last_name\n$ ls *.csv\nnames.csv names-Robert-Abel.csv names-Robert-Thompson.csv\nnames-Ken-Thompson.csv names-Robert-Griesemer.csv names-Rob-Pike.csv\n
flag -o/--out-file
can specify out directory for splitted files
$ seq 10000 | csvtk split -H -o result\n$ ls result/*.csv | wc -l\n10000\n
Do not output prefix, use -p \"\"
.
$ echo -ne \"1,ACGT\\n2,GGCA\\n3,ACAAC\\n\"\n1,ACGT\n2,GGCA\n3,ACAAC\n\n$ echo -ne \"1,ACGT\\n2,GGCA\\n3,ACAAC\\n\" | csvtk split -H -f 2 -o t -p \"\" -s 3 --force\n\n$ tree t\nt\n\u251c\u2500\u2500 ACA\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ACAAC.csv\n\u251c\u2500\u2500 ACG\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ACGT.csv\n\u2514\u2500\u2500 GGC\n \u2514\u2500\u2500 GGCA.csv\n\n4 directories, 3 files\n
extreme example 1: lots (1M) of rows in groups
$ yes 2 | head -n 10000000 | gzip -c > t.gz\n\n$ memusg -t csvtk -H split t.gz\nelapsed time: 5.859s\npeak rss: 41.45 MB\n\n# check\n$ zcat t-2.gz | wc -l\n10000000\n$ zcat t-2.gz | md5sum\nf194afd7cecf645c0e3cce50c9bc526e -\n$ zcat t.gz | md5sum\nf194afd7cecf645c0e3cce50c9bc526e -\n
extreme example 2: lots (10K) of groups
$ seq 10000 | gzip -c > t2.gz\n\n$ memusg -t csvtk -H split t2.gz -o t2\nelapsed time: 20.856s\npeak rss: 23.77 MB\n\n# check\n$ ls t2/*.gz | wc -l\n10000\n$ zcat t2/*.gz | sort -k 1,1n | md5sum\n72d4ff27a28afbc066d5804999d5a504 -\n$ zcat t2.gz | md5sum\n72d4ff27a28afbc066d5804999d5a504 -\n
since, v0.31.0, the flag -s/--prefix-as-subdir
can create subdirectories with prefixes of keys of length X, to avoid writing too many files in the output directory.
$ memusg -t csvtk -H split t2.gz -o t2 -s 3\nelapsed time: 2.668s\npeak rss: 1.79 GB\n
$ fd .gz$ t2 | rush 'zcat {}' | sort -k 1,1n | md5sum 72d4ff27a28afbc066d5804999d5a504 -
$ tree t2/ | more\nt2/\n\u251c\u2500\u2500 100\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-10000.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1000.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1001.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1002.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1003.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1004.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1005.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1006.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1007.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1008.gz\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 t2-1009.gz\n\u251c\u2500\u2500 101\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1010.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1011.gz\n...\n\u251c\u2500\u2500 t2-994.gz\n\u251c\u2500\u2500 t2-995.gz\n\u251c\u2500\u2500 t2-996.gz\n\u251c\u2500\u2500 t2-997.gz\n\u251c\u2500\u2500 t2-998.gz\n\u251c\u2500\u2500 t2-999.gz\n\u251c\u2500\u2500 t2-99.gz\n\u2514\u2500\u2500 t2-9.gz\n\n901 directories, 10000 files\n
Usage
split XLSX sheet into multiple sheets according to column values\n\nStrengths: Sheet properties are remained unchanged.\nWeakness : Complicated sheet structures are not well supported, e.g.,\n 1. merged cells\n 2. more than one header row\n\nUsage:\n csvtk splitxlsx [flags]\n\nFlags:\n -f, --fields string comma separated key fields, column name or index. e.g. -f 1-3 or -f id,id2\n or -F -f \"group*\" (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for splitxlsx\n -i, --ignore-case ignore case (cell value)\n -a, --list-sheets list all sheets\n -N, --sheet-index int Nth sheet to retrieve (default 1)\n -n, --sheet-name string sheet to retrieve\n\n
Examples
example data
# list all sheets\n$ csvtk xlsx2csv -a accounts.xlsx\nindex sheet\n1 names\n2 phones\n3 region\n\n# data of sheet \"names\"\n$ csvtk xlsx2csv accounts.xlsx | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
split sheet \"names\" according to first_name
$ csvtk splitxlsx accounts.xlsx -n names -f first_name\n\n$ ls accounts.*\naccounts.split.xlsx accounts.xlsx\n\n$ csvtk splitxlsx -a accounts.split.xlsx\nindex sheet\n1 names\n2 phones\n3 region\n4 Rob\n5 Ken\n6 Robert\n\n$ csvtk xlsx2csv accounts.split.xlsx -n Rob \\\n | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n\n$ csvtk xlsx2csv accounts.split.xlsx -n Robert \\\n | csvtk pretty\nid first_name last_name username\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
Usage
compute combinations of items at every row\n\nUsage:\n csvtk comb [flags]\n\nAliases:\n comb, combination\n\nFlags:\n -h, --help help for comb\n -i, --ignore-case ignore-case\n -S, --nat-sort sort items in natural order\n -n, --number int number of items in a combination, 0 for no limit, i.e., return all combinations\n (default 2)\n -s, --sort sort items in a combination\n\n
Examples:
$ cat players.csv \ngender,id,name\nmale,1,A\nmale,2,B\nmale,3,C\nfemale,11,a\nfemale,12,b\nfemale,13,c\nfemale,14,d\n\n# put names of one group in one row\n$ cat players.csv \\\n | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk cut -f 2 \nname\nA;B;C\na;b;c;d\n\n# n = 2\n$ cat players.csv \\\n | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk cut -f 2 \\\n | csvtk comb -d ';' -n 2\nA,B\nA,C\nB,C\na,b\na,c\nb,c\na,d\nb,d\nc,d\n\n# n = 3\n$ cat players.csv \\\n | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk cut -f 2 \\\n | csvtk comb -d ';' -n 3\nA,B,C\na,b,c\na,b,d\na,c,d\nb,c,d\n\n# n = 0\n$ cat players.csv \\\n | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk cut -f 2 \\\n | csvtk comb -d ';' -n 0\nA\nB\nA,B\nC\nA,C\nB,C\nA,B,C\na\nb\na,b\nc\na,c\nb,c\na,b,c\nd\na,d\nb,d\na,b,d\nc,d\na,c,d\nb,c,d\na,b,c,d\n\n
"},{"location":"usage/#fix","title":"fix","text":"Usage
fix CSV/TSV with different numbers of columns in rows\n\nHow to:\n 1. First -n/--buf-rows rows are read to check the maximum number of columns.\n The default value 0 means all rows will be read.\n 2. Buffered and remaining rows with fewer columns are appended with empty\n cells before output.\n 3. An error will be reported if the number of columns of any remaining row\n is larger than the maximum number of columns.\n\nUsage:\n csvtk fix [flags]\n\nFlags:\n -n, --buf-rows int the number of rows to determine the maximum number of columns. 0 for all rows.\n -h, --help help for fix\n\n\n
Examples
$ cat testdata/unequal_ncols.csv\nid,first_name,last_name\n11,\"Rob\",\"Pike\"\n2,Ken,Thompson\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\"\n\n\n$ cat testdata/unequal_ncols.csv | csvtk pretty\n[ERRO] record on line 4: wrong number of fields\n\n\n\n$ cat testdata/unequal_ncols.csv | csvtk fix | csvtk pretty -S grid\n[INFO] the maximum number of columns in all 6 rows: 4\n+----+------------+-----------+-----+\n| id | first_name | last_name | |\n+====+============+===========+=====+\n| 11 | Rob | Pike | |\n+----+------------+-----------+-----+\n| 2 | Ken | Thompson | |\n+----+------------+-----------+-----+\n| 4 | Robert | Griesemer | gri |\n+----+------------+-----------+-----+\n| 1 | Robert | Thompson | abc |\n+----+------------+-----------+-----+\n| NA | Robert | | |\n+----+------------+-----------+-----+\n\n
"},{"location":"usage/#fix-quotes","title":"fix-quotes","text":"Usage
fix malformed CSV/TSV caused by double-quotes\n\nThis command fixes fields not appropriately enclosed by double-quotes\nto meet the RFC4180 standard (https://rfc-editor.org/rfc/rfc4180.html).\n\nWhen and how to:\n 1. Values containing bare double quotes. e.g.,\n a,abc\" xyz,d\n Error information: bare \" in non-quoted-field.\n Fix: adding the flag -l/--lazy-quotes.\n Using this command:\n a,abc\" xyz,d -> a,\"abc\"\" xyz\",d\n 2. Values with double quotes in the begining but not in the end. e.g.,\n a,\"abc\" xyz,d\n Error information: extraneous or missing \" in quoted-field.\n Using this command:\n a,\"abc\" xyz,d -> a,\"\"\"abc\"\" xyz\",d\n\nNext:\n 1. You can process the data without the flag -l/--lazy-quotes.\n 2. Use 'csvtk del-quotes' if you want to restore the original format.\n\nLimitation:\n 1. Values containing line breaks are not supported.\n\nUsage:\n csvtk fix-quotes [flags]\n\nFlags:\n -b, --buffer-size string size of buffer, supported unit: K, M, G. You need increase the value when\n \"bufio.Scanner: token too long\" error reported (default \"1G\")\n -h, --help help for fix-quotes\n\n
Examples:
Test data, in which there are five cases with values containing double quotes.
$ cat testdata/malformed.tsv\n1 Cellvibrio no quotes & not tab\n2 \"Cellvibrio gilvus\" quotes can be removed\n3 \"quotes required\" quotes needed (with a tab in the cell)\n4 fake\" record bare double-quote in non-quoted-field\n5 \"Cellvibrio\" Winogradsky only with doub-quote in the beginning\n6 fake record2\" \"only with doub-quote in the end\"\n\n$ cat testdata/malformed.tsv | csvtk cut -f 1-\n[ERRO] parse error on line 2, column 3: bare \" in non-quoted-field\n\n# -l does not work, and it's messed up.\n$ cat testdata/malformed.tsv | csvtk cut -f 1- -l\n1 Cellvibrio no quotes & not tab\n\"2 \"\"Cellvibrio gilvus\"\" quotes can be removed\"\n\"3 \"\"quotes required\"\" quotes needed (with a tab in the cell)\"\n\"4 fake\"\" record bare double-quote in non-quoted-field\"\n\"5 \"\"Cellvibrio\"\" Winogradsky only with doub-quote in the beginning\"\n\"6 fake record2\"\" \"\"only with doub-quote in the end\"\"\"\n
Fix it!!!
$ cat testdata/malformed.tsv | csvtk fix-quotes -t\n1 Cellvibrio no quotes & not tab\n2 \"Cellvibrio gilvus\" quotes can be removed\n3 \"quotes required\" quotes needed (with a tab in the cell)\n4 \"fake\"\" record\" bare double-quote in non-quoted-field\n5 \"\"\"Cellvibrio\"\" Winogradsky\" only with doub-quote in the beginning\n6 \"fake record2\"\"\" \"only with doub-quote in the end\"\n\n# pretty\n$ cat testdata/malformed.tsv | csvtk fix-quotes -t | csvtk pretty -Ht -S grid\n+---+--------------------------+----------------------------------------+\n| 1 | Cellvibrio | no quotes & not tab |\n+---+--------------------------+----------------------------------------+\n| 2 | Cellvibrio gilvus | quotes can be removed |\n+---+--------------------------+----------------------------------------+\n| 3 | quotes required | quotes needed (with a tab in the cell) |\n+---+--------------------------+----------------------------------------+\n| 4 | fake\" record | bare double-quote in non-quoted-field |\n+---+--------------------------+----------------------------------------+\n| 5 | \"Cellvibrio\" Winogradsky | only with doub-quote in the beginning |\n+---+--------------------------+----------------------------------------+\n| 6 | fake record2\" | only with doub-quote in the end |\n+---+--------------------------+----------------------------------------+\n\n# do something, like searching rows containing double-quotes.\n# since the command-line argument parser csvtk uses parse the value of flag -p\n# as CSV data, we have to use -p '\"\"\"\"' to represents one double-quotes,\n# where the outter two double quotes are used to quote the value,\n# and the two inner double-quotes actually means an escaped double-quote\n#\n$ cat testdata/malformed.tsv \\\n | csvtk fix-quotes -t \\\n | csvtk grep -Ht -f 2 -r -p '\"\"\"\"'\n4 \"fake\"\" record\" bare double-quote in non-quoted-field\n5 \"\"\"Cellvibrio\"\" Winogradsky\" only with doub-quote in the beginning\n6 \"fake record2\"\"\" only with doub-quote in the end\n
Note that fixed rows are different from the orginal ones, you can use csvtk del-quotes
to reset them.
$ cat testdata/malformed.tsv \\\n | csvtk fix-quotes -t \\\n | csvtk filter2 -t -f '$1 > 0' \\\n | csvtk del-quotes -t\n1 Cellvibrio no quotes & not tab\n2 Cellvibrio gilvus quotes can be removed\n3 \"quotes required\" quotes needed (with a tab in the cell)\n4 fake\" record bare double-quote in non-quoted-field\n5 \"Cellvibrio\" Winogradsky only with doub-quote in the beginning\n6 fake record2\" only with doub-quote in the end\n
Usage
remove extra double quotes added by 'fix-quotes'\n\nLimitation:\n 1. Values containing line breaks are not supported.\n\nUsage:\n csvtk del-quotes [flags]\n\nFlags:\n -h, --help help for del-quotes\n
Examples: see eamples of fix-quotes
"},{"location":"usage/#add-header","title":"add-header","text":"Usage
add column names\n\nUsage:\n csvtk add-header [flags]\n\nFlags:\n -h, --help help for add-header\n -n, --names strings column names to add, in CSV format\n\n
Examples:
No new colnames given:
$ seq 3 | csvtk mutate -H \\\n | csvtk add-header\n[WARN] colnames not given, c1, c2, c3... will be used\nc1,c2\n1,1\n2,2\n3,3\n
Adding new colnames:
$ seq 3 | csvtk mutate -H \\\n | csvtk add-header -n a,b\na,b\n1,1\n2,2\n3,3\n$ seq 3 | csvtk mutate -H \\\n | csvtk add-header -n a -n b\na,b\n1,1\n2,2\n3,3\n\n$ seq 3 | csvtk mutate -H -t \\\n | csvtk add-header -t -n a,b\na b\n1 1\n2 2\n3 3\n
Usage
delete column names\n\nAttention:\n 1. It delete the first lines of all input files.\n\nUsage:\n csvtk del-header [flags]\n\nFlags:\n -h, --help help for del-header\n\n
Examples:
$ seq 3 | csvtk add-header\nc1\n1\n2\n3\n\n$ seq 3 | csvtk add-header | csvtk del-header\n1\n2\n3\n\n$ seq 3 | csvtk del-header -H\n1\n2\n3\n
"},{"location":"usage/#rename","title":"rename","text":"Usage
rename column names with new names\n\nUsage:\n csvtk rename [flags]\n\nFlags:\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -n, --names string comma separated new names\n\n
Examples:
Setting new names: csvtk rename -f A,B -n a,b
or csvtk rename -f 1-3 -n a,b,c
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ cat testdata/phones.csv \\\n | csvtk rename -f 1-2 -n \u59d3\u540d,\u7535\u8bdd \\\n | csvtk pretty \n\u59d3\u540d \u7535\u8bdd\ngri 11111\nrob 12345\nken 22222\nshenwei 999999\n
Also support any order
$ cat testdata/phones.csv \\\n | csvtk rename -f 2,1 -n \u7535\u8bdd,\u59d3\u540d \\\n | csvtk pretty\n\u59d3\u540d \u7535\u8bdd\ngri 11111\nrob 12345\nken 22222\nshenwei 999999\n
Usage
rename column names by regular expression\n\nSpecial replacement symbols:\n\n {nr} ascending number, starting from --start-num\n {kv} Corresponding value of the key (captured variable $n) by key-value file,\n n can be specified by flag --key-capt-idx (default: 1)\n\nUsage:\n csvtk rename2 [flags]\n\nFlags:\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for rename2\n -i, --ignore-case ignore case\n -K, --keep-key keep the key as value when no value found for the key\n --key-capt-idx int capture variable index of key (1-based) (default 1)\n --key-miss-repl string replacement for key with no corresponding value\n -k, --kv-file string tab-delimited key-value file for replacing key with value\n when using \"{kv}\" in -r (--replacement)\n -A, --kv-file-all-left-columns-as-value treat all columns except 1th one as value for kv-file with\n more than 2 columns\n --nr-width int minimum width for {nr} in flag -r/--replacement. e.g.,\n formating \"1\" to \"001\" by --nr-width 3 (default 1)\n -p, --pattern string search regular expression\n -r, --replacement string renamement. supporting capture variables. e.g. $1\n represents the text of the first submatch. ATTENTION: use\n SINGLE quote NOT double quotes in *nix OS or use the \\\n escape character. Ascending number is also supported by\n \"{nr}\".use ${1} instead of $1 when {kv} given!\n -n, --start-num int starting number when using {nr} in replacement (default 1)\n\n
Examples:
Add suffix to all column names.
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ cat testdata/phones.csv \\\n | csvtk rename2 -F -f \"*\" -p \"(.*)\" -r 'prefix_${1}_suffix'\nprefix_username_suffix,prefix_phone_suffix\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n
supporting {kv}
and {nr}
in csvtk replace
. e.g., replace barcode with sample name.
$ cat barcodes.tsv\nSample Barcode\nsc1 CCTAGATTAAT\nsc2 GAAGACTTGGT\nsc3 GAAGCAGTATG\nsc4 GGTAACCTGAC\nsc5 ATAGTTCTCGT\n\n$ cat table.tsv\ngene ATAGTTCTCGT GAAGCAGTATG GAAGACTTGGT AAAAAAAAAA\ngene1 0 0 3 0\ngen1e2 0 0 0 0\n\n# note that, we must arrange the order of barcodes.tsv to KEY-VALUE\n$ csvtk cut -t -f 2,1 barcodes.tsv\nBarcode Sample\nCCTAGATTAAT sc1\nGAAGACTTGGT sc2\nGAAGCAGTATG sc3\nGGTAACCTGAC sc4\nATAGTTCTCGT sc5\n\n# here we go!!!!\n\n$ csvtk rename2 -t -k <(csvtk cut -t -f 2,1 barcodes.tsv) \\\n -f -1 -p '(.+)' -r '{kv}' --key-miss-repl unknown table.tsv\ngene sc5 sc3 sc2 unknown\ngene1 0 0 3 0\ngen1e2 0 0 0 0\n
{nr}
, incase you need this
$ echo \"a,b,c,d\" \\\n | csvtk rename2 -p '(.+)' -r 'col_{nr}' -f -1 --start-num 2\na,col_2,col_3,col_4\n
Usage
replace data of selected fields by regular expression\n\nNote that the replacement supports capture variables.\ne.g. $1 represents the text of the first submatch.\nATTENTION: use SINGLE quote NOT double quotes in *nix OS.\n\nExamples: Adding space to cell values.\n\n csvtk replace -p \"(.)\" -r '$1 '\n\nOr use the \\ escape character.\n\n csvtk replace -p \"(.)\" -r \"\\$1 \"\n\nmore on: http://shenwei356.github.io/csvtk/usage/#replace\n\nSpecial replacement symbols:\n\n {nr} Record number, starting from 1\n {kv} Corresponding value of the key (captured variable $n) by key-value file,\n n can be specified by flag --key-capt-idx (default: 1)\n\nUsage:\n csvtk replace [flags]\n\nFlags:\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB\n (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for replace\n -i, --ignore-case ignore case\n -K, --keep-key keep the key as value when no value found for the key\n --key-capt-idx int capture variable index of key (1-based) (default 1)\n --key-miss-repl string replacement for key with no corresponding value\n -k, --kv-file string tab-delimited key-value file for replacing key with value\n when using \"{kv}\" in -r (--replacement)\n -A, --kv-file-all-left-columns-as-value treat all columns except 1th one as value for kv-file with\n more than 2 columns\n --nr-width int minimum width for {nr} in flag -r/--replacement. e.g.,\n formating \"1\" to \"001\" by --nr-width 3 (default 1)\n -p, --pattern string search regular expression\n -r, --replacement string replacement. supporting capture variables. e.g. $1\n represents the text of the first submatch. ATTENTION: for\n *nix OS, use SINGLE quote NOT double quotes or use the \\\n escape character. Record number is also supported by\n \"{nr}\".use ${1} instead of $1 when {kv} given!\n
Examples
remove Chinese charactors
$ csvtk replace -F -f \"*_name\" -p \"\\p{Han}+\" -r \"\"\n
replace by key-value files
$ cat data.tsv\nname id\nA ID001\nB ID002\nC ID004\n\n$ cat alias.tsv\n001 Tom\n002 Bob\n003 Jim\n\n$ csvtk replace -t -f 2 -p \"ID(.+)\" -r \"N: {nr}, alias: {kv}\" -k alias.tsv data.tsv\n[INFO] read key-value file: alias.tsv\n[INFO] 3 pairs of key-value loaded\nname id\nA N: 1, alias: Tom\nB N: 2, alias: Bob\nC N: 3, alias\n
Usage
round float to n decimal places\n\nUsage:\n csvtk round [flags]\n\nFlags:\n -a, --all-fields all fields, overides -f/--fields\n -n, --decimal-width int limit floats to N decimal points (default 2)\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for round\n\n
Examples:
$ cat testdata/floats.csv | csvtk pretty\na b\n0.12345 abc\nNA 0.9999198549640733\n12.3 e3\n1.4814505299984235e-05 -3.1415926E05\n\n# one or more fields\n$ cat testdata/floats.csv | csvtk round -n 2 -f b | csvtk pretty \na b\n0.12345 abc\nNA 1.00\n12.3 e3\n1.4814505299984235e-05 -3.14E05\n\n# all fields\n$ cat testdata/floats.csv | csvtk round -n 2 -a | csvtk pretty \na b\n0.12 abc\nNA 1.00\n12.30 e3\n1.48e-05 -3.14E05\n
"},{"location":"usage/#mutate","title":"mutate","text":"Usage
create a new column from selected fields by regular expression\n\nUsage:\n csvtk mutate [flags]\n\nFlags:\n --after string insert the new column right after the given column name\n --at int where the new column should appear, 1 for the 1st column, 0 for the last column\n --before string insert the new column right before the given column name\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -i, --ignore-case ignore case\n --na for unmatched data, use blank instead of original data\n -n, --name string new column name\n -p, --pattern string search regular expression with capture bracket. e.g. (default \"^(.+)$\")\n\n
Examples
By default, copy a column: csvtk mutate -f id -n newname
Extract prefix of data as group name using regular expression (get \"A\" from \"A.1\" as group name):
csvtk mutate -f sample -n group -p \"^(.+?)\\.\"\n
get the first letter as new column
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ cat testdata/phones.csv \\\n | csvtk mutate -f username -p \"^(\\w)\" -n first_letter\nusername,phone,first_letter\ngri,11111,g\nrob,12345,r\nken,22222,k\nshenwei,999999,s\n
specify the position of the new column (see similar examples of csvtk mutate2
)
$ cat testdata/phones.csv \\\n | csvtk mutate -f username -p \"^(\\w)\" -n first_letter --at 2\nusername,first_letter,phone\ngri,g,11111\nrob,r,12345\nken,k,22222\nshenwei,s,999999\n\n$ cat testdata/phones.csv \\\n | csvtk mutate -f username -p \"^(\\w)\" -n first_letter --after username\nusername,first_letter,phone\ngri,g,11111\nrob,r,12345\nken,k,22222\nshenwei,s,999999\n\n$ cat testdata/phones.csv \\\n | csvtk mutate -f username -p \"^(\\w)\" -n first_letter --before username\nfirst_letter,username,phone\ng,gri,11111\nr,rob,12345\nk,ken,22222\ns,shenwei,99999\n
Usage
create a new column from selected fields by awk-like arithmetic/string expressions\n\nThe arithmetic/string expression is supported by:\n\n https://github.com/Knetic/govaluate\n\nVariables formats:\n $1 or ${1} The first field/column\n $a or ${a} Column \"a\"\n ${a,b} or ${a b} or ${a (b)} Column name with special charactors,\n e.g., commas, spaces, and parentheses\n\nSupported operators and types:\n\n Modifiers: + - / * & | ^ ** % >> <<\n Comparators: > >= < <= == != =~ !~\n Logical ops: || &&\n Numeric constants, as 64-bit floating point (12345.678)\n String constants (single quotes: 'foobar')\n Date constants (single quotes)\n Boolean constants: true false\n Parenthesis to control order of evaluation ( )\n Arrays (anything separated by , within parenthesis: (1, 2, 'foo'))\n Prefixes: ! - ~\n Ternary conditional: ? :\n Null coalescence: ??\n\nCustom functions:\n - len(), length of strings, e.g., len($1), len($a), len($1, $2)\n - ulen(), length of unicode strings/width of unicode strings rendered\n to a terminal, e.g., len(\"\u6c88\u4f1f\")==6, ulen(\"\u6c88\u4f1f\")==4\n\nUsage:\n csvtk mutate2 [flags]\n\nFlags:\n --after string insert the new column right after the given column name\n --at int where the new column should appear, 1 for the 1st column, 0 for the last column\n --before string insert the new column right before the given column name\n -w, --decimal-width int limit floats to N decimal points (default 2)\n -e, --expression string arithmetic/string expressions. e.g. \"'string'\", '\"abc\"', ' $a + \"-\" + $b ',\n '$1 + $2', '$a / $b', ' $1 > 100 ? \"big\" : \"small\" '\n -h, --help help for mutate2\n -n, --name string new column name\n -s, --numeric-as-string treat even numeric fields as strings to avoid converting big numbers into\n scientific notation\n\n
Example
Constants
$ cat testdata/digitals.tsv \\\n | csvtk mutate2 -t -H -e \" 'abc' \"\n4 5 6 abc\n1 2 3 abc\n7 8 0 abc\n8 1,000 4 abc\n\n$ val=123 \\\n && cat testdata/digitals.tsv \\\n | csvtk mutate2 -t -H -e \" $val \"\n4 5 6 123\n1 2 3 123\n7 8 0 123\n8 1,000 4 123\n
String concatenation
$ cat testdata/names.csv \\\n | csvtk mutate2 -n full_name -e ' $first_name + \" \" + $last_name ' \\\n | csvtk pretty\nid first_name last_name username full_name\n11 Rob Pike rob Rob Pike\n2 Ken Thompson ken Ken Thompson\n4 Robert Griesemer gri Robert Griesemer\n1 Robert Thompson abc Robert Thompson\nNA Robert Abel 123 Robert Abel\n
Math
$ cat testdata/digitals.tsv | csvtk mutate2 -t -H -e '$1 + $3' -w 0\n4 5 6 10\n1 2 3 4\n7 8 0 7\n8 1,000 4 12\n
Bool
$ cat testdata/digitals.tsv | csvtk mutate2 -t -H -e '$1 > 5'\n4 5 6 false\n1 2 3 false\n7 8 0 true\n8 1,000 4 true\n
Ternary condition (? :
)
$ cat testdata/digitals.tsv | csvtk mutate2 -t -H -e '$1 > 5 ? \"big\" : \"small\" '\n4 5 6 small\n1 2 3 small\n7 8 0 big\n8 1,000 4 big\n
Null coalescence (??
)
$ echo -e \"one,two\\na1,a2\\n,b2\\na2,\" | csvtk pretty \none two\n--- ---\na1 a2\n b2\na2\n\n$ echo -e \"one,two\\na1,a2\\n,b2\\na2,\" \\\n | csvtk mutate2 -n three -e '$one ?? $two' \\\n | csvtk pretty\none two three\n--- --- -----\na1 a2 a1\n b2 b2\na2 a2\n
Specify the position of the new column
$ echo -ne \"a,b,c\\n1,2,3\\n\"\na,b,c\n1,2,3\n\n# in the end (default)\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate2 -e '$a+$c' -n x -w 0\na,b,c,x\n1,2,3,4\n\n# in the beginning\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate2 -e '$a+$c' -n x -w 0 --at 1\nx,a,b,c\n4,1,2,3\n\n# at another position\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate2 -e '$a+$c' -n x -w 0 --at 3\na,b,x,c\n1,2,4,3\n\n# right after the given column name\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate2 -e '$a+$c' -n x -w 0 --after a\na,x,b,c\n1,4,2,3\n\n# right before the given column name\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate2 -e '$a+$c' -n x -w 0 --before c\na,b,x,c\n1,2,4,3\n
Usage
create a new column from selected fields with Go-like expressions\n\nThe expression language is supported by Expr:\n\n https://expr-lang.org/docs/language-definition\n\nVariables formats:\n $1 or ${1} The first field/column\n $a or ${a} Column \"a\"\n ${a,b} or ${a b} or ${a (b)} Column name with special charactors,\n e.g., commas, spaces, and parentheses\n\nSupported Operators:\n\n Arithmetic: + - / * ^ ** %\n Comparison: > >= < <= == !=\n Logical: not ! and && or ||\n String: + contains startsWith endsWith\n Regex: matches\n Range: ..\n Slice: [:]\n Pipe: |\n Ternary conditional: ? :\n Null coalescence: ??\n\nSupported Literals:\n\n Arrays: [1, 2, 3]\n Boolean: true false\n Float: 0.5 .5\n Integer: 42 0x2A 0o52 0b101010\n Map: {a: 1, b: 2}\n Null: nil\n String: \"foo\" 'bar'\n\nSee Expr language definition link for documentation on built-in functions.\n\nCustom functions:\n - ulen(), length of unicode strings/width of unicode strings rendered\n to a terminal, e.g., len(\"\u6c88\u4f1f\")==6, ulen(\"\u6c88\u4f1f\")==4\n\nUsage:\n csvtk mutate3 [flags]\n\nFlags:\n --after string insert the new column right after the given column name\n --at int where the new column should appear, 1 for the 1st column, 0 for the last column\n --before string insert the new column right before the given column name\n -w, --decimal-width int limit floats to N decimal points (default 2)\n -e, --expression string arithmetic/string expressions. e.g. \"'string'\", '\"abc\"', ' $a + \"-\" + $b ',\n '$1 + $2', '$a / $b', ' $1 > 100 ? \"big\" : \"small\" '\n -h, --help help for mutate3\n -n, --name string new column name\n -s, --numeric-as-string treat even numeric fields as strings to avoid converting big numbers into\n scientific notation\n
Examples
Constants
$ cat testdata/digitals.tsv \\\n | csvtk mutate3 -t -H -e \" 'abc' \"\n4 5 6 abc\n1 2 3 abc\n7 8 0 abc\n8 1,000 4 abc\n\n$ val=123 \\\n && cat testdata/digitals.tsv \\\n | csvtk mutate3 -t -H -e \" $val \"\n4 5 6 123\n1 2 3 123\n7 8 0 123\n8 1,000 4 123\n
String concatenation
$ cat testdata/names.csv \\\n | csvtk mutate3 -n full_name -e ' $first_name + \" \" + $last_name ' \\\n | csvtk pretty\nid first_name last_name username full_name\n11 Rob Pike rob Rob Pike\n2 Ken Thompson ken Ken Thompson\n4 Robert Griesemer gri Robert Griesemer\n1 Robert Thompson abc Robert Thompson\nNA Robert Abel 123 Robert Abel\n
Math
$ cat testdata/digitals.tsv | csvtk mutate3 -t -H -e '$1 + $3' -w 0\n4 5 6 10\n1 2 3 4\n7 8 0 7\n8 1,000 4 12\n
Bool
$ cat testdata/digitals.tsv | csvtk mutate3 -t -H -e '$1 > 5'\n4 5 6 false\n1 2 3 false\n7 8 0 true\n8 1,000 4 true\n
Ternary condition (? :
)
$ cat testdata/digitals.tsv | csvtk mutate3 -t -H -e '$1 > 5 ? \"big\" : \"small\" '\n4 5 6 small\n1 2 3 small\n7 8 0 big\n8 1,000 4 big\n
Null coalescence (??
)
$ echo -e \"one,two\\na1,a2\\n,b2\\na2,\" | csvtk pretty\none two\n--- ---\na1 a2\n b2\na2\n\n$ echo -e \"one,two\\na1,a2\\n,b2\\na2,\" \\\n | csvtk mutate3 -n three -e '$one ?? $two' \\\n | csvtk pretty\none two three\n--- --- -----\na1 a2 a1\n b2 b2\na2 a2\n
Specify the position of the new column
$ echo -ne \"a,b,c\\n1,2,3\\n\"\na,b,c\n1,2,3\n\n# in the end (default)\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate3 -e '$a+$c' -n x -w 0\na,b,c,x\n1,2,3,4\n\n# in the beginning\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate3 -e '$a+$c' -n x -w 0 --at 1\nx,a,b,c\n4,1,2,3\n\n# at another position\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate3 -e '$a+$c' -n x -w 0 --at 3\na,b,x,c\n1,2,4,3\n\n# right after the given column name\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate3 -e '$a+$c' -n x -w 0 --after a\na,x,b,c\n1,4,2,3\n\n# right before the given column name\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate3 -e '$a+$c' -n x -w 0 --before c\na,b,x,c\n1,2,4,3\n
Usage
separate column into multiple columns\n\nUsage:\n csvtk sep [flags]\n\nFlags:\n --drop drop extra data, exclusive with --merge\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -h, --help help for sep\n -i, --ignore-case ignore case\n --merge only splits at most N times, exclusive with --drop\n --na string content for filling NA data\n -n, --names strings new column names\n -N, --num-cols int preset number of new created columns\n -R, --remove remove input column\n -s, --sep string separator\n -r, --use-regexp separator is a regular expression\n
Examples:
$ cat players.csv | csvtk collapse -f 1 -v 3 -s ';'\ngender,name\nmale,A;B;C\nfemale,a;b;c;d\n\n# set number of new columns as 3.\n$ cat players.csv | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk sep -f 2 -s ';' -n p1,p2,p3,p4 -N 4 --na NA \\\n | csvtk pretty\ngender name p1 p2 p3 p4\n------ ------- -- -- -- --\nmale A;B;C A B C NA\nfemale a;b;c;d a b c d\n\n# set number of new columns as 3, drop extra values \n$ cat players.csv | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk sep -f 2 -s ';' -n p1,p2,p3 --drop \\\n | csvtk pretty\ngender name p1 p2 p3\n------ ------- -- -- --\nmale A;B;C A B C\nfemale a;b;c;d a b c\n\n# set number of new columns as 3, split as most 3 parts\n$ cat players.csv | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk sep -f 2 -s ';' -n p1,p2,p3 --merge \\\n | csvtk pretty\ngender name p1 p2 p3\n------ ------- -- -- ---\nmale A;B;C A B C\nfemale a;b;c;d a b c;\n\n#\n$ echo -ne \"taxid\\tlineage\\n9606\\tEukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;Homo sapiens\\n\"\ntaxid lineage\n9606 Eukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;Homo sapiens\n\n$ echo -ne \"taxid\\tlineage\\n9606\\tEukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;Homo sapiens\\n\" \\\n | csvtk sep -t -f 2 -s ';' -n kindom,phylum,class,order,family,genus,species --remove \\\n | csvtk pretty -t\ntaxid kindom phylum class order family genus species\n----- --------- -------- -------- -------- --------- ----- ------------\n9606 Eukaryota Chordata Mammalia Primates Hominidae Homo Homo sapiens\n
"},{"location":"usage/#gather","title":"gather","text":"Usage
gather columns into key-value pairs, like tidyr::gather/pivot_longer\n\nUsage:\n csvtk gather [flags]\n\nAliases:\n gather, longer\n\nFlags:\n -f, --fields string fields for gathering. e.g -f 1,2 or -f columnA,columnB, or -f -columnA for\n unselect columnA\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for longer\n -k, --key string name of key column to create in output\n -v, --value string name of value column to create in outpu\n\n
Examples:
$ cat testdata/names.csv | csvtk pretty -S simple\n----------------------------------------\nid first_name last_name username\n----------------------------------------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n----------------------------------------\n\n$ cat testdata/names.csv \\\n | csvtk gather -k item -v value -f -1 \\\n | csvtk pretty -S simple\n-----------------------------\nid item value\n-----------------------------\n11 first_name Rob\n11 last_name Pike\n11 username rob\n2 first_name Ken\n2 last_name Thompson\n2 username ken\n4 first_name Robert\n4 last_name Griesemer\n4 username gri\n1 first_name Robert\n1 last_name Thompson\n1 username abc\nNA first_name Robert\nNA last_name Abel\nNA username 123\n-----------------------------\n
"},{"location":"usage/#spread","title":"spread","text":"Usage
spread a key-value pair across multiple columns, like tidyr::spread/pivot_wider\n\nUsage:\n csvtk spread [flags]\n\nAliases:\n spread, wider, scatter\n\nFlags:\n -h, --help help for spread\n -k, --key string field of the key. e.g -k 1 or -k columnA\n --na string content for filling NA data\n -s, --separater string separater for values that share the same key (default \"; \")\n -v, --value string field of the value. e.g -v 1 or -v columnA\n\n
Examples:
Shuffled columns:
$ csvtk cut -f 1,4,2,3 testdata/names.csv \\\n | csvtk pretty -S simple\n----------------------------------------\nid username first_name last_name\n----------------------------------------\n11 rob Rob Pike\n2 ken Ken Thompson\n4 gri Robert Griesemer\n1 abc Robert Thompson\nNA 123 Robert Abel\n----------------------------------------\n
data -> gather/longer -> spread/wider. Note that the orders of both rows and columns are kept :)
$ csvtk cut -f 1,4,2,3 testdata/names.csv \\\n | csvtk gather -k item -v value -f -1 \\\n | csvtk spread -k item -v value \\\n | csvtk pretty -S simple\n----------------------------------------\nid username first_name last_name\n----------------------------------------\n11 rob Rob Pike\n2 ken Ken Thompson\n4 gri Robert Griesemer\n1 abc Robert Thompson\nNA 123 Robert Abel\n----------------------------------------\n
No header rows
$ echo -ne \"a,a,0\\nb,b,0\\nc,c,0\\na,b,1\\na,c,2\\nb,c,3\\n\"\na,a,0\nb,b,0\nc,c,0\na,b,1\na,c,2\nb,c,3\n\n$ echo -ne \"a,a,0\\nb,b,0\\nc,c,0\\na,b,1\\na,c,2\\nb,c,3\\n\" \\\n | csvtk spread -H -k 2 -v 3 \\\n | csvtk pretty -S bold\n\u250f\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2513\n\u2503 \u2503 a \u2503 b \u2503 c \u2503\n\u2523\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u252b\n\u2503 a \u2503 0 \u2503 1 \u2503 2 \u2503\n\u2523\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u252b\n\u2503 b \u2503 \u2503 0 \u2503 3 \u2503\n\u2523\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u252b\n\u2503 c \u2503 \u2503 \u2503 0 \u2503\n\u2517\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u251b\n
"},{"location":"usage/#unfold","title":"unfold","text":"Usage
unfold multiple values in cells of a field\n\nExample:\n\n $ echo -ne \"id,values,meta\\n1,a;b,12\\n2,c,23\\n3,d;e;f,34\\n\" \\\n | csvtk pretty\n id values meta\n 1 a;b 12\n 2 c 23\n 3 d;e;f 34\n\n\n $ echo -ne \"id,values,meta\\n1,a;b,12\\n2,c,23\\n3,d;e;f,34\\n\" \\\n | csvtk unfold -f values -s \";\" \\\n | csvtk pretty\n id values meta\n 1 a 12\n 1 b 12\n 2 c 23\n 3 d 34\n 3 e 34\n 3 f 34\n\nUsage:\n csvtk unfold [flags]\n\nFlags:\n -f, --fields string field to expand, only one field is allowed. type \"csvtk unfold -h\" for examples\n -h, --help help for unfold\n -s, --separater string separater for folded values (default \"; \")\n
"},{"location":"usage/#fold","title":"fold","text":"Usage
fold multiple values of a field into cells of groups\n\nAttention:\n\n Only grouping fields and value filed are outputted.\n\nExample:\n\n $ echo -ne \"id,value,meta\\n1,a,12\\n1,b,34\\n2,c,56\\n2,d,78\\n\" \\\n | csvtk pretty\n id value meta\n 1 a 12\n 1 b 34\n 2 c 56\n 2 d 78\n\n $ echo -ne \"id,value,meta\\n1,a,12\\n1,b,34\\n2,c,56\\n2,d,78\\n\" \\\n | csvtk fold -f id -v value -s \";\" \\\n | csvtk pretty\n id value\n 1 a;b\n 2 c;d\n\n $ echo -ne \"id,value,meta\\n1,a,12\\n1,b,34\\n2,c,56\\n2,d,78\\n\" \\\n | csvtk fold -f id -v value -s \";\" \\\n | csvtk unfold -f value -s \";\" \\\n | csvtk pretty\n id value\n 1 a\n 1 b\n 2 c\n 2 d\n\nUsage:\n csvtk fold [flags]\n\nAliases:\n fold, collapse\n\nFlags:\n -f, --fields string key fields for grouping. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -F, --fuzzy-fields using fuzzy fields (only for key fields), e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for fold\n -i, --ignore-case ignore case\n -s, --separater string separater for folded values (default \"; \")\n -v, --vfield string value field for folding\n\n\n
examples
data
$ csvtk pretty teachers.csv\nlab teacher class\ncomputational biology Tom Bioinformatics\ncomputational biology Tom Statistics\ncomputational biology Rob Bioinformatics\nsequencing center Jerry Bioinformatics\nsequencing center Nick Molecular Biology\nsequencing center Nick Microbiology\n
List teachers for every lab/class. uniq
is used to deduplicate items.
$ cat teachers.csv \\\n | csvtk uniq -f lab,teacher \\\n | csvtk fold -f lab -v teacher \\\n | csvtk pretty\n\nlab teacher\ncomputational biology Tom; Rob\nsequencing center Jerry; Nick\n\n$ cat teachers.csv \\\n | csvtk uniq -f class,teacher \\\n | csvtk fold -f class -v teacher -s \", \" \\\n | csvtk pretty\n\nclass teacher\nStatistics Tom\nBioinformatics Tom, Rob, Jerry\nMolecular Biology Nick\nMicrobiology Nick\n
Multiple key fields supported
$ cat teachers.csv \\\n | csvtk fold -f teacher,lab -v class \\\n | csvtk pretty\n\nteacher lab class\nTom computational biology Bioinformatics; Statistics\nRob computational biology Bioinformatics\nJerry sequencing center Bioinformatics\nNick sequencing center Molecular Biology; Microbiology\n
Usage
format date of selected fields\n\nDate parsing is supported by: https://github.com/araddon/dateparse\nDate formating is supported by: https://github.com/metakeule/fmtdate\n\nTime zones:\n format: Asia/Shanghai\n whole list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones\n\nOutput format is in MS Excel (TM) syntax.\nPlaceholders:\n\n M - month (1)\n MM - month (01)\n MMM - month (Jan)\n MMMM - month (January)\n D - day (2)\n DD - day (02)\n DDD - day (Mon)\n DDDD - day (Monday)\n YY - year (06)\n YYYY - year (2006)\n hh - hours (15)\n mm - minutes (04)\n ss - seconds (05)\n\n AM/PM hours: 'h' followed by optional 'mm' and 'ss' followed by 'pm', e.g.\n\n hpm - hours (03PM)\n h:mmpm - hours:minutes (03:04PM)\n h:mm:sspm - hours:minutes:seconds (03:04:05PM)\n\n Time zones: a time format followed by 'ZZZZ', 'ZZZ' or 'ZZ', e.g.\n\n hh:mm:ss ZZZZ (16:05:06 +0100)\n hh:mm:ss ZZZ (16:05:06 CET)\n hh:mm:ss ZZ (16:05:06 +01:00)\n\nUsage:\n csvtk fmtdate [flags]\n\nFlags:\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n --format string output date format in MS Excel (TM) syntax, type \"csvtk fmtdate -h\" for\n details (default \"YYYY-MM-DD hh:mm:ss\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for fmtdate\n -k, --keep-unparsed keep the key as value when no value found for the key\n -z, --time-zone string timezone aka \"Asia/Shanghai\" or \"America/Los_Angeles\" formatted time-zone,\n type \"csvtk fmtdate -h\" for details\n\n
Examples
$ csvtk xlsx2csv date.xlsx | csvtk pretty \ndata value\n------------------- -----\n2021-08-25 11:24:21 1\n08/25/21 11:24 p8 2\nNA 3\n 4\n\n$ csvtk xlsx2csv date.xlsx \\\n | csvtk fmtdate --format \"YYYY-MM-DD hh:mm:ss\" \\\n | csvtk pretty \ndata value\n------------------- -----\n2021-08-25 11:24:21 1\n2021-08-25 11:24:00 2\n 3\n 4\n\n$ csvtk xlsx2csv date.xlsx \\\n | csvtk fmtdate --format \"YYYY-MM-DD hh:mm:ss\" -k \\\n | csvtk pretty \ndata value\n------------------- -----\n2021-08-25 11:24:21 1\n2021-08-25 11:24:00 2\nNA 3\n 4\n
"},{"location":"usage/#sort","title":"sort","text":"Usage
sort by selected fields\n\nUsage:\n csvtk sort [flags]\n\nFlags:\n -h, --help help for sort\n -i, --ignore-case ignore-case\n -k, --keys strings keys (multiple values supported). sort type supported, \"N\" for natural order,\n \"n\" for number, \"u\" for user-defined order and \"r\" for reverse. e.g., \"-k 1\" or\n \"-k A:r\" or \"\"-k 1:nr -k 2\" (default [1])\n -L, --levels strings user-defined level file (one level per line, multiple values supported).\n format: <field>:<level-file>. e.g., \"-k name:u -L name:level.txt\n
Examples
data
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n
By single column : csvtk sort -k 1
or csvtk sort -k last_name
in alphabetical order
$ cat testdata/names.csv \\\n | csvtk sort -k first_name\nid,first_name,last_name,username\n2,Ken,Thompson,ken\n11,Rob,Pike,rob\nNA,Robert,Abel,123\n1,Robert,Thompson,abc\n4,Robert,Griesemer,gri\n
in reversed alphabetical order (key:r
)
$ cat testdata/names.csv \\\n | csvtk sort -k first_name:r\nid,first_name,last_name,username\nNA,Robert,Abel,123\n1,Robert,Thompson,abc\n4,Robert,Griesemer,gri\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n
in numerical order (key:n
)
$ cat testdata/names.csv \\\n | csvtk sort -k id:n\nid,first_name,last_name,username\nNA,Robert,Abel,123\n1,Robert,Thompson,abc\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n11,Rob,Pike,rob\n
in natural order (key:N
)
$ cat testdata/names.csv | csvtk sort -k id:N\nid,first_name,last_name,username\n1,Robert,Thompson,abc\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n11,Rob,Pike,rob\nNA,Robert,Abel,123\n
in natural order (key:N
), a bioinformatics example
$ echo \"X,Y,1,10,2,M,11,1_c,Un_g,1_g\" | csvtk transpose \nX\nY\n1\n10\n2\nM\n11\n1_c\nUn_g\n1_g\n\n$ echo \"X,Y,1,10,2,M,11,1_c,Un_g,1_g\" \\\n | csvtk transpose \\\n | csvtk sort -H -k 1:N\n1\n1_c\n1_g\n2\n10\n11\nM\nUn_g\nX\nY\n
By multiple columns: csvtk sort -k 1,2
or csvtk sort -k 1 -k 2
or csvtk sort -k last_name,age
# by first_name and then last_name\n$ cat testdata/names.csv | csvtk sort -k first_name -k last_name\nid,first_name,last_name,username\n2,Ken,Thompson,ken\n11,Rob,Pike,rob\nNA,Robert,Abel,123\n4,Robert,Griesemer,gri\n1,Robert,Thompson,abc\n\n# by first_name and then ID\n$ cat testdata/names.csv | csvtk sort -k first_name -k id:n\nid,first_name,last_name,username\n2,Ken,Thompson,ken\n11,Rob,Pike,rob\nNA,Robert,Abel,123\n1,Robert,Thompson,abc\n4,Robert,Griesemer,gri\n
By user-defined order
# user-defined order/level\n$ cat testdata/size_level.txt\ntiny\nmini\nsmall\nmedium\nbig\n\n# original data\n$ cat testdata/size.csv\nid,size\n1,Huge\n2,Tiny\n3,Big\n4,Small\n5,Medium\n\n$ csvtk sort -k 2:u -i -L 2:testdata/size_level.txt testdata/size.csv\nid,size\n2,Tiny\n4,Small\n5,Medium\n3,Big\n1,Huge\n
Usage
plot common figures\n\nNotes:\n\n 1. Output file can be set by flag -o/--out-file.\n 2. File format is determined by the out file suffix.\n Supported formats: eps, jpg|jpeg, pdf, png, svg, and tif|tiff\n 3. If flag -o/--out-file not set (default), image is written to stdout,\n you can display the image by pipping to \"display\" command of Imagemagic\n or just redirect to file.\n\nUsage:\n csvtk plot [command]\n\nAvailable Commands:\n box plot boxplot\n hist plot histogram\n line line plot and scatter plot\n\nFlags:\n --axis-width float axis width (default 1.5)\n -f, --data-field string column index or column name of data (default \"1\")\n --format string image format for stdout when flag -o/--out-file not given. available\n values: eps, jpg|jpeg, pdf, png, svg, and tif|tiff. (default \"png\")\n -g, --group-field string column index or column name of group\n --height float Figure height (default 4.5)\n -h, --help help for plot\n --label-size int label font size (default 14)\n --na-values strings NA values, case ignored (default [,NA,N/A])\n --scale float scale the image width/height, tick, axes, line/point and font sizes\n proportionally (default 1)\n --skip-na skip NA values in --na-values\n --tick-label-size int tick label font size (default 12)\n --tick-width float axis tick width (default 1.5)\n --title string Figure title\n --title-size int title font size (default 16)\n --width float Figure width (default 6)\n --x-max string maximum value of X axis\n --x-min string minimum value of X axis\n --xlab string x label text\n --y-max string maximum value of Y axis\n --y-min string minimum value of Y axis\n --ylab string y label text\n\n
Note that most of the flags of plot
are global flags of the subcommands hist
, box
and line
Notes of image output
display
command of Imagemagic
or just redirect to file.Usage
plot histogram\n\nNotes:\n\n 1. Output file can be set by flag -o/--out-file.\n 2. File format is determined by the out file suffix.\n Supported formats: eps, jpg|jpeg, pdf, png, svg, and tif|tiff\n 3. If flag -o/--out-file not set (default), image is written to stdout,\n you can display the image by pipping to \"display\" command of Imagemagic\n or just redirect to file.\n\nUsage:\n csvtk plot hist [flags]\n\nFlags:\n --bins int number of bins (default 50)\n --color-index int color index, 1-7 (default 1)\n -h, --help help for hist\n --line-width float line width (default 1)\n --percentiles calculate percentiles\n\n
Examples
example data
$ zcat testdata/grouped_data.tsv.gz | head -n 5 | csvtk -t pretty\nGroup Length GC Content\nGroup A 97 57.73\nGroup A 95 49.47\nGroup A 97 49.48\nGroup A 100 51.00\n
plot histogram with data of the second column:
$ csvtk -t plot hist testdata/grouped_data.tsv.gz -f 2 \\\n --title Histogram -o histogram.png\n
You can also write image to stdout and pipe to \"display\" command of Imagemagic:
$ csvtk -t plot hist testdata/grouped_data.tsv.gz -f 2 | display\n
Usage
plot boxplot\n\nNotes:\n\n 1. Output file can be set by flag -o/--out-file.\n 2. File format is determined by the out file suffix.\n Supported formats: eps, jpg|jpeg, pdf, png, svg, and tif|tiff\n 3. If flag -o/--out-file not set (default), image is written to stdout,\n you can display the image by pipping to \"display\" command of Imagemagic\n or just redirect to file.\n\nUsage:\n csvtk plot box [flags]\n\nFlags:\n --box-width float box width\n --color-index int color index, 1-7 (default 1)\n -h, --help help for box\n --horiz horize box plot\n --line-width float line width (default 1.5)\n --point-size float point size (default 3)\n\n
Examples
plot boxplot with data of the \"GC Content\" (third) column, group information is the \"Group\" column.
csvtk -t plot box testdata/grouped_data.tsv.gz -g \"Group\" -f \"GC Content\" \\\n --width 3 --title \"Box plot\" \\\n > boxplot.png\n
plot horiz boxplot with data of the \"Length\" (second) column, group information is the \"Group\" column.
$ csvtk -t plot box testdata/grouped_data.tsv.gz -g \"Group\" -f \"Length\" \\\n --height 3 --width 5 --horiz --title \"Horiz box plot\" \\\n > boxplot2.png`\n
Usage
line plot and scatter plot\n\nNotes:\n\n 1. Output file can be set by flag -o/--out-file.\n 2. File format is determined by the out file suffix.\n Supported formats: eps, jpg|jpeg, pdf, png, svg, and tif|tiff\n 3. If flag -o/--out-file not set (default), image is written to stdout,\n you can display the image by pipping to \"display\" command of Imagemagic\n or just redirect to file.\n\nUsage:\n csvtk plot line [flags]\n\nFlags:\n --color-index int color index, 1-7 (default 1)\n -x, --data-field-x string column index or column name of X for command line\n -y, --data-field-y string column index or column name of Y for command line\n -h, --help help for line\n --legend-left locate legend along the left edge of the plot\n --legend-top locate legend along the top edge of the plot\n --line-width float line width (default 1.5)\n --point-size float point size (default 3)\n --scatter only plot points\n\n
Examples
example data
$ head -n 5 testdata/xy.tsv\nGroup X Y\nA 0 1\nA 1 1.3\nA 1.5 1.5\nA 2.0 2\n
plot line plot with X-Y data
$ csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group \\\n --title \"Line plot\" \\\n > lineplot.png\n
plot scatter
$ csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group \\\n --title \"Scatter\" --scatter \\\n > lineplot.png\n
Usage
stream file to stdout and report progress on stderr\n\nUsage:\n csvtk cat [flags]\n\nFlags:\n -b, --buffsize int buffer size (default 8192)\n -h, --help help for cat\n -L, --lines count lines instead of bytes\n -p, --print-freq int print frequency (-1 for print after parsing) (default 1)\n -s, --total int expected total bytes/lines (default -1)\n
Examples
Stream file, report progress in bytes
csvtk cat file.tsv\n
Stream file from stdin, report progress in lines
tac input.tsv | csvtk cat -L -s `wc -l < input.tsv` -\n
Usage
generate shell autocompletion script\n\nSupported shell: bash|zsh|fish|powershell\n\nBash:\n\n # generate completion shell\n csvtk genautocomplete --shell bash\n\n # configure if never did.\n # install bash-completion if the \"complete\" command is not found.\n echo \"for bcfile in ~/.bash_completion.d/* ; do source \\$bcfile; done\" >> ~/.bash_completion\n echo \"source ~/.bash_completion\" >> ~/.bashrc\n\nZsh:\n\n # generate completion shell\n csvtk genautocomplete --shell zsh --file ~/.zfunc/_csvtk\n\n # configure if never did\n echo 'fpath=( ~/.zfunc \"${fpath[@]}\" )' >> ~/.zshrc\n echo \"autoload -U compinit; compinit\" >> ~/.zshrc\n\nfish:\n\n csvtk genautocomplete --shell fish --file ~/.config/fish/completions/csvtk.fish\n\nUsage:\n csvtk genautocomplete [flags]\n\nFlags:\n --file string autocompletion file (default \"/home/shenwei/.bash_completion.d/csvtk.sh\")\n -h, --help help for genautocomplete\n --shell string autocompletion type (bash|zsh|fish|powershell) (default \"bash\")\n\n
Please enable JavaScript to view the comments powered by Disqus."}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"csvtk - a cross-platform, efficient and practical CSV/TSV toolkit","text":"Similar to FASTA/Q format in field of Bioinformatics, CSV/TSV formats are basic and ubiquitous file formats in both Bioinformatics and data science.
People usually use spreadsheet software like MS Excel to process table data. However this is all by clicking and typing, which is not automated and is time-consuming to repeat, especially when you want to apply similar operations with different datasets or purposes.
You can also accomplish some CSV/TSV manipulations using shell commands, but more code is needed to handle the header line. Shell commands do not support selecting columns with column names either.
csvtk
is convenient for rapid data investigation and also easy to integrate into analysis pipelines. It could save you lots of time in (not) writing Python/R scripts.
csvkit
-f \"-id,-name\"
for all fields except \"id\" and \"name\", -F -f \"a.*\"
for all fields with prefix \"a.\".sep=,
) of separator declaration used by MS Excel54 subcommands in total.
Information
headers
: prints headersdim
: dimensions of CSV filenrow
: print number of recordsncol
: print number of columnssummary
: summary statistics of selected numeric or text fields (groupby group fields)watch
: online monitoring and histogram of selected fieldcorr
: calculate Pearson correlation between numeric columnsFormat conversion
pretty
: converts CSV to a readable aligned tablecsv2tab
: converts CSV to tabular formattab2csv
: converts tabular format to CSVspace2tab
: converts space delimited format to TSVcsv2md
: converts CSV to markdown formatcsv2rst
: converts CSV to reStructuredText formatcsv2json
: converts CSV to JSON formatcsv2xlsx
: converts CSV/TSV files to XLSX filexlsx2csv
: converts XLSX to CSV formatSet operations
head
: prints first N recordsconcat
: concatenates CSV/TSV files by rowssample
: sampling by proportioncut
: select and arrange fieldsgrep
: greps data by selected fields with patterns/regular expressionsuniq
: unique data without sortingfreq
: frequencies of selected fieldsinter
: intersection of multiple filesfilter
: filters rows by values of selected fields with arithmetic expressionfilter2
: filters rows by awk-like arithmetic/string expressionsjoin
: join files by selected fields (inner, left and outer join)split
splits CSV/TSV into multiple files according to column valuessplitxlsx
: splits XLSX sheet into multiple sheets according to column valuescomb
: compute combinations of items at every rowEdit
fix
: fix CSV/TSV with different numbers of columns in rowsfix-quotes
: fix malformed CSV/TSV caused by double-quotesdel-quotes
: remove extra double-quotes added by fix-quotes
add-header
: add column namesdel-header
: delete column namesrename
: renames column names with new namesrename2
: renames column names by regular expressionreplace
: replaces data of selected fields by regular expressionround
: round float to n decimal placesmutate
: creates new columns from selected fields by regular expressionmutate2
: creates a new column from selected fields by awk-like arithmetic/string expressionsmutate3
: create a new column from selected fields with Go-like expressionsfmtdate
: format date of selected fieldsTransform
transpose
: transposes CSV datasep
: separate column into multiple columnsgather
: gather columns into key-value pairs, like tidyr::gather/pivot_longer
spread
: spread a key-value pair across multiple columns, like tidyr::spread/pivot_wider
unfold
: unfold multiple values in cells of a fieldfold
: fold multiple values of a field into cells of groupsOrdering
sort
: sorts by selected fieldsPloting
plot
see usageplot hist
histogramplot box
boxplotplot line
line plot and scatter plotMisc
cat
stream file and report progressversion
print version information and check for updategenautocomplete
generate shell autocompletion script (bash|zsh|fish|powershell)Download Page
csvtk
is implemented in Go programming language, executable binary files for most popular operating systems are freely available in release page.
Just download compressed executable file of your operating system, and decompress it with tar -zxvf *.tar.gz
command or other tools. And then:
For Linux-like systems
If you have root privilege simply copy it to /usr/local/bin
:
sudo cp csvtk /usr/local/bin/\n
Or copy to anywhere in the environment variable PATH
:
mkdir -p $HOME/bin/; cp csvtk $HOME/bin/\n
For windows, just copy csvtk.exe
to C:\\WINDOWS\\system32
.
# >= v0.31.0\nconda install -c conda-forge csvtk\n\n# <= v0.31.0\nconda install -c bioconda csvtk\n
"},{"location":"#method-3-install-via-homebrew","title":"Method 3: Install via homebrew","text":"brew install csvtk\n
"},{"location":"#method-4-for-go-developer-latest-stabledev-version","title":"Method 4: For Go developer (latest stable/dev version)","text":"go get -u github.com/shenwei356/csvtk/csvtk\n
"},{"location":"#method-5-for-archlinux-aur-users-may-be-not-the-latest","title":"Method 5: For ArchLinux AUR users (may be not the latest)","text":"yaourt -S csvtk\n
"},{"location":"#command-line-completion","title":"Command-line completion","text":"Bash:
# generate completion shell\ncsvtk genautocomplete --shell bash\n\n# configure if never did.\n# install bash-completion if the \"complete\" command is not found.\necho \"for bcfile in ~/.bash_completion.d/* ; do source \\$bcfile; done\" >> ~/.bash_completion\necho \"source ~/.bash_completion\" >> ~/.bashrc\n
Zsh:
# generate completion shell\ncsvtk genautocomplete --shell zsh --file ~/.zfunc/_csvtk\n\n# configure if never did\necho 'fpath=( ~/.zfunc \"${fpath[@]}\" )' >> ~/.zshrc\necho \"autoload -U compinit; compinit\" >> ~/.zshrc\n
fish:
csvtk genautocomplete --shell fish --file ~/.config/fish/completions/csvtk.fish\n
"},{"location":"#compared-to-csvkit","title":"Compared to csvkit
","text":"csvkit, attention: this table wasn't updated for many years.
Features csvtk csvkit Note Read Gzip Yes Yes read gzip files Fields ranges Yes Yes e.g.-f 1-4,6
Unselect fields Yes -- e.g. -1
for excluding first column Fuzzy fields Yes -- e.g. ab*
for columns with name prefix \"ab\" Reorder fields Yes Yes it means -f 1,2
is different from -f 2,1
Rename columns Yes -- rename with new name(s) or from existed names Sort by multiple keys Yes Yes bash sort like operations Sort by number Yes -- e.g. -k 1:n
Multiple sort Yes -- e.g. -k 2:r -k 1:nr
Pretty output Yes Yes convert CSV to readable aligned table Unique data Yes -- unique data of selected fields frequency Yes -- frequencies of selected fields Sampling Yes -- sampling by proportion Mutate fields Yes -- create new columns from selected fields Replace Yes -- replace data of selected fields Similar tools:
More examples and tutorial.
Attention
-H
on.-t
for tab-delimited files.#
will be ignored, if the header row starts with #
, please assign flag -C
another rare symbol, e.g. $
.-I/--ignore-illegal-row
to skip these lines if neccessary. You can also use \"csvtk fix\" to fix files with different numbers of columns in rows.If double-quotes exist in fields not enclosed with double-quotes, e.g.,
x,a \"b\" c,1\n
It would report error:
bare `\"` in non-quoted-field.\n
Please switch on the flag -l
or use csvtk fix-quotes
to fix it.
If somes fields have only a double-quote either in the beginning or in the end, e.g.,
x,d \"e\",\"a\" b c,1\n
It would report an error:
extraneous or missing \" in quoted-field\n
Please use csvtk fix-quotes
to fix it, and use csvtk del-quotes
to reset to the original format as needed.
Examples
Pretty result
$ csvtk pretty names.csv\nid first_name last_name username\n-- ---------- --------- --------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n\n$ csvtk pretty names.csv -S 3line\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n id first_name last_name username\n----------------------------------------\n 11 Rob Pike rob\n 2 Ken Thompson ken\n 4 Robert Griesemer gri\n 1 Robert Thompson abc\n NA Robert Abel 123\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n\n$ csvtk pretty names.csv -S bold -w 5 -m 1-\n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 id \u2503 first_name \u2503 last_name \u2503 username \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 11 \u2503 Rob \u2503 Pike \u2503 rob \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 2 \u2503 Ken \u2503 Thompson \u2503 ken \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 4 \u2503 Robert \u2503 Griesemer \u2503 gri \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 1 \u2503 Robert \u2503 Thompson \u2503 abc \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 NA \u2503 Robert \u2503 Abel \u2503 123 \u2503\n\u2517\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u251b\n
Summary of selected numeric fields, supporting \"group-by\"
$ cat testdata/digitals2.csv \\\n | csvtk summary -i -f f4:sum,f5:sum -g f1,f2 \\\n | csvtk pretty\nf1 f2 f4:sum f5:sum\nbar xyz 7.00 106.00\nbar xyz2 4.00 4.00\nfoo bar 6.00 3.00\nfoo bar2 4.50 5.00\n
Select fields/columns (cut
)
csvtk cut -f 1,2
csvtk cut -f first_name,username
csvtk cut -f -1,-2
or csvtk cut -f -first_name
csvtk cut -F -f \"*_name,username\"
csvtk cut -f 2-4
for column 2,3,4 or csvtk cut -f -3--1
for discarding column 1,2,3csvtk cut -f 1-
or csvtk cut -F -f \"*\"
Search by selected fields (grep
) (matched parts will be highlighted as red)
csvtk grep -f first_name -p Robert -p Rob
csvtk grep -f first_name -r -p Rob
csvtk grep -f first_name -P name_list.txt
csvtk grep -F -f \"*\" -r -p \"^$\" -v
Rename column names (rename
and rename2
)
csvtk rename -f A,B -n a,b
or csvtk rename -f 1-3 -n a,b,c
csvtk rename2 -f 1- -p \"(.*)\" -r 'prefix_$1'
for adding prefix to all column names.Edit data with regular expression (replace
)
csvtk replace -F -f \"*_name\" -p \"\\p{Han}+\" -r \"\"
Create new column from selected fields by regular expression (mutate
)
csvtk mutate -f id
csvtk mutate -f sample -n group -p \"^(.+?)\\.\" --after sample
Sort by multiple keys (sort
)
csvtk sort -k 1
or csvtk sort -k last_name
csvtk sort -k 1,2
or csvtk sort -k 1 -k 2
or csvtk sort -k last_name,age
csvtk sort -k 1:n
or csvtk sort -k 1:nr
for reverse numbercsvtk sort -k region -k age:n -k id:nr
csvtk sort -k chr:N
Join multiple files by keys (join
)
csvtk join -f id file1.csv file2.csv
csvtk join -f \"username;username;name\" names.csv phone.csv adress.csv -k
Filter by numbers (filter
)
csvtk filter -f \"id>0\"
csvtk filter -f \"1-3>0\"
--any
to print record if any of the field satisfy the condition: csvtk filter -f \"1-3>0\" --any
csvtk filter -F -f \"A*!=0\"
Filter rows by awk-like arithmetic/string expressions (filter2
)
csvtk filter2 -f '$3>0'
csvtk filter2 -f '$id > 0'
csvtk filter2 -f '$id > 3 || $username==\"ken\"'
csvtk filter2 -H -t -f '$1 > 2 && $2 % 2 == 0'
Plotting
csvtk -t plot hist testdata/grouped_data.tsv.gz -f 2 | display\n
csvtk -t plot box testdata/grouped_data.tsv.gz -g \"Group\" \\\n -f \"GC Content\" --width 3 --title \"Box plot\" | display\n
csvtk -t plot box testdata/grouped_data.tsv.gz -g \"Group\" -f \"Length\" \\\n --height 3 --width 5 --horiz --title \"Horiz box plot\" | display\n
csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group | display\n
csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group --scatter | display\n
We are grateful to Zhiluo Deng and Li Peng for suggesting features and reporting bugs.
Thanks Albert Vilella for feature suggestions, which makes csvtk feature-rich\u3002
"},{"location":"#contact","title":"Contact","text":"Create an issue to report bugs, propose new functions or ask for help.
Or leave a comment.
"},{"location":"#license","title":"License","text":"MIT License
"},{"location":"#starchart","title":"Starchart","text":""},{"location":"bioinf/","title":"Bioinf","text":""},{"location":"chinese/","title":"\u4e2d\u6587\u4ecb\u7ecd","text":"\u5982\u540c\u751f\u7269\u4fe1\u606f\u9886\u57df\u4e2d\u7684FASTA/Q\u683c\u5f0f\u4e00\u6837\uff0cCSV/TSV\u4f5c\u4e3a\u8ba1\u7b97\u673a\u3001\u6570\u636e\u79d1\u5b66\u548c\u751f\u7269\u4fe1\u606f\u7684\u57fa\u672c\u683c\u5f0f\uff0c\u5e94\u7528\u975e\u5e38\u5e7f\u6cdb\u3002\u5e38\u7528\u7684\u5904\u7406\u8f6f\u4ef6\u5305\u62ec\uff1a
\u7136\u800c\uff0c\u7535\u5b50\u8868\u683c\u8f6f\u4ef6\u548c\u6587\u672c\u7f16\u8f91\u5668\u56fa\u7136\u5f3a\u5927\uff0c\u4f46\u4f9d\u8d56\u9f20\u6807\u64cd\u4f5c\uff0c\u4e0d\u9002\u5408\u6279\u91cf\u5904\u7406\uff1bsed/awk/cut\u7b49Shell\u547d\u4ee4\u4e3b\u8981\u7528\u4e8e\u901a\u7528\u7684\u8868\u683c\u6570\u636e\uff0c\u4e0d\u9002\u5408\u542b\u6709\u6807\u9898\u884c\u7684CSV\u683c\u5f0f\uff1b\u4e3a\u4e86\u4e00\u4e2a\u5c0f\u64cd\u4f5c\u5199Python/R\u811a\u672c\u4e5f\u6709\u70b9\u5c0f\u9898\u5927\u4f5c\uff0c\u4e14\u96be\u4ee5\u590d\u7528\u3002
\u5f00\u53d1csvtk\u524d\u73b0\u6709\u7684\u5de5\u5177\u4e3b\u8981\u662fPython\u5199\u7684csvkit\uff0cRust\u5199\u7684xsv\uff0cC\u8bed\u8a00\u5199\u7684miller\uff0c\u90fd\u5404\u6709\u4f18\u52a3\u3002\u5f53\u65f6\u6211\u521a\u5f00\u53d1\u5b8cseqkit\uff0c\u6295\u6587\u7ae0\u8fc7\u7a0b\u4e2d\u65f6\u95f4\u5145\u8db3\uff0c\u4fbf\u60f3\u8d81\u70ed\u518d\u9020\u4e00\u4e2a\u8f6e\u5b50\u3002
\u6240\u4ee5\u6211\u51b3\u5b9a\u5199\u4e00\u4e2a\u547d\u4ee4\u884c\u5de5\u5177\u6765\u6ee1\u8db3CSV/TSV\u683c\u5f0f\u7684\u5e38\u89c1\u64cd\u4f5c\uff0c\u8fd9\u5c31\u662fcsvtk\u4e86\u3002
"},{"location":"chinese/#_1","title":"\u4ecb\u7ecd","text":"\u57fa\u672c\u4fe1\u606f
\u7279\u6027
\u5728\u5f00\u53d1csvtk\u4e4b\u524d\u7684\u4e24\u4e09\u5e74\u95f4\uff0c\u6211\u5df2\u7ecf\u5199\u4e86\u51e0\u4e2a\u53ef\u4ee5\u590d\u7528\u7684Python/Perl\u811a\u672c\uff08https://github.com/shenwei356/datakit\uff09 \uff0c\u5305\u62eccsv2tab\u3001csvtk_grep\u3001csv_join\u3001csv_melt\uff0cintersection\uff0cunique\u3002\u6240\u4ee5\u6211\u7684\u8ba1\u5212\u662f\u9996\u5148\u96c6\u6210\u8fd9\u4e9b\u5df2\u6709\u7684\u529f\u80fd\uff0c\u968f\u540e\u6839\u636e\u9700\u6c42\u8fdb\u884c\u6269\u5c55\u3002
\u5230\u76ee\u524d\u4e3a\u6b62\uff0ccsvtk\u5df2\u670927\u4e2a\u5b50\u547d\u4ee4\uff0c\u5206\u4e3a\u4ee5\u4e0b\u51e0\u5927\u7c7b\uff1a
headers
\u76f4\u89c2\u6253\u5370\u6807\u9898\u884c\uff08\u64cd\u4f5c\u5217\u6570\u8f83\u591a\u7684CSV\u524d\u4f7f\u7528\u6700\u4f73\uff09stats
\u57fa\u672c\u7edf\u8ba1stats2
\u5bf9\u9009\u5b9a\u7684\u6570\u503c\u5217\u8fdb\u884c\u57fa\u672c\u7edf\u8ba1pretty
\u8f6c\u4e3a\u7f8e\u89c2\u3001\u53ef\u8bfb\u6027\u5f3a\u7684\u683c\u5f0f\uff08\u6700\u5e38\u7528\u547d\u4ee4\u4e4b\u4e00\uff09csv2tab
\u8f6cCSV\u4e3a\u5236\u8868\u7b26\u5206\u5272\u683c\u5f0f\uff08TSV\uff09tab2csv
\u8f6cTSV\u4e3aCSVspace2tab
\u8f6c\u7a7a\u683c\u5206\u5272\u683c\u5f0f\u4e3aTSVtranspose
\u8f6c\u7f6eCSV/TSVcsv2md
\u8f6cCSV/TSV\u4e3amakrdown\u683c\u5f0f\uff08\u5199\u6587\u6863\u5e38\u7528\uff09head
\u6253\u5370\u524dN\u6761\u8bb0\u5f55sample
\u6309\u6bd4\u4f8b\u968f\u673a\u91c7\u6837cut
\u9009\u62e9\u7279\u5b9a\u5217\uff0c\u652f\u6301\u6309\u5217\u6216\u5217\u540d\u8fdb\u884c\u57fa\u672c\u9009\u62e9\u3001\u8303\u56f4\u9009\u62e9\u3001\u6a21\u7cca\u9009\u62e9\u3001\u8d1f\u5411\u9009\u62e9\uff08\u6700\u5e38\u7528\u547d\u4ee4\u4e4b\u4e00\uff0c\u975e\u5e38\u5f3a\u5927\uff09uniq
\u65e0\u987b\u6392\u5e8f\uff0c\u8fd4\u56de\u6309\u6307\u5b9a\uff08\u591a\uff09\u5217\u4f5c\u4e3akey\u7684\u552f\u4e00\u8bb0\u5f55\uff08\u597d\u7ed5\u3002\u3002\uff09freq
\u6309\u6307\u5b9a\uff08\u591a\uff09\u5217\u8fdb\u884c\u8ba1\u6570\uff08\u5e38\u7528\uff09inter
\u591a\u4e2a\u6587\u4ef6\u95f4\u7684\u4ea4\u96c6grep
\u6307\u5b9a\uff08\u591a\uff09\u5217\u4e3aKey\u8fdb\u884c\u641c\u7d22\uff08\u6700\u5e38\u7528\u547d\u4ee4\u4e4b\u4e00\uff0c\u53ef\u6309\u6307\u5b9a\u5217\u641c\u7d22\uff09filter
\u6309\u6307\u5b9a\uff08\u591a\uff09\u5217\u7684\u6570\u503c\u8fdb\u884c\u8fc7\u6ee4filter2
\u7528\u7c7b\u4f3cawk\u7684\u6570\u503c/\u8868\u8fbe\u5f0f\uff0c\u6309\u6307\u5b9a\uff08\u591a\uff09\u5217\u7684\u6570\u503c\u8fdb\u884c\u8fc7\u6ee4join
\u5408\u5e76\u591a\u4e2a\u6587\u4ef6\uff08\u5e38\u7528\uff09rename
\u76f4\u63a5\u91cd\u547d\u540d\u6307\u5b9a\uff08\u591a\uff09\u5217\u540d\uff08\u7b80\u5355\u800c\u5b9e\u7528\uff09rename2
\u4ee5\u6b63\u5219\u8868\u8fbe\u5f0f\u91cd\u547d\u540d\u6307\u5b9a\uff08\u591a\uff09\u5217\u540d\uff08\u7b80\u5355\u800c\u5b9e\u7528\uff09replace
\u4ee5\u6b63\u5219\u8868\u8fbe\u5f0f\u5bf9\u6307\u5b9a\uff08\u591a\uff09\u5217\u8fdb\u884c\u66ff\u6362\u7f16\u8f91\uff08\u6700\u5e38\u7528\u547d\u4ee4\u4e4b\u4e00\uff0c\u53ef\u6309\u6307\u5b9a\u5217\u7f16\u8f91\uff09mutate
\u4ee5\u6b63\u5219\u8868\u8fbe\u5f0f\u57fa\u4e8e\u5df2\u6709\u5217\u521b\u5efa\u65b0\u7684\u4e00\u5217\uff08\u5e38\u7528\u4e8e\u751f\u6210\u591a\u5217\u6d4b\u8bd5\u6570\u636e\uff09mutate2
\u7528\u7c7b\u4f3cawk\u7684\u6570\u503c/\u8868\u8fbe\u5f0f\uff0c\u4ee5\u6b63\u5219\u8868\u8fbe\u5f0f\u57fa\u4e8e\u5df2\u6709\uff08\u591a\uff09\u5217\u521b\u5efa\u65b0\u7684\u4e00\u5217\uff08\u5e38\u7528\uff09gather
\u7c7b\u4f3c\u4e8eR\u91cc\u9762tidyr\u5305\u7684gather\u65b9\u6cd5sort
\u6309\u6307\u5b9a\uff08\u591a\uff09\u5217\u8fdb\u884c\u6392\u5e8fplot
\u57fa\u672c\u7ed8\u56feplot hist
histogramplot box
boxplotplot line
line plot and scatter plotversion
\u7248\u672c\u4fe1\u606f\u548c\u68c0\u67e5\u65b0\u7248\u672cgenautocomplete
\u751f\u6210\u652f\u6301Bash\u81ea\u52a8\u8865\u5168\u7684\u914d\u7f6e\u6587\u4ef6\uff0c\u91cd\u542fTerminal\u751f\u6548\u3002-H
-t
\"\"
\uff0c\u8bf7\u5f00\u542f\u5168\u5c40\u53c2\u6570-l
#
\u5f00\u59cb\u7684\u4e3a\u6ce8\u91ca\u884c\uff0c\u82e5\u6807\u9898\u884c\u542b#
\uff0c\u8bf7\u7ed9\u5168\u5c40\u53c2\u6570-C
\u6307\u5b9a\u53e6\u4e00\u4e2a\u4e0d\u5e38\u89c1\u7684\u5b57\u7b26\uff08\u5982$
\uff09\u4ec5\u63d0\u4f9b\u5c11\u91cf\u4f8b\u5b50\uff0c\u66f4\u591a\u4f8b\u5b50\u8bf7\u770b\u4f7f\u7528\u624b\u518c http://bioinf.shenwei.me/csvtk/usage/ \u3002
\u793a\u4f8b\u6570\u636e
$ cat names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n
\u589e\u5f3a\u53ef\u8bfb\u6027
$ cat names.csv | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
\u8f6c\u4e3amarkdown
$ cat names.csv | csvtk csv2md\nid |first_name|last_name|username\n:--|:---------|:--------|:-------\n11 |Rob |Pike |rob\n2 |Ken |Thompson |ken\n4 |Robert |Griesemer|gri\n1 |Robert |Thompson |abc\nNA |Robert |Abel |123\n
\u6548\u679c
id first_name last_name username 11 Rob Pike rob 2 Ken Thompson ken 4 Robert Griesemer gri 1 Robert Thompson abc NA Robert Abel 123\u7528\u5217\u6216\u5217\u540d\u6765\u9009\u62e9\u6307\u5b9a\u5217\uff0c\u53ef\u6539\u53d8\u5217\u7684\u987a\u5e8f
$ cat names.csv | csvtk cut -f 3,1 | csvtk pretty\n$ cat names.csv | csvtk cut -f last_name,id | csvtk pretty\nlast_name id\nPike 11\nThompson 2\nGriesemer 4\nThompson 1\nAbel NA\n
\u7528\u901a\u914d\u7b26\u9009\u62e9\u591a\u5217
$ cat names.csv | csvtk cut -F -f '*name,id' | csvtk pretty\nfirst_name last_name username id\nRob Pike rob 11\nKen Thompson ken 2\nRobert Griesemer gri 4\nRobert Thompson abc 1\nRobert Abel 123 NA\n
\u5220\u9664\u7b2c2\uff0c3\u5217\uff08\u4e0b\u5217\u7b2c\u4e8c\u79cd\u65b9\u6cd5\u662f\u9009\u5b9a\u8303\u56f4\uff0c\u4f46-3\u5728\u524d,-2\u5728\u540e\uff09
$ cat names.csv | csvtk cut -f -2,-3 | csvtk pretty\n$ cat names.csv | csvtk cut -f -3--2 | csvtk pretty\n$ cat names.csv | csvtk cut -f -first_name,-last_name | csvtk pretty\nid username\n11 rob\n2 ken\n4 gri\n1 abc\nNA 123\n
\u6309\u6307\u5b9a\u5217\u641c\u7d22\uff0c\u9ed8\u8ba4\u7cbe\u786e\u5339\u914d
$ cat names.csv | csvtk grep -f id -p 1 | csvtk pretty\nid first_name last_name username\n1 Robert Thompson abc\n
\u6a21\u7cca\u641c\u7d22\uff08\u6b63\u5219\u8868\u8fbe\u5f0f\uff09
$ cat names.csv | csvtk grep -f id -p 1 -r | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n1 Robert Thompson abc\n
\u7528\u6587\u4ef6\u4f5c\u4e3a\u6a21\u5f0f\u6765\u6e90
$ cat names.csv | csvtk grep -f id -P id-files.txt\n
\u5bf9\u6307\u5b9a\u5217\u505a\u7b80\u5355\u66ff\u6362
$ cat names.csv | csvtk replace -f id -p '(\\d+)' -r 'ID: $1' \\\n | csvtk pretty\nid first_name last_name username\nID: 11 Rob Pike rob\nID: 2 Ken Thompson ken\nID: 4 Robert Griesemer gri\nID: 1 Robert Thompson abc\nNA Robert Abel 123\n
\u7528key-value\u6587\u4ef6\u6765\u66ff\u6362\uff08seqkit\u548cbrename\u90fd\u652f\u6301\u7c7b\u4f3c\u64cd\u4f5c\uff09
$ cat data.tsv\nname id\nA ID001\nB ID002\nC ID004\n\n$ cat alias.tsv\n001 Tom\n002 Bob\n003 Jim\n\n$ csvtk replace -t -f 2 -p \"ID(.+)\" -r \"N: {nr}, alias: {kv}\" -k \\\n alias.tsv data.tsv\nname id\nA N: 1, alias: Tom\nB N: 2, alias: Bob\nC N: 3, alias: 004\n
\u5408\u5e76\u8868\u683c\uff0c\u9700\u8981\u5206\u522b\u6307\u5b9a\u5404\u6587\u4ef6\u4e2d\u7684key\u5217\uff1a\u9ed8\u8ba4\u5747\u4e3a\u7b2c\u4e00\u5217\uff1b\u82e5\u5217\uff08\u540d\uff09\u76f8\u540c\u63d0\u4f9b\u4e00\u4e2a\uff1b\u82e5\u4e0d\u540c\u7528\u5206\u53f7\u5206\u5272
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ csvtk join -f 'username;username' --keep-unmatched names.csv phones.csv \\\n | csvtk pretty\nid first_name last_name username phone\n11 Rob Pike rob 12345\n2 Ken Thompson ken 22222\n4 Robert Griesemer gri 11111\n1 Robert Thompson abc\nNA Robert Abel 123\n
csvtk
is implemented in Go programming language, executable binary files for most popular operating system are freely available in release page.
csvtk filter2/mutate2/mutate3
:csvtk pretty
:-w/--min-width
and -W/--max-width
accept multiple values for setting column-specific thresholds.round
for round corners.Notes
csvtk version
to check update !!!csvtk genautocomplete
to update Bash completion !!!Download Page
csvtk
is implemented in Go programming language, executable binary files for most popular operating systems are freely available in release page.
Just download compressed executable file of your operating system, and decompress it with tar -zxvf *.tar.gz
command or other tools. And then:
For Linux-like systems
If you have root privilege simply copy it to /usr/local/bin
:
sudo cp csvtk /usr/local/bin/\n
Or copy to anywhere in the environment variable PATH
:
mkdir -p $HOME/bin/; cp csvtk $HOME/bin/\n
For windows, just copy csvtk.exe
to C:\\WINDOWS\\system32
.
# >= v0.31.0\nconda install -c conda-forge csvtk\n\n# <= v0.31.0\nconda install -c bioconda csvtk\n
"},{"location":"download/#method-3-install-via-homebrew-may-be-not-the-latest","title":"Method 3: Install via homebrew (may be not the latest)","text":"brew install csvtk\n
"},{"location":"download/#method-4-for-go-developer-latest-stabledev-version","title":"Method 4: For Go developer (latest stable/dev version)","text":"go get -u github.com/shenwei356/csvtk/csvtk\n
"},{"location":"download/#method-5-for-archlinux-aur-users-may-be-not-the-latest","title":"Method 5: For ArchLinux AUR users (may be not the latest)","text":"yaourt -S csvtk\n
"},{"location":"download/#method-6-compiling-from-source-latest-stabledev-version","title":"Method 6: Compiling from source (latest stable/dev version)","text":"# ------------------- install golang -----------------\n\n# download Go from https://go.dev/dl\nwget https://go.dev/dl/go1.17.12.linux-amd64.tar.gz\n\ntar -zxf go1.17.12.linux-amd64.tar.gz -C $HOME/\n\n# or \n# echo \"export PATH=$PATH:$HOME/go/bin\" >> ~/.bashrc\n# source ~/.bashrc\nexport PATH=$PATH:$HOME/go/bin\n\n\n# ------------- the latest stable version -------------\n\ngo get -v -u github.com/shenwei356/csvtk/csvtk\n\n# The executable binary file is located in:\n# ~/go/bin/csvtk\n# You can also move it to anywhere in the $PATH\nmkdir -p $HOME/bin\ncp ~/go/bin/csvtk $HOME/bin/\n\n# --------------- the development version --------------\n\ngit clone https://github.com/shenwei356/csvtk\ncd csvtk/csvtk/\ngo build\n\n# The executable binary file is located in:\n# ./csvtk\n# You can also move it to anywhere in the $PATH\nmkdir -p $HOME/bin\ncp ./csvtk $HOME/bin/\n
"},{"location":"download/#shell-completion","title":"Shell-completion","text":"Bash:
# generate completion shell\ncsvtk genautocomplete --shell bash\n\n# configure if never did.\n# install bash-completion if the \"complete\" command is not found.\necho \"for bcfile in ~/.bash_completion.d/* ; do source \\$bcfile; done\" >> ~/.bash_completion\necho \"source ~/.bash_completion\" >> ~/.bashrc\n
Zsh:
# generate completion shell\ncsvtk genautocomplete --shell zsh --file ~/.zfunc/_csvtk\n\n# configure if never did\necho 'fpath=( ~/.zfunc \"${fpath[@]}\" )' >> ~/.zshrc\necho \"autoload -U compinit; compinit\" >> ~/.zshrc\n
fish:
csvtk genautocomplete --shell fish --file ~/.config/fish/completions/csvtk.fish\n
"},{"location":"download/#release-history","title":"Release history","text":"csvtk filter2/mutate2/mutate3
:csvtk csv2json
:csvtk mutate3
: create a new column from selected fields with Go-like expressions. Contributed by @moorereason 172csvtk sort/join
:csvtk sort
:csvtk summary
:csvtk rename2
:-n/--start-num
. #286--nr-width
.csvtk replace
:{nr}
. #286csvtk csv2json
:csvtk split
:csvtk spread
:csvtk grep
:csvtk fix-quotes
:-b, --buffer-size
.csvtk plot
:--scale
for scaling the image width/height, tick, axes, line/point and font sizes proportionally, adviced by @tseemann.csvtk plot line
:csvtk hist
:--line-width
.csvtk box
:--line-width
, --point-size
, and color-index
.csvtk
:--quiet
. #261-U, --delete-header
for disable outputing the header row. Supported commands: concat, csv2tab/tab2csv, csv2xlsx/xlsx2csv, cut, filter, filter2, freq, fold/unfold, gather, fmtdate, grep, head, join, mutate, mutate2, replace, round, sample. #258-Z/--show-row-number
: head.csvtk dim
:csvtk concat
:csvtk spread
:-k
and -v
.csvtk sort
:csvtk filter/filter2
:-Z
.csvtk xls2csv
:csvtk pretty
:-n/--buf-rows
from 128 to 1024, and 0 for loading all data.csvtk join
:-s/--suffix
for adding suffixes to colnames from each file. #263fix-quotes
: fix malformed CSV/TSV caused by double-quotes. #260del-quotes
: remove extra double-quotes added by fix-quotes
.csvtk del-header
:csvtk concat
:csvtk sort
:csvtk filter2
:in
keyword. #195csvtk plot
:--tick-label-size
.csvtk pretty
:csvtk
:-X
for the flag --infile-list
. #249csvtk pretty
:-m/--align-center
and -r/--align-right
. #244csvtk spread
:csvtk join
:-P/--prefix-duplicates
: add filenames as colname prefixes only for duplicated colnames. #246csvtk mutate2
:csvtk xlsx2csv
:open /tmp/excelize-: no such file or directory
error for big .xlsx
files. #251csvtk comb
:csvtk pretty
:-H/--no-header-row
, introduced in v0.27.0.3line
for three-line table.csvtk csv2xlsx
:csvtk splitxlsx
:invalid worksheet index
. #1617csvtk filter2/mutate2
:csvtk
:csvtk grep -f 2-
. #120-Z/--show-row-number
, supported commands: cut, csv2tab, csv2xlsx, tab2csv, pretty.csvtk spread
: spread a key-value pair across multiple columns, like tidyr::spread/pivot_wider. #91, #236, #239csvtk mutate/mutate2
:--at
, --before
, --after
for specifying the position of the new column. #193csvtk cut
:-i/--ignore-case
.csvtk pretty
:csvtk round
:7.1E-1
.csvtk summary
:csvtk corr/watch
:csvtk
:--infile-list
accepts stdin \"-\". #210csvtk fix
: fix CSV/TSV with different numbers of columns in rows. #226csvtk pretty
: rewrite to support wrapping cells. #206 #209 #228csvtk cut/fmtdate/freq/grep/rename/rename2/replace/round
: allow duplicated column names.csvtk csv2xlsx
: optionally stores numbers as float. #217csvtk xlsx2csv
: fix bug where xlsx2csv
treats small number (padj < 1e-25) as 0. It's solved by updating the excelize package. #261csvtk join
: a new flag for adding filename as column name prefix. by @tetedange13 #202csvtk mutate2
: fix wrongly treating strings like E10
as numbers in scientific notation. #219csvtk sep
: fix the logic. #218csvtk space2tab
: fix \"bufio.Scanner: token too long\". #231csvtk
: report empty files.csvtk join
: fix loading file with no records.csvtk filter2/muate2
:${var}
with special charactors including commas, spaces, and parentheses, e.g., ${a,b}
, ${a b}
, or ${a (b)}
. #186csvtk sort
: fix checking non-existed fileds.csvtk plot box/hist/line
: new flag --skip-na
for skipping missing data. #188csvtk csv2xlsx
: stores number as float. #192csvtk summary
: new functions argmin
and argmax
. #181csvtk mutate2/summary
:mutate2
: remove the option -L/--digits
.-w/--decimal-width
to limit floats to N decimal points.csvtk fmtdate
: format date of selected fields. #159csvtk grep
: fix bug for searching with -r -p .
.csvtk csv2rst
: fix bug for data containing unicode. #137csvtk filter2
: fix bug for date expression. #146csvtk mutate2/filter2
: len()
. #153csvtk cut
: new flags -m/--allow-missing-col
and -b/--blank-missing-col
. #156csvtk pretty
: still add header row for empty column.csvtk csv2md
: better format.csvtk join
: new flag -n/--ignore-null
. #163csvtk csv2rst
for converting CSV to reStructuredText format. #137csvtk pretty
: add header separator line. #123csvtk mutate2/summary
: fix message and doc. Thanks @VladimirAlexiev #127csvtk mutate2
: fix null coalescence: ??. #129csvtk genautocomplete
: supports bash|zsh|fish|powershell. #126csvtk cat
: fix progress bar. #130csvtk grep
: new flag immediate-output
.csvtk csv2xlsx
: fix bug for table with > 26 columns. 138csvtk
:-t
does not overide -D
anymore. #114tsvtk
the -t/--tabs
option for tab input is set. Thanks @bsipos. #117csvtk csv2xlsx
for converting CSV/TSV file(s) to a single .xlsx
file.csvtk unfold
for unfolding multiple values in cells of a field. #103csvtk collapse
to csvtk fold
, for folding multiple values of a field into cells of groups.csvtk cut
: support range format 2-
to choose 2nd column to the end. #106csvtk round
: fix bug of failing to round scientific notation with value small than one, e.g., 7.1E-1
.csvtk nrow/ncol
for printing number of rows or columns.round
to round float to n decimal places. #112csvtk headers
: file name and column index is optional outputted with new flag -v/--verbose
.csvtk dim
: new flags --tabluar
, --cols
, --rows
, -n/--no-files
.csvtk dim/ncol/nrow
: can handle empty files now. #108csvtk csv2json
#104:-b/--blank
: do not convert \"\", \"na\", \"n/a\", \"none\", \"null\", \".\" to null-n/--parse-num
: parse numeric values for nth column(s), multiple values are supported and \"a\"/\"all\" for all columns.csvtk xlsx2csv
: fix output for ragged table. #110csvtk join
: fix bug for joining >2 files.csvtk uniq
: new flag -n/--keep-n
for keeping first N records of every key.csvtk cut
: support repeatedly selecting columns. #106csvtk comb
: compute combinations of items at every row.csvtk sep
: separate column into multiple columns. #96csvtk
:-I
) and empty (-E
) rows. #97--infile-list
for giving file of input files list (one file per line), if given, they are appended to files from cli argumentscsvtk join
:-i/--ignore-case
. #99-L/--left-join
: left join, equals to -k/--keep-unmatched, exclusive with --outer-join
-O/--outer-join
: outer join, exclusive with --left-join--fill
to --na
.csvtk filter2
: fix bug when column names start with digits, e.g., 1000g2015aug
. Thank @VorontsovIE (#44)csvtk concat
: allow one input file. #98csvtk mutate
: new flag -R/--remove
for removing input column.csvtk
:csvtk cut -f a, b
.csvtk summary
: fix err of q1 and q3. #90csvtk version
: making checking update optional.watch
: online monitoring and histogram of selected field.corr
: calculate Pearson correlation between numeric columns.cat
: stream file and report progress.csvtk split
: fix bug of repeatedly output header line when number of output files exceed value of --buf-groups
. #83csvtk plot hist
: new option --percentiles
to add percentiles to histogram x label. #88csvtk replace/rename2/splitxlsx
: fix flag conflicts with global flag -I
since v0.18.0.csvtk replace/rename2
: removing shorthand flag -I
for --key-capt-idx
.csvtk splitxlsx
: changing shorthand flag of --sheet-index
from -I
to -N
.csvtk sort
: fix mutiple-key-sort containing natural order sorting. #79csvtk xlsx2csv
: reacts to global flags -t
, -T
, -D
and -E
. #78csvtk
: add new flag --ignore-illegal-row
to skip illegal rows. #72csvtk summary
: add more textual/numeric operations. #64csvtk sort
: fix bug for sorting by columns with empty values. #70csvtk grep
: add new flag --delete-matched
to delete a pattern right after being matched, this keeps the firstly matched data and speedups when using regular expressions. #77csvtk add-header
and csvtk del-header
for adding/deleting column names. [#62]csvtk csv2json
: convert CSV to JSON format.csvtk stats2
.csvtk summary
: summary statistics of selected digital fields (groupby group fields), usage and examples. #59csvtk replace
: add flag --nr-width
: minimum width for {nr} in flag -r/--replacement. e.g., formating \"1\" to \"001\" by --nr-width 3
(default 1)csvtk rename2/replace
: add flag -A, --kv-file-all-left-columns-as-value
, for treating all columns except 1th one as value for kv-file with more than 2 columns. #56csvtk
: add global flag -E/--ignore-empty-row
to skip empty row. #50csvtk mutate2
: add flag -s/--digits-as-string
for not converting big digits into scientific notation. #46csvtk sort
: add support for sorting in natural order. #49csvtk
: supporting multi-line fields by replacing multicorecsv with standard library encoding/csv, while losing support for metaline which was supported since v0.7.0. It also gain a little speedup.csvtk sample
: add flag -n/--line-number
to print line number as the first column (\"n\")csvtk filter2
: fix bug when column names start with digits, e.g., 1000g2015aug
(#44)csvtk rename2
: add support for similar repalecement symbols {kv} and {nr}
in csvtk replace
concat
for concatenating CSV/TSV files by rows #38csvtk
: add support for environment variables for frequently used global flags #39CSVTK_T
for flag -t/--tabs
CSVTK_H
for flag -H/--no-header-row
mutate2
: add support for eval expression WITHOUT column index symbol, so we can add some string constants #37pretty
: better support for files with duplicated column namescollapse
: collapsing one field with selected fields as keysfreq
: keeping orignal order of keys by defaultsplit
:-G/--out-gzip
for forcing output gzipped filesplit
to split CSV/TSV into multiple files according to column valuessplitxlxs
to split XLSX sheet into multiple sheets according to column valuescsvtk
, automatically check BOM (byte-order mark) and discard itxlsx2csv
to convert XLSX to CSV formatgrep
, filter
, filter2
: add flag -n/--line-number
to print line-number as the first columncut
: add flag -i/--ignore-case
to ignore case of column namecsvtk replace
: fix bug when replacing with key-value pairs brought in v0.8.0csvtk mutate2
: create new column from selected fields by awk-like arithmetic/string expressionsgenautocomplete
to generate shell autocompletion script!csvtk gather
for gathering columns into key-value pairs.csvtk sort
: support sorting by user-defined order.cut
, filter
, fitler2
, freq
, grep
, inter
, mutate
, rename
, rename2
, replace
, stats2
, uniq
.-F/--fuzzy-fields
.-t
, which overrides both -d
and -D
. If you want other delimiter for tabular input, use -t $'\\t' -D \"delimiter\"
.csvtk plot box
and csvtk plot line
: fix bugs for special cases of input-F/--fuzzy-fields
csvtk pretty
and csvtk csv2md
: add attention that these commands treat the first row as header line and require them to be unique.csvtk stat
renamed to csvtk stats
, old name is still available as an alias.csvtk stat2
renamed to csvtk stats2
, old name is still available as an alias.csvtk cut
: minor bug: panic when no fields given. i.e., csvtk cut
. All relevant commands have been fixed.csvtk grep
: large performance improvement by discarding goroutine (multiple threads), and keeping output in order of input.cut
, filter
, freq
, grep
, inter
, mutate
, rename
, rename2
, replace
, stat2
, and uniq
.csvtk filter2
, filtering rows by arithmetic/string expressions like awk
.csvtk cut
: delete flag -n/--names
, move it to a new command csvtk headers
csvtk headers
csvtk head
csvtk sample
csvtk grep
: fix result highlight when flag -v
is on.csvtk join
: support the 2nd or later files with entries with same ID.csvtk freq
: frequencies of selected fields-n
is not required anymore when flag -H
in csvtk mutate
csvtk grep
: if the pattern matches multiple parts, the text will be wrongly edited.csvtk replace
: -K
(--keep-key
) keep the key as value when no value found for the key. This is open in default in previous versions.csvtk sort
resultcsvtk grep -r -p
, when value of -p
contain \"[\" and \"]\" at the beginning or end, they are wrongly parsed.csvtk cut
supports ordered fields output. e.g., csvtk cut -f 2,1
outputs the 2nd column in front of 1th column.csvtk plot
can plot three types of plots by subcommands:csvtk plot hist
: histogramcsvtk plot box
: boxplotcsvtk plot line
: line plot and scatter plot-f \"-id\"
csvtk replace
support replacement symbols {nr}
(record number) and {kv}
(corresponding value of the key ($1) by key-value file)--fill
for csvtk join
, so we can fill the unmatched data\\r\\n
from a dependency packagecsv2md
version
which could check for updatecsvtk replace
that head row should not be edited.csvtk grep -t -P
inter
grep
csv2md
pretty
csvtk cut -n
filter
pretty
-- convert CSV to readable aligned tablegrep
grep
stat
that failed to considerate files with header rowstat2
- summary of selected number fieldsstat
prettier--colnames
to cut
-f
(--fields
) of join
supports single value now--keep-unmathed
to join
mutate
The CSV parser used by csvtk follows the RFC4180 specification.
"},{"location":"faq/#bare-in-non-quoted-field","title":"bare \" in non-quoted-field","text":" 5. Each field may or may not be enclosed in double quotes (however\n some programs, such as Microsoft Excel, do not use double quotes\n at all). If fields are not enclosed with double quotes, then\n double quotes may not appear inside the fields. For example:\n\n \"aaa\",\"bbb\",\"ccc\" CRLF\n zzz,yyy,xxx\n\n 6. Fields containing line breaks (CRLF), double quotes, and commas\n should be enclosed in double-quotes. For example:\n\n \"aaa\",\"b CRLF\n bb\",\"ccc\" CRLF\n zzz,yyy,xxx\n\n 7. If double-quotes are used to enclose fields, then a double-quote\n appearing inside a field must be escaped by preceding it with\n another double quote. For example:\n\n \"aaa\",\"b\"\"bb\",\"ccc\"\n
If a single double-quote exists in one non-quoted-field, an error will be reported. e.g,
$ echo 'a,abc\" xyz,d'\na,abc\" xyz,d\n\n$ echo 'a,abc\" xyz,d' | csvtk cut -f 1-\n[ERRO] parse error on line 1, column 6: bare \" in non-quoted-field\n
You can add the flag -l/--lazy-quotes
to fix this.
$ echo 'a,abc\" xyz,d' | csvtk cut -f 1- -l\na,\"abc\"\" xyz\",d\n
"},{"location":"faq/#extraneous-or-missing-in-quoted-field","title":"extraneous or missing \" in quoted-field","text":"But for the situation below, -l/--lazy-quotes
won't help:
$ echo 'a,\"abc\" xyz,d'\na,\"abc\" xyz,d\n\n$ echo 'a,\"abc\" xyz,d' | csvtk cut -f 1-\n[ERRO] parse error on line 1, column 7: extraneous or missing \" in quoted-field\n\n$ echo 'a,\"abc\" xyz,d' | csvtk cut -f 1- -l\na,\"abc\"\" xyz,d\n\"\n\n$ echo 'a,\"abc\" xyz,d' | csvtk cut -f 1- -l | csvtk dim\nfile num_cols num_rows\n- 2 0\n
You need to use csvtk fix-quotes (available in v0.29.0 or later versions):
$ echo 'a,\"abc\" xyz,d' | csvtk fix-quotes\na,\"\"\"abc\"\" xyz\",d\n\n$ echo 'a,\"abc\" xyz,d' | csvtk fix-quotes | csvtk cut -f 1-\na,\"\"\"abc\"\" xyz\",d\n\n$ echo 'a,\"abc\" xyz,d' | csvtk fix-quotes | csvtk cut -f 1- | csvtk dim\nfile num_cols num_rows\n- 3 0\n
Use del-quotes if you need the original format after some operations.
$ echo 'a,\"abc\" xyz,d' | csvtk fix-quotes | csvtk cut -f 1- | csvtk del-quotes\na,\"abc\" xyz,d\n
"},{"location":"tutorial/","title":"Tutorial","text":""},{"location":"tutorial/#analyzing-otu-table","title":"Analyzing OTU table","text":""},{"location":"tutorial/#data","title":"Data","text":"Here is a mock OTU table from 16S rRNA sequencing result. Columns are sample IDs in format of \"GROUP.ID\"
$ cat otu_table.csv\nTaxonomy,A.1,A.2,A.3,B.1,B.2,B.3,C.1,C.2\nProteobacteria,.13,.29,.13,.16,.13,.22,.30,.23\nFirmicutes,.42,.06,.49,.41,.55,.41,.32,.38\nBacteroidetes,.19,.62,.12,.33,.16,.29,.34,.35\nDeferribacteres,.17,.00,.24,.01,.01,.01,.01,.01\n
What a mess! Let's make it prettier!
$ csvtk pretty otu_table.csv\nTaxonomy A.1 A.2 A.3 B.1 B.2 B.3 C.1 C.2\nProteobacteria .13 .29 .13 .16 .13 .22 .30 .23\nFirmicutes .42 .06 .49 .41 .55 .41 .32 .38\nBacteroidetes .19 .62 .12 .33 .16 .29 .34 .35\nDeferribacteres .17 .00 .24 .01 .01 .01 .01 .01\n
"},{"location":"tutorial/#steps","title":"Steps","text":"Counting
$ csvtk stat otu_table.csv\nfile num_cols num_rows\notu_table.csv 9 4\n
Column names
$ csvtk headers otu_table.csv\n# otu_table.csv\n1 Taxonomy\n2 A.1\n3 A.2\n4 A.3\n5 B.1\n6 B.2\n7 B.3\n8 C.1\n9 C.2\n
Convert to tab-delimited table
$ csvtk csv2tab otu_table.csv\nTaxonomy A.1 A.2 A.3 B.1 B.2 B.3 C.1 C.2\nProteobacteria .13 .29 .13 .16 .13 .22 .30 .23\nFirmicutes .42 .06 .49 .41 .55 .41 .32 .38\nBacteroidetes .19 .62 .12 .33 .16 .29 .34 .35\nDeferribacteres .17 .00 .24 .01 .01 .01 .01 .01\n
Extract data of group A and B and save to file -o otu_table.gAB.csv
$ csvtk cut -F -f \"Taxonomy,A.*,B.*\" otu_table.csv -o otu_table.gAB.csv\n\n$ csvtk pretty otu_table.gAB.csv\nTaxonomy A.1 A.2 A.3 B.1 B.2 B.3\nProteobacteria .13 .29 .13 .16 .13 .22\nFirmicutes .42 .06 .49 .41 .55 .41\nBacteroidetes .19 .62 .12 .33 .16 .29\nDeferribacteres .17 .00 .24 .01 .01 .01\n
Search some rows by fields. Matched parts will be highlighted as red
$ csvtk grep -f Taxonomy -r -p \"tes\" otu_table.gAB.csv -T\n
Result:
Transpose
$ csvtk transpose otu_table.gAB.csv -o otu_table.gAB.t.csv\n\n$ csvtk pretty otu_table.gAB.t.csv\nTaxonomy Proteobacteria Firmicutes Bacteroidetes Deferribacteres\nA.1 .13 .42 .19 .17\nA.2 .29 .06 .62 .00\nA.3 .13 .49 .12 .24\nB.1 .16 .41 .33 .01\nB.2 .13 .55 .16 .01\nB.3 .22 .41 .29 .01\n
Rename name of the first column
$ csvtk rename -f 1 -n \"sample\" otu_table.gAB.t.csv -o otu_table.gAB.t.r.csv\n\n$ csvtk pretty otu_table.gAB.t.r.csv\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres\nA.1 .13 .42 .19 .17\nA.2 .29 .06 .62 .00\nA.3 .13 .49 .12 .24\nB.1 .16 .41 .33 .01\nB.2 .13 .55 .16 .01\nB.3 .22 .41 .29 .01\n
Add group column
$ csvtk mutate -p \"(.+?)\\.\" -n group otu_table.gAB.t.r.csv -o otu_table2.csv\n\n$ csvtk pretty otu_table2.csv\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.1 .13 .42 .19 .17 A\nA.2 .29 .06 .62 .00 A\nA.3 .13 .49 .12 .24 A\nB.1 .16 .41 .33 .01 B\nB.2 .13 .55 .16 .01 B\nB.3 .22 .41 .29 .01 B\n
Rename groups:
$ csvtk replace -f group -p \"A\" -r \"Ctrl\" otu_table2.csv \\\n | csvtk replace -f group -p \"B\" -r \"Treatment\" \\\n > otu_table3.csv\n\n$ csvtk pretty -s \" \" otu_table3.csv\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.1 .13 .42 .19 .17 Ctrl\nA.2 .29 .06 .62 .00 Ctrl\nA.3 .13 .49 .12 .24 Ctrl\nB.1 .16 .41 .33 .01 Treatment\nB.2 .13 .55 .16 .01 Treatment\nB.3 .22 .41 .29 .01 Treatment\n
Sort by abundance of Proteobacteria in descending order.
$ csvtk sort -k Proteobacteria:nr otu_table3.csv \\\n | csvtk pretty -s \" \"\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.2 .29 .06 .62 .00 Ctrl\nB.3 .22 .41 .29 .01 Treatment\nB.1 .16 .41 .33 .01 Treatment\nB.2 .13 .55 .16 .01 Treatment\nA.3 .13 .49 .12 .24 Ctrl\nA.1 .13 .42 .19 .17 Ctrl\n
Sort by abundance of Proteobacteria in descending order and Firmicutes in ascending order
$ csvtk sort -k Proteobacteria:nr -k Firmicutes:n otu_table3.csv \\\n | csvtk pretty -s \" \"\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.2 .29 .06 .62 .00 Ctrl\nB.3 .22 .41 .29 .01 Treatment\nB.1 .16 .41 .33 .01 Treatment\nA.1 .13 .42 .19 .17 Ctrl\nA.3 .13 .49 .12 .24 Ctrl\nB.2 .13 .55 .16 .01 Treatment\n
Filter samples with abundance greater than 0 in all taxons (columns except for sample and group, you can also use -f \"2-5>0\"
).
$ cat otu_table3.csv \\\n | csvtk filter -f \"2-5>0\" \\\n | csvtk pretty -s \" \" \nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.1 .13 .42 .19 .17 Ctrl\nA.3 .13 .49 .12 .24 Ctrl\nB.1 .16 .41 .33 .01 Treatment\nB.2 .13 .55 .16 .01 Treatment\nB.3 .22 .41 .29 .01 Treatment\n
Most of the time, we may want to remove samples with abundance of 0 in all taxons.
$ cat otu_table3.csv \\\n | csvtk filter -f \"2-5>0\" --any \\\n | csvtk pretty -s \" \"\nsample Proteobacteria Firmicutes Bacteroidetes Deferribacteres group\nA.1 .13 .42 .19 .17 Ctrl\nA.2 .29 .06 .62 .00 Ctrl\nA.3 .13 .49 .12 .24 Ctrl\nB.1 .16 .41 .33 .01 Treatment\nB.2 .13 .55 .16 .01 Treatment\nB.3 .22 .41 .29 .01 Treatment\n
Attention
-H
on.-t
for tab-delimited files.#
will be ignored, if the header row starts with #
, please assign flag -C
another rare symbol, e.g. $
.-I/--ignore-illegal-row
to skip these lines if neccessary. You can also use \"csvtk fix\" to fix files with different numbers of columns in rows.If double-quotes exist in fields not enclosed with double-quotes, e.g.,
x,a \"b\" c,1\n
It would report error:
bare `\"` in non-quoted-field.\n
Please switch on the flag -l
or use csvtk fix-quotes
to fix it.
If somes fields have only a double-quote eighter in the beginning or in the end, e.g.,
x,d \"e\",\"a\" b c,1\n
It would report error:
extraneous or missing \" in quoted-field\n
Please use csvtk fix-quotes
to fix it, and use csvtk del-quotes
to reset to the original format as needed.
Information
Format conversion
Set operations
Edit
Transform
Ordering
Ploting
Misc
Usage
csvtk -- a cross-platform, efficient and practical CSV/TSV toolkit\n\nVersion: 0.32.0\n\nAuthor: Wei Shen <shenwei356@gmail.com>\n\nDocuments : http://shenwei356.github.io/csvtk\nSource code: https://github.com/shenwei356/csvtk\n\nAttention:\n\n 1. By default, csvtk assumes input files have header row, if not, switch flag \"-H\" on.\n 2. By default, csvtk handles CSV files, use flag \"-t\" for tab-delimited files.\n 3. Column names should be unique.\n 4. By default, lines starting with \"#\" will be ignored, if the header row\n starts with \"#\", please assign flag \"-C\" another rare symbol, e.g. '$'.\n 5. Do not mix use field (column) numbers and names to specify columns to operate.\n 6. The CSV parser requires all the lines have same numbers of fields/columns.\n Even lines with spaces will cause error.\n Use '-I/--ignore-illegal-row' to skip these lines if neccessary.\n You can also use \"csvtk fix\" to fix files with different numbers of columns in rows.\n 7. If double-quotes exist in fields not enclosed with double-quotes, e.g.,\n x,a \"b\" c,1\n It would report error:\n bare \" in non-quoted-field.\n Please switch on the flag \"-l\" or use \"csvtk fix-quotes\" to fix it.\n 8. If somes fields have only a double-quote eighter in the beginning or in the end, e.g.,\n x,d \"e\",\"a\" b c,1\n It would report error:\n extraneous or missing \" in quoted-field\n Please use \"csvtk fix-quotes\" to fix it, and use \"csvtk del-quotes\" to reset to the\n original format as needed.\n\nEnvironment variables for frequently used global flags:\n\n - \"CSVTK_T\" for flag \"-t/--tabs\"\n - \"CSVTK_H\" for flag \"-H/--no-header-row\"\n - \"CSVTK_QUIET\" for flag \"--quiet\"\n\nYou can also create a soft link named \"tsvtk\" for \"csvtk\",\nwhich sets \"-t/--tabs\" by default.\n\nUsage:\n csvtk [command]\n\nCommands for Information:\n corr calculate Pearson correlation between two columns\n dim dimensions of CSV file\n headers print headers\n ncol print number of columns\n nrow print number of records\n summary summary statistics of selected numeric or text fields (groupby group fields)\n watch monitor the specified fields\n\nFormat Conversion:\n csv2json convert CSV to JSON format\n csv2md convert CSV to markdown format\n csv2rst convert CSV to reStructuredText format\n csv2tab convert CSV to tabular format\n csv2xlsx convert CSV/TSV files to XLSX file\n pretty convert CSV to a readable aligned table\n space2tab convert space delimited format to TSV\n splitxlsx split XLSX sheet into multiple sheets according to column values\n tab2csv convert tabular format to CSV\n xlsx2csv convert XLSX to CSV format\n\nCommands for Set Operation:\n comb compute combinations of items at every row\n concat concatenate CSV/TSV files by rows\n cut select and arrange fields\n filter filter rows by values of selected fields with arithmetic expression\n filter2 filter rows by awk-like arithmetic/string expressions\n freq frequencies of selected fields\n grep grep data by selected fields with patterns/regular expressions\n head print first N records\n inter intersection of multiple files\n join join files by selected fields (inner, left and outer join)\n sample sampling by proportion\n split split CSV/TSV into multiple files according to column values\n uniq unique data without sorting\n\nCommands for Edit:\n add-header add column names\n del-header delete column names\n del-quotes remove extra double quotes added by 'fix-quotes'\n fix fix CSV/TSV with different numbers of columns in rows\n fix-quotes fix malformed CSV/TSV caused by double-quotes\n fmtdate format date of selected fields\n mutate create new column from selected fields by regular expression\n mutate2 create a new column from selected fields by awk-like arithmetic/string expressions\n mutate3 create a new column from selected fields with Go-like expressions\n rename rename column names with new names\n rename2 rename column names by regular expression\n replace replace data of selected fields by regular expression\n round round float to n decimal places\n\nCommands for Data Transformation:\n fold fold multiple values of a field into cells of groups\n gather gather columns into key-value pairs, like tidyr::gather/pivot_longer\n sep separate column into multiple columns\n spread spread a key-value pair across multiple columns, like tidyr::spread/pivot_wider\n transpose transpose CSV data\n unfold unfold multiple values in cells of a field\n\nCommands for Ordering:\n sort sort by selected fields\n\nCommands for Ploting:\n plot plot common figures\n\nCommands for Miscellaneous Functions:\n cat stream file to stdout and report progress on stderr\n\nAdditional Commands:\n genautocomplete generate shell autocompletion script (bash|zsh|fish|powershell)\n version print version information and check for update\n\nFlags:\n -C, --comment-char string lines starting with commment-character will be ignored. if your header\n row starts with '#', please assign \"-C\" another rare symbol, e.g. '$'\n (default \"#\")\n -U, --delete-header do not output header row\n -d, --delimiter string delimiting character of the input CSV file (default \",\")\n -h, --help help for csvtk\n -E, --ignore-empty-row ignore empty rows\n -I, --ignore-illegal-row ignore illegal rows. You can also use 'csvtk fix' to fix files with\n different numbers of columns in rows\n -X, --infile-list string file of input files list (one file per line), if given, they are appended\n to files from cli arguments\n -l, --lazy-quotes if given, a quote may appear in an unquoted field and a non-doubled quote\n may appear in a quoted field\n -H, --no-header-row specifies that the input CSV file does not have header row\n -j, --num-cpus int number of CPUs to use (default 4)\n -D, --out-delimiter string delimiting character of the output CSV file, e.g., -D $'\\t' for tab\n (default \",\")\n -o, --out-file string out file (\"-\" for stdout, suffix .gz for gzipped out) (default \"-\")\n -T, --out-tabs specifies that the output is delimited with tabs. Overrides \"-D\"\n --quiet be quiet and do not show extra information and warnings\n -Z, --show-row-number show row number as the first column, with header row skipped\n -t, --tabs specifies that the input CSV file is delimited with tabs. Overrides \"-d\"\n\nUse \"csvtk [command] --help\" for more information about a command.\n
"},{"location":"usage/#headers","title":"headers","text":"Usage
print headers\n\nUsage:\n csvtk headers [flags]\n\nFlags:\n -h, --help help for headers\n -v, --verbose print verbose information\n\n
Examples
$ csvtk headers testdata/[12].csv\nname\nattr\nname\nmajor\n\n$ csvtk headers testdata/[12].csv -v\n# testdata/1.csv\n1 name\n2 attr\n# testdata/2.csv\n1 name\n2 major\n
"},{"location":"usage/#dimnrowncol","title":"dim/nrow/ncol","text":"Usage
dim:
dimensions of CSV file\n\nUsage:\n csvtk dim [flags]\n\nAliases:\n dim, size, stats, stat\n\nFlags:\n --cols only print number of columns\n -h, --help help for dim\n -n, --no-files do not print file names\n --rows only print number of rows\n --tabular output in machine-friendly tabular format\n\n
nrow:
print number of records\n\nUsage:\n csvtk nrow [flags]\n\nAliases:\n nrow, nrows\n\nFlags:\n -n, --file-name print file names\n -h, --help help for nrow\n\n
ncol:
print number of columns\n\nUsage:\n csvtk ncol [flags]\n\nAliases:\n ncol, ncols\n\nFlags:\n -n, --file-name print file names\n -h, --help help for ncol\n\n
Examples
with header row
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n\n$ cat testdata/names.csv | csvtk size\nfile num_cols num_rows\n- 4 5\n\n$ cat testdata/names.csv | csvtk nrow\n5\n\n$ cat testdata/names.csv | csvtk ncol\n4\n\n$ csvtk nrow testdata/names.csv testdata/phones.csv -n\n5 testdata/names.csv\n4 testdata/phones.csv\n
no header row
$ cat testdata/digitals.tsv\n4 5 6\n1 2 3\n7 8 0\n8 1,000 4\n\n$ cat testdata/digitals.tsv \\\n | csvtk size -t -H\nfile num_cols num_rows\n- 3 4\n\n$ cat testdata/names.csv | csvtk nrow -H\n3\n\n$ cat testdata/names.csv | csvtk ncol -H\n4\n
Usage
summary statistics of selected numeric or text fields (groupby group fields)\n\nAttention:\n\n 1. Do not mix use field (column) numbers and names.\n\nAvailable operations:\n\n # numeric/statistical operations\n # provided by github.com/gonum/stat and github.com/gonum/floats\n countn (count numeric values), min, max, sum, argmin, argmax,\n mean, stdev, variance, median, q1, q2, q3,\n entropy (Shannon entropy),\n prod (product of the elements)\n\n # textual/numeric operations\n count, first, last, rand, unique/uniq, collapse, countunique\n\nUsage:\n csvtk summary [flags]\n\nFlags:\n -w, --decimal-width int limit floats to N decimal points (default 2)\n -f, --fields strings operations on these fields. e.g -f 1:count,1:sum or -f colA:mean. available\n operations: argmax, argmin, collapse, count, countn, countuniq,\n countunique, entropy, first, last, max, mean, median, min, prod, q1, q2,\n q3, rand, stdev, sum, uniq, unique, variance\n -g, --groups string group via fields. e.g -f 1,2 or -f columnA,columnB\n -h, --help help for summary\n -i, --ignore-non-numbers ignore non-numeric values like \"NA\" or \"N/A\"\n -S, --rand-seed int rand seed for operation \"rand\" (default 11)\n -s, --separater string separater for collapsed data (default \"; \")\n\n
Examples
data
$ cat testdata/digitals2.csv \nf1,f2,f3,f4,f5\nfoo,bar,xyz,1,0\nfoo,bar2,xyz,1.5,-1\nfoo,bar2,xyz,3,2\nfoo,bar,xyz,5,3\nfoo,bar2,xyz,N/A,4\nbar,xyz,abc,NA,2\nbar,xyz,abc2,1,-1\nbar,xyz,abc,2,0\nbar,xyz,abc,1,5\nbar,xyz,abc,3,100\nbar,xyz2,abc3,2,3\nbar,xyz2,abc3,2,1\n
use flag -i/--ignore-non-numbers
$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:sum\n[ERRO] column 4 has non-digital data: N/A, you can use flag -i/--ignore-non-numbers to skip these data\n\n$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:sum -i\nf4:sum\n21.50\n
multiple fields suported
$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:sum,f5:sum -i\nf4:sum,f5:sum\n21.50,118.00\n
using fields instead of colname is still supported
$ cat testdata/digitals2.csv \\\n | csvtk summary -f 4:sum,5:sum -i\nf4:sum,f5:sum\n21.50,118.00\n
but remember do not mix use column numbers and names
$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:sum,5:sum -i\n[ERRO] column \"5\" not existed in file: -\n\n$ cat testdata/digitals2.csv \\\n | csvtk summary -f 4:sum,f5:sum -i\n[ERRO] failed to parse f5 as a field number, you may mix the use of field numbers and column names\n
groupby
$ cat testdata/digitals2.csv \\\n | csvtk summary -i -f f4:sum,f5:sum -g f1,f2 \\\n | csvtk pretty\nf1 f2 f4:sum f5:sum\n--- ---- ------ ------\nbar xyz 7.00 106.00\nbar xyz2 4.00 4.00\nfoo bar 6.00 3.00\nfoo bar2 4.50 5.00\n
for data without header line
$ cat testdata/digitals2.csv | sed 1d \\\n | csvtk summary -H -i -f 4:sum,5:sum -g 1,2 \\\n | csvtk pretty -H\nbar xyz 7.00 106.00\nbar xyz2 4.00 4.00\nfoo bar 6.00 3.00\nfoo bar2 4.50 5.00\n
numeric/statistical operations
$ cat testdata/digitals2.csv \\\n | csvtk summary -i -g f1 -f f4:countn,f4:mean,f4:stdev,f4:q1,f4:q2,f4:mean,f4:q3,f4:min,f4:max \\\n | csvtk pretty\nf1 f4:countn f4:mean f4:stdev f4:q1 f4:q2 f4:mean f4:q3 f4:min f4:max\n--- --------- ------- -------- ----- ----- ------- ----- ------ ------\nbar 6 1.83 0.75 1.25 2.00 1.83 2.00 1.00 3.00\nfoo 4 2.62 1.80 1.38 2.25 2.62 3.50 1.00 5.00\n
textual/numeric operations
$ cat testdata/digitals2.csv \\\n | csvtk summary -i -g f1 -f f2:count,f2:first,f2:last,f2:rand,f2:collapse,f2:uniq,f2:countunique \\\n | csvtk pretty\nf1 f2:count f2:first f2:last f2:rand f2:collapse f2:uniq f2:countunique\n--- -------- -------- ------- ------- ----------------------------------- --------- --------------\nbar 7 xyz xyz2 xyz2 xyz; xyz; xyz; xyz; xyz; xyz2; xyz2 xyz; xyz2 2\nfoo 5 bar bar2 bar2 bar; bar2; bar2; bar; bar2 bar2; bar\n
mixed operations
$ cat testdata/digitals2.csv \\\n | csvtk summary -i -g f1 -f f4:collapse,f4:max \\\n | csvtk pretty\nf1 f4:collapse f4:max\n--- -------------------- ------\nbar NA; 1; 2; 1; 3; 2; 2 3.00\nfoo 1; 1.5; 3; 5; N/A 5.00\n
count
and countn
(count of digits)
$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:count,f4:countn -i \\\n | csvtk pretty\nf4:count f4:countn\n-------- ---------\n12 10\n\n# details:\n$ cat testdata/digitals2.csv \\\n | csvtk summary -f f4:count,f4:countn,f4:collapse -i -g f1 \\\n | csvtk pretty\nf1 f4:count f4:countn f4:collapse\n--- -------- --------- --------------------\nbar 7 6 NA; 1; 2; 1; 3; 2; 2\nfoo 5 4 1; 1.5; 3; 5; N/A\n
Usage
monitor the specified fields\n\nUsage:\n csvtk watch [flags]\n\nFlags:\n -B, --bins int number of histogram bins (default -1)\n -W, --delay int sleep this many seconds after plotting (default 1)\n -y, --dump print histogram data to stderr instead of plotting\n -f, --field string field to watch\n -h, --help help for watch\n -O, --image string save histogram to this PDF/image file\n -L, --log log10(x+1) transform numeric values\n -x, --pass passthrough mode (forward input to output)\n -p, --print-freq int print/report after this many records (-1 for print after EOF) (default -1)\n -Q, --quiet supress all plotting to stderr\n -R, --reset reset histogram after every report\n
Examples
Read whole file, plot histogram of field on the terminal and PDF
csvtk -t watch -O hist.pdf -f MyField input.tsv\n
Monitor a TSV stream, print histogram every 1000 records
cat input.tsv | csvtk -t watch -f MyField -p 1000 -\n
Monitor a TSV stream, print histogram every 1000 records, hang forever for updates
tail -f +0 input.tsv | csvtk -t watch -f MyField -p 1000 -\n
Usage
calculate Pearson correlation between two columns\n\nUsage:\n csvtk corr [flags]\n\nFlags:\n -f, --fields string comma separated fields\n -h, --help help for corr\n -i, --ignore_nan Ignore non-numeric fields to avoid returning NaN\n -L, --log Calcute correlations on Log10 transformed data\n -x, --pass passthrough mode (forward input to output)\n
Examples
csvtk -t corr -i -f Foo,Bar input.tsv\n
Usage
convert CSV to a readable aligned table\n\nHow to:\n 1. First -n/--buf-rows rows are read to check the minimum and maximum widths\n of each columns.\n\n You can also set the global or column-specific (the number of values need\n equal to the number of columns) thresholds via -w/--min-width and -W/--max-width.\n\n 1a. Cells longer than the maximum width will be wrapped (default) or\n clipped (--clip).\n Usually, the text is wrapped in space (-x/--wrap-delimiter). But if one\n word is longer than the -W/--max-width, it will be force split.\n 1b. Texts are aligned left (default), center (-m/--align-center)\n or right (-r/--align-right). Users can specify columns with column names,\n field indexes or ranges.\n Examples:\n -m A,B # column A and B\n -m 1,2 # 1st and 2nd column \n -m -1 # the last column (it's not unselecting in other commands)\n -m 1,3-5 # 1st, from 3rd to 5th column\n -m 1- # 1st and later columns (all columns)\n -m -3- # the last 3 columns\n -m -3--2 # the 2nd and 3rd to last columns\n -m 1- -r -1 # all columns are center-aligned, except the last column\n # which is right-aligned. -r overides -m.\n\n 2. Remaining rows are read and immediately outputted, one by one, till the end.\n\nStyles:\n\n Some preset styles are provided (-S/--style).\n\n default:\n\n id size\n -- ----\n 1 Huge\n 2 Tiny\n\n plain:\n\n id size\n 1 Huge\n 2 Tiny\n\n simple:\n\n -----------\n id size\n -----------\n 1 Huge\n 2 Tiny\n -----------\n\n 3line:\n\n \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n id size\n -----------\n 1 Huge\n 2 Tiny\n \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n\n grid:\n\n +----+------+\n | id | size |\n +====+======+\n | 1 | Huge |\n +----+------+\n | 2 | Tiny |\n +----+------+\n\n light:\n\n \u250c----\u252c------\u2510\n | id | size |\n \u251c====\u253c======\u2524\n | 1 | Huge |\n \u251c----\u253c------\u2524\n | 2 | Tiny |\n \u2514----\u2534------\u2518\n\n round:\n\n \u256d----\u252c------\u256e\n | id | size |\n \u251c====\u253c======\u2524\n | 1 | Huge |\n \u251c----\u253c------\u2524\n | 2 | Tiny |\n \u2570----\u2534------\u256f\n\n bold:\n\n \u250f\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n \u2503 id \u2503 size \u2503\n \u2523\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n \u2503 1 \u2503 Huge \u2503\n \u2523\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n \u2503 2 \u2503 Tiny \u2503\n \u2517\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u251b\n\n double:\n\n \u2554\u2550\u2550\u2550\u2550\u2566\u2550\u2550\u2550\u2550\u2550\u2550\u2557\n \u2551 id \u2551 size \u2551\n \u2560\u2550\u2550\u2550\u2550\u256c\u2550\u2550\u2550\u2550\u2550\u2550\u2563\n \u2551 1 \u2551 Huge \u2551\n \u2560\u2550\u2550\u2550\u2550\u256c\u2550\u2550\u2550\u2550\u2550\u2550\u2563\n \u2551 2 \u2551 Tiny \u2551\n \u255a\u2550\u2550\u2550\u2550\u2569\u2550\u2550\u2550\u2550\u2550\u2550\u255d\n\nUsage:\n csvtk pretty [flags] \n\nFlags:\n -m, --align-center strings align right for selected columns (field index/range or column name, type\n \"csvtk pretty -h\" for examples)\n -r, --align-right strings align right for selected columns (field index/range or column name, type\n \"csvtk pretty -h\" for examples)\n -n, --buf-rows int the number of rows to determine the min and max widths (0 for all rows)\n (default 1024)\n --clip clip longer cell instead of wrapping\n --clip-mark string clip mark (default \"...\")\n -h, --help help for pretty\n -W, --max-width strings max width, multiple values (max widths for each column, 0 for no limit)\n should be separated by commas. E.g., -W 40,20,0 limits the max widths of\n 1st and 2nd columns\n -w, --min-width strings min width, multiple values (min widths for each column, 0 for no limit)\n should be separated by commas. E.g., -w 0,10,10 limits the min widths of\n 2nd and 3rd columns\n -s, --separator string fields/columns separator (default \" \")\n -S, --style string output syle. available vaules: default, plain, simple, 3line, grid,\n light, round, bold, double. check https://github.com/shenwei356/stable\n -x, --wrap-delimiter string delimiter for wrapping cells (default \" \")\n\n
Examples:
default
$ csvtk pretty testdata/names.csv\nid first_name last_name username\n-- ---------- --------- --------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n\n$ csvtk pretty testdata/names.csv -H\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
tree-line table
$ cat testdata/names.csv | csvtk pretty -S 3line\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n id first_name last_name username\n----------------------------------------\n 11 Rob Pike rob\n 2 Ken Thompson ken\n 4 Robert Griesemer gri\n 1 Robert Thompson abc\n NA Robert Abel 123\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n\n$ cat testdata/names.csv | csvtk pretty -S 3line -H\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n id first_name last_name username\n 11 Rob Pike rob\n 2 Ken Thompson ken\n 4 Robert Griesemer gri\n 1 Robert Thompson abc\n NA Robert Abel 123\n\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\n
align right/center for some columns
$ csvtk pretty testdata/names.csv -w 6 -S bold -r 1,username -m first_name \n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 id \u2503 first_name \u2503 last_name \u2503 username \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 11 \u2503 Rob \u2503 Pike \u2503 rob \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 2 \u2503 Ken \u2503 Thompson \u2503 ken \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 4 \u2503 Robert \u2503 Griesemer \u2503 gri \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 1 \u2503 Robert \u2503 Thompson \u2503 abc \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 NA \u2503 Robert \u2503 Abel \u2503 123 \u2503\n\u2517\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u251b\n\n$ csvtk pretty testdata/names.csv -w 6 -S bold -m 1- -r -1\n\u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n\u2503 id \u2503 first_name \u2503 last_name \u2503 username \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 11 \u2503 Rob \u2503 Pike \u2503 rob \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 2 \u2503 Ken \u2503 Thompson \u2503 ken \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 4 \u2503 Robert \u2503 Griesemer \u2503 gri \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 1 \u2503 Robert \u2503 Thompson \u2503 abc \u2503\n\u2523\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u252b\n\u2503 NA \u2503 Robert \u2503 Abel \u2503 123 \u2503\n\u2517\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u251b\n
custom separator
$ csvtk pretty testdata/names.csv -s \" | \"\nid | first_name | last_name | username\n-- | ---------- | --------- | --------\n11 | Rob | Pike | rob\n2 | Ken | Thompson | ken\n4 | Robert | Griesemer | gri\n1 | Robert | Thompson | abc\nNA | Robert | Abel | 123\n
Set the global minimum and maximum width.
$ csvtk pretty testdata/long.csv -w 5 -W 40\nid name message\n----- ------------------ ----------------------------------------\n1 Donec Vitae Quis autem vel eum iure reprehenderit\n qui in ea voluptate velit esse.\n2 Quaerat Voluptatem At vero eos et accusamus et iusto odio.\n3 Aliquam lorem Curabitur ullamcorper ultricies nisi.\n Nam eget dui. Etiam rhoncus. Maecenas\n tempus, tellus eget condimentum\n rhoncus, sem quam semper libero.\n
Set min and max widths for all columns.
$ csvtk pretty testdata/long.csv -w 5,25,0 -W 0,30,40 -m 1,2 -S round\n\u256d-------\u252c---------------------------\u252c------------------------------------------\u256e\n| id | name | message |\n\u251c=======\u253c===========================\u253c==========================================\u2524\n| 1 | Donec Vitae | Quis autem vel eum iure reprehenderit |\n| | | qui in ea voluptate velit esse. |\n\u251c-------\u253c---------------------------\u253c------------------------------------------\u2524\n| 2 | Quaerat Voluptatem | At vero eos et accusamus et iusto odio. |\n\u251c-------\u253c---------------------------\u253c------------------------------------------\u2524\n| 3 | Aliquam lorem | Curabitur ullamcorper ultricies nisi. |\n| | | Nam eget dui. Etiam rhoncus. Maecenas |\n| | | tempus, tellus eget condimentum |\n| | | rhoncus, sem quam semper libero. |\n\u2570-------\u2534---------------------------\u2534------------------------------------------\u256f\n
Clipping cells instead of wrapping
$ csvtk pretty testdata/long.csv -w 5 -W 40 --clip\nid name message\n----- ------------------ ----------------------------------------\n1 Donec Vitae Quis autem vel eum iure reprehenderit...\n2 Quaerat Voluptatem At vero eos et accusamus et iusto odio.\n3 Aliquam lorem Curabitur ullamcorper ultricies nisi....\n
Change the output style
$ csvtk pretty testdata/long.csv -W 40 -S grid\n+----+--------------------+------------------------------------------+\n| id | name | message |\n+====+====================+==========================================+\n| 1 | Donec Vitae | Quis autem vel eum iure reprehenderit |\n| | | qui in ea voluptate velit esse. |\n+----+--------------------+------------------------------------------+\n| 2 | Quaerat Voluptatem | At vero eos et accusamus et iusto odio. |\n+----+--------------------+------------------------------------------+\n| 3 | Aliquam lorem | Curabitur ullamcorper ultricies nisi. |\n| | | Nam eget dui. Etiam rhoncus. Maecenas |\n| | | tempus, tellus eget condimentum |\n| | | rhoncus, sem quam semper libero. |\n+----+--------------------+------------------------------------------+\n
Custom delimiter for wrapping
$ csvtk pretty testdata/lineages.csv -W 60 -x ';' -S light\n\u250c-------\u252c------------------\u252c--------------------------------------------------------------\u2510\n| taxid | name | complete lineage |\n\u251c=======\u253c==================\u253c==============================================================\u2524\n| 9606 | Homo sapiens | cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa; |\n| | | Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata; |\n| | | Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii; |\n| | | Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria; |\n| | | Eutheria;Boreoeutheria;Euarchontoglires;Primates; |\n| | | Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae; |\n| | | Homininae;Homo;Homo sapiens |\n\u251c-------\u253c------------------\u253c--------------------------------------------------------------\u2524\n| 562 | Escherichia coli | cellular organisms;Bacteria;Pseudomonadota; |\n| | | Gammaproteobacteria;Enterobacterales;Enterobacteriaceae; |\n| | | Escherichia;Escherichia coli |\n\u2514-------\u2534------------------\u2534--------------------------------------------------------------\u2518\n
Usage
transpose CSV data\n\nUsage:\n csvtk transpose [flags]\n\n
Examples
$ cat testdata/digitals.tsv\n4 5 6\n1 2 3\n7 8 0\n8 1,000 4\n\n$ csvtk transpose -t testdata/digitals.tsv\n4 1 7 8\n5 2 8 1,000\n6 3 0 4\n
"},{"location":"usage/#csv2json","title":"csv2json","text":"Usage
convert CSV to JSON format\n\nUsage:\n csvtk csv2json [flags]\n\nFlags:\n -b, --blanks do not convert \"\", \"na\", \"n/a\", \"none\", \"null\", \".\" to null\n -h, --help help for csv2json\n -i, --indent string indent. if given blank, output json in one line. (default \" \")\n -k, --key string output json as an array of objects keyed by a given filed rather than as a\n list. e.g -k 1 or -k columnA\n -n, --parse-num strings parse numeric values for nth column, multiple values are supported and\n \"a\"/\"all\" for all columns\n\n
Examples
test data
$ cat testdata/data4json.csv \nID,room,name,status\n3,G13,Simon,true\n5,103,Anna,TRUE\n1e-3,2,,N/A\n
default operation
$ cat testdata/data4json.csv | csvtk csv2json\n[\n {\n \"ID\": \"3\",\n \"room\": \"G13\",\n \"name\": \"Simon\",\n \"status\": true\n },\n {\n \"ID\": \"5\",\n \"room\": \"103\",\n \"name\": \"Anna\",\n \"status\": true\n },\n {\n \"ID\": \"1e-3\",\n \"room\": \"2\",\n \"name\": null,\n \"status\": null\n }\n]\n
change indent
$ cat testdata/data4json.csv | csvtk csv2json -i \"\"\n[{\"ID\":\"3\",\"room\":\"G13\",\"name\":\"Simon\",\"status\":true},{\"ID\":\"5\",\"room\":\"103\",\"name\":\"Anna\",\"status\":true},{\"ID\":\"1e-3\",\"room\":\"2\",\"name\":null,\"status\":null}]\n
output json as an array of objects keyed by a given filed rather than as a list.
$ cat testdata/data4json.csv | csvtk csv2json -k ID\n{\n \"3\": {\n \"ID\": \"3\",\n \"room\": \"G13\",\n \"name\": \"Simon\",\n \"status\": true\n },\n \"5\": {\n \"ID\": \"5\",\n \"room\": \"103\",\n \"name\": \"Anna\",\n \"status\": true\n },\n \"1e-3\": {\n \"ID\": \"1e-3\",\n \"room\": \"2\",\n \"name\": null,\n \"status\": null\n }\n}\n
for CSV without header row
$ cat testdata/data4json.csv | csvtk csv2json -H\n[\n [\n \"ID\",\n \"room\",\n \"name\",\n \"status\"\n ],\n [\n \"3\",\n \"G13\",\n \"Simon\",\n \"true\"\n ],\n [\n \"5\",\n \"103\",\n \"Anna\",\n \"TRUE\"\n ],\n [\n \"1e-3\",\n \"2\",\n \"\",\n \"N/A\"\n ]\n]\n
parse numeric values.
# cat testdata/data4json.csv | csvtk csv2json -n all # for all columns\n# cat testdata/data4json.csv | csvtk csv2json -n 1,2 # for multiple columns\n$ cat testdata/data4json.csv | csvtk csv2json -n 1 # for single column\n[\n {\n \"ID\": 3,\n \"room\": \"G13\",\n \"name\": \"Simon\",\n \"status\": true\n },\n {\n \"ID\": 5,\n \"room\": \"103\",\n \"name\": \"Anna\",\n \"status\": true\n },\n {\n \"ID\": 1e-3,\n \"room\": \"2\",\n \"name\": null,\n \"status\": null\n }\n]\n
do not convert \"\", \"na\", \"n/a\", \"none\", \"null\", \".\" to null (just like csvjon --blanks in csvkit)
$ cat testdata/data4json.csv | csvtk csv2json --blanks\n[\n {\n \"ID\": \"3\",\n \"room\": \"G13\",\n \"name\": \"Simon\",\n \"status\": true\n },\n {\n \"ID\": \"5\",\n \"room\": \"103\",\n \"name\": \"Anna\",\n \"status\": true\n },\n {\n \"ID\": \"1e-3\",\n \"room\": \"2\",\n \"name\": \"\",\n \"status\": \"N/A\"\n }\n]\n
values with \"
, \\
, \\n
.
$ cat testdata/data4json2.csv\ntest\nnone\n\"Make America \"\"great\"\" again\"\n\\nations\n\"This is a\nMULTILINE\nstring\"\n\n$ csvtk csv2json testdata/data4json2.csv\n[\n {\n \"test\": null\n },\n {\n \"test\": \"Make America \\\"great\\\" again\"\n },\n {\n \"test\": \"\\\\nations\"\n },\n {\n \"test\": \"This is a\\nMULTILINE\\nstring\"\n }\n]\n
Usage
convert space delimited format to TSV\n\nUsage:\n csvtk space2tab [flags]\n\nFlags:\n -b, --buffer-size string size of buffer, supported unit: K, M, G. You need increase the value when\n \"bufio.Scanner: token too long\" error reported (default \"1G\")\n -h, --help help for space2tab\n\n
Exapmles
$ echo a b | csvtk space2tab\na b\n
"},{"location":"usage/#csv2md","title":"csv2md","text":"Usage
convert CSV to markdown format\n\nAttention:\n\n csv2md treats the first row as header line and requires them to be unique\n\nUsage:\n csvtk csv2md [flags]\n\nFlags:\n -a, --alignments string comma separated alignments. e.g. -a l,c,c,c or -a c (default \"l\")\n -w, --min-width int min width (at least 3) (default 3)\n\n
Examples
give single alignment symbol
$ cat testdata/names.csv | csvtk csv2md -a left\n|id |first_name|last_name|username|\n|:--|:---------|:--------|:-------|\n|11 |Rob |Pike |rob |\n|2 |Ken |Thompson |ken |\n|4 |Robert |Griesemer|gri |\n|1 |Robert |Thompson |abc |\n|NA |Robert |Abel |123 |\n
result:
id first_name last_name username 11 Rob Pike rob 2 Ken Thompson ken 4 Robert Griesemer gri 1 Robert Thompson abc NA Robert Abel 123give alignment symbols of all fields
$ cat testdata/names.csv | csvtk csv2md -a c,l,l,l\n|id |first_name|last_name|username|\n|:-:|:---------|:--------|:-------|\n|11 |Rob |Pike |rob |\n|2 |Ken |Thompson |ken |\n|4 |Robert |Griesemer|gri |\n|1 |Robert |Thompson |abc |\n|NA |Robert |Abel |123 |\n
result
id first_name last_name username 11 Rob Pike rob 2 Ken Thompson ken 4 Robert Griesemer gri 1 Robert Thompson abc NA Robert Abel 123Usage
convert CSV to readable aligned table\n\nAttention:\n\n 1. row span is not supported.\n\nUsage:\n csvtk csv2rst [flags]\n\nFlags:\n -k, --cross string charactor of cross (default \"+\")\n -s, --header string charactor of separator between header row and data rowws (default \"=\")\n -h, --help help for csv2rst\n -b, --horizontal-border string charactor of horizontal border (default \"-\")\n -p, --padding string charactor of padding (default \" \")\n -B, --vertical-border string charactor of vertical border (default \"|\")\n\n
Example
With header row
$ csvtk csv2rst testdata/names.csv \n+----+------------+-----------+----------+\n| id | first_name | last_name | username |\n+====+============+===========+==========+\n| 11 | Rob | Pike | rob |\n+----+------------+-----------+----------+\n| 2 | Ken | Thompson | ken |\n+----+------------+-----------+----------+\n| 4 | Robert | Griesemer | gri |\n+----+------------+-----------+----------+\n| 1 | Robert | Thompson | abc |\n+----+------------+-----------+----------+\n| NA | Robert | Abel | 123 |\n+----+------------+-----------+----------+\n
No header row
$ csvtk csv2rst -H -t testdata/digitals.tsv \n+---+-------+---+\n| 4 | 5 | 6 |\n+---+-------+---+\n| 1 | 2 | 3 |\n+---+-------+---+\n| 7 | 8 | 0 |\n+---+-------+---+\n| 8 | 1,000 | 4 |\n+---+-------+---+\n
Unicode
$ cat testdata/unicode.csv | csvtk csv2rst\n+-------+---------+\n| value | name |\n+=======+=========+\n| 1 | \u6c88\u4f1f |\n+-------+---------+\n| 2 | \u6c88\u4f1fb |\n+-------+---------+\n| 3 | \u6c88\u5c0f\u4f1f |\n+-------+---------+\n| 4 | \u6c88\u5c0f\u4f1fb |\n+-------+---------+\n
Misc
$ cat testdata/names.csv | head -n 1 | csvtk csv2rst \n+----+------------+-----------+----------+\n| id | first_name | last_name | username |\n+====+============+===========+==========+\n\n$ cat testdata/names.csv | head -n 1 | csvtk csv2rst -H\n+----+------------+-----------+----------+\n| id | first_name | last_name | username |\n+----+------------+-----------+----------+\n\n$ echo | csvtk csv2rst -H\n[ERRO] xopen: no content\n\n$ echo \"a\" | csvtk csv2rst -H\n+---+\n| a |\n+---+\n\n$ echo \"\u6c88\u4f1f\" | csvtk csv2rst -H\n+------+\n| \u6c88\u4f1f |\n+------+\n
Usage
convert CSV/TSV files to XLSX file\n\nAttention:\n\n 1. Multiple CSV/TSV files are saved as separated sheets in .xlsx file.\n 2. All input files should all be CSV or TSV.\n 3. First rows are freezed unless given '-H/--no-header-row'.\n\nUsage:\n csvtk csv2xlsx [flags]\n\nFlags:\n -f, --format-numbers save numbers in number format, instead of text\n -h, --help help for csv2xlsx\n\n
Examples
Single input
$ csvtk csv2xlsx ../testdata/names.csv -o output.xlsx\n\n# check content\n\n$ csvtk xlsx2csv -a output.xlsx\nindex sheet\n1 Sheet1\n\n$ csvtk xlsx2csv output.xlsx | md5sum \n8e9d38a012cb02279a396a2f2dbbbca9 -\n\n$ csvtk cut -f 1- ../testdata/names.csv | md5sum \n8e9d38a012cb02279a396a2f2dbbbca9 -\n
Merging multiple CSV/TSV files into one .xlsx file.
$ csvtk csv2xlsx ../testdata/names*.csv -o output.xlsx\n\n$ csvtk xlsx2csv -a output.xlsx\nindex sheet\n1 names\n2 names.reorder\n3 names.with-unmatched-colname\n
Usage
convert XLSX to CSV format\n\nUsage:\n csvtk xlsx2csv [flags]\n\nFlags:\n -h, --help help for xlsx2csv\n -a, --list-sheets list all sheets\n -i, --sheet-index int Nth sheet to retrieve (default 1)\n -n, --sheet-name string sheet to retrieve\n\n
Examples
list all sheets
$ csvtk xlsx2csv ../testdata/accounts.xlsx -a\nindex sheet\n1 names\n2 phones\n3 region\n
retrieve sheet by index
$ csvtk xlsx2csv ../testdata/accounts.xlsx -i 3\nname,region\nken,nowhere\ngri,somewhere\nshenwei,another\nThompson,there\n
retrieve sheet by name
$ csvtk xlsx2sv ../testdata/accounts.xlsx -n region\nname,region\nken,nowhere\ngri,somewhere\nshenwei,another\nThompson,there\n
Usage
print first N records\n\nUsage:\n csvtk head [flags]\n\nFlags:\n -n, --number int print first N records (default 10)\n\n
Examples
with header line
$ csvtk head -n 2 testdata/1.csv\nname,attr\nfoo,cool\nbar,handsome\n
no header line
$ csvtk head -H -n 2 testdata/1.csv\nname,attr\nfoo,cool\n
Usage
concatenate CSV/TSV files by rows\n\nNote that the second and later files are concatenated to the first one,\nso only columns match that of the first files kept.\n\nUsage:\n csvtk concat [flags]\n\nFlags:\n -h, --help help for concat\n -i, --ignore-case ignore case (column name)\n -k, --keep-unmatched keep blanks even if no any data of a file matches\n -u, --unmatched-repl string replacement for unmatched data\n\n
Examples
data
$ csvtk pretty names.csv\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n\n$ csvtk pretty names.reorder.csv\nlast_name username id first_name\nPike rob 11 Rob\nThompson ken 2 Ken\nGriesemer gri 4 Robert\nThompson abc 1 Robert\nAbel 123 NA Robert\n\n$ csvtk pretty names.with-unmatched-colname.csv\nid2 First_name Last_name Username col\n22 Rob33 Pike222 rob111 abc\n44 Ken33 Thompson22 ken111 def\n
simple one
$ csvtk concat names.csv names.reorder.csv \\\n | csvtk pretty\nid first_name last_name username\n-- ---------- --------- --------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
data with unmatched column names, and ignoring cases
$ csvtk concat names.csv names.with-unmatched-colname.csv -i \\\n | csvtk pretty\nid first_name last_name username\n-- ---------- ---------- --------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n Rob33 Pike222 rob111\n Ken33 Thompson22 ken111\n\n $ csvtk concat names.csv names.with-unmatched-colname.csv -i -u Unmached \\\n | csvtk pretty\nid first_name last_name username\n-------- ---------- ---------- --------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\nUnmached Rob33 Pike222 rob111\nUnmached Ken33 Thompson22 ken111\n
Sometimes data of one file does not matche any column, they are discared by default. But you can keep them using flag -k/--keep-unmatched
$ csvtk concat names.with-unmatched-colname.csv names.csv \\\n | csvtk pretty\nid2 First_name Last_name Username col\n--- ---------- ---------- -------- ---\n22 Rob33 Pike222 rob111 abc\n44 Ken33 Thompson22 ken111 def\n\n$ csvtk concat names.with-unmatched-colname.csv names.csv -k -u NA \\\n | csvtk pretty\nid2 First_name Last_name Username col\n--- ---------- ---------- -------- ---\n22 Rob33 Pike222 rob111 abc\n44 Ken33 Thompson22 ken111 def\nNA NA NA NA NA\nNA NA NA NA NA\nNA NA NA NA NA\nNA NA NA NA NA\nNA NA NA NA NA\n
Usage
sampling by proportion\n\nUsage:\n csvtk sample [flags]\n\nFlags:\n -h, --help help for sample\n -n, --line-number print line number as the first column (\"n\")\n -p, --proportion float sample by proportion\n -s, --rand-seed int rand seed (default 11)\n\n
Examples
$ seq 100 | csvtk sample -H -p 0.5 | wc -l\n46\n\n$ seq 100 | csvtk sample -H -p 0.5 | wc -l\n46\n\n$ seq 100 | csvtk sample -H -p 0.1 | wc -l\n10\n\n$ seq 100 | csvtk sample -H -p 0.05 -n\n50,50\n52,52\n65,65\n
"},{"location":"usage/#cut","title":"cut","text":"Usage
select and arrange fields\n\nExamples:\n\n 1. Single column\n csvtk cut -f 1\n csvtk cut -f colA\n 2. Multiple columns (replicates allowed)\n csvtk cut -f 1,3,2,1\n csvtk cut -f colA,colB,colA\n 3. Column ranges\n csvtk cut -f 1,3-5 # 1, 3, 4, 5\n csvtk cut -f 3,5- # 3rd col, and 5th col to the end\n csvtk cut -f 1- # for all\n csvtk cut -f 2-,1 # move 1th col to the end\n 4. Unselect\n csvtk cut -f -1,-3 # discard 1st and 3rd column\n csvtk cut -f -1--3 # discard 1st to 3rd column\n csvtk cut -f -2- # discard 2nd and all columns on the right.\n csvtu cut -f -colA,-colB # discard colA and colB\n\nUsage:\n csvtk cut [flags]\n\nFlags:\n -m, --allow-missing-col allow missing column\n -b, --blank-missing-col blank missing column, only for using column fields\n -f, --fields string select only these fields. type \"csvtk cut -h\" for examples\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for cut\n -i, --ignore-case ignore case (column name)\n -u, --uniq-column deduplicate columns matched by multiple fuzzy column names\n\n
Examples
data:
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n
Select columns by column index: csvtk cut -f 1,2
$ cat testdata/names.csv \\\n | csvtk cut -f 1,2\nid,first_name\n11,Rob\n2,Ken\n4,Robert\n1,Robert\nNA,Robert\n\n# select more than once\n$ cat testdata/names.csv \\\n | csvtk cut -f 1,2,2\nid,first_name,first_name\n11,Rob,Rob\n2,Ken,Ken\n4,Robert,Robert\n1,Robert,Robert\nNA,Robert,Robert\n
Select columns by column names: csvtk cut -f first_name,username
$ cat testdata/names.csv \\\n | csvtk cut -f first_name,username\nfirst_name,username\nRob,rob\nKen,ken\nRobert,gri\nRobert,abc\nRobert,123\n\n# select more than once\n$ cat testdata/names.csv \\\n | csvtk cut -f first_name,username,username\nfirst_name,username,username\nRob,rob,rob\nKen,ken,ken\nRobert,gri,gri\nRobert,abc,abc\nRobert,123,123\n
Unselect:
select 3+ columns: csvtk cut -f -1,-2
$ cat testdata/names.csv \\\n | csvtk cut -f -1,-2\nlast_name,username\nPike,rob\nThompson,ken\nGriesemer,gri\nThompson,abc\nAbel,123\n
select columns except 1-3
$ cat testdata/names.csv \\\n | csvtk cut -f -1--3\nusername\nrob\nken\ngri\nabc\n123\n
select columns except first_name
: csvtk cut -f -first_name
$ cat testdata/names.csv \\\n | csvtk cut -f -first_name\nid,last_name,username\n11,Pike,rob\n2,Thompson,ken\n4,Griesemer,gri\n1,Thompson,abc\nNA,Abel,123\n
Fuzzy fields using wildcard character, csvtk cut -F -f \"*_name,username\"
$ cat testdata/names.csv \\\n | csvtk cut -F -f \"*_name,username\"\nfirst_name,last_name,username\nRob,Pike,rob\nKen,Thompson,ken\nRobert,Griesemer,gri\nRobert,Thompson,abc\nRobert,Abel,123\n
All fields: csvtk cut -F -f \"*\"
or csvtk cut -f 1-
.
$ cat testdata/names.csv \\\n | csvtk cut -F -f \"*\"\nid,first_name,last_name,username\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n1,Robert,Thompson,abc\nNA,Robert,Abel,123\n
Field ranges (read help message (\"csvtk cut -f\") for more examples)
csvtk cut -f 2-4
for column 2,3,4
$ cat testdata/names.csv \\\n | csvtk cut -f 2-4\nfirst_name,last_name,username\nRob,Pike,rob\nKen,Thompson,ken\nRobert,Griesemer,gri\nRobert,Thompson,abc\nRobert,Abel,123\n
csvtk cut -f -3--1
for discarding column 1,2,3
# or -f -1--3\n$ cat testdata/names.csv \\\n | csvtk cut -f -3--1\nusername\nrob\nken\ngri\nabc\n123\n
csvtk cut -f 2-,1
for moving 1th column to the end.
$ cat testdata/names.csv \\\n | csvtk cut -f 2-,1\nfirst_name,last_name,username,id\nRob,Pike,rob,11\nKen,Thompson,ken,2\nRobert,Griesemer,gri,4\nRobert,Thompson,abc,1\nRobert,Abel,123,NA\n
csvtk cut -f 1,1
for duplicating columns
$ cat testdata/names.csv \\\n | csvtk cut -f 1,1\nid,id\n11,11\n2,2\n4,4\n1,1\nNA,NA\n
Usage
unique data without sorting\n\nUsage:\n csvtk uniq [flags]\n\nFlags:\n -f, --fields string select these fields as keys. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for uniq\n -i, --ignore-case ignore case\n -n, --keep-n int keep at most N records for a key (default 1)\n\n
Examples:
data:
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n
unique first_name (it removes rows with duplicated first_name)
$ cat testdata/names.csv \\\n | csvtk uniq -f first_name\nid,first_name,last_name,username\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n
unique first_name, a more common way
$ cat testdata/names.csv \\\n | csvtk cut -f first_name \\\n | csvtk uniq -f 1\nfirst_name\nRob\nKen\nRobert\n
keep top 2 items for every group.
$ cat testdata/players.csv \ngender,id,name\nmale,1,A\nmale,2,B\nmale,3,C\nfemale,11,a\nfemale,12,b\nfemale,13,c\nfemale,14,d\n\n$ cat testdata/players.csv \\\n | csvtk sort -k gender:N -k id:nr \\\n | csvtk uniq -f gender -n 2\ngender,id,name\nfemale,14,d\nfemale,13,c\nmale,3,C\nmale,2,B\n
Usage
frequencies of selected fields\n\nUsage:\n csvtk freq [flags]\n\nFlags:\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -i, --ignore-case ignore case\n -r, --reverse reverse order while sorting\n -n, --sort-by-freq sort by frequency\n -k, --sort-by-key sort by key\n\n
Examples
one filed
$ cat testdata/names.csv \\\n | csvtk freq -f first_name | csvtk pretty\nfirst_name frequency\nKen 1\nRob 1\nRobert 3\n
sort by frequency. you can also use csvtk sort
with more sorting options
$ cat testdata/names.csv \\\n | csvtk freq -f first_name -n -r \\\n | csvtk pretty\nfirst_name frequency\nRobert 3\nKen 1\nRob 1\n
sorty by key
$ cat testdata/names.csv \\\n | csvtk freq -f first_name -k \\\n | csvtk pretty\nfirst_name frequency\nKen 1\nRob 1\nRobert 3\n
multiple fields
$ cat testdata/names.csv \\\n | csvtk freq -f first_name,last_name \\\n | csvtk pretty\nfirst_name last_name frequency\nRobert Abel 1\nKen Thompson 1\nRob Pike 1\nRobert Thompson 1\nRobert Griesemer 1\n
data without header row
$ cat testdata/ testdata/digitals.tsv \\\n | csvtk -t -H freq -f 1\n8 1\n1 1\n4 1\n7 1\n
Usage
intersection of multiple files\n\nAttention:\n\n 1. fields in all files should be the same, \n if not, extracting to another file using \"csvtk cut\".\n\nUsage:\n csvtk inter [flags]\n\nFlags:\n -f, --fields string select these fields as the key. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -i, --ignore-case ignore case\n\n
Examples:
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ cat testdata/region.csv\nname,region\nken,nowhere\ngri,somewhere\nshenwei,another\nThompson,there\n\n$ csvtk inter testdata/phones.csv testdata/region.csv\nusername\ngri\nken\nshenwei\n
"},{"location":"usage/#grep","title":"grep","text":"Usage
grep data by selected fields with patterns/regular expressions\n\nAttentions:\n\n 1. By default, we directly compare the column value with patterns,\n use \"-r/--use-regexp\" for partly matching.\n 2. Multiple patterns can be given by setting '-p/--pattern' more than once,\n or giving comma separated values (CSV formats).\n Therefore, please use double quotation marks for patterns containing\n comma, e.g., -p '\"A{2,}\"'\n\nUsage:\n csvtk grep [flags]\n\nFlags:\n --delete-matched delete a pattern right after being matched, this keeps the firstly matched\n data and speedups when using regular expressions\n -f, --fields string comma separated key fields, column name or index. e.g. -f 1-3 or -f id,id2\n or -F -f \"group*\" (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for grep\n -i, --ignore-case ignore case\n --immediate-output print output immediately, do not use write buffer\n -v, --invert invert match\n -n, --line-number print line number as the first column (\"n\")\n -N, --no-highlight no highlight\n -p, --pattern strings query pattern (multiple values supported). Attention: use double quotation\n marks for patterns containing comma, e.g., -p '\"A{2,}\"'\n -P, --pattern-file string pattern files (one pattern per line)\n -r, --use-regexp patterns are regular expression\n --verbose verbose output\n\n
Examples
Matched parts will be highlight.
By exact keys
$ cat testdata/names.csv \\\n | csvtk grep -f last_name -p Pike -p Abel \\\n | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\nNA Robert Abel 123\n\n# another form of multiple keys \n$ csvtk grep -f last_name -p Pike,Abel,Tom\n
By regular expression: csvtk grep -f first_name -r -p Rob
$ cat testdata/names.csv \\\n | csvtk grep -f first_name -r -p Rob \\\n | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
By pattern list
$ csvtk grep -f first_name -P name_list.txt\n
Remore rows containing any missing data (NA):
$ csvtk grep -F -f \"*\" -r -p \"^$\" -v\n
Show line number
$ cat names.csv \\\n | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n\n$ cat names.csv \\\n | csvtk grep -f first_name -r -i -p rob -n \\\n | csvtk pretty\nrow id first_name last_name username\n--- -- ---------- --------- --------\n1 11 Rob Pike rob\n3 4 Robert Griesemer gri\n4 1 Robert Thompson abc\n5 NA Robert Abel 123\n
Usage
filter rows by values of selected fields with arithmetic expression\n\nUsage:\n csvtk filter [flags]\n\nFlags:\n --any print record if any of the field satisfy the condition\n -f, --filter string filter condition. e.g. -f \"age>12\" or -f \"1,3<=2\" or -F -f \"c*!=0\"\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for filter\n -n, --line-number print line number as the first column (\"n\")\n\n
Examples
single field
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n\n$ cat testdata/names.csv \\\n | csvtk filter -f \"id>0\" \\\n | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\n
multiple fields
$ cat testdata/digitals.tsv\n4 5 6\n1 2 3\n7 8 0\n8 1,000 4\n\n$ cat testdata/digitals.tsv \\\n | csvtk -t -H filter -f \"1-3>0\"\n4 5 6\n1 2 3\n8 1,000 4\n
using --any
to print record if any of the field satisfy the condition
$ cat testdata/digitals.tsv \\\n | csvtk -t -H filter -f \"1-3>0\" --any\n4 5 6\n1 2 3\n7 8 0\n8 1,000 4\n
fuzzy fields
$ cat testdata/names.csv \\\n | csvtk filter -F -f \"i*!=0\"\nid,first_name,last_name,username\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n1,Robert,Thompson,abc\n
Usage
filter rows by awk-like arithmetic/string expressions\n\nThe arithmetic/string expression is supported by:\n\n https://github.com/Knetic/govaluate\n\nVariables formats:\n $1 or ${1} The first field/column\n $a or ${a} Column \"a\"\n ${a,b} or ${a b} or ${a (b)} Column name with special charactors,\n e.g., commas, spaces, and parentheses\n\nSupported operators and types:\n\n Modifiers: + - / * & | ^ ** % >> <<\n Comparators: > >= < <= == != =~ !~ in\n Logical ops: || &&\n Numeric constants, as 64-bit floating point (12345.678)\n String constants (single quotes: 'foobar')\n Date constants (single quotes)\n Boolean constants: true false\n Parenthesis to control order of evaluation ( )\n Arrays (anything separated by , within parenthesis: (1, 2, 'foo'))\n Prefixes: ! - ~\n Ternary conditional: ? :\n Null coalescence: ??\n\nCustom functions:\n - len(), length of strings, e.g., len($1), len($a), len($1, $2)\n - ulen(), length of unicode strings/width of unicode strings rendered\n to a terminal, e.g., len(\"\u6c88\u4f1f\")==6, ulen(\"\u6c88\u4f1f\")==4\n\nUsage:\n csvtk filter2 [flags]\n\nFlags:\n -f, --filter string awk-like filter condition. e.g. '$age>12' or '$1 > $3' or '$name==\"abc\"' or\n '$1 % 2 == 0'\n -h, --help help for filter2\n -n, --line-number print line number as the first column (\"n\")\n -s, --numeric-as-string treat even numeric fields as strings to avoid converting big numbers into\n scientific notation\n\n
Examples:
filter rows with id
greater than 3:
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n\n$ cat testdata/names.csv \\\n | csvtk filter2 -f '$id > 3'\nid,first_name,last_name,username\n11,Rob,Pike,rob\n4,Robert,Griesemer,gri\n
arithmetic and string expressions
$ cat testdata/names.csv \\\n | csvtk filter2 -f '$id > 3 || $username==\"ken\"'\nid,first_name,last_name,username\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n
More arithmetic expressions
$ cat testdata/digitals.tsv\n4 5 6\n1 2 3\n7 8 0\n8 1,000 4\n\n$ cat testdata/digitals.tsv \\\n | csvtk filter2 -H -t -f '$1 > 2 && $2 % 2 == 0'\n7 8 0\n8 1,000 4\n\n# comparison between fields and support\n$ cat testdata/digitals.tsv \\\n | csvtk filter2 -H -t -f '$2 <= $3 || ( $1 / $2 > 0.5 )'\n4 5 6\n1 2 3\n7 8 0\n
Array expressions using in
numeric or string (case sensitive)
$ cat testdata/names.csv | csvtk filter2 -f '$first_name in (\"Ken\", \"Rob\", \"robert\")'\nid,first_name,last_name,username\\\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n\n$ cat testdata/names.csv | csvtk filter2 -f '$id in (2, 4)'\nid,first_name,last_name,username\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n\n# negate by wrapping entire expression in `!()`\n$ cat testdata/names.csv | csvtk filter2 -f '!($username in (\"rob\", \"ken\"))'\nid,first_name,last_name,username\n4,Robert,Griesemer,gri\n1,Robert,Thompson,abc\nNA,Robert,Abel,123\n
Usage
join files by selected fields (inner, left and outer join).\n\nAttention:\n\n 1. Multiple keys supported\n 2. Default operation is inner join, use --left-join for left join\n and --outer-join for outer join.\n\nUsage:\n csvtk join [flags]\n\nAliases:\n join, merge\n\nFlags:\n -f, --fields string Semicolon separated key fields of all files, if given one, we think all the\n files have the same key columns. Fields of different files should be separated\n by \";\", e.g -f \"1;2\" or -f \"A,B;C,D\" or -f id (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for join\n -i, --ignore-case ignore case\n -n, --ignore-null do not match NULL values\n -k, --keep-unmatched keep unmatched data of the first file (left join)\n -L, --left-join left join, equals to -k/--keep-unmatched, exclusive with --outer-join\n --na string content for filling NA data\n -P, --only-duplicates add filenames as colname prefixes or add custom suffixes only for duplicated\n colnames\n -O, --outer-join outer join, exclusive with --left-join\n -p, --prefix-filename add each filename as a prefix to each colname. if there's no header row, we'll\n add one\n -e, --prefix-trim-ext trim extension when adding filename as colname prefix\n -s, --suffix strings add suffixes to colnames from each file\n\n
Examples:
data
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ cat testdata/region.csv\nname,region\nken,nowhere\ngri,somewhere\nshenwei,another\nThompson,there\n
All files have same key column: csvtk join -f id file1.csv file2.csv
$ csvtk join -f 1 testdata/phones.csv testdata/region.csv \\\n | csvtk pretty\nusername phone region\ngri 11111 somewhere\nken 22222 nowhere\nshenwei 999999 another\n
keep unmatched (left join)
$ csvtk join -f 1 testdata/phones.csv testdata/region.csv --left-join \\\n | csvtk pretty\nusername phone region\ngri 11111 somewhere\nrob 12345 \nken 22222 nowhere\nshenwei 999999 another\n
keep unmatched and fill with something
$ csvtk join -f 1 testdata/phones.csv testdata/region.csv --left-join --na NA \\\n | csvtk pretty\nusername phone region\ngri 11111 somewhere\nrob 12345 NA\nken 22222 nowhere\nshenwei 999999 another\n
Outer join
$ csvtk join -f 1 testdata/phones.csv testdata/region.csv --outer-join --na NA \\\n | csvtk pretty\nusername phone region\ngri 11111 somewhere\nrob 12345 NA\nken 22222 nowhere\nshenwei 999999 another\nThompson NA there\n
Files have different key columns: csvtk join -f \"username;username;name\" testdata/names.csv phone.csv adress.csv -k
. Note that fields are separated with ;
not ,
.
$ csvtk join -f \"username;name\" testdata/phones.csv testdata/region.csv --left-join --na NA \\\n | csvtk pretty\nusername phone region\ngri 11111 somewhere\nrob 12345 NA\nken 22222 nowhere\nshenwei 999999 another\n
Adding each filename as a prefix to each colname
$ cat testdata/1.csv \nname,attr\nfoo,cool\nbar,handsome\nbob,beutiful\n\n$ cat testdata/2.csv \nname,major\nbar,bioinformatics\nbob,microbiology\nbob,computer science\n\n$ csvtk join testdata/{1,2}.csv \\\n | csvtk pretty \nname attr major\n---- -------- -----------------\nbar handsome bioinformatics\nbob beutiful microbiology\nbob beutiful computer science\n\n$ csvtk join testdata/{1,2}.csv --prefix-filename \\\n | csvtk pretty \nname 1.csv-attr 2.csv-major\n---- ---------- -----------------\nbar handsome bioinformatics\nbob beutiful microbiology\nbob beutiful computer science\n\n# trim the file extention\n$ csvtk join testdata/{1,2}.csv --prefix-filename --prefix-trim-ext \\\n | csvtk pretty \nname 1-attr 2-major\n---- -------- -----------------\nbar handsome bioinformatics\nbob beutiful microbiology\nbob beutiful computer science\n
Adding each filename as a prefix to each colname for data without header row
$ cat testdata/A.f.csv \na,x,1\nb,y,2\n\n$ cat testdata/B.f.csv \na,x,3\nb,y,4\n\n$ cat testdata/C.f.csv \na,x,5\nb,y,6\n\n$ csvtk join -H testdata/{A,B,C}.f.csv \\\n | csvtk pretty -H\na x 1 x 3 x 5\nb y 2 y 4 y 6\n\n$ csvtk join -H testdata/{A,B,C}.f.csv -p \\\n | csvtk pretty\nkey1 A.f.csv A.f.csv B.f.csv B.f.csv C.f.csv C.f.csv\n---- ------- ------- ------- ------- ------- -------\na x 1 x 3 x 5\nb y 2 y 4 y 6\n\n# trim file extention\n$ csvtk join -H testdata/{A,B,C}.f.csv -p -e \\\n | csvtk pretty\nkey1 A.f A.f B.f B.f C.f C.f\n---- --- --- --- --- --- ---\na x 1 x 3 x 5\nb y 2 y 4 y 6\n\n# use column 1 and 2 as keys\n$ csvtk join -H testdata/{A,B,C}.f.csv -p -e -f 1,2 \\\n | csvtk pretty\nkey1 key2 A.f B.f C.f\n---- ---- --- --- ---\na x 1 3 5\nb y 2 4 6\n\n# change column names furthor\n$ csvtk join -H testdata/{A,B,C}.f.csv -p -e -f 1,2 \\\n | csvtk rename2 -F -f '*' -p '\\.f$' \\\n | csvtk pretty\nkey1 key2 A B C\n---- ---- - - -\na x 1 3 5\nb y 2 4 6\n
add suffixes to colnames from each file (-s/--suffix
)
$ csvtk join -H testdata/{A,B,C}.f.csv -s A,B,C \\\n | csvtk pretty\nkey1 c2-A c3-A c2-B c3-B c2-C c3-C\n---- ---- ---- ---- ---- ---- ----\na x 1 x 3 x 5\nb y 2 y 4 y 6\n
Usage
split CSV/TSV into multiple files according to column values\n\nNotes:\n\n 1. flag -o/--out-file can specify out directory for splitted files.\n 2. flag -s/--prefix-as-subdir can create subdirectories with prefixes of\n keys of length X, to avoid writing too many files in the output directory.\n\nUsage:\n csvtk split [flags]\n\nFlags:\n -g, --buf-groups int buffering N groups before writing to file (default 100)\n -b, --buf-rows int buffering N rows for every group before writing to file (default 100000)\n -f, --fields string comma separated key fields, column name or index. e.g. -f 1-3 or -f\n id,id2 or -F -f \"group*\" (default \"1\")\n --force overwrite existing output directory (given by -o).\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for split\n -i, --ignore-case ignore case\n -G, --out-gzip force output gzipped file\n -p, --out-prefix string output file prefix, the default value is the input file. use -p \"\" to\n disable outputting prefix\n -s, --prefix-as-subdir int create subdirectories with prefixes of keys of length X, to avoid writing\n too many files in the output directory\n\n
Examples
Test data
$ cat names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n
split according to first_name
$ csvtk split names.csv -f first_name\n$ ls *.csv\nnames.csv names-Ken.csv names-Rob.csv names-Robert.csv\n\n$ cat names-Ken.csv\nid,first_name,last_name,username\n2,Ken,Thompson,ken\n\n$ cat names-Rob.csv\nid,first_name,last_name,username\n11,Rob,Pike,rob\n\n$ cat names-Robert.csv\nid,first_name,last_name,username\n4,Robert,Griesemer,gri\n1,Robert,Thompson,abc\nNA,Robert,Abel,123\n
split according to first_name
and last_name
$ csvtk split names.csv -f first_name,last_name\n$ ls *.csv\nnames.csv names-Robert-Abel.csv names-Robert-Thompson.csv\nnames-Ken-Thompson.csv names-Robert-Griesemer.csv names-Rob-Pike.csv\n
flag -o/--out-file
can specify out directory for splitted files
$ seq 10000 | csvtk split -H -o result\n$ ls result/*.csv | wc -l\n10000\n
Do not output prefix, use -p \"\"
.
$ echo -ne \"1,ACGT\\n2,GGCA\\n3,ACAAC\\n\"\n1,ACGT\n2,GGCA\n3,ACAAC\n\n$ echo -ne \"1,ACGT\\n2,GGCA\\n3,ACAAC\\n\" | csvtk split -H -f 2 -o t -p \"\" -s 3 --force\n\n$ tree t\nt\n\u251c\u2500\u2500 ACA\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ACAAC.csv\n\u251c\u2500\u2500 ACG\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 ACGT.csv\n\u2514\u2500\u2500 GGC\n \u2514\u2500\u2500 GGCA.csv\n\n4 directories, 3 files\n
extreme example 1: lots (1M) of rows in groups
$ yes 2 | head -n 10000000 | gzip -c > t.gz\n\n$ memusg -t csvtk -H split t.gz\nelapsed time: 5.859s\npeak rss: 41.45 MB\n\n# check\n$ zcat t-2.gz | wc -l\n10000000\n$ zcat t-2.gz | md5sum\nf194afd7cecf645c0e3cce50c9bc526e -\n$ zcat t.gz | md5sum\nf194afd7cecf645c0e3cce50c9bc526e -\n
extreme example 2: lots (10K) of groups
$ seq 10000 | gzip -c > t2.gz\n\n$ memusg -t csvtk -H split t2.gz -o t2\nelapsed time: 20.856s\npeak rss: 23.77 MB\n\n# check\n$ ls t2/*.gz | wc -l\n10000\n$ zcat t2/*.gz | sort -k 1,1n | md5sum\n72d4ff27a28afbc066d5804999d5a504 -\n$ zcat t2.gz | md5sum\n72d4ff27a28afbc066d5804999d5a504 -\n
since, v0.31.0, the flag -s/--prefix-as-subdir
can create subdirectories with prefixes of keys of length X, to avoid writing too many files in the output directory.
$ memusg -t csvtk -H split t2.gz -o t2 -s 3\nelapsed time: 2.668s\npeak rss: 1.79 GB\n
$ fd .gz$ t2 | rush 'zcat {}' | sort -k 1,1n | md5sum 72d4ff27a28afbc066d5804999d5a504 -
$ tree t2/ | more\nt2/\n\u251c\u2500\u2500 100\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-10000.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1000.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1001.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1002.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1003.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1004.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1005.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1006.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1007.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1008.gz\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 t2-1009.gz\n\u251c\u2500\u2500 101\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1010.gz\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 t2-1011.gz\n...\n\u251c\u2500\u2500 t2-994.gz\n\u251c\u2500\u2500 t2-995.gz\n\u251c\u2500\u2500 t2-996.gz\n\u251c\u2500\u2500 t2-997.gz\n\u251c\u2500\u2500 t2-998.gz\n\u251c\u2500\u2500 t2-999.gz\n\u251c\u2500\u2500 t2-99.gz\n\u2514\u2500\u2500 t2-9.gz\n\n901 directories, 10000 files\n
Usage
split XLSX sheet into multiple sheets according to column values\n\nStrengths: Sheet properties are remained unchanged.\nWeakness : Complicated sheet structures are not well supported, e.g.,\n 1. merged cells\n 2. more than one header row\n\nUsage:\n csvtk splitxlsx [flags]\n\nFlags:\n -f, --fields string comma separated key fields, column name or index. e.g. -f 1-3 or -f id,id2\n or -F -f \"group*\" (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for splitxlsx\n -i, --ignore-case ignore case (cell value)\n -a, --list-sheets list all sheets\n -N, --sheet-index int Nth sheet to retrieve (default 1)\n -n, --sheet-name string sheet to retrieve\n\n
Examples
example data
# list all sheets\n$ csvtk xlsx2csv -a accounts.xlsx\nindex sheet\n1 names\n2 phones\n3 region\n\n# data of sheet \"names\"\n$ csvtk xlsx2csv accounts.xlsx | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
split sheet \"names\" according to first_name
$ csvtk splitxlsx accounts.xlsx -n names -f first_name\n\n$ ls accounts.*\naccounts.split.xlsx accounts.xlsx\n\n$ csvtk splitxlsx -a accounts.split.xlsx\nindex sheet\n1 names\n2 phones\n3 region\n4 Rob\n5 Ken\n6 Robert\n\n$ csvtk xlsx2csv accounts.split.xlsx -n Rob \\\n | csvtk pretty\nid first_name last_name username\n11 Rob Pike rob\n\n$ csvtk xlsx2csv accounts.split.xlsx -n Robert \\\n | csvtk pretty\nid first_name last_name username\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n
Usage
compute combinations of items at every row\n\nUsage:\n csvtk comb [flags]\n\nAliases:\n comb, combination\n\nFlags:\n -h, --help help for comb\n -i, --ignore-case ignore-case\n -S, --nat-sort sort items in natural order\n -n, --number int number of items in a combination, 0 for no limit, i.e., return all combinations\n (default 2)\n -s, --sort sort items in a combination\n\n
Examples:
$ cat players.csv \ngender,id,name\nmale,1,A\nmale,2,B\nmale,3,C\nfemale,11,a\nfemale,12,b\nfemale,13,c\nfemale,14,d\n\n# put names of one group in one row\n$ cat players.csv \\\n | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk cut -f 2 \nname\nA;B;C\na;b;c;d\n\n# n = 2\n$ cat players.csv \\\n | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk cut -f 2 \\\n | csvtk comb -d ';' -n 2\nA,B\nA,C\nB,C\na,b\na,c\nb,c\na,d\nb,d\nc,d\n\n# n = 3\n$ cat players.csv \\\n | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk cut -f 2 \\\n | csvtk comb -d ';' -n 3\nA,B,C\na,b,c\na,b,d\na,c,d\nb,c,d\n\n# n = 0\n$ cat players.csv \\\n | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk cut -f 2 \\\n | csvtk comb -d ';' -n 0\nA\nB\nA,B\nC\nA,C\nB,C\nA,B,C\na\nb\na,b\nc\na,c\nb,c\na,b,c\nd\na,d\nb,d\na,b,d\nc,d\na,c,d\nb,c,d\na,b,c,d\n\n
"},{"location":"usage/#fix","title":"fix","text":"Usage
fix CSV/TSV with different numbers of columns in rows\n\nHow to:\n 1. First -n/--buf-rows rows are read to check the maximum number of columns.\n The default value 0 means all rows will be read.\n 2. Buffered and remaining rows with fewer columns are appended with empty\n cells before output.\n 3. An error will be reported if the number of columns of any remaining row\n is larger than the maximum number of columns.\n\nUsage:\n csvtk fix [flags]\n\nFlags:\n -n, --buf-rows int the number of rows to determine the maximum number of columns. 0 for all rows.\n -h, --help help for fix\n\n\n
Examples
$ cat testdata/unequal_ncols.csv\nid,first_name,last_name\n11,\"Rob\",\"Pike\"\n2,Ken,Thompson\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\"\n\n\n$ cat testdata/unequal_ncols.csv | csvtk pretty\n[ERRO] record on line 4: wrong number of fields\n\n\n\n$ cat testdata/unequal_ncols.csv | csvtk fix | csvtk pretty -S grid\n[INFO] the maximum number of columns in all 6 rows: 4\n+----+------------+-----------+-----+\n| id | first_name | last_name | |\n+====+============+===========+=====+\n| 11 | Rob | Pike | |\n+----+------------+-----------+-----+\n| 2 | Ken | Thompson | |\n+----+------------+-----------+-----+\n| 4 | Robert | Griesemer | gri |\n+----+------------+-----------+-----+\n| 1 | Robert | Thompson | abc |\n+----+------------+-----------+-----+\n| NA | Robert | | |\n+----+------------+-----------+-----+\n\n
"},{"location":"usage/#fix-quotes","title":"fix-quotes","text":"Usage
fix malformed CSV/TSV caused by double-quotes\n\nThis command fixes fields not appropriately enclosed by double-quotes\nto meet the RFC4180 standard (https://rfc-editor.org/rfc/rfc4180.html).\n\nWhen and how to:\n 1. Values containing bare double quotes. e.g.,\n a,abc\" xyz,d\n Error information: bare \" in non-quoted-field.\n Fix: adding the flag -l/--lazy-quotes.\n Using this command:\n a,abc\" xyz,d -> a,\"abc\"\" xyz\",d\n 2. Values with double quotes in the begining but not in the end. e.g.,\n a,\"abc\" xyz,d\n Error information: extraneous or missing \" in quoted-field.\n Using this command:\n a,\"abc\" xyz,d -> a,\"\"\"abc\"\" xyz\",d\n\nNext:\n 1. You can process the data without the flag -l/--lazy-quotes.\n 2. Use 'csvtk del-quotes' if you want to restore the original format.\n\nLimitation:\n 1. Values containing line breaks are not supported.\n\nUsage:\n csvtk fix-quotes [flags]\n\nFlags:\n -b, --buffer-size string size of buffer, supported unit: K, M, G. You need increase the value when\n \"bufio.Scanner: token too long\" error reported (default \"1G\")\n -h, --help help for fix-quotes\n\n
Examples:
Test data, in which there are five cases with values containing double quotes.
$ cat testdata/malformed.tsv\n1 Cellvibrio no quotes & not tab\n2 \"Cellvibrio gilvus\" quotes can be removed\n3 \"quotes required\" quotes needed (with a tab in the cell)\n4 fake\" record bare double-quote in non-quoted-field\n5 \"Cellvibrio\" Winogradsky only with doub-quote in the beginning\n6 fake record2\" \"only with doub-quote in the end\"\n\n$ cat testdata/malformed.tsv | csvtk cut -f 1-\n[ERRO] parse error on line 2, column 3: bare \" in non-quoted-field\n\n# -l does not work, and it's messed up.\n$ cat testdata/malformed.tsv | csvtk cut -f 1- -l\n1 Cellvibrio no quotes & not tab\n\"2 \"\"Cellvibrio gilvus\"\" quotes can be removed\"\n\"3 \"\"quotes required\"\" quotes needed (with a tab in the cell)\"\n\"4 fake\"\" record bare double-quote in non-quoted-field\"\n\"5 \"\"Cellvibrio\"\" Winogradsky only with doub-quote in the beginning\"\n\"6 fake record2\"\" \"\"only with doub-quote in the end\"\"\"\n
Fix it!!!
$ cat testdata/malformed.tsv | csvtk fix-quotes -t\n1 Cellvibrio no quotes & not tab\n2 \"Cellvibrio gilvus\" quotes can be removed\n3 \"quotes required\" quotes needed (with a tab in the cell)\n4 \"fake\"\" record\" bare double-quote in non-quoted-field\n5 \"\"\"Cellvibrio\"\" Winogradsky\" only with doub-quote in the beginning\n6 \"fake record2\"\"\" \"only with doub-quote in the end\"\n\n# pretty\n$ cat testdata/malformed.tsv | csvtk fix-quotes -t | csvtk pretty -Ht -S grid\n+---+--------------------------+----------------------------------------+\n| 1 | Cellvibrio | no quotes & not tab |\n+---+--------------------------+----------------------------------------+\n| 2 | Cellvibrio gilvus | quotes can be removed |\n+---+--------------------------+----------------------------------------+\n| 3 | quotes required | quotes needed (with a tab in the cell) |\n+---+--------------------------+----------------------------------------+\n| 4 | fake\" record | bare double-quote in non-quoted-field |\n+---+--------------------------+----------------------------------------+\n| 5 | \"Cellvibrio\" Winogradsky | only with doub-quote in the beginning |\n+---+--------------------------+----------------------------------------+\n| 6 | fake record2\" | only with doub-quote in the end |\n+---+--------------------------+----------------------------------------+\n\n# do something, like searching rows containing double-quotes.\n# since the command-line argument parser csvtk uses parse the value of flag -p\n# as CSV data, we have to use -p '\"\"\"\"' to represents one double-quotes,\n# where the outter two double quotes are used to quote the value,\n# and the two inner double-quotes actually means an escaped double-quote\n#\n$ cat testdata/malformed.tsv \\\n | csvtk fix-quotes -t \\\n | csvtk grep -Ht -f 2 -r -p '\"\"\"\"'\n4 \"fake\"\" record\" bare double-quote in non-quoted-field\n5 \"\"\"Cellvibrio\"\" Winogradsky\" only with doub-quote in the beginning\n6 \"fake record2\"\"\" only with doub-quote in the end\n
Note that fixed rows are different from the orginal ones, you can use csvtk del-quotes
to reset them.
$ cat testdata/malformed.tsv \\\n | csvtk fix-quotes -t \\\n | csvtk filter2 -t -f '$1 > 0' \\\n | csvtk del-quotes -t\n1 Cellvibrio no quotes & not tab\n2 Cellvibrio gilvus quotes can be removed\n3 \"quotes required\" quotes needed (with a tab in the cell)\n4 fake\" record bare double-quote in non-quoted-field\n5 \"Cellvibrio\" Winogradsky only with doub-quote in the beginning\n6 fake record2\" only with doub-quote in the end\n
Usage
remove extra double quotes added by 'fix-quotes'\n\nLimitation:\n 1. Values containing line breaks are not supported.\n\nUsage:\n csvtk del-quotes [flags]\n\nFlags:\n -h, --help help for del-quotes\n
Examples: see eamples of fix-quotes
"},{"location":"usage/#add-header","title":"add-header","text":"Usage
add column names\n\nUsage:\n csvtk add-header [flags]\n\nFlags:\n -h, --help help for add-header\n -n, --names strings column names to add, in CSV format\n\n
Examples:
No new colnames given:
$ seq 3 | csvtk mutate -H \\\n | csvtk add-header\n[WARN] colnames not given, c1, c2, c3... will be used\nc1,c2\n1,1\n2,2\n3,3\n
Adding new colnames:
$ seq 3 | csvtk mutate -H \\\n | csvtk add-header -n a,b\na,b\n1,1\n2,2\n3,3\n$ seq 3 | csvtk mutate -H \\\n | csvtk add-header -n a -n b\na,b\n1,1\n2,2\n3,3\n\n$ seq 3 | csvtk mutate -H -t \\\n | csvtk add-header -t -n a,b\na b\n1 1\n2 2\n3 3\n
Usage
delete column names\n\nAttention:\n 1. It delete the first lines of all input files.\n\nUsage:\n csvtk del-header [flags]\n\nFlags:\n -h, --help help for del-header\n\n
Examples:
$ seq 3 | csvtk add-header\nc1\n1\n2\n3\n\n$ seq 3 | csvtk add-header | csvtk del-header\n1\n2\n3\n\n$ seq 3 | csvtk del-header -H\n1\n2\n3\n
"},{"location":"usage/#rename","title":"rename","text":"Usage
rename column names with new names\n\nUsage:\n csvtk rename [flags]\n\nFlags:\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -n, --names string comma separated new names\n\n
Examples:
Setting new names: csvtk rename -f A,B -n a,b
or csvtk rename -f 1-3 -n a,b,c
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ cat testdata/phones.csv \\\n | csvtk rename -f 1-2 -n \u59d3\u540d,\u7535\u8bdd \\\n | csvtk pretty \n\u59d3\u540d \u7535\u8bdd\ngri 11111\nrob 12345\nken 22222\nshenwei 999999\n
Also support any order
$ cat testdata/phones.csv \\\n | csvtk rename -f 2,1 -n \u7535\u8bdd,\u59d3\u540d \\\n | csvtk pretty\n\u59d3\u540d \u7535\u8bdd\ngri 11111\nrob 12345\nken 22222\nshenwei 999999\n
Usage
rename column names by regular expression\n\nSpecial replacement symbols:\n\n {nr} ascending number, starting from --start-num\n {kv} Corresponding value of the key (captured variable $n) by key-value file,\n n can be specified by flag --key-capt-idx (default: 1)\n\nUsage:\n csvtk rename2 [flags]\n\nFlags:\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for rename2\n -i, --ignore-case ignore case\n -K, --keep-key keep the key as value when no value found for the key\n --key-capt-idx int capture variable index of key (1-based) (default 1)\n --key-miss-repl string replacement for key with no corresponding value\n -k, --kv-file string tab-delimited key-value file for replacing key with value\n when using \"{kv}\" in -r (--replacement)\n -A, --kv-file-all-left-columns-as-value treat all columns except 1th one as value for kv-file with\n more than 2 columns\n --nr-width int minimum width for {nr} in flag -r/--replacement. e.g.,\n formating \"1\" to \"001\" by --nr-width 3 (default 1)\n -p, --pattern string search regular expression\n -r, --replacement string renamement. supporting capture variables. e.g. $1\n represents the text of the first submatch. ATTENTION: use\n SINGLE quote NOT double quotes in *nix OS or use the \\\n escape character. Ascending number is also supported by\n \"{nr}\".use ${1} instead of $1 when {kv} given!\n -n, --start-num int starting number when using {nr} in replacement (default 1)\n\n
Examples:
Add suffix to all column names.
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ cat testdata/phones.csv \\\n | csvtk rename2 -F -f \"*\" -p \"(.*)\" -r 'prefix_${1}_suffix'\nprefix_username_suffix,prefix_phone_suffix\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n
supporting {kv}
and {nr}
in csvtk replace
. e.g., replace barcode with sample name.
$ cat barcodes.tsv\nSample Barcode\nsc1 CCTAGATTAAT\nsc2 GAAGACTTGGT\nsc3 GAAGCAGTATG\nsc4 GGTAACCTGAC\nsc5 ATAGTTCTCGT\n\n$ cat table.tsv\ngene ATAGTTCTCGT GAAGCAGTATG GAAGACTTGGT AAAAAAAAAA\ngene1 0 0 3 0\ngen1e2 0 0 0 0\n\n# note that, we must arrange the order of barcodes.tsv to KEY-VALUE\n$ csvtk cut -t -f 2,1 barcodes.tsv\nBarcode Sample\nCCTAGATTAAT sc1\nGAAGACTTGGT sc2\nGAAGCAGTATG sc3\nGGTAACCTGAC sc4\nATAGTTCTCGT sc5\n\n# here we go!!!!\n\n$ csvtk rename2 -t -k <(csvtk cut -t -f 2,1 barcodes.tsv) \\\n -f -1 -p '(.+)' -r '{kv}' --key-miss-repl unknown table.tsv\ngene sc5 sc3 sc2 unknown\ngene1 0 0 3 0\ngen1e2 0 0 0 0\n
{nr}
, incase you need this
$ echo \"a,b,c,d\" \\\n | csvtk rename2 -p '(.+)' -r 'col_{nr}' -f -1 --start-num 2\na,col_2,col_3,col_4\n
Usage
replace data of selected fields by regular expression\n\nNote that the replacement supports capture variables.\ne.g. $1 represents the text of the first submatch.\nATTENTION: use SINGLE quote NOT double quotes in *nix OS.\n\nExamples: Adding space to cell values.\n\n csvtk replace -p \"(.)\" -r '$1 '\n\nOr use the \\ escape character.\n\n csvtk replace -p \"(.)\" -r \"\\$1 \"\n\nmore on: http://shenwei356.github.io/csvtk/usage/#replace\n\nSpecial replacement symbols:\n\n {nr} Record number, starting from 1\n {kv} Corresponding value of the key (captured variable $n) by key-value file,\n n can be specified by flag --key-capt-idx (default: 1)\n\nUsage:\n csvtk replace [flags]\n\nFlags:\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB\n (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for replace\n -i, --ignore-case ignore case\n -K, --keep-key keep the key as value when no value found for the key\n --key-capt-idx int capture variable index of key (1-based) (default 1)\n --key-miss-repl string replacement for key with no corresponding value\n -k, --kv-file string tab-delimited key-value file for replacing key with value\n when using \"{kv}\" in -r (--replacement)\n -A, --kv-file-all-left-columns-as-value treat all columns except 1th one as value for kv-file with\n more than 2 columns\n --nr-width int minimum width for {nr} in flag -r/--replacement. e.g.,\n formating \"1\" to \"001\" by --nr-width 3 (default 1)\n -p, --pattern string search regular expression\n -r, --replacement string replacement. supporting capture variables. e.g. $1\n represents the text of the first submatch. ATTENTION: for\n *nix OS, use SINGLE quote NOT double quotes or use the \\\n escape character. Record number is also supported by\n \"{nr}\".use ${1} instead of $1 when {kv} given!\n
Examples
remove Chinese charactors
$ csvtk replace -F -f \"*_name\" -p \"\\p{Han}+\" -r \"\"\n
replace by key-value files
$ cat data.tsv\nname id\nA ID001\nB ID002\nC ID004\n\n$ cat alias.tsv\n001 Tom\n002 Bob\n003 Jim\n\n$ csvtk replace -t -f 2 -p \"ID(.+)\" -r \"N: {nr}, alias: {kv}\" -k alias.tsv data.tsv\n[INFO] read key-value file: alias.tsv\n[INFO] 3 pairs of key-value loaded\nname id\nA N: 1, alias: Tom\nB N: 2, alias: Bob\nC N: 3, alias\n
Usage
round float to n decimal places\n\nUsage:\n csvtk round [flags]\n\nFlags:\n -a, --all-fields all fields, overides -f/--fields\n -n, --decimal-width int limit floats to N decimal points (default 2)\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for round\n\n
Examples:
$ cat testdata/floats.csv | csvtk pretty\na b\n0.12345 abc\nNA 0.9999198549640733\n12.3 e3\n1.4814505299984235e-05 -3.1415926E05\n\n# one or more fields\n$ cat testdata/floats.csv | csvtk round -n 2 -f b | csvtk pretty \na b\n0.12345 abc\nNA 1.00\n12.3 e3\n1.4814505299984235e-05 -3.14E05\n\n# all fields\n$ cat testdata/floats.csv | csvtk round -n 2 -a | csvtk pretty \na b\n0.12 abc\nNA 1.00\n12.30 e3\n1.48e-05 -3.14E05\n
"},{"location":"usage/#mutate","title":"mutate","text":"Usage
create a new column from selected fields by regular expression\n\nUsage:\n csvtk mutate [flags]\n\nFlags:\n --after string insert the new column right after the given column name\n --at int where the new column should appear, 1 for the 1st column, 0 for the last column\n --before string insert the new column right before the given column name\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -i, --ignore-case ignore case\n --na for unmatched data, use blank instead of original data\n -n, --name string new column name\n -p, --pattern string search regular expression with capture bracket. e.g. (default \"^(.+)$\")\n\n
Examples
By default, copy a column: csvtk mutate -f id -n newname
Extract prefix of data as group name using regular expression (get \"A\" from \"A.1\" as group name):
csvtk mutate -f sample -n group -p \"^(.+?)\\.\"\n
get the first letter as new column
$ cat testdata/phones.csv\nusername,phone\ngri,11111\nrob,12345\nken,22222\nshenwei,999999\n\n$ cat testdata/phones.csv \\\n | csvtk mutate -f username -p \"^(\\w)\" -n first_letter\nusername,phone,first_letter\ngri,11111,g\nrob,12345,r\nken,22222,k\nshenwei,999999,s\n
specify the position of the new column (see similar examples of csvtk mutate2
)
$ cat testdata/phones.csv \\\n | csvtk mutate -f username -p \"^(\\w)\" -n first_letter --at 2\nusername,first_letter,phone\ngri,g,11111\nrob,r,12345\nken,k,22222\nshenwei,s,999999\n\n$ cat testdata/phones.csv \\\n | csvtk mutate -f username -p \"^(\\w)\" -n first_letter --after username\nusername,first_letter,phone\ngri,g,11111\nrob,r,12345\nken,k,22222\nshenwei,s,999999\n\n$ cat testdata/phones.csv \\\n | csvtk mutate -f username -p \"^(\\w)\" -n first_letter --before username\nfirst_letter,username,phone\ng,gri,11111\nr,rob,12345\nk,ken,22222\ns,shenwei,99999\n
Usage
create a new column from selected fields by awk-like arithmetic/string expressions\n\nThe arithmetic/string expression is supported by:\n\n https://github.com/Knetic/govaluate\n\nVariables formats:\n $1 or ${1} The first field/column\n $a or ${a} Column \"a\"\n ${a,b} or ${a b} or ${a (b)} Column name with special charactors,\n e.g., commas, spaces, and parentheses\n\nSupported operators and types:\n\n Modifiers: + - / * & | ^ ** % >> <<\n Comparators: > >= < <= == != =~ !~\n Logical ops: || &&\n Numeric constants, as 64-bit floating point (12345.678)\n String constants (single quotes: 'foobar')\n Date constants (single quotes)\n Boolean constants: true false\n Parenthesis to control order of evaluation ( )\n Arrays (anything separated by , within parenthesis: (1, 2, 'foo'))\n Prefixes: ! - ~\n Ternary conditional: ? :\n Null coalescence: ??\n\nCustom functions:\n - len(), length of strings, e.g., len($1), len($a), len($1, $2)\n - ulen(), length of unicode strings/width of unicode strings rendered\n to a terminal, e.g., len(\"\u6c88\u4f1f\")==6, ulen(\"\u6c88\u4f1f\")==4\n\nUsage:\n csvtk mutate2 [flags]\n\nFlags:\n --after string insert the new column right after the given column name\n --at int where the new column should appear, 1 for the 1st column, 0 for the last column\n --before string insert the new column right before the given column name\n -w, --decimal-width int limit floats to N decimal points (default 2)\n -e, --expression string arithmetic/string expressions. e.g. \"'string'\", '\"abc\"', ' $a + \"-\" + $b ',\n '$1 + $2', '$a / $b', ' $1 > 100 ? \"big\" : \"small\" '\n -h, --help help for mutate2\n -n, --name string new column name\n -s, --numeric-as-string treat even numeric fields as strings to avoid converting big numbers into\n scientific notation\n\n
Example
Constants
$ cat testdata/digitals.tsv \\\n | csvtk mutate2 -t -H -e \" 'abc' \"\n4 5 6 abc\n1 2 3 abc\n7 8 0 abc\n8 1,000 4 abc\n\n$ val=123 \\\n && cat testdata/digitals.tsv \\\n | csvtk mutate2 -t -H -e \" $val \"\n4 5 6 123\n1 2 3 123\n7 8 0 123\n8 1,000 4 123\n
String concatenation
$ cat testdata/names.csv \\\n | csvtk mutate2 -n full_name -e ' $first_name + \" \" + $last_name ' \\\n | csvtk pretty\nid first_name last_name username full_name\n11 Rob Pike rob Rob Pike\n2 Ken Thompson ken Ken Thompson\n4 Robert Griesemer gri Robert Griesemer\n1 Robert Thompson abc Robert Thompson\nNA Robert Abel 123 Robert Abel\n
Math
$ cat testdata/digitals.tsv | csvtk mutate2 -t -H -e '$1 + $3' -w 0\n4 5 6 10\n1 2 3 4\n7 8 0 7\n8 1,000 4 12\n
Bool
$ cat testdata/digitals.tsv | csvtk mutate2 -t -H -e '$1 > 5'\n4 5 6 false\n1 2 3 false\n7 8 0 true\n8 1,000 4 true\n
Ternary condition (? :
)
$ cat testdata/digitals.tsv | csvtk mutate2 -t -H -e '$1 > 5 ? \"big\" : \"small\" '\n4 5 6 small\n1 2 3 small\n7 8 0 big\n8 1,000 4 big\n
Null coalescence (??
)
$ echo -e \"one,two\\na1,a2\\n,b2\\na2,\" | csvtk pretty \none two\n--- ---\na1 a2\n b2\na2\n\n$ echo -e \"one,two\\na1,a2\\n,b2\\na2,\" \\\n | csvtk mutate2 -n three -e '$one ?? $two' \\\n | csvtk pretty\none two three\n--- --- -----\na1 a2 a1\n b2 b2\na2 a2\n
Specify the position of the new column
$ echo -ne \"a,b,c\\n1,2,3\\n\"\na,b,c\n1,2,3\n\n# in the end (default)\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate2 -e '$a+$c' -n x -w 0\na,b,c,x\n1,2,3,4\n\n# in the beginning\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate2 -e '$a+$c' -n x -w 0 --at 1\nx,a,b,c\n4,1,2,3\n\n# at another position\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate2 -e '$a+$c' -n x -w 0 --at 3\na,b,x,c\n1,2,4,3\n\n# right after the given column name\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate2 -e '$a+$c' -n x -w 0 --after a\na,x,b,c\n1,4,2,3\n\n# right before the given column name\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate2 -e '$a+$c' -n x -w 0 --before c\na,b,x,c\n1,2,4,3\n
Usage
create a new column from selected fields with Go-like expressions\n\nThe expression language is supported by Expr:\n\n https://expr-lang.org/docs/language-definition\n\nVariables formats:\n $1 or ${1} The first field/column\n $a or ${a} Column \"a\"\n ${a,b} or ${a b} or ${a (b)} Column name with special charactors,\n e.g., commas, spaces, and parentheses\n\nSupported Operators:\n\n Arithmetic: + - / * ^ ** %\n Comparison: > >= < <= == !=\n Logical: not ! and && or ||\n String: + contains startsWith endsWith\n Regex: matches\n Range: ..\n Slice: [:]\n Pipe: |\n Ternary conditional: ? :\n Null coalescence: ??\n\nSupported Literals:\n\n Arrays: [1, 2, 3]\n Boolean: true false\n Float: 0.5 .5\n Integer: 42 0x2A 0o52 0b101010\n Map: {a: 1, b: 2}\n Null: nil\n String: \"foo\" 'bar'\n\nSee Expr language definition link for documentation on built-in functions.\n\nCustom functions:\n - ulen(), length of unicode strings/width of unicode strings rendered\n to a terminal, e.g., len(\"\u6c88\u4f1f\")==6, ulen(\"\u6c88\u4f1f\")==4\n\nUsage:\n csvtk mutate3 [flags]\n\nFlags:\n --after string insert the new column right after the given column name\n --at int where the new column should appear, 1 for the 1st column, 0 for the last column\n --before string insert the new column right before the given column name\n -w, --decimal-width int limit floats to N decimal points (default 2)\n -e, --expression string arithmetic/string expressions. e.g. \"'string'\", '\"abc\"', ' $a + \"-\" + $b ',\n '$1 + $2', '$a / $b', ' $1 > 100 ? \"big\" : \"small\" '\n -h, --help help for mutate3\n -n, --name string new column name\n -s, --numeric-as-string treat even numeric fields as strings to avoid converting big numbers into\n scientific notation\n
Examples
Constants
$ cat testdata/digitals.tsv \\\n | csvtk mutate3 -t -H -e \" 'abc' \"\n4 5 6 abc\n1 2 3 abc\n7 8 0 abc\n8 1,000 4 abc\n\n$ val=123 \\\n && cat testdata/digitals.tsv \\\n | csvtk mutate3 -t -H -e \" $val \"\n4 5 6 123\n1 2 3 123\n7 8 0 123\n8 1,000 4 123\n
String concatenation
$ cat testdata/names.csv \\\n | csvtk mutate3 -n full_name -e ' $first_name + \" \" + $last_name ' \\\n | csvtk pretty\nid first_name last_name username full_name\n11 Rob Pike rob Rob Pike\n2 Ken Thompson ken Ken Thompson\n4 Robert Griesemer gri Robert Griesemer\n1 Robert Thompson abc Robert Thompson\nNA Robert Abel 123 Robert Abel\n
Math
$ cat testdata/digitals.tsv | csvtk mutate3 -t -H -e '$1 + $3' -w 0\n4 5 6 10\n1 2 3 4\n7 8 0 7\n8 1,000 4 12\n
Bool
$ cat testdata/digitals.tsv | csvtk mutate3 -t -H -e '$1 > 5'\n4 5 6 false\n1 2 3 false\n7 8 0 true\n8 1,000 4 true\n
Ternary condition (? :
)
$ cat testdata/digitals.tsv | csvtk mutate3 -t -H -e '$1 > 5 ? \"big\" : \"small\" '\n4 5 6 small\n1 2 3 small\n7 8 0 big\n8 1,000 4 big\n
Null coalescence (??
)
$ echo -e \"one,two\\na1,a2\\n,b2\\na2,\" | csvtk pretty\none two\n--- ---\na1 a2\n b2\na2\n\n$ echo -e \"one,two\\na1,a2\\n,b2\\na2,\" \\\n | csvtk mutate3 -n three -e '$one ?? $two' \\\n | csvtk pretty\none two three\n--- --- -----\na1 a2 a1\n b2 b2\na2 a2\n
Specify the position of the new column
$ echo -ne \"a,b,c\\n1,2,3\\n\"\na,b,c\n1,2,3\n\n# in the end (default)\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate3 -e '$a+$c' -n x -w 0\na,b,c,x\n1,2,3,4\n\n# in the beginning\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate3 -e '$a+$c' -n x -w 0 --at 1\nx,a,b,c\n4,1,2,3\n\n# at another position\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate3 -e '$a+$c' -n x -w 0 --at 3\na,b,x,c\n1,2,4,3\n\n# right after the given column name\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate3 -e '$a+$c' -n x -w 0 --after a\na,x,b,c\n1,4,2,3\n\n# right before the given column name\n$ echo -ne \"a,b,c\\n1,2,3\\n\" | csvtk mutate3 -e '$a+$c' -n x -w 0 --before c\na,b,x,c\n1,2,4,3\n
Usage
separate column into multiple columns\n\nUsage:\n csvtk sep [flags]\n\nFlags:\n --drop drop extra data, exclusive with --merge\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -h, --help help for sep\n -i, --ignore-case ignore case\n --merge only splits at most N times, exclusive with --drop\n --na string content for filling NA data\n -n, --names strings new column names\n -N, --num-cols int preset number of new created columns\n -R, --remove remove input column\n -s, --sep string separator\n -r, --use-regexp separator is a regular expression\n
Examples:
$ cat players.csv | csvtk collapse -f 1 -v 3 -s ';'\ngender,name\nmale,A;B;C\nfemale,a;b;c;d\n\n# set number of new columns as 3.\n$ cat players.csv | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk sep -f 2 -s ';' -n p1,p2,p3,p4 -N 4 --na NA \\\n | csvtk pretty\ngender name p1 p2 p3 p4\n------ ------- -- -- -- --\nmale A;B;C A B C NA\nfemale a;b;c;d a b c d\n\n# set number of new columns as 3, drop extra values \n$ cat players.csv | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk sep -f 2 -s ';' -n p1,p2,p3 --drop \\\n | csvtk pretty\ngender name p1 p2 p3\n------ ------- -- -- --\nmale A;B;C A B C\nfemale a;b;c;d a b c\n\n# set number of new columns as 3, split as most 3 parts\n$ cat players.csv | csvtk collapse -f 1 -v 3 -s ';' \\\n | csvtk sep -f 2 -s ';' -n p1,p2,p3 --merge \\\n | csvtk pretty\ngender name p1 p2 p3\n------ ------- -- -- ---\nmale A;B;C A B C\nfemale a;b;c;d a b c;\n\n#\n$ echo -ne \"taxid\\tlineage\\n9606\\tEukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;Homo sapiens\\n\"\ntaxid lineage\n9606 Eukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;Homo sapiens\n\n$ echo -ne \"taxid\\tlineage\\n9606\\tEukaryota;Chordata;Mammalia;Primates;Hominidae;Homo;Homo sapiens\\n\" \\\n | csvtk sep -t -f 2 -s ';' -n kindom,phylum,class,order,family,genus,species --remove \\\n | csvtk pretty -t\ntaxid kindom phylum class order family genus species\n----- --------- -------- -------- -------- --------- ----- ------------\n9606 Eukaryota Chordata Mammalia Primates Hominidae Homo Homo sapiens\n
"},{"location":"usage/#gather","title":"gather","text":"Usage
gather columns into key-value pairs, like tidyr::gather/pivot_longer\n\nUsage:\n csvtk gather [flags]\n\nAliases:\n gather, longer\n\nFlags:\n -f, --fields string fields for gathering. e.g -f 1,2 or -f columnA,columnB, or -f -columnA for\n unselect columnA\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for longer\n -k, --key string name of key column to create in output\n -v, --value string name of value column to create in outpu\n\n
Examples:
$ cat testdata/names.csv | csvtk pretty -S simple\n----------------------------------------\nid first_name last_name username\n----------------------------------------\n11 Rob Pike rob\n2 Ken Thompson ken\n4 Robert Griesemer gri\n1 Robert Thompson abc\nNA Robert Abel 123\n----------------------------------------\n\n$ cat testdata/names.csv \\\n | csvtk gather -k item -v value -f -1 \\\n | csvtk pretty -S simple\n-----------------------------\nid item value\n-----------------------------\n11 first_name Rob\n11 last_name Pike\n11 username rob\n2 first_name Ken\n2 last_name Thompson\n2 username ken\n4 first_name Robert\n4 last_name Griesemer\n4 username gri\n1 first_name Robert\n1 last_name Thompson\n1 username abc\nNA first_name Robert\nNA last_name Abel\nNA username 123\n-----------------------------\n
"},{"location":"usage/#spread","title":"spread","text":"Usage
spread a key-value pair across multiple columns, like tidyr::spread/pivot_wider\n\nUsage:\n csvtk spread [flags]\n\nAliases:\n spread, wider, scatter\n\nFlags:\n -h, --help help for spread\n -k, --key string field of the key. e.g -k 1 or -k columnA\n --na string content for filling NA data\n -s, --separater string separater for values that share the same key (default \"; \")\n -v, --value string field of the value. e.g -v 1 or -v columnA\n\n
Examples:
Shuffled columns:
$ csvtk cut -f 1,4,2,3 testdata/names.csv \\\n | csvtk pretty -S simple\n----------------------------------------\nid username first_name last_name\n----------------------------------------\n11 rob Rob Pike\n2 ken Ken Thompson\n4 gri Robert Griesemer\n1 abc Robert Thompson\nNA 123 Robert Abel\n----------------------------------------\n
data -> gather/longer -> spread/wider. Note that the orders of both rows and columns are kept :)
$ csvtk cut -f 1,4,2,3 testdata/names.csv \\\n | csvtk gather -k item -v value -f -1 \\\n | csvtk spread -k item -v value \\\n | csvtk pretty -S simple\n----------------------------------------\nid username first_name last_name\n----------------------------------------\n11 rob Rob Pike\n2 ken Ken Thompson\n4 gri Robert Griesemer\n1 abc Robert Thompson\nNA 123 Robert Abel\n----------------------------------------\n
No header rows
$ echo -ne \"a,a,0\\nb,b,0\\nc,c,0\\na,b,1\\na,c,2\\nb,c,3\\n\"\na,a,0\nb,b,0\nc,c,0\na,b,1\na,c,2\nb,c,3\n\n$ echo -ne \"a,a,0\\nb,b,0\\nc,c,0\\na,b,1\\na,c,2\\nb,c,3\\n\" \\\n | csvtk spread -H -k 2 -v 3 \\\n | csvtk pretty -S bold\n\u250f\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2513\n\u2503 \u2503 a \u2503 b \u2503 c \u2503\n\u2523\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u252b\n\u2503 a \u2503 0 \u2503 1 \u2503 2 \u2503\n\u2523\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u252b\n\u2503 b \u2503 \u2503 0 \u2503 3 \u2503\n\u2523\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u254b\u2501\u2501\u2501\u252b\n\u2503 c \u2503 \u2503 \u2503 0 \u2503\n\u2517\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u253b\u2501\u2501\u2501\u251b\n
"},{"location":"usage/#unfold","title":"unfold","text":"Usage
unfold multiple values in cells of a field\n\nExample:\n\n $ echo -ne \"id,values,meta\\n1,a;b,12\\n2,c,23\\n3,d;e;f,34\\n\" \\\n | csvtk pretty\n id values meta\n 1 a;b 12\n 2 c 23\n 3 d;e;f 34\n\n\n $ echo -ne \"id,values,meta\\n1,a;b,12\\n2,c,23\\n3,d;e;f,34\\n\" \\\n | csvtk unfold -f values -s \";\" \\\n | csvtk pretty\n id values meta\n 1 a 12\n 1 b 12\n 2 c 23\n 3 d 34\n 3 e 34\n 3 f 34\n\nUsage:\n csvtk unfold [flags]\n\nFlags:\n -f, --fields string field to expand, only one field is allowed. type \"csvtk unfold -h\" for examples\n -h, --help help for unfold\n -s, --separater string separater for folded values (default \"; \")\n
"},{"location":"usage/#fold","title":"fold","text":"Usage
fold multiple values of a field into cells of groups\n\nAttention:\n\n Only grouping fields and value filed are outputted.\n\nExample:\n\n $ echo -ne \"id,value,meta\\n1,a,12\\n1,b,34\\n2,c,56\\n2,d,78\\n\" \\\n | csvtk pretty\n id value meta\n 1 a 12\n 1 b 34\n 2 c 56\n 2 d 78\n\n $ echo -ne \"id,value,meta\\n1,a,12\\n1,b,34\\n2,c,56\\n2,d,78\\n\" \\\n | csvtk fold -f id -v value -s \";\" \\\n | csvtk pretty\n id value\n 1 a;b\n 2 c;d\n\n $ echo -ne \"id,value,meta\\n1,a,12\\n1,b,34\\n2,c,56\\n2,d,78\\n\" \\\n | csvtk fold -f id -v value -s \";\" \\\n | csvtk unfold -f value -s \";\" \\\n | csvtk pretty\n id value\n 1 a\n 1 b\n 2 c\n 2 d\n\nUsage:\n csvtk fold [flags]\n\nAliases:\n fold, collapse\n\nFlags:\n -f, --fields string key fields for grouping. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n -F, --fuzzy-fields using fuzzy fields (only for key fields), e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for fold\n -i, --ignore-case ignore case\n -s, --separater string separater for folded values (default \"; \")\n -v, --vfield string value field for folding\n\n\n
examples
data
$ csvtk pretty teachers.csv\nlab teacher class\ncomputational biology Tom Bioinformatics\ncomputational biology Tom Statistics\ncomputational biology Rob Bioinformatics\nsequencing center Jerry Bioinformatics\nsequencing center Nick Molecular Biology\nsequencing center Nick Microbiology\n
List teachers for every lab/class. uniq
is used to deduplicate items.
$ cat teachers.csv \\\n | csvtk uniq -f lab,teacher \\\n | csvtk fold -f lab -v teacher \\\n | csvtk pretty\n\nlab teacher\ncomputational biology Tom; Rob\nsequencing center Jerry; Nick\n\n$ cat teachers.csv \\\n | csvtk uniq -f class,teacher \\\n | csvtk fold -f class -v teacher -s \", \" \\\n | csvtk pretty\n\nclass teacher\nStatistics Tom\nBioinformatics Tom, Rob, Jerry\nMolecular Biology Nick\nMicrobiology Nick\n
Multiple key fields supported
$ cat teachers.csv \\\n | csvtk fold -f teacher,lab -v class \\\n | csvtk pretty\n\nteacher lab class\nTom computational biology Bioinformatics; Statistics\nRob computational biology Bioinformatics\nJerry sequencing center Bioinformatics\nNick sequencing center Molecular Biology; Microbiology\n
Usage
format date of selected fields\n\nDate parsing is supported by: https://github.com/araddon/dateparse\nDate formating is supported by: https://github.com/metakeule/fmtdate\n\nTime zones:\n format: Asia/Shanghai\n whole list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones\n\nOutput format is in MS Excel (TM) syntax.\nPlaceholders:\n\n M - month (1)\n MM - month (01)\n MMM - month (Jan)\n MMMM - month (January)\n D - day (2)\n DD - day (02)\n DDD - day (Mon)\n DDDD - day (Monday)\n YY - year (06)\n YYYY - year (2006)\n hh - hours (15)\n mm - minutes (04)\n ss - seconds (05)\n\n AM/PM hours: 'h' followed by optional 'mm' and 'ss' followed by 'pm', e.g.\n\n hpm - hours (03PM)\n h:mmpm - hours:minutes (03:04PM)\n h:mm:sspm - hours:minutes:seconds (03:04:05PM)\n\n Time zones: a time format followed by 'ZZZZ', 'ZZZ' or 'ZZ', e.g.\n\n hh:mm:ss ZZZZ (16:05:06 +0100)\n hh:mm:ss ZZZ (16:05:06 CET)\n hh:mm:ss ZZ (16:05:06 +01:00)\n\nUsage:\n csvtk fmtdate [flags]\n\nFlags:\n -f, --fields string select only these fields. e.g -f 1,2 or -f columnA,columnB (default \"1\")\n --format string output date format in MS Excel (TM) syntax, type \"csvtk fmtdate -h\" for\n details (default \"YYYY-MM-DD hh:mm:ss\")\n -F, --fuzzy-fields using fuzzy fields, e.g., -F -f \"*name\" or -F -f \"id123*\"\n -h, --help help for fmtdate\n -k, --keep-unparsed keep the key as value when no value found for the key\n -z, --time-zone string timezone aka \"Asia/Shanghai\" or \"America/Los_Angeles\" formatted time-zone,\n type \"csvtk fmtdate -h\" for details\n\n
Examples
$ csvtk xlsx2csv date.xlsx | csvtk pretty \ndata value\n------------------- -----\n2021-08-25 11:24:21 1\n08/25/21 11:24 p8 2\nNA 3\n 4\n\n$ csvtk xlsx2csv date.xlsx \\\n | csvtk fmtdate --format \"YYYY-MM-DD hh:mm:ss\" \\\n | csvtk pretty \ndata value\n------------------- -----\n2021-08-25 11:24:21 1\n2021-08-25 11:24:00 2\n 3\n 4\n\n$ csvtk xlsx2csv date.xlsx \\\n | csvtk fmtdate --format \"YYYY-MM-DD hh:mm:ss\" -k \\\n | csvtk pretty \ndata value\n------------------- -----\n2021-08-25 11:24:21 1\n2021-08-25 11:24:00 2\nNA 3\n 4\n
"},{"location":"usage/#sort","title":"sort","text":"Usage
sort by selected fields\n\nUsage:\n csvtk sort [flags]\n\nFlags:\n -h, --help help for sort\n -i, --ignore-case ignore-case\n -k, --keys strings keys (multiple values supported). sort type supported, \"N\" for natural order,\n \"n\" for number, \"u\" for user-defined order and \"r\" for reverse. e.g., \"-k 1\" or\n \"-k A:r\" or \"\"-k 1:nr -k 2\" (default [1])\n -L, --levels strings user-defined level file (one level per line, multiple values supported).\n format: <field>:<level-file>. e.g., \"-k name:u -L name:level.txt\n
Examples
data
$ cat testdata/names.csv\nid,first_name,last_name,username\n11,\"Rob\",\"Pike\",rob\n2,Ken,Thompson,ken\n4,\"Robert\",\"Griesemer\",\"gri\"\n1,\"Robert\",\"Thompson\",\"abc\"\nNA,\"Robert\",\"Abel\",\"123\"\n
By single column : csvtk sort -k 1
or csvtk sort -k last_name
in alphabetical order
$ cat testdata/names.csv \\\n | csvtk sort -k first_name\nid,first_name,last_name,username\n2,Ken,Thompson,ken\n11,Rob,Pike,rob\nNA,Robert,Abel,123\n1,Robert,Thompson,abc\n4,Robert,Griesemer,gri\n
in reversed alphabetical order (key:r
)
$ cat testdata/names.csv \\\n | csvtk sort -k first_name:r\nid,first_name,last_name,username\nNA,Robert,Abel,123\n1,Robert,Thompson,abc\n4,Robert,Griesemer,gri\n11,Rob,Pike,rob\n2,Ken,Thompson,ken\n
in numerical order (key:n
)
$ cat testdata/names.csv \\\n | csvtk sort -k id:n\nid,first_name,last_name,username\nNA,Robert,Abel,123\n1,Robert,Thompson,abc\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n11,Rob,Pike,rob\n
in natural order (key:N
)
$ cat testdata/names.csv | csvtk sort -k id:N\nid,first_name,last_name,username\n1,Robert,Thompson,abc\n2,Ken,Thompson,ken\n4,Robert,Griesemer,gri\n11,Rob,Pike,rob\nNA,Robert,Abel,123\n
in natural order (key:N
), a bioinformatics example
$ echo \"X,Y,1,10,2,M,11,1_c,Un_g,1_g\" | csvtk transpose \nX\nY\n1\n10\n2\nM\n11\n1_c\nUn_g\n1_g\n\n$ echo \"X,Y,1,10,2,M,11,1_c,Un_g,1_g\" \\\n | csvtk transpose \\\n | csvtk sort -H -k 1:N\n1\n1_c\n1_g\n2\n10\n11\nM\nUn_g\nX\nY\n
By multiple columns: csvtk sort -k 1,2
or csvtk sort -k 1 -k 2
or csvtk sort -k last_name,age
# by first_name and then last_name\n$ cat testdata/names.csv | csvtk sort -k first_name -k last_name\nid,first_name,last_name,username\n2,Ken,Thompson,ken\n11,Rob,Pike,rob\nNA,Robert,Abel,123\n4,Robert,Griesemer,gri\n1,Robert,Thompson,abc\n\n# by first_name and then ID\n$ cat testdata/names.csv | csvtk sort -k first_name -k id:n\nid,first_name,last_name,username\n2,Ken,Thompson,ken\n11,Rob,Pike,rob\nNA,Robert,Abel,123\n1,Robert,Thompson,abc\n4,Robert,Griesemer,gri\n
By user-defined order
# user-defined order/level\n$ cat testdata/size_level.txt\ntiny\nmini\nsmall\nmedium\nbig\n\n# original data\n$ cat testdata/size.csv\nid,size\n1,Huge\n2,Tiny\n3,Big\n4,Small\n5,Medium\n\n$ csvtk sort -k 2:u -i -L 2:testdata/size_level.txt testdata/size.csv\nid,size\n2,Tiny\n4,Small\n5,Medium\n3,Big\n1,Huge\n
Usage
plot common figures\n\nNotes:\n\n 1. Output file can be set by flag -o/--out-file.\n 2. File format is determined by the out file suffix.\n Supported formats: eps, jpg|jpeg, pdf, png, svg, and tif|tiff\n 3. If flag -o/--out-file not set (default), image is written to stdout,\n you can display the image by pipping to \"display\" command of Imagemagic\n or just redirect to file.\n\nUsage:\n csvtk plot [command]\n\nAvailable Commands:\n box plot boxplot\n hist plot histogram\n line line plot and scatter plot\n\nFlags:\n --axis-width float axis width (default 1.5)\n -f, --data-field string column index or column name of data (default \"1\")\n --format string image format for stdout when flag -o/--out-file not given. available\n values: eps, jpg|jpeg, pdf, png, svg, and tif|tiff. (default \"png\")\n -g, --group-field string column index or column name of group\n --height float Figure height (default 4.5)\n -h, --help help for plot\n --label-size int label font size (default 14)\n --na-values strings NA values, case ignored (default [,NA,N/A])\n --scale float scale the image width/height, tick, axes, line/point and font sizes\n proportionally (default 1)\n --skip-na skip NA values in --na-values\n --tick-label-size int tick label font size (default 12)\n --tick-width float axis tick width (default 1.5)\n --title string Figure title\n --title-size int title font size (default 16)\n --width float Figure width (default 6)\n --x-max string maximum value of X axis\n --x-min string minimum value of X axis\n --xlab string x label text\n --y-max string maximum value of Y axis\n --y-min string minimum value of Y axis\n --ylab string y label text\n\n
Note that most of the flags of plot
are global flags of the subcommands hist
, box
and line
Notes of image output
display
command of Imagemagic
or just redirect to file.Usage
plot histogram\n\nNotes:\n\n 1. Output file can be set by flag -o/--out-file.\n 2. File format is determined by the out file suffix.\n Supported formats: eps, jpg|jpeg, pdf, png, svg, and tif|tiff\n 3. If flag -o/--out-file not set (default), image is written to stdout,\n you can display the image by pipping to \"display\" command of Imagemagic\n or just redirect to file.\n\nUsage:\n csvtk plot hist [flags]\n\nFlags:\n --bins int number of bins (default 50)\n --color-index int color index, 1-7 (default 1)\n -h, --help help for hist\n --line-width float line width (default 1)\n --percentiles calculate percentiles\n\n
Examples
example data
$ zcat testdata/grouped_data.tsv.gz | head -n 5 | csvtk -t pretty\nGroup Length GC Content\nGroup A 97 57.73\nGroup A 95 49.47\nGroup A 97 49.48\nGroup A 100 51.00\n
plot histogram with data of the second column:
$ csvtk -t plot hist testdata/grouped_data.tsv.gz -f 2 \\\n --title Histogram -o histogram.png\n
You can also write image to stdout and pipe to \"display\" command of Imagemagic:
$ csvtk -t plot hist testdata/grouped_data.tsv.gz -f 2 | display\n
Usage
plot boxplot\n\nNotes:\n\n 1. Output file can be set by flag -o/--out-file.\n 2. File format is determined by the out file suffix.\n Supported formats: eps, jpg|jpeg, pdf, png, svg, and tif|tiff\n 3. If flag -o/--out-file not set (default), image is written to stdout,\n you can display the image by pipping to \"display\" command of Imagemagic\n or just redirect to file.\n\nUsage:\n csvtk plot box [flags]\n\nFlags:\n --box-width float box width\n --color-index int color index, 1-7 (default 1)\n -h, --help help for box\n --horiz horize box plot\n --line-width float line width (default 1.5)\n --point-size float point size (default 3)\n\n
Examples
plot boxplot with data of the \"GC Content\" (third) column, group information is the \"Group\" column.
csvtk -t plot box testdata/grouped_data.tsv.gz -g \"Group\" -f \"GC Content\" \\\n --width 3 --title \"Box plot\" \\\n > boxplot.png\n
plot horiz boxplot with data of the \"Length\" (second) column, group information is the \"Group\" column.
$ csvtk -t plot box testdata/grouped_data.tsv.gz -g \"Group\" -f \"Length\" \\\n --height 3 --width 5 --horiz --title \"Horiz box plot\" \\\n > boxplot2.png`\n
Usage
line plot and scatter plot\n\nNotes:\n\n 1. Output file can be set by flag -o/--out-file.\n 2. File format is determined by the out file suffix.\n Supported formats: eps, jpg|jpeg, pdf, png, svg, and tif|tiff\n 3. If flag -o/--out-file not set (default), image is written to stdout,\n you can display the image by pipping to \"display\" command of Imagemagic\n or just redirect to file.\n\nUsage:\n csvtk plot line [flags]\n\nFlags:\n --color-index int color index, 1-7 (default 1)\n -x, --data-field-x string column index or column name of X for command line\n -y, --data-field-y string column index or column name of Y for command line\n -h, --help help for line\n --legend-left locate legend along the left edge of the plot\n --legend-top locate legend along the top edge of the plot\n --line-width float line width (default 1.5)\n --point-size float point size (default 3)\n --scatter only plot points\n\n
Examples
example data
$ head -n 5 testdata/xy.tsv\nGroup X Y\nA 0 1\nA 1 1.3\nA 1.5 1.5\nA 2.0 2\n
plot line plot with X-Y data
$ csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group \\\n --title \"Line plot\" \\\n > lineplot.png\n
plot scatter
$ csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group \\\n --title \"Scatter\" --scatter \\\n > lineplot.png\n
Usage
stream file to stdout and report progress on stderr\n\nUsage:\n csvtk cat [flags]\n\nFlags:\n -b, --buffsize int buffer size (default 8192)\n -h, --help help for cat\n -L, --lines count lines instead of bytes\n -p, --print-freq int print frequency (-1 for print after parsing) (default 1)\n -s, --total int expected total bytes/lines (default -1)\n
Examples
Stream file, report progress in bytes
csvtk cat file.tsv\n
Stream file from stdin, report progress in lines
tac input.tsv | csvtk cat -L -s `wc -l < input.tsv` -\n
Usage
generate shell autocompletion script\n\nSupported shell: bash|zsh|fish|powershell\n\nBash:\n\n # generate completion shell\n csvtk genautocomplete --shell bash\n\n # configure if never did.\n # install bash-completion if the \"complete\" command is not found.\n echo \"for bcfile in ~/.bash_completion.d/* ; do source \\$bcfile; done\" >> ~/.bash_completion\n echo \"source ~/.bash_completion\" >> ~/.bashrc\n\nZsh:\n\n # generate completion shell\n csvtk genautocomplete --shell zsh --file ~/.zfunc/_csvtk\n\n # configure if never did\n echo 'fpath=( ~/.zfunc \"${fpath[@]}\" )' >> ~/.zshrc\n echo \"autoload -U compinit; compinit\" >> ~/.zshrc\n\nfish:\n\n csvtk genautocomplete --shell fish --file ~/.config/fish/completions/csvtk.fish\n\nUsage:\n csvtk genautocomplete [flags]\n\nFlags:\n --file string autocompletion file (default \"/home/shenwei/.bash_completion.d/csvtk.sh\")\n -h, --help help for genautocomplete\n --shell string autocompletion type (bash|zsh|fish|powershell) (default \"bash\")\n\n
Please enable JavaScript to view the comments powered by Disqus."}]}
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index ae26e902aba41c148d3f7f840244719976afba6b..13e0d5e5759f5fa7bac609d8c7bfa9a8c90cee03 100644
GIT binary patch
delta 13
Ucmb=gXP58h;F!B4a3XsJ03EReVgLXD
delta 13
Ucmb=gXP58h;CSrjIFY>q03AOBIsgCw
diff --git a/usage/index.html b/usage/index.html
index 5b4ca37..1d2ec41 100644
--- a/usage/index.html
+++ b/usage/index.html
@@ -1327,7 +1327,7 @@ Usage
csvtk -- a cross-platform, efficient and practical CSV/TSV toolkit
-Version: 0.31.1
+Version: 0.32.0
Author: Wei Shen <shenwei356@gmail.com>
@@ -1826,8 +1826,11 @@ pretty
How to:
1. First -n/--buf-rows rows are read to check the minimum and maximum widths
- of each columns. You can also set the global thresholds -w/--min-width and
- -W/--max-width.
+ of each columns.
+
+ You can also set the global or column-specific (the number of values need
+ equal to the number of columns) thresholds via -w/--min-width and -W/--max-width.
+
1a. Cells longer than the maximum width will be wrapped (default) or
clipped (--clip).
Usually, the text is wrapped in space (-x/--wrap-delimiter). But if one
@@ -1903,6 +1906,16 @@ pretty
| 2 | Tiny |
└----┴------┘
+ round:
+
+ ╭----┬------╮
+ | id | size |
+ ├====┼======┤
+ | 1 | Huge |
+ ├----┼------┤
+ | 2 | Tiny |
+ ╰----┴------╯
+
bold:
┏━━━━┳━━━━━━┓
@@ -1936,11 +1949,15 @@ pretty
--clip clip longer cell instead of wrapping
--clip-mark string clip mark (default "...")
-h, --help help for pretty
- -W, --max-width int max width
- -w, --min-width int min width
+ -W, --max-width strings max width, multiple values (max widths for each column, 0 for no limit)
+ should be separated by commas. E.g., -W 40,20,0 limits the max widths of
+ 1st and 2nd columns
+ -w, --min-width strings min width, multiple values (min widths for each column, 0 for no limit)
+ should be separated by commas. E.g., -w 0,10,10 limits the min widths of
+ 2nd and 3rd columns
-s, --separator string fields/columns separator (default " ")
-S, --style string output syle. available vaules: default, plain, simple, 3line, grid,
- light, bold, double. check https://github.com/shenwei356/stable
+ light, round, bold, double. check https://github.com/shenwei356/stable
-x, --wrap-delimiter string delimiter for wrapping cells (default " ")
@@ -2036,7 +2053,7 @@ Set the minimum and maximum width.
+Set the global minimum and maximum width.
$ csvtk pretty testdata/long.csv -w 5 -W 40
id name message
----- ------------------ ----------------------------------------
@@ -2050,6 +2067,24 @@ pretty
Set min and max widths for all columns.
+$ csvtk pretty testdata/long.csv -w 5,25,0 -W 0,30,40 -m 1,2 -S round
+╭-------┬---------------------------┬------------------------------------------╮
+| id | name | message |
+├=======┼===========================┼==========================================┤
+| 1 | Donec Vitae | Quis autem vel eum iure reprehenderit |
+| | | qui in ea voluptate velit esse. |
+├-------┼---------------------------┼------------------------------------------┤
+| 2 | Quaerat Voluptatem | At vero eos et accusamus et iusto odio. |
+├-------┼---------------------------┼------------------------------------------┤
+| 3 | Aliquam lorem | Curabitur ullamcorper ultricies nisi. |
+| | | Nam eget dui. Etiam rhoncus. Maecenas |
+| | | tempus, tellus eget condimentum |
+| | | rhoncus, sem quam semper libero. |
+╰-------┴---------------------------┴------------------------------------------╯
+
+Clipping cells instead of wrapping
$ csvtk pretty testdata/long.csv -w 5 -W 40 --clip
id name message