-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathcsvjoin.1.html
154 lines (152 loc) · 3.74 KB
/
csvjoin.1.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
<!DOCTYPE html>
<html>
<head>
<title>Caltech Library's Digital Library Development Sandbox</title>
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="/css/site.css">
</head>
<body>
<header>
<a href="http://library.caltech.edu"><img src="/assets/liblogo.gif" alt="Caltech Library logo"></a>
</header>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="./">README</a></li>
<li><a href="LICENSE">LICENSE</a></li>
<li><a href="INSTALL.html">INSTALL</a></li>
<li><a href="user-manual.html">User Manual</a></li>
<li><a href="how-to/">Tutorials</a></li>
<li><a href="search.html">Search Docs</a></li>
<li><a href="about.html">About</a></li>
<li><a href="https://github.com/caltechlibrary/datatools">GitHub</a></li>
</ul>
</nav>
<section>
<h1 id="name">NAME</h1>
<p>csvjoin</p>
<h1 id="synopsis">SYNOPSIS</h1>
<p>csvjoin <a href="#options">OPTIONS</a> CSV1 CSV2 COL1 COL2</p>
<h1 id="description">DESCRIPTION</h1>
<p>csvjoin outputs CSV content based on two CSV files with matching
column values. Each CSV input file has a designated column to match on.
The values are compared as strings. Columns are counted from one rather
than zero.</p>
<h1 id="options">OPTIONS</h1>
<dl>
<dt>-help</dt>
<dd>
display help
</dd>
<dt>-license</dt>
<dd>
display license
</dd>
<dt>-version</dt>
<dd>
display version
</dd>
<dt>-allow-duplicates</dt>
<dd>
allow duplicates when searching for matches
</dd>
<dt>-case-sensitive</dt>
<dd>
make a case sensitive match (default is case insensitive)
</dd>
<dt>-col1</dt>
<dd>
column to on join on in first CSV file
</dd>
<dt>-col2</dt>
<dd>
column to on join on in second CSV file
</dd>
<dt>-contains</dt>
<dd>
match columns based on csv1/col1 contained in csv2/col2
</dd>
<dt>-csv1</dt>
<dd>
first CSV filename
</dd>
<dt>-csv2</dt>
<dd>
second CSV filename
</dd>
<dt>-d, -delimiter</dt>
<dd>
set delimiter character
</dd>
<dt>-delete-cost</dt>
<dd>
deletion cost to use when calculating Levenshtein edit distance
</dd>
<dt>-in-memory</dt>
<dd>
if true read both CSV files
</dd>
<dt>-insert-cost</dt>
<dd>
insertion cost to use when calculating Levenshtein edit distance
</dd>
<dt>-levenshtein</dt>
<dd>
match columns using Levensthein edit distance
</dd>
<dt>-max-edit-distance</dt>
<dd>
maximum edit distance for match using Levenshtein distance
</dd>
<dt>-o, -output</dt>
<dd>
output filename
</dd>
<dt>-quiet</dt>
<dd>
supress error messages
</dd>
<dt>-stop-words</dt>
<dd>
a column delimited list of stop words to ingnore when matching
</dd>
<dt>-substitute-cost</dt>
<dd>
substitution cost to use when calculating Levenshtein edit distance
</dd>
<dt>-trim-leading-space</dt>
<dd>
trim leading space in field(s) for CSV input
</dd>
<dt>-trimspaces</dt>
<dd>
trim spaces around cell values before comparing
</dd>
<dt>-use-lazy-quotes</dt>
<dd>
use lazy quotes for CSV input
</dd>
<dt>-verbose</dt>
<dd>
output processing count to stderr
</dd>
</dl>
<h1 id="examples">EXAMPLES</h1>
<p>Simple usage of building a merged CSV file from data1.csv and
data2.csv where column 1 in data1.csv matches the value in column 3 of
data2.csv with the results being written to merged-data.csv..</p>
<pre><code> csvjoin -csv1=data1.csv -col1=2 \
-csv2=data2.csv -col2=4 \
-output=merged-data.csv</code></pre>
<p>csvjoin 1.2.12</p>
</section>
<footer>
<span><h1><A href="http://caltech.edu">Caltech</a></h1></span>
<span>© 2023 <a href="https://www.library.caltech.edu/copyright">Caltech library</a></span>
<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address>
<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span>
<span><a href="mailto:[email protected]">Email Us</a></span>
<a class="cl-hide" href="sitemap.xml">Site Map</a>
</footer>
</body>
</html>