forked from curl/trurl
-
Notifications
You must be signed in to change notification settings - Fork 0
/
trurl.1
432 lines (385 loc) · 15.8 KB
/
trurl.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
.\" **************************************************************************
.\" * _ _ ____ _
.\" * Project ___| | | | _ \| |
.\" * / __| | | | |_) | |
.\" * | (__| |_| | _ <| |___
.\" * \___|\___/|_| \_\_____|
.\" *
.\" * Copyright (C) Daniel Stenberg, <[email protected]>, et al.
.\" *
.\" * This software is licensed as described in the file COPYING, which
.\" * you should have received as part of this distribution. The terms
.\" * are also available at https://curl.se/docs/copyright.html.
.\" *
.\" * You may opt to use, copy, modify, merge, publish, distribute and/or sell
.\" * copies of the Software, and permit persons to whom the Software is
.\" * furnished to do so, under the terms of the COPYING file.
.\" *
.\" * This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY
.\" * KIND, either express or implied.
.\" *
.\" * SPDX-License-Identifier: curl
.\" *
.\" **************************************************************************
.\" You can view this file with:
.\" man -l trurl.1
.\"
.TH trurl 1 "April 27, 2023" "trurl" "trurl Manual"
.SH NAME
trurl \- transpose URLs
.SH SYNOPSIS
.B trurl [options / URLs]
.SH DESCRIPTION
.B trurl
parses, manipulates and outputs URLs and parts of URLs.
It uses the RFC 3986 definition of URLs and it uses libcurl's URL parser to do
so, which includes a few "extensions". The URL support is limited to
"hierarchical" URLs, the ones that use "://" separators after the scheme.
Typically you pass in one or more URLs and decide what of that you want
output. Possibly modifying the URL as well.
trurl knows URLs and every URL consists of up to ten separate and independent
"components". These components can be extracted, removed and updated with
trurl and they are referred to by their respective names: scheme, user,
password, options, host, port, path, query, fragment and zoneid.
.SH "OPTIONS"
Options start with one or two dashes. Many of the options require an additional
value next to them.
Any other argument is interpreted as a URL argument, and is treated as if it
was following a \fI--url\fP option.
The first argument that is exactly two dashes ("--"), marks the end of
options; any argument after the end of options is interpreted as a URL
argument even if it starts with a dash.
.IP "-a, --append [component]=[data]"
Append data to a component. This can only append data to the path and the
query components.
For path, this URL encodes and appends the new segment to the path, separated
with a slash.
For query, this URL encodes and appends the new segment to the query,
separated with an ampersand (&). If the appended segment contains an equal
sign ('=') that one will be kept verbatim and both sides of the first
occurrence will be URL encoded separately.
.IP "--accept-space"
When set, trurl will try to accept spaces as part of the URL and instead URL
encode such occurrences accordingly.
According to RFC 3986, a space cannot legally be part of a URL. This option
provides a best-effort to convert the provided string into a valid URL.
.IP "--default-port"
When set, trurl will use the scheme's default port number for URLs with a known
scheme, and without an explicit port number.
Note that trurl only knows default port numbers for URL schemes that are
supported by libcurl.
Since, by default, trurl removes default port numbers from URLs with a known
scheme, this option is pretty much ignored unless one of \fI--get\fP,
\fI--json\fP, and \fI--keep-port\fP is not also specified.
.IP "-f, --url-file [file name]"
Read URLs to work on from the given file. Use the file name "-" (a single
minus) to tell trurl to read the URLs from stdin.
Each line needs to be a single valid URL. trurl will remove one carriage return
character at the end of the line if present, trim off all the trailing space and
tab characters, and skip all empty (after trimming) lines.
The maximum line length supported in a file like this is 4094 bytes. Lines that
exceed that length are skipped, and a warning is printed to stderr when they are
encountered.
.IP "-g, --get [format]"
Output text and URL data according to the provided format string. Components
from the URL can be output when specified as \fB{component}\fP or
\fB[component]\fP, with the name of the part show within curly braces or
brackets. You can not mix braces and brackets for this purpose in the same
command line.
The following component names are available (case sensitive): url, scheme,
user, password, options, host, port, path, query, fragment and zoneid.
\fB{component}\fP will expand to nothing if the given component does
not have a value.
Components are shown URL decoded by default. If you instead write the
component prefixed with a colon like "{:path}", it gets output URL encoded.
You may also prefix components with \fBdefault:\fP and/or \fBpuny:\fP or \fBidn:\fP,
in any order.
If \fBdefault:\fP is specified, like "{default:url}" or
"{default:port}", and the port is not explicitly specified in the URL,
the scheme's default port will be output if it is known.
If \fBpuny:\fP is specified, like "{puny:url}" or "{puny:host}", the
"punycoded" version of the host name will be used in the output. This
option is mutually exclusive with \fBidn:\fP.
If \fBidn:\fP is specified like "{idn:url}" or "{idn:host}", the International
Domain Name version of the host name will be used in the output if it is provided as a correctly encoded punycode version. This
option is mutually exclusive with \fBpuny:\fP.
If \fI--default-port\fP is specified, all formats are expanded as if
they used \fIdefault:\fP; and if \fI--punycode\fP is used, all formats
are expanded as if they used \fIpuny:\fP. Also note that "{url}" is
affected by the \fI--keep-port\fP option.
Hosts provided as IPv6 numerical addresses will be provided within square
brackets. Like "[fe80::20c:29ff:fe9c:409b]".
Hosts provided as IPv4 numerical addresses will be "normalized" and provided
as four dot-separated decimal numbers when output.
You can access specific keys in the query string using the format
\fB{query:key}\fP. Then the value of the first matching key will be output
using a case sensitive match. When extracting a URL decoded query key that
contains %00, such octet will be replaced with a single period '.' in the
output.
You can access specific keys in the query string and out all values using the
format \fB{query-all:key}\fP. This looks for 'key' case sensitively and will
output all values for that key space-separated.
The "format" string supports the following backslash sequences:
\&\\\\ - backslash
\&\\t - tab
\&\\n - newline
\&\\r - carriage return
\&\\{ - an open curly brace that does not start a variable
\&\\[ - an open bracket that does not start a variable
All other text in the format string will be shown as-is.
.IP "-h, --help"
Show the help output.
.IP "--iterate [component]=[item1 item2 ...]"
Set the component to multiple values and output the result once for each
iteration. Several combined iterations are allowed to generate combinations,
but only one \fI--iterate\fP option per component. The listed items to iterate
over should be separated by single spaces.
.IP "--json"
Outputs all set components of the URLs as JSON objects. All components of the
URL that have data will get populated in the parts object using their
component names. See below for details on the format.
.IP "--keep-port"
By default, trurl removes default port numbers from URLs with a known scheme
even if they are explicitly specified in the input URL. This options, makes
trurl not remove them.
.IP "--no-guess-scheme"
Disables libcurl's scheme guessing feature. URLs that do not contain a scheme
will be treated as invalid URLs.
.IP "--punycode"
Uses the "punycoded" version of the host name, which is how International Domain
Names are converted into plain ASCII. If the host name is not using IDN, the
regular ASCII name is used.
.IP "--as-idn"
Converts a "punycoded" ASCII host name to its original International Domain
Name in Unicode. If the host name is not using punycode then the original host
name is used.
.IP "--query-separator [what]"
Specify the single letter used for separating query pairs. The default is "&"
but at least in the past sometimes semicolons ";" or even colons ":" have been
used for this purpose. If your URL uses something other than the default
letter, setting the right one makes sure trurl can do its query operations
properly.
.IP "--redirect [URL]"
Redirect the URL to this new location.
The redirection is performed on the base URL, so, if no base URL is specified,
no redirection will be performed.
.IP "--replace [data]"
Replaces a URL query.
data can either take the form of a single value, or as a key/value pair in the
shape \fIfoo=bar\fP. If replace is called on an item that isn't in the list of
queries trurl will ignore that item.
.IP "--force-replace [data]"
Works the same as \fI--replace\fP, but trurl will append a missing query string if
it is not in the query list already.
.IP "-s, --set [component][:]=[data]"
Set this URL component. Setting blank string ("") will clear the component
from the URL.
The following components can be set: url, scheme, user, password,
options, host, port, path, query, fragment and zoneid.
If a simple "="-assignment is used, the data is URL encoded when applied. If
":=" is used, the data is assumed to already be URL encoded and will be stored
as-is.
If no URL or \fI--url-file\fP argument is provided, trurl will try to create
a URL using the components provided by the \fI--set\fP options. If not enough
components are specified, this will fail.
.IP "--sort-query"
The "variable=content" tuplets in the query component are sorted in a case
insensitive alphabetical order. This helps making URLs identical that
otherwise only had their query pairs in different orders.
.IP "--url [URL]"
Set the input URL to work with. The URL may be provided without a scheme,
which then typically is not actually a legal URL but trurl will try to figure
out what is meant and guess what scheme to use (unless \fI--no-guess-scheme\fP
is used).
Providing multiple URLs will make trurl act on all URLs in a serial fashion.
If the URL cannot be parsed for whatever reason, trurl will simply move on to
the next provided URL - unless \fI--verify\fP is used.
.IP "--urlencode"
Outputs URL encoded version of components by default when using \fI--get\fP or
\fI--json\fP.
.IP "--trim [component]=[what]"
Trims data off a component. Currently this can only trim a query component.
"what" is specified as a full word or as a word prefix (using a single
trailing asterisk ('*')) which makes trurl remove the tuples from the query
string that match the instruction.
To match a literal trailing asterisk instead of using a wildcard, escape it with
a backslash in front of it. Like "\\*".
.IP "-v, --version"
Show version information and exit.
.IP "--verify"
When a URL is provided, return error immediately if it does not parse as a
valid URL. In normal cases, trurl can forgive a bad URL input.
.IP "--quiet"
Suppress (some) notes and warnings.
.SH "JSON output format"
The \fI--json\fP option outputs a JSON array with one or more objects. One for
each URL.
Each URL JSON object contains a number of properties, a series of key/value
pairs. The exact set depends on the given URL.
.IP "url"
This key exists in every object. It is the complete URL. Affected by
\fI--default-port\fP, \fI--keep-port\fP, and \fI--punycode\fP.
.IP "parts"
This key exists in every object, and contains an object with a key for
each of the settable URL components. If a component is missing, it means
it is not present in the URL. The parts are URL decoded unless \fI--urlencode\fP
is used.
.RS
.TP
.B "scheme"
The URL scheme.
.TP
.B "user"
The user name.
.TP
.B "password"
The password.
.TP
.B "options"
The options. Note that only a few URL schemes support the "options"
component.
.TP
.B "host"
The and normalized host name. It might be a UTF-8 name if an IDN name was used.
It can also be a normalized IPv4 or IPv6 address. An IPv6 address always starts
with a bracket (\fB[\fP) - and no other host names can contain such a symbol. If
\fI--punycode\fP is used, the punycode version of the host is outputted instead.
.TP
.B "port"
The provided port number as a string. If the port number was not provided in the
URL, but the scheme is a known one, and \fI--default-port\fP is in use, the
default port for that scheme will be provided here.
.TP
.B "path"
The path. Including the leading slash.
.TP
.B "query"
The full query, excluding the question mark separator.
.TP
.B "fragment"
The fragment, excluding the pound sign separator.
.TP
.B "zoneid"
The zone id, which can only be present in an IPv6 address. When this key is
present, then \fBhost\fP is an IPv6 numerical address.
.RE
.IP "params"
This key contains an array of query key/value objects. Each such pair is
listed with "key" and "value" and their respective contents in the output.
The key/values are extracted from the query where they are separated by
ampersands (\fB&\fP) - or the user sets with \fB--query-separator\fP.
The query pairs are listed in the order of appearance in a left-to-right
order, but can be made alpha-sorted with \fB--sort-query\fP.
It is only present if the URL has a query.
.SH EXAMPLES
.IP "Replace the host name of a URL"
.nf
$ trurl --url https://curl.se --set host=example.com
https://example.com/
.fi
.IP "Create a URL by setting components"
.nf
$ trurl --set host=example.com --set scheme=ftp
ftp://example.com/
.fi
.IP "Redirect a URL"
.nf
$ trurl --url https://curl.se/we/are.html --redirect here.html
https://curl.se/we/here.html
.fi
.IP "Change port number"
This also shows how trurl will remove dot-dot sequences
.nf
$ trurl --url https://curl.se/we/../are.html --set port=8080
https://curl.se:8080/are.html
.fi
.IP "Extract the path from a URL"
.nf
$ trurl --url https://curl.se/we/are.html --get '{path}'
/we/are.html
.fi
.IP "Extract the port from a URL"
This gets the default port based on the scheme if the port is not set in the
URL.
.nf
$ trurl --url https://curl.se/we/are.html --get '{default:port}'
443
.fi
.IP "Append a path segment to a URL"
.nf
$ trurl --url https://curl.se/hello --append path=you
https://curl.se/hello/you
.fi
.IP "Append a query segment to a URL"
.nf
$ trurl --url "https://curl.se?name=hello" --append query=search=string
https://curl.se/?name=hello&search=string
.fi
.IP "Read URLs from stdin"
.nf
$ cat urllist.txt | trurl --url-file -
\&...
.fi
.IP "Output JSON"
.nf
$ trurl "https://fake.host/search?q=answers&user=me#frag" --json
[
{
"url": "https://fake.host/search?q=answers&user=me#frag",
"parts": [
"scheme": "https",
"host": "fake.host",
"path": "/search",
"query": "q=answers&user=me"
"fragment": "frag",
],
"params": [
{
"key": "q",
"value": "answers"
},
{
"key": "user",
"value": "me"
}
]
}
]
.fi
.IP "Remove tracking tuples from query"
.nf
$ trurl "https://curl.se?search=hey&utm_source=tracker" --trim query="utm_*"
https://curl.se/?search=hey
.fi
.IP "Show a specific query key value"
.nf
$ trurl "https://example.com?a=home&here=now&thisthen" -g '{query:a}'
home
.fi
.IP "Sort the key/value pairs in the query component"
.nf
$ trurl "https://example.com?b=a&c=b&a=c" --sort-query
https://example.com?a=c&b=a&c=b
.fi
.IP "Work with a query that uses a semicolon separator"
.nf
$ trurl "https://curl.se?search=fool;page=5" --trim query="search" --query-separator ";"
https://curl.se?page=5
.fi
.IP "Accept spaces in the URL path"
.nf
$ trurl "https://curl.se/this has space/index.html" --accept-space
https://curl.se/this%20has%20space/index.html
.fi
.IP "Create multiple variations of a URL with different schemes"
.nf
$ trurl "https://curl.se/path/index.html" --iterate "scheme=http ftp sftp"
http://curl.se/path/index.html
ftp://curl.se/path/index.html
sftp://curl.se/path/index.html
.fi
.SH WWW
https://curl.se/trurl
.SH "SEE ALSO"
.BR curl_url_set (3),
.BR curl_url_get (3)