forked from simonsj/fdupes-jody
-
Notifications
You must be signed in to change notification settings - Fork 0
/
jdupes.1
339 lines (311 loc) · 11.7 KB
/
jdupes.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
.TH JDUPES 1
.\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection
.\" other parms are allowed: see man(7), man(1)
.SH NAME
jdupes \- finds and performs actions upon duplicate files
.SH SYNOPSIS
.B jdupes
[
.I options
]
.I FILES and/or DIRECTORIES
\|.\|.\|.
.SH "DESCRIPTION"
Searches the given path(s) for duplicate files. Such files are found by
comparing file sizes, then partial and full file hashes, followed by a
byte-by-byte comparison. The default behavior with no other "action
options" specified (delete, summarize, link, dedupe, etc.) is to print
sets of matching files.
.SH OPTIONS
.TP
.B -@ --loud
output annoying low-level debug info while running
.TP
.B -0 --printnull
when printing matches, use null bytes instead of CR/LF bytes, just
like 'find -print0' does. This has no effect with any action mode other
than the default "print matches" (delete, link, etc. will still print
normal line endings in the output.)
.TP
.B -1 --one-file-system
do not match files that are on different filesystems or devices
.TP
.B -A --nohidden
exclude hidden files from consideration
.TP
.B -B --dedupe
issue the btrfs same-extents ioctl to trigger a deduplication on
disk. The program must be built with btrfs support for this option
to be available
.TP
.B -C --chunksize=\fIBYTES\fR
set the I/O chunk size manually; larger values may improve performance
on rotating media by reducing the number of head seeks required, but
also increases memory usage and can reduce performance in some cases
.TP
.B -D --debug
if this feature is compiled in, show debugging statistics and info
at the end of program execution
.TP
.B -d --delete
prompt user for files to preserve, deleting all others (see
.B CAVEATS
below)
.TP
.B -f --omitfirst
omit the first file in each set of matches
.TP
.B -H --hardlinks
normally, when two or more files point to the same disk area they are
treated as non-duplicates; this option will change this behavior
.TP
.B -h --help
displays help
.TP
.B -i --reverse
reverse (invert) the sort order of matches
.TP
.B -I --isolate
isolate each command-line parameter from one another; only match if the
files are under different parameter specifications
.TP
.B -L --linkhard
replace all duplicate files with hardlinks to the first file in each set
of duplicates
.TP
.B -m --summarize
summarize duplicate file information
.TP
.B -M --printwithsummary
print matches and summarize the duplicate file information at the end
.TP
.B -N --noprompt
when used together with \-\-delete, preserve the first file in each set of
duplicates and delete the others without prompting the user
.TP
.B -n --noempty
exclude zero-length files from consideration; this option is the default
behavior and does nothing (also see \fB\-z/--zeromatch\fP)
.TP
.B -O --paramorder
parameter order preservation is more important than the chosen sort; this
is particularly useful with the \fB\-N\fP option to ensure that automatic
deletion behaves in a controllable way
.TP
.B -o --order\fR=\fIWORD\fR
order files according to WORD:
time - sort by modification time
name - sort by filename (default)
.TP
.B -p --permissions
don't consider files with different owner/group or permission bits as
duplicates
.TP
.B -P --print=type
print extra information to stdout; valid options are:
early - matches that pass early size/permission/link/etc. checks
partial - files whose partial hashes match
fullhash - files whose full hashes match
.TP
.B -Q --quick
.B [WARNING: RISK OF DATA LOSS, SEE CAVEATS]
skip byte-for-byte verification of duplicate pairs (use hashes only)
.TP
.B -q --quiet
hide progress indicator
.TP
.B -R --recurse:
for each directory given after this option follow subdirectories
encountered within (note the ':' at the end of option; see the
Examples section below for further explanation)
.TP
.B -r --recurse
for every directory given follow subdirectories encountered within
.TP
.B -l --linksoft
replace all duplicate files with symlinks to the first file in each set
of duplicates
.TP
.B -S --size
show size of duplicate files
.TP
.B -s --symlinks
follow symlinked directories
.TP
.B -T --partial-only
.B [WARNING: EXTREME RISK OF DATA LOSS, SEE CAVEATS]
match based on hash of first block of file data, ignoring the rest
.TP
.B -v --version
display jdupes version and compilation feature flags
.TP
.B -x --xsize=[+]SIZE (NOTE: deprecated in favor of \-X)
exclude files of size less than SIZE from consideration, or if SIZE is
prefixed with a '+' i.e.
jdupes -x +226 [files]
then exclude files larger than SIZE. Suffixes K/M/G can be used.
.TP
.B -X --extfilter=spec:info
exclude/filter files based on specified criteria; supported specs are:
.RS
.IP `size[+-=]:number[suffix]'
Match only if size is greater (+), less than (-), or equal to (=) the
specified number, with an optional multiplier suffix. The +/- and =
specifiers can be combined: "size+=:4K" will match if size is greater
than or equal to four kilobytes (4096 bytes). Suffixes supported are
K/M/G/T/P/E with a B or iB extension (all case-insensitive); no extension
or an iB extension specify binary multipliers while a B extension
specifies decimal multipliers (ex: 4K or 4KiB = 4096, 4KB = 4000.)
.RE
.TP
.B -z --zeromatch
consider zero-length files to be duplicates; this replaces the old
default behavior when \fB\-n\fP was not specified
.TP
.B -Z --softabort
if the user aborts the program (as with CTRL-C) act on the matches that
were found before the abort was received. For example, if -L and -Z are
specified, all matches found prior to the abort will be hard linked. The
default behavior without -Z is to abort without taking any actions.
.SH NOTES
A set of arrows are used in hard linking to show what action was taken on
each link candidate. These arrows are as follows:
.TP
.B ---->
This file was successfully hard linked to the first file in the duplicate
chain
.TP
.B -@@->
This file was successfully symlinked to the first file in the chain
.TP
.B -==->
This file was already a hard link to the first file in the chain
.TP
.B -//->
Linking this file failed due to an error during the linking process
.PP
Duplicate files are listed together in groups with each file displayed on a
separate line. The groups are then separated from each other by blank lines.
.SH EXAMPLES
.TP
.B jdupes a --recurse: b
will follow subdirectories under b, but not those under a.
.TP
.B jdupes a --recurse b
will follow subdirectories under both a and b.
.TP
.B jdupes -O dir1 dir3 dir2
will always place 'dir1' results first in any match set (where relevant)
.SH CAVEATS
Using
.B \-1
or
.BR \-\-one\-file\-system
prevents matches that cross filesystems, but a more relaxed form of this
option may be added that allows cross-matching for all filesystems that
each parameter is present on.
When using
.B \-d
or
.BR \-\-delete ,
care should be taken to insure against accidental data loss.
.B \-Z
or
.BR \-\-softabort
used to be --hardabort in jdupes prior to v1.5 and had the opposite behavior.
Defaulting to taking action on abort is probably not what most users would
expect. The decision to invert rather than reassign to a different option
was made because this feature was still fairly new at the time of the change.
The
.B \-O
or
.BR \-\-paramorder
option allows the user greater control over what appears in the first
position of a match set, specifically for keeping the \fB\-N\fP option
from deleting all but one file in a set in a seemingly random way. All
directories specified on the command line will be used as the sorting
order of result sets first, followed by the sorting algorithm set by
the \fB\-o\fP or \fB\-\-order\fP option. This means that the order of
all match pairs for a single directory specification will retain the
old sorting behavior even if this option is specified.
When used together with options
.B \-s
or
.BR \-\-symlink ,
a user could accidentally preserve a symlink while deleting the file it
points to.
The
.B \-Q
or
.BR \-\-quick
option only reads each file once, hashes it, and performs comparisons
based solely on the hashes. There is a small but significant risk of a
hash collision which is the purpose of the failsafe byte-for-byte
comparison that this option explicitly bypasses. Do not use it on ANY data
set for which any amount of data loss is unacceptable. This option is not
included in the help text for the program due to its risky nature.
.B You have been warned!
The
.B \-T
or
.BR \-\-partial\-only
option produces results based on a hash of the first block of file data
in each file, ignoring everything else in the file. Partial hash checks
have always been an important exclusion step in the jdupes algorithm,
usually hashing the first 4096 bytes of data and allowing files that are
different at the start to be rejected early. In certain scenarios it may
be a useful heuristic for a user to see that a set of files has the same
size and the same starting data, even if the remaining data does not
match; one example of this would be comparing files with data blocks that
are damaged or missing such as an incomplete file transfer or checking a
data recovery against known-good copies to see what damaged data can be
deleted in favor of restoring the known-good copy. This option is meant
to be used with informational actions and
.B can result in EXTREME DATA LOSS
if used with options that delete files, create hard links, or perform
other destructive actions on data based on the matching output. Because
of the potential for massive data destruction,
.B this option MUST BE SPECIFIED TWICE
to take effect and will error out if it is only specified once.
Using the
.B \-C
or
.BR \-\-chunksize
option to override I/O chunk size can increase performance on rotating
storage media by reducing "head thrashing," reading larger amounts of
data sequentially from each file. This tunable size can have bad side
effects; the default size maximizes algorithmic performance without
regard to the I/O characteristics of any given device and uses a modest
amount of memory, but other values may greatly increase memory usage or
incur a lot more system call overhead. Try several different values to
see how they affect performance for your hardware and data set. This
option does not affect match results in any way, so even if it slows
down the file matching process it will not hurt anything.
.SH REPORTING BUGS
Send bug reports to [email protected] or use the issue tracker at:
http://github.com/jbruchon/jdupes/issues
.SH SUPPORTING DEVELOPMENT
If you find this program useful, please consider financially supporting
its continued development by visiting the following URL:
https://www.subscribestar.com/JodyBruchon
.SH AUTHOR
jdupes is created and maintained by Jody Bruchon <[email protected]>
and was forked from fdupes 1.51 by Adrian Lopez <[email protected]>
.SH LICENSE
The MIT License (MIT)
Copyright (C) 2015-2020 Jody Lee Bruchon and contributors
Forked from fdupes 1.51, Copyright (C) 1999-2014 Adrian Lopez and contributors
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.