Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for btrfs deduplication #52

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yath
Copy link

@yath yath commented Dec 21, 2015

Thie adds an option -D (or --dedupe) to fdupes which issues the
BTRFS_IOC_FILE_EXTENT_SAME ioctl on files that it detects as duplicates,
causing btrfs to deduplicate the data on disk.

This build option needs to be turned on separately in the Makefile by
setting the HAVE_BTRFS_IOCTL_H preprocessor macro to not break builds
missing this header.

Also fixes a memory leak in escapefilename().

@yath
Copy link
Author

yath commented Dec 21, 2015

The btrfs-enabled duperemove utility has support for consuming fdupes duplicate lists and deduplicating them. Because of this, adding the functionality directly to fdupes itself seems unnecessary.

SLOC Directory SLOC-by-Language (Sorted)
7569 duperemove ansic=7307,sh=262
1373 fdupes ansic=1373

There’s vim, why would anyone need another editor?

Note also that fdupes does a byte-for-byte data check which will be repeated by the btrfs driver anyway, slowing the process down.

Yeah, one could probably skip the byte-for-byte check in that case.

(My fork of fdupes has a semi-documented -Q option to skip byte-for-byte checking which alleviates the redundancy in this use case, but it is unlikely such an option will be added to fdupes itself due to the inherent danger of it.)

You are welcome to merge my patch to your fork. Once it’s landed in Debian I’ll use it.

Thie adds an option -B (or --dedupe) to fdupes which issues the
BTRFS_IOC_FILE_EXTENT_SAME ioctl on files that it detects as duplicates,
causing btrfs to deduplicate the data on disk.

This build option needs to be turned on separately in the Makefile by
setting the HAVE_BTRFS_IOCTL_H preprocessor macro to not break builds
missing this header.

Also fixes a memory leak in escapefilename().
@yath
Copy link
Author

yath commented Dec 21, 2015

Oh, I would also suggest changing -D to -B since a simple typo can result in -d (delete!) instead of -D and if they happen to use the -N with it, it'll just delete half their stuff instead of dedupe.

Good point, thanks. I’ve amended this commit and it also skips verifying the file contents now.

Could you try it out and open a separate issue report on my fork if it doesn't work as expected?

Yep, works. I think you could just rip out the couple of lines starting at https://git.kernel.org/cgit/linux/kernel/git/kdave/btrfs-progs.git/tree/ioctl.h#n344 (IANAL though).

@adrianlopezroche
Copy link
Owner

If you decide to ask Debian (or any other distro) to include your version
of fdupes, I do hope you'll give the executable a different name than
"fdupes".

On Mon, Dec 21, 2015, 11:46 AM Jody Bruchon [email protected]
wrote:

Vim is the only editor I use ;-) As for the line counts, I'd say that's
not automatically a good reason to avoid dupreremove though the fact that
it requires sqlite3 and glib2 to work is a more compelling argument for
inclusion directly into fdupes. My fork is highly unlikely to end up in
Debian, though, and I don't know how to poke them to include it anyway...


Reply to this email directly or view it on GitHub
#52 (comment)
.

@adrianlopezroche
Copy link
Owner

Hello Sebastian. I am reluctant to include system-specific code in mainline
fdupes, but thank you for contributing.

On Mon, Dec 21, 2015, 11:09 AM Sebastian Schmidt [email protected]
wrote:

Thie adds an option -D (or --dedupe) to fdupes which issues the
BTRFS_IOC_FILE_EXTENT_SAME ioctl on files that it detects as duplicates,
causing btrfs to deduplicate the data on disk.

This build option needs to be turned on separately in the Makefile by
setting the HAVE_BTRFS_IOCTL_H preprocessor macro to not break builds
missing this header.

Also fixes a memory leak in escapefilename().

You can view, comment on, or merge this pull request online at:

#52
Commit Summary

  • Add support for btrfs deduplication

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#52.

@sandrotosi
Copy link

with my debian maintainer hat on, I can say that 'fdupes' in debian is tracking https://github.com/adrianlopezroche/fdupes/ and it will keep doing it for the foreseeable future. but I have to admit the development here is a bit on hold, and there are a lot of forks out there which might have interesting features to add to fdupes. @adrianlopezroche do you plan to resume fdupes development? that'd be great!

@adrianlopezroche
Copy link
Owner

Jody Bruchon: Thanks.

On Tue, Dec 22, 2015 at 1:08 AM Jody Bruchon [email protected]
wrote:

@adrianlopezroche https://github.com/adrianlopezroche I renamed my fork
in the code and documentation to set it apart as a completely different
program. If someone decides to add it to a distribution as a package, it
will live on its own and not interfere with any installed fdupes. I hope
this addresses your previous concern sufficiently. See
jbruchon/fdupes-jody@13f6c51
https://github.com/jbruchon/fdupes-jody/commit/13f6c51917256ca30a491e81231d899784096720
for more info.


Reply to this email directly or view it on GitHub
#52 (comment)
.

@adrianlopezroche
Copy link
Owner

Sandro Tosi: I haven't abandoned fdupes. I am much more careful about
adding new features (don't want fdupes to grow too complex), but there's
still some features on which I intend to continue working, and am open to
bugfixes.

On Mon, Dec 21, 2015 at 9:18 PM Sandro Tosi [email protected]
wrote:

with my debian maintainer hat on, I can say that 'fdupes' in debian is
tracking https://github.com/adrianlopezroche/fdupes/ and it will keep
doing it for the foreseeable future. but I have to admit the development
here is a bit on hold, and there are a lot of forks out there which might
have interesting features to add to fdupes. @adrianlopezroche
https://github.com/adrianlopezroche do you plan to resume fdupes
development? that'd be great!


Reply to this email directly or view it on GitHub
#52 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants