qsplit: Create manifest files for parallel copy using rsync or robocopy

This document details two versions of qsplit:

Original qsplit which creates rsync and robocopy manifests
Newer qsplit rsync only which is better and handling situations where you maybe be doing multiple iterations of rsync.

qsplit: Create manifest files for parallel copy using rsync or robocopy

The qsplit utility is used to move data from a qumulo cluster by using Qumulo file and directory aggregates from the REST API. Qsplit uses the read_dir_aggregates API to build a list of paths (in ~log(n) time) that can be piped to rsync in order to optimize a migration from a Qumulo cluster to another target path.

Using theis approach you should see a significant performance improvement over running rsync in the traditional way rsync -av -r [src] [dest]. The performance should be better for two reasons:

No file crawl needed by rsync because we're passing a filespsec in --files-from
Running multiple instances of rsync in parallel
Different client machines avoid burying the NIC and keep things busy and active.

Example usage: python3 qsplit.py --ip 192.168.1.88 -u admin -b 4 /media

A detailed qsplit example

First, a little about the "algorithm":

Divide a qumulo cluster into N equal partitions. A partition is a list of paths. The partitioning is based on the capacity (block count), which is obtained from fs_read_dir_aggregates. (You can also specify partitioning using the argument -a files).
Feed each partition to an rsync client

As an example, I run the command like this:

python3 qsplit.py --ip 192.168.1.88 -b 4 /music

This will create four 'bucket files' for host '192.168.1.88' and path '/music': a bucket is a list of filepaths using naming convention split_bucket_[n].txt where 'n' is # from 1..[# of buckets specified, above it is four]. If you do not specify a '-b' param it will create a single bucket with all of the filepaths for the specified source and path.

Once the files are created you can copy them to different machines/NICs to perform rsyncs (or robocopies) in parallel. You could also run the rsyncs on a single machine with separate processes but you'd likely bury the machine NIC with traffic that way. So one way to use these manifests is:

Copy the results of qsplit/text files to somewhere client machines can resolve them
ssh to [n] different client machines with separate NICs
Mount the cluster [src] and [dest] on each machine
On each machine run rsync in the following fashion:

rsync -av -r --files-from=split_bucket_[n].txt [src qumulo cluster mount] [target cluster mount]

NOTE that the file paths in the bucket text files are all relative to the path specified when running qsplit so if you created filepaths for '/music' then that should be your [src cluster mount] point so that the relative filepaths can resolve.

Windows/robocopy option

qsplit.py now also offers a --robocopy (or -r) option for Windows environments which writes out file specs using backslashes rather than forward slashes:

python3 qsplit.py -r --ip 192.168.1.88 -u admin -b 4 /media

qsplit rsync only: Create manifest files for parallel copy using rsync

Example usage: python3 qsplit-rsync-only.py --host 192.168.1.88 -b 4 /music

This will create four files that can be used with a command like the following:

rsync --filter '. rsync-filter-001.txt' -a Q/ T/

Prerequisites

Python 2.7

  pip install -r requirements.txt

You can verify that you have the Qumulo REST API installed by running the following command at a command prompt:

  pip list

You should see something like the following output:

...
qumulo-api (2.6.10)
...

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pylint.rc		pylint.rc
qsplit-rsync-only.py		qsplit-rsync-only.py
qsplit.py		qsplit.py
requirements.txt		requirements.txt
runlint		runlint
runtests		runtests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

qsplit: Create manifest files for parallel copy using rsync or robocopy

A detailed qsplit example

Windows/robocopy option

qsplit rsync only: Create manifest files for parallel copy using rsync

Prerequisites

About

Releases

Packages

Contributors 8

Languages

License

Qumulo/qsplit

Folders and files

Latest commit

History

Repository files navigation

qsplit: Create manifest files for parallel copy using rsync or robocopy

A detailed qsplit example

Windows/robocopy option

qsplit rsync only: Create manifest files for parallel copy using rsync

Prerequisites

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages