segment_liftover is a Python tool for converting segments between genome assemblies without break them apart. See the main README for introduction and installation.
Usage: segment_liftover [OPTIONS]
Options:
-i, --input_dir TEXT The directory to start processing.
-o, --output_dir TEXT The directory to write new files.
-c, --chain_file TEXT Specify the chain file name.
-si, --segment_input_file TEXT Specify the segment input file name.
-so, --segment_output_file TEXT
Specify the segment output file name.
-pi, --probe_input_file TEXT Specify the probe input file name.
-po, --probe_output_file TEXT Specify the probe output file name.
-l, --liftover TEXT Specify the location of the UCSC liftover
program.
-t, --test_mode INTEGER Only process a limited number of files.
-f, --file_indexing Only generate the index file.
-x, --index_file FILENAME Specify an index file containing file paths.
-m, --mapping_file FILENAME Specify a pre-defined file of position
mappings.
--step_size INTEGER The step size of approximate conversion (in
bases, default:400).
--range INTEGER The searching range of approximate
conversion (in kilo bases, default:10).
--beta FLOAT Parameter in quality control.
--no_approximate_conversion Do not perform approximate conversion.
--new_segment_header TEXT... Specify 4 new column names for new segment
files.
--new_probe_header TEXT... Specify 3 new column names for new probe
files.
--resume TEXT... Specify a index file and a progress file to
resume an interrupted job.
--demo TEXT Copy example files to a user defined
directory and run a demonstration.
--log_path TEXT Specify the directory to write logging
files.
--help Show this message and exit.
segment_liftover is designed for batch processing, therefore, it requires the user to provide both input and output directories, and a pattern of the input file.
-i, --input_dir TEXT
The root directory of all the files that need to be processed. The program will traverse all the sub-directories and index all the files that match the user-defined pattern.
-o, --output_dir TEXT
Where the converted files should be stored. It keeps the same directory structure as input directory
.
-l, --liftover TEXT
segment_liftover depends on the UCSC liftOver program. segment_liftover looks for liftOver in system path by default. User can also manually provide the path of liftOver.
-c, --chain_file TEXT
segment_liftover uses the UCSC liftOver program to perform the basic conversion. Since liftOver requires a chain file for conversion, so does the segment_liftover.
Common human chain files are provided by segment_liftover as key words: hg18ToHg19, hg18ToHg38, hg19ToHg38, hg19ToHg18, hg38ToHg19.
You can find other chain files here.
-si, --segment_input_file TEXT
-pi, --probe_input_file TEXT
At lease one input file need to be given. It will be used for the automatic file traversal and discovery in the input directory.
The TEXT
is interpreted as a regular expression.
As a regular expression, the input segment.tsv
actually means segment?tsv
. But most of the time, this should not be a problem.
-so, --segment_output_file TEXT
-po, --probe_output_file TEXT
When specified, the output files will be renamed, otherwise, the input file name will be adopted.
-f, --file_indexing
Traversing a large or complex directory may take hours to complete. With this option, the program will stop after generating the indexing file which is saved as ./logs/fileList.log
. This is particularly useful for parallel running.
-x, --index_file FILENAME
Instead of begin with file traversal, which may be unnecessary to repeat, segment_liftover can also start from a pre-defined or generate file.
Each line of the file should be the path of an input file, nothing else should be in this file.
All the required options are still needed to generated the directory structure and distinguish segment/probe files.
-t, --test_mode INTEGER RANGE
When this option is given, the program will only process a limited number of files.
--no_approximate_conversion
When this option is given, the program will only use conversion results from liftOver.
-m, --mapping_file FILENAME
After each successful execution, segment_liftover will generate a log file containing all the re-converted segments and probes in ./logs/approximate_conversion.log
.
It can be re-used to dramatically improve the processing time, if the segments/probes are from the same/similar pipeline or platform.
--step_size INTEGER
--range INTEGER
When Liftover fails to convert a segment or probe, segment_liftover will try to re-convert by approximation. The surrounding region of the failed position will be searched, and the closed convertible position will be used as substitute. The distance between two searched positions and the maximum distance of the search are defined by step_size
and range
, respectively.
The default settings is step_size = 400
, range = 10
. The counting unit is base for step_size
and kilo bases for range
.
These options have significant impact on the process time of re-conversion. In general they should be related to the average distance between adjacent probes of the experiment platform.
--beta FLOAT
This parameter controls the strictness of one of the quality control condition, the default value is 2. Please refer to the paper for more details.
--new_segment_header TEXT
--new_probe_header TEXT
When specified, the output files will use the given new column names in the header.
--resume TEXT
Two files are needed to resume from an interruption: the indexing file and the progress files. They are log files generated by segment_liftover and saved as output_dir/logs/fileList.log
and output_dir/logs/progress.log
.
--log_path TEXT
By default, all log files are saved in output_dir/logs/
. User can also specify a directory to save log files.
--demo TEXT
By providing a directory path, an example data set will be copied over and followed by a segment_liftover with default parameters.