CARTA employs a MILP to solve a constrained maximum parsimony problem to infer (i) a cell differentiatoin map and (ii) an ancestral cell type labeling for a set of cell lineage trees.
CARTA takes as input:
- Cell lineage trees
- Terminal cell cell type annotations for each cell in each cell lineage tree
- An integer constraint specifying the number of progenitors in the inferred cell differentiation map
- python3 (note that 3.9 is necessary in order to correctly build cassiopeia)
- numpy
- pandas
- gurobipy
- networkx
- cassiopeia
- ete3
- (for generating simulation and real data instances) snakemake (>=5.2.0)
The input for CARTA is
- A tab-delimited file, which has on each line the locations of the newick and state annotation files of the set of cell lineage trees over which to infer the cell differentiation map.
- Example:
data/gastruloid/TLS_locations.txt
- Example:
- An integer k specifying the number of progenitors in the inferred cell differentiation map. Zero-indexed; i.e. k = 0 will specify only the root progenitor.
- A file containing all terminal cell types must be provided, with each cell type on its own line.
- example:
data/gastruloid/TLS_states.txt
- example:
usage: run_ilp.py [--prefix PREFIX] [-k K] [--file_locations FILE_LOCATIONS] [--states_file STATES_FILE] [--normalize_method NORMALIZE_METHOD] [--time_limit_min TIME_LIMIT_MIN] [--enforce_tree]
required arguments:
--prefix PREFIX filepath for folder at which to store output files
-k K number of progenitors in the output
--file_locations FILE_LOCATIONS txt file with newick and state annotation file locations
--states_file STATES_FILE file containing the terminal states
optional arguments:
--normalize_method NORMALIZE_METHOD The weights for each terminal cell state corresponding to w_s(t). Default is w_s(t) = 1 for each terminal cell type
--time_limit_sec TIME_LIMIT_SEC The time limit in seconds. Default is 6 hours.
--enforce_tree Whether to enforce that the output cell differentiation map is a tree. Default is False
An example of usage is as follows. This command can be run from the directory that contains this README file.
python src/run_ilp.py --prefix test -k 5 --file_locations data/gastruloid/TLS_locations.txt --states_file data/gastruloid/TLS_states.txt
A cell differentiation map built from the progenitors output by CARTA and with edge weights counting the number of cells in the dataset that traverse each edge can be generated using the following command.
python src/build_DAG_from_labeled_trees.py --prefix test --file_locations data/gastruloid/TLS_locations.txt --states_file data/gastruloid/TLS_states.txt --node_labels_file test_nodeLabels.txt
Currently, the newick files encoding the TLS cell lineage trees are stored in data/gastruloid/input_trees
and the metadata files containing the cell type annotations are stored in data/gastruloid/formatted_and_reduced_labels
.