You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We encountered a problem while analyzing data using jcvi. Below are the steps and issues we analyzed.
Firstly, we prepared bed and cds files for 2 species. The format of the files for 2 species is the same. One species was named LHG and another species named Species2.
The format of bed is shown below (LHG_scaffold_3):
Secondly, we performed ortholog analysis for LHG with all other species using code like python -m jcvi.compara.catalog ortholog {main_species} {other_species} --no_strip_names to extract collinear information for all genes in LHG_scaffold_3 with all other species.
We have obtained the collinearity information file of LHG_scaffold_3 compared to species. We display the content of each file (displaying gene SGR00005418.1 information as it exists in each file) such as:
Thirdly, we extracted collinear information for several genes (include SGR00005418.1) in LHG_scaffold_3 instead of all genes in LHG_scaffold_3 and rerun the analysis.
LHG_UGTscaffold_3.Species2_v2_5.lifted.anchors: were not formed.
LHG_UGTscaffold_3.Species2_v2_5.anchors: The file have no content, and gene SGR00005418.1 A is not in the file.
The question is: in the second step, when inputting all genes, gene SGR00005418.1 could find anchors in another species. But in the third step when inputing parts of genes, gene SGR00005418.1 could not find anchors in another species. Is there any fltering steps for generating the anchor file from the last.filtered file. (By the way, I also tried to change --min_size to 1 or 0, still no luck.)
Would you please give me some suggestion? When we focus on specific gene sets, should we feed only these gene sets or whole scaffold to the jcvi.compara.catalog step?
Thank you very much.
The text was updated successfully, but these errors were encountered:
In your third step, did you include the neighbors that formed the synteny block close to SGR00005418.1?
Filtering the input files to only subsets of the genes is not recommended, since jcvi relies on the LAST hits from other genes (even those that aren't forming the blocks) as part of the filtering decision. So subsetting may lead to unexpected results?
Would you please explain a bit why you want to run on subsets, rather than extracting what you need from the full run?
In my third step, I include the neighbors that formed the synteny block close to SGR00005418.1 , however, only two genes are close to the SGR00005418.1. Such as:
The reason I want to run on subsets is that the full data is abundant, I want to simplify the data, and analyze the data for the genes I am interested in. This could be more convenient.
To my surprise, the data processing results were different.
Hi @tanghaibao,
Thanks for this great tool.
We encountered a problem while analyzing data using
jcvi
. Below are the steps and issues we analyzed.Firstly, we prepared
bed
andcds
files for2
species. The format of the files for2
species is the same. One species was namedLHG
and another species namedSpecies2
.The format of
bed
is shown below (LHG_scaffold_3):The format of
cds
is shown below (LHG_scaffold_3):Secondly, we performed
ortholog
analysis forLHG
with all other species using code likepython -m jcvi.compara.catalog ortholog {main_species} {other_species} --no_strip_names
to extract collinear information for all genes inLHG_scaffold_3
with all other species.We have obtained the collinearity information file of
LHG_scaffold_3
compared to species. We display the content of each file (displaying geneSGR00005418.1
information as it exists in each file) such as:*LHG_scaffold_3.Species2_v2_5.anchors
*LHG_scaffold_3.Species2_v2_5.lifted.anchors
Thirdly, we extracted collinear information for several genes (include
SGR00005418.1
) inLHG_scaffold_3
instead of all genes inLHG_scaffold_3
and rerun the analysis.The question is: in the
second
step, when inputting all genes, geneSGR00005418.1
could find anchors in another species. But in thethird
step when inputing parts of genes, geneSGR00005418.1
could not find anchors in another species. Is there any fltering steps for generating theanchor
file from thelast.filtered
file. (By the way, I also tried to change--min_size
to1
or0
, still no luck.)Would you please give me some suggestion? When we focus on specific gene sets, should we feed only these gene sets or whole scaffold to the
jcvi.compara.catalog
step?Thank you very much.
The text was updated successfully, but these errors were encountered: