Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genome length normalisation + kraken2 confidence score threshold #3

Open
fconstancias opened this issue Sep 27, 2022 · 6 comments
Open

Comments

@fconstancias
Copy link

fconstancias commented Sep 27, 2022

Dear @SilasK,

Thanks for hard work on the field!

I am analysing mouse metagenome dataset and trying different tools & appraoches for an exhaustive characterisation of the taxonomic composition of the fecal metagenomes (i.e., metaphlan4, mOTUS, kraken-braken on CMMG)

I have 3 questions (so far) regarding the kraken pipeline:

  • Do you have any recommandation regarding the confidence score threshold --confidence when working with CMMG and mice fecal metagenomes?
  • How can I access the length of the genomes you used to build the kraken-braken CMMG databases? Is it possible to retreive the info from the kraken2 database files? I would like to perform genome length normalisation.
  • How could I get the taxonomic path of the reference genomes? - to agglomerate data at higher taxonomic ranks.

Thanks for your help.

Best,

Florentin

@SilasK
Copy link
Owner

SilasK commented Sep 29, 2022

Sorry for the delay.

import pandas as pd


def rename_kraken_table(kraken_table,Tax):
    """
        Kraken table that is at species level map to species indexes
    """


    assert kraken_table.columns.is_unique
    
    # create mapping of species to index
    species2ref= pd.Series(Tax.index,Tax.species)
    # remove redundant species
    species2ref = species2ref.loc[~species2ref.index.duplicated(keep='first')]

    kraken_table.columns = species2ref.loc[kraken_table.columns]


```

(Let me test this function once again)

- What exactly do you mean by the confidence threshold? This is a parameter for?

I apologize, I would make the code much smoother but for now I don't have so much time. 

@fconstancias
Copy link
Author

Dear @SilasK,

Thanks for the details.

What exactly do you mean by the confidence threshold? This is a parameter for?

Please see --confidence from the manual or here. It might not be that important for habitat specific database but I was wondering whether you experiemented a bit using cmmg.

Best.

@SilasK
Copy link
Owner

SilasK commented Sep 30, 2022

No I haven't used the confidence score. do you usualy use it?

@fconstancias
Copy link
Author

fconstancias commented Sep 30, 2022

I have been previously working with environmental samples using kraken refseq database and used --confidence of 0.1 based on some results here. Using cmmg on mice GIT metagenomes it might not be needed, I have run the kraken2+bracken pipeline using --confidence values of 0, 0.05, 0.1 and 0.2, and can share the results when I am there.

@jorondo1
Copy link

Hi @SilasK and @fconstancias,

I'd love to have your input on this DerrickWood/kraken2#265 (comment)

cheers

@fconstancias
Copy link
Author

HI @jorondo1,

Sorry for the delay, check AGalanis97 answer there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants