Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newer versions report "Illegal instruction" and exit on Mac with M2 #16

Open
raufs opened this issue Aug 1, 2024 · 7 comments
Open
Labels
bug Something isn't working

Comments

@raufs
Copy link

raufs commented Aug 1, 2024

Hi Martin,

Hope all is well! I just noticed on my Mac setup that versions v0.9.8 and v0.9.10 reports the following and exists:

gecco run -o /Users/raufs/Coding_Projects/test_lsapan/test_pan/test_case/lsaBGC-Pan_Results/GECCO_Results/Cutibacterium_acnes_GB_GCA_011525575_1/ -g /Users/raufs/Coding_Projects/test_lsapan/test_pan/test_case/lsaBGC-Pan_Results/Gene_Calling/Cutibacterium_acnes_GB_GCA_011525575_1.gbk
:heavy_check_mark: Loading sequences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.7/4.7 MiB 100% 0:00:00 0:00:00Illegal instruction: 4

This error does not occur on linux systems with these versions. Reverting to v0.9.6 appears to work as expected on the mac. I tried running with the verbose flag but it just gave the same error message.

The mac has an M2 chip if that helps. Installation was via conda. Happy to help with testing or share additional info.

Kind regards,
Rauf

@raufs raufs changed the title Issues with newer versions on Mac with M2 Newer versions report "Illegal instruction" and exit on Mac with M2 Aug 1, 2024
@althonos
Copy link
Member

Hey @raufs !

I'm pretty sure this is coming from Pyrodigal since you get the error right after loading sequences, so probably at the time Pyrodigal is executed. There's a chance the CPU feature detection doesn't work properly and causes the wrong platform code to be executed. Would you mind trying to run the Pyrodigal CLI on a test example, and check if you get the error with the latest version? According to my changelog GECCO v0.9.6 should be using Pyrodigal v2.0.0 while v0.9.10 is using v3.0.0 so it would be helpful if you could confirm the bug is happening on either of those. If that's indeed a Pyrodigal bug I'll transfer the issue there.

Cheers,
Martin

@althonos althonos added the bug Something isn't working label Aug 14, 2024
@raufs
Copy link
Author

raufs commented Aug 14, 2024

Pyrodigal versions appear the same in both conda environments with GECCO v0.9.6 and GECCO v0.9.10. Both environments have pyrodigal v3.5.1.

It seems I only tested running GECCO with a full genome GenBank file as input and this was the basis of my initial report. However, when testing using genomes in FASTA format as input, we see the reverse scenario where 0.9.10 works as expected 0.9.6 doesn't work giving the error message:

(/Users/raufs/Coding_Projects/test_gecco/gecco_0.9.6) Raufs-Mac-mini:input_genomes raufs$ gecco run -g Cutibacterium_avidum_GB_GCA_000477695_1.fasta -o test/

x An unexpected error occurred. Consider opening a new issue on the bug tracker ( https://github.com/zellerlab/GECCO/issues/new ) if it persists, including the traceback below:
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/raufs/Coding_Projects/test_gecco/gecco_0.9.6/lib/python3.12/site-packages/gecco/cli/comma │
│                                                                                                  │
│   156 │   │   │   │   subcmd.quiet = self.quiet                                                  │
│   157 │   │   │   │   subcmd.progress.disable = self.args["--no-progress-bar"]                   │
│   158 │   │   │   # run the subcommand                                                           │
│ ❱ 159 │   │   │   return subcmd.execute(ctx)                                                     │
│   160 │   │   except CommandExit as sysexit:                                                     │
│   161 │   │   │   return sysexit.code                                                            │
│   162 │   │   except KeyboardInterrupt:                                                          │
│                                                                                                  │
│ /Users/raufs/Coding_Projects/test_gecco/gecco_0.9.6/lib/python3.12/site-packages/gecco/cli/comma │
│                                                                                                  │
│   254 │   │   │   self._make_output_directory(outputs)                                           │
│   255 │   │   │   # load sequences and extract genes                                             │
│   256 │   │   │   sequences = list(self._load_sequences())                                       │
│ ❱ 257 │   │   │   genes = self._extract_genes(sequences)                                         │
│   258 │   │   │   if genes:                                                                      │
│   259 │   │   │   │   self.success("Found", "a total of", len(genes), "genes", level=1)          │
│   260 │   │   │   else:                                                                          │
│                                                                                                  │
│ /Users/raufs/Coding_Projects/test_gecco/gecco_0.9.6/lib/python3.12/site-packages/gecco/cli/comma │
│                                                                                                  │
│   135 │   │   [self.info](http://self.info/)("Extracting", "genes from input sequences", level=1)                     │
│   136 │   │   if self.cds_feature is None:                                                       │
│   137 │   │   │   [self.info](http://self.info/)("Using", "Pyrodigal in metagenomic mode", level=2)                   │
│ ❱ 138 │   │   │   orf_finder: ORFFinder = PyrodigalFinder(metagenome=True, mask=self.mask, cpus= │
│   139 │   │   else:                                                                              │
│   140 │   │   │   [self.info](http://self.info/)("Using", f"record features named {self.cds_feature!r}", level=2)     │
│   141 │   │   │   orf_finder = CDSFinder(feature=self.cds_feature, locus_tag=self.locus_tag)     │
│                                                                                                  │
│ /Users/raufs/Coding_Projects/test_gecco/gecco_0.9.6/lib/python3.12/site-packages/gecco/orf.py:72 │
│                                                                                                  │
│    69 │   │   self.metagenome = metagenome                                                       │
│    70 │   │   self.mask = mask                                                                   │
│    71 │   │   self.cpus = cpus                                                                   │
│ ❱  72 │   │   self.orf_finder =  pyrodigal.OrfFinder(meta=metagenome, mask=mask)                 │
│    73 │                                                                                          │
│    74 │   def _train(self, records: Iterable[SeqRecord]) -> pyrodigal.TrainingInfo:              │
│    75 │   │   sequences = []                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: module 'pyrodigal' has no attribute 'OrfFinder'

So perhaps the issue is related to parsing GenBank inputs in v0.9.10. And as far as I have experienced, everything works as expected on Linux setups regardless of versions used.

Hope its helpful!
Rauf

@raufs
Copy link
Author

raufs commented Aug 14, 2024

Here is an example full genome GenBank I used as input for the initial report:

Cutibacterium_avidum_RS_GCF_902375045_1.gbk.gz

I self create them in lsaBGC-Pan, so perhaps it is something on my end to improve with creating them. Will definitely look into later. They do work as compatible input for GECCO v0.9.6 (when using Mac M2 and also on Linux) and also v0.9.10 (but only on Linux).

@althonos
Copy link
Member

Ah sorry -- what I meant was to try running Pyrodigal directly on some genome with your laptop to see if it's the culprit that is crashing GECCO.

The "Illegal Instruction" interrupt happens when the CPU attempts to run, well, an illegal instruction. This usually happens when the CPU tries to run SIMD code it doesn't support, so for instance AVX2 or SSE4.1 on older computers. Since Pyrodigal and PyHMMER both use SIMD, I guess it's one of these two which crash, but I'd probably think Pyrodigal is the culprit because the crash happens when the progress bar is done Loading sequences.

Maybe you could just try:

$ python -m pyrodigal -i <some_genome_file.fna>

to check if this works, or if it immediately crashes?

@raufs
Copy link
Author

raufs commented Aug 29, 2024

I see, so what is odd here is that pyrodigal (v3.5.1) runs great on all systems. I actually use it to create the custom GenBank files to feed into GECCO. So running the pyrodigal command in a conda environment where GECCO reports the "Illegal instruction" works great on the M2 mac.

This is related to the lsaBGC-Pan suite, which is a re-implementation of lsaBGC, and I just get around this by using GECCO v0.9.6 as a dependency, which works great - so there is no rush here!

Maybe of interest to you and the other GECCO co-authors, but lsaBGC-Pan can now co-process both antiSMASH and GECCO predictions. Similar to your study, applying this to a well-sequenced Streptomyces species, we saw that this leads to a substantial increase in BGC predictions to using antiSMASH alone.

@althonos
Copy link
Member

Hmm..... Would you mind running GECCO in verbose mode? If Pyrodigal is not the culprit I'm wondering what the problem may be... You can run gecco -vv run instead of gecco run to get the verbose output and not the progress bar.

Maybe of interest to you and the other GECCO co-authors, but lsaBGC-Pan can now co-process both antiSMASH and GECCO predictions. Similar to your study, applying this to a well-sequenced Streptomyces species, we saw that this leads to a substantial increase in BGC predictions to using antiSMASH alone.

I've seen your tweet about it, that's really exciting!

@raufs
Copy link
Author

raufs commented Aug 29, 2024

Sure thing, here is the more detailed output:

(/Users/raufs/Coding_Projects/test_gecco/gecco_0.9.10) Raufs-Mac-mini:Gene_Calling raufs$ gecco -vv run -g Cutibacterium_avidum_GB_GCA_000413335_1.gbk
2024-08-29 12:15:04 Raufs-Mac-mini.local gecco[3348] INFO Using output folder '.'
2024-08-29 12:15:04 Raufs-Mac-mini.local gecco[3348] INFO Detecting sequence format from file contents
2024-08-29 12:15:04 Raufs-Mac-mini.local gecco[3348]   OK Detected format of input as 'genbank'
2024-08-29 12:15:04 Raufs-Mac-mini.local gecco[3348] INFO Loading sequences from genomic file 'Cutibacterium_avidum_GB_GCA_000413335_1.gbk'
2024-08-29 12:15:04 Raufs-Mac-mini.local gecco[3348]   OK Found 2 sequences
✔ Loading sequences ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.7/4.7 MiB 100% 0:00:00 0:00:00Illegal instruction: 4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants