Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help needed for RuntimeError: size of tensor a must match the size of tensor b #364

Open
MapleHe opened this issue Oct 3, 2024 · 2 comments

Comments

@MapleHe
Copy link

MapleHe commented Oct 3, 2024

Thanks for your excellent tool.
Previously I used vamb 3.0.2 for data analysls.
Currently I'm trying to run the latest version of vamb4, but unfortunately met such Runtime Error which I haven't figured out how to solve.
I would be grateful if you can provide any suggestion.

Environment

  • Python: 3.12.6
  • Vamb: 4.1.4.dev136+g5090ecc

Commands

conda run -p vamb4 vamb bin default --outdir ${WORKING_DIR1} -m 2000 -p ${PROGRAM_T} --cuda --fasta contigs.fa --bamdir ${BAM_DIR}

## OR ##

conda run -p vamb4 vamb bin default --outdir ${WORKING_DIR1} -m 2000 -p ${PROGRAM_T} --cuda --fasta contigs.fa --bamdir ${BAM_DIR} -o "."

Other notes

  • The contigs.fa, assembled using metaspades, were manually renaming, filtering and concatenating. I assume the concatenation should be the same as Vamb's concatenate.py script with --keepnames option. The read ID is formatted as:
>Sample1-XX-XX.NODE_X_X.1111
AAAAA
>Sample2-XX-XX.NODE_X_X.1111
AAAAA
  • The bam files in BAM_DIR were generated using bwa-mem2, with reads mapped to concatenated contig, for each sample separately , and sorted using samtools.
Sample1-XX-XX.bam
Sample2-XX-XX.bam

Logs

The full content of log file can be found here in log.txt

Here are tail contents of vamb log:

2024-10-03 16:21:36.260 | INFO    | Clustering
2024-10-03 16:21:36.260 | INFO    | 	Windowsize: 300
2024-10-03 16:21:36.260 | INFO    | 	Min successful thresholds detected: 15
2024-10-03 16:21:36.260 | INFO    | 	Max clusters: None
2024-10-03 16:21:36.261 | INFO    | 	Use CUDA for clustering: True
2024-10-03 16:21:36.261 | INFO    | 	Binsplitter: "."
2024-10-03 16:21:55.278 | ERROR   | An error has been caught in function 'main', process 'MainProcess' (2581899), thread 'MainThread' (140562393429824):
Traceback (most recent call last):

  File "/projects/Software/miniforge3/envs/vamb4/bin/vamb", line 8, in <module>
    sys.exit(main())
    │   │    └ <function main at 0x7fd622ce2200>
    │   └ <built-in function exit>
    └ <module 'sys' (built-in)>

> File "/maps/projects/Software/vamb/vamb/__main__.py", line 2183, in main
    run(runner, opt.common.general)
    │   │       │   │      └ <vamb.__main__.GeneralOptions object at 0x7fd622ecf7e0>
    │   │       │   └ <vamb.__main__.BinnerCommonOptions object at 0x7fd622cf65d0>
    │   │       └ <vamb.__main__.BinDefaultOptions object at 0x7fd622fe6600>
    │   └ functools.partial(<function run_bin_default at 0x7fd622ce1620>, <vamb.__main__.BinDefaultOptions object at 0x7fd622fe6600>)
    └ <function run at 0x7fd622ce0680>

  File "/maps/projects/Software/vamb/vamb/__main__.py", line 647, in run
    runner()
    └ functools.partial(<function run_bin_default at 0x7fd622ce1620>, <vamb.__main__.BinDefaultOptions object at 0x7fd622fe6600>)

  File "/maps/projects/Software/vamb/vamb/__main__.py", line 1204, in run_bin_default
    cluster_and_write_files(
    └ <function cluster_and_write_files at 0x7fd622ce1080>

  File "/maps/projects/Software/vamb/vamb/__main__.py", line 1090, in cluster_and_write_files
    for i, cluster in enumerate(clusters):
        │  │                    └ <itertools.islice object at 0x7fd609a5ed90>
        │  └ <vamb.cluster.Cluster object at 0x7fd608fcfd80>
        └ 14959

  File "/maps/projects/Software/vamb/vamb/cluster.py", line 297, in __next__
    cluster, _, points = self.find_cluster()
                         │    └ <function ClusterGenerator.find_cluster at 0x7fd6364640e0>
                         └ ClusterGenerator(85 points, 14960 clusters)

  File "/maps/projects/Software/vamb/vamb/cluster.py", line 541, in find_cluster
    threshold = self.find_threshold(distances)
                │    │              └ tensor([0.5873, 0.2544, 0.5492, 0.2756, 0.7862, 0.4555, 0.5639, 0.3698, 0.3409,
                │    │                        0.4678, 0.7385, 0.4397, 0.2854, 0.467...
                │    └ <function ClusterGenerator.find_threshold at 0x7fd636464040>
                └ ClusterGenerator(85 points, 14960 clusters)

  File "/maps/projects/Software/vamb/vamb/cluster.py", line 455, in find_threshold
    below_xmax = (distances <= _XMAX) & self.kept_mask
                  │            │        │    └ <member 'kept_mask' of 'ClusterGenerator' objects>
                  │            │        └ ClusterGenerator(85 points, 14960 clusters)
                  │            └ 0.3
                  └ tensor([0.5873, 0.2544, 0.5492, 0.2756, 0.7862, 0.4555, 0.5639, 0.3698, 0.3409,
                            0.4678, 0.7385, 0.4397, 0.2854, 0.467...

RuntimeError: The size of tensor a (87) must match the size of tensor b (85) at non-singleton dimension 0
@MapleHe
Copy link
Author

MapleHe commented Oct 4, 2024

I rerun the pipeline from mapping step, got the same error.

  1. concatenate contigs
concatenate.py -m 2000 contigs.2k.fa sample1.contigs.fasta sample2.contigs.fasta
bwa-mem2 index -p contigs.2k contigs.2k.fa
  1. mapping to contigs
bwa-mem2 mem contigs.2k sample1_1.fq sample1_2.fq | \
    samtools view -bS -F 3584 - | \
    samtools sort -O bam -o bams/sample1.contigs.bam

## same for sample 2
  1. run vamb
conda run -p vamb4 vamb bin default --outdir vamb_output -m 2000 -p 32 --cuda --fasta contigs.2k.fa.gz --bamdir bams/ 

@MapleHe
Copy link
Author

MapleHe commented Oct 11, 2024

Update:

These two version of VAMB works fine using the same data, the separator can be either default "C" or customized "." .

  • Buid from source branch v4.1.3
  • pip installed version v3.0.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant