Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in quantms related with parquet and SDRF-pipelines #425

Open
ypriverol opened this issue Oct 6, 2024 · 1 comment
Open

bug in quantms related with parquet and SDRF-pipelines #425

ypriverol opened this issue Oct 6, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ypriverol
Copy link
Member

Description of the bug

Plus 28 more processes waiting for tasks…
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/quantms] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:INPUT_CHECK:SAMPLESHEET_CHECK (PXD000001.sdrf.tsv)'

Caused by:
Process NFCORE_QUANTMS:QUANTMS:INPUT_CHECK:SAMPLESHEET_CHECK (PXD000001.sdrf.tsv) terminated with an error exit status (1)

Command executed:

quantmsutilsc checksamplesheet --exp_design "PXD000001.sdrf.tsv" --is_sdrf


--skip_factor_validation

--use_ols_cache_only 2>&1 | tee input_check.log

cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:INPUT_CHECK:SAMPLESHEET_CHECK":
quantms-utils: $(pip show quantms-utils | grep "Version" | awk -F ': ' '{print $2}')
END_VERSIONS

Command exit status:
1

Command output:
2024-10-06 11:26:49,019 [] - platform is linux
2024-10-06 11:26:49,071 [wrapper] - CACHEDIR=/tmp/matplotlib-1e0a9r_x
2024-10-06 11:26:49,071 [init] - font search path [PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/ttf'), PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/afm'), PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts')]
Fontconfig error: No writable cache directories
2024-10-06 11:26:49,348 [_load_fontmanager] - generated new fontManager
Traceback (most recent call last):
File "/usr/local/bin/quantmsutilsc", line 10, in
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/quantmsutils/quantmsutilsc.py", line 38, in main
cli()
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/quantmsutils/sdrf/check_samplesheet.py", line 189, in checksamplesheet
check_sdrf(
File "/usr/local/lib/python3.10/site-packages/quantmsutils/sdrf/check_samplesheet.py", line 55, in check_sdrf
errors = df.validate(DEFAULT_TEMPLATE, use_ols_cache_only)
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf.py", line 79, in validate
errors = default_schema.validate(self, use_ols_cache_only=use_ols_cache_only)
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 218, in validate
error_ontology_terms = self.validate_columns(panda_sdrf, use_ols_cache_only=use_ols_cache_only)
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 329, in validate_columns
errors += column.validate(series)
File "/usr/local/lib/python3.10/site-packages/pandas_schema/column.py", line 27, in validate
return [error for validation in self.validations for error in validation.get_errors(series, self)]
File "/usr/local/lib/python3.10/site-packages/pandas_schema/column.py", line 27, in
return [error for validation in self.validations for error in validation.get_errors(series, self)]
File "/usr/local/lib/python3.10/site-packages/pandas_schema/validation.py", line 85, in get_errors
simple_validation = ~self.validate(series)
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 149, in validate
ontology_terms = client.search(
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/ols.py", line 286, in search
terms = self.cache_search(term, ontology)
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/ols.py", line 414, in cache_search
duckdb_conn = duckdb.execute(
File "/usr/local/lib/python3.10/site-packages/duckdb/init.py", line 225, in execute
return conn.execute(query, parameters, multiple_parameter_sets, **kwargs)
duckdb.duckdb.ConversionException: Conversion Error: In Parquet reader of file "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/psi-ms.parquet": failed to cast column "accession" from type VARCHAR to INTEGER: Could not convert string 'NCIT:C25330' to INT32

In file "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/psi-ms.parquet" the column "accession" has type VARCHAR, but we are trying to read it as type INTEGER.
This can happen when reading multiple Parquet files. The schema information is taken from the first Parquet file by default. Possible solutions:

  • Enable the union_by_name=True option to combine the schema of all Parquet files (duckdb.org/docs/data/multiple_files/combining_schemas)
  • Use a COPY statement to automatically derive types from an existing table.

Command error:
2024-10-06 11:26:49,019 [] - platform is linux
2024-10-06 11:26:49,071 [wrapper] - CACHEDIR=/tmp/matplotlib-1e0a9r_x
2024-10-06 11:26:49,071 [init] - font search path [PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/ttf'), PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/afm'), PosixPath('/usr/local/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts')]
Fontconfig error: No writable cache directories
2024-10-06 11:26:49,348 [_load_fontmanager] - generated new fontManager
Traceback (most recent call last):
File "/usr/local/bin/quantmsutilsc", line 10, in
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/quantmsutils/quantmsutilsc.py", line 38, in main
cli()
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/quantmsutils/sdrf/check_samplesheet.py", line 189, in checksamplesheet
check_sdrf(
File "/usr/local/lib/python3.10/site-packages/quantmsutils/sdrf/check_samplesheet.py", line 55, in check_sdrf
errors = df.validate(DEFAULT_TEMPLATE, use_ols_cache_only)
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf.py", line 79, in validate
errors = default_schema.validate(self, use_ols_cache_only=use_ols_cache_only)
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 218, in validate
error_ontology_terms = self.validate_columns(panda_sdrf, use_ols_cache_only=use_ols_cache_only)
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 329, in validate_columns
errors += column.validate(series)
File "/usr/local/lib/python3.10/site-packages/pandas_schema/column.py", line 27, in validate
return [error for validation in self.validations for error in validation.get_errors(series, self)]
File "/usr/local/lib/python3.10/site-packages/pandas_schema/column.py", line 27, in
return [error for validation in self.validations for error in validation.get_errors(series, self)]
File "/usr/local/lib/python3.10/site-packages/pandas_schema/validation.py", line 85, in get_errors
simple_validation = ~self.validate(series)
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/sdrf/sdrf_schema.py", line 149, in validate
ontology_terms = client.search(
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/ols.py", line 286, in search
terms = self.cache_search(term, ontology)
File "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/ols.py", line 414, in cache_search
duckdb_conn = duckdb.execute(
File "/usr/local/lib/python3.10/site-packages/duckdb/init.py", line 225, in execute
return conn.execute(query, parameters, multiple_parameter_sets, **kwargs)
duckdb.duckdb.ConversionException: Conversion Error: In Parquet reader of file "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/psi-ms.parquet": failed to cast column "accession" from type VARCHAR to INTEGER: Could not convert string 'NCIT:C25330' to INT32

In file "/usr/local/lib/python3.10/site-packages/sdrf_pipelines/ols/psi-ms.parquet" the column "accession" has type VARCHAR, but we are trying to read it as type INTEGER.
This can happen when reading multiple Parquet files. The schema information is taken from the first Parquet file by default. Possible solutions:

  • Enable the union_by_name=True option to combine the schema of all Parquet files (duckdb.org/docs/data/multiple_files/combining_schemas)
  • Use a COPY statement to automatically derive types from an existing table.

Work dir:
/Users/yperez/work/quantms/work/ce/9f2986f1eb38825f0d7f4a75cae3d4

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

-- Check '.nextflow.log' file for details

Command used and terminal output

No response

Relevant files

No response

System information

No response

@ypriverol ypriverol added the bug Something isn't working label Oct 6, 2024
@ypriverol ypriverol self-assigned this Oct 6, 2024
@ypriverol ypriverol changed the title big in quantms related with parquet and SDRF-pipelines bug in quantms related with parquet and SDRF-pipelines Oct 7, 2024
@ypriverol
Copy link
Member Author

PR ongoing here: bigbio/quantms-utils#29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant