Skip to content

Commit

Permalink
Add passthrough options for BCO (#36)
Browse files Browse the repository at this point in the history
---------

Signed-off-by: Ben Sherman <[email protected]>
  • Loading branch information
bentsherman authored Nov 5, 2024
1 parent 881881e commit 6d73833
Show file tree
Hide file tree
Showing 6 changed files with 244 additions and 31 deletions.
180 changes: 180 additions & 0 deletions BCO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Additional BCO configuration

*New in version 1.3.0*

The `bco` format supports additional "pass-through" options for certain BCO fields. These fields cannot be inferred automatically from a pipeline or run, and so must be entered through the config. External systems can use these config options to inject fields automatically.

The following config options are supported:

- `prov.formats.bco.provenance_domain.review`
- `prov.formats.bco.provenance_domain.derived_from`
- `prov.formats.bco.provenance_domain.obsolete_after`
- `prov.formats.bco.provenance_domain.embargo`
- `prov.formats.bco.usability_domain`
- `prov.formats.bco.description_domain.keywords`
- `prov.formats.bco.description_domain.xref`
- `prov.formats.bco.execution_domain.external_data_endpoints`
- `prov.formats.bco.execution_domain.environment_variables`

These options correspond exactly to fields in the BCO JSON schema. Refer to the [BCO User Guide](https://docs.biocomputeobject.org/user_guide/) for more information about these fields.

*NOTE: The `environment_variables` setting differs from the BCO standard in that it only specifies the variable names. Only the variables specified in this list will be populated in the BCO, if they are present in the execution environment.*

Here is an example config based on the BCO User Guide:

```groovy
prov {
formats {
bco {
provenance_domain {
review = [
[
"status": "approved",
"reviewer_comment": "Approved by GW staff. Waiting for approval from FDA Reviewer",
"date": "2017-11-12T12:30:48-0400",
"reviewer": [
"name": "Charles Hadley King",
"affiliation": "George Washington University",
"email": "[email protected]",
"contribution": "curatedBy",
"orcid": "https://orcid.org/0000-0003-1409-4549"
]
],
[
"status": "approved",
"reviewer_comment": "The revised BCO looks fine",
"date": "2017-12-12T12:30:48-0400",
"reviewer": [
"name": "Eric Donaldson",
"affiliation": "FDA",
"email": "[email protected]",
"contribution": "curatedBy"
]
]
]
derived_from = 'https://example.com/BCO_948701/1.0'
obsolete_after = '2118-09-26T14:43:43-0400'
embargo = [
"start_time": "2000-09-26T14:43:43-0400",
"end_time": "2000-09-26T14:43:45-0400"
]
}
usability_domain = [
"Identify baseline single nucleotide polymorphisms (SNPs)[SO:0000694], (insertions)[SO:0000667], and (deletions)[SO:0000045] that correlate with reduced (ledipasvir)[pubchem.compound:67505836] antiviral drug efficacy in (Hepatitis C virus subtype 1)[taxonomy:31646]",
"Identify treatment emergent amino acid (substitutions)[SO:1000002] that correlate with antiviral drug treatment failure",
"Determine whether the treatment emergent amino acid (substitutions)[SO:1000002] identified correlate with treatment failure involving other drugs against the same virus",
"GitHub CWL example: https://github.com/mr-c/hive-cwl-examples/blob/master/workflow/hive-viral-mutation-detection.cwl#L20"
]
description_domain {
keywords = [
"HCV1a",
"Ledipasvir",
"antiviral resistance",
"SNP",
"amino acid substitutions"
]
xref = [
[
"namespace": "pubchem.compound",
"name": "PubChem-compound",
"ids": ["67505836"],
"access_time": "2018-13-02T10:15-05:00"
],
[
"namespace": "pubmed",
"name": "PubMed",
"ids": ["26508693"],
"access_time": "2018-13-02T10:15-05:00"
],
[
"namespace": "so",
"name": "Sequence Ontology",
"ids": ["SO:000002", "SO:0000694", "SO:0000667", "SO:0000045"],
"access_time": "2018-13-02T10:15-05:00"
],
[
"namespace": "taxonomy",
"name": "Taxonomy",
"ids": ["31646"],
"access_time": "2018-13-02T10:15-05:00"
]
]
}
execution_domain {
external_data_endpoints = [
[
"url": "protocol://domain:port/application/path",
"name": "generic name"
],
[
"url": "ftp://data.example.com:21/",
"name": "access to ftp server"
],
[
"url": "http://eutils.ncbi.nlm.nih.gov/entrez/eutils",
"name": "access to e-utils web service"
]
]
environment_variables = ["HOSTTYPE", "EDITOR"]
}
}
}
}
```

Alternatively, you can use params to make it easier for an external system:

```groovy
prov {
formats {
bco {
provenance_domain {
review = params.bco_provenance_domain_review
derived_from = params.bco_provenance_domain_derived_from
obsolete_after = params.bco_provenance_domain_obsolete_after
embargo = params.bco_provenance_domain_embargo
}
usability_domain = params.bco_usability_domain
description_domain {
keywords = params.bco_description_domain_keywords
xref = params.bco_description_domain_xref
}
execution_domain {
external_data_endpoints = params.bco_execution_domain_external_data_endpoints
environment_variables = params.bco_execution_domain_environment_variables
}
}
}
}
```

This way, the pass-through options can be provided as JSON in a [params file](https://nextflow.io/docs/latest/reference/cli.html#run):

```jsonc
{
"bco_provenance_domain_review": [
// ...
],
"derived_from": "...",
"obsolete_after": "...",
"embargo": {
"start_time": "...",
"end_time": "..."
},
"bco_usability_domain": [
// ...
],
"bco_description_domain_keywords": [
// ...
],
"bco_description_domain_xref": [
// ...
],
"bco_execution_domain_external_data_endpoints": [
// ...
],
"bco_execution_domain_environment_variables": [
// ...
]
}
```
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Configuration scope for the desired output formats. The following formats are av

- `bco`: Render a [BioCompute Object](https://biocomputeobject.org/). Supports the `file` and `overwrite` options.

Visit the [BCO User Guide](https://docs.biocomputeobject.org/user_guide/) to learn more about this format and how to extend it with information that isn't available to Nextflow.
*New in version 1.3.0*: additional "pass-through" options are available for BCO fields that can't be inferred from the pipeline. See [BCO.md](./BCO.md) for more information.

- `dag`: Render the task graph as a Mermaid diagram embedded in an HTML document. Supports the `file` and `overwrite` options.

Expand Down
18 changes: 7 additions & 11 deletions plugins/nf-prov/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -56,21 +56,17 @@ sourceSets {

dependencies {
// This dependency is exported to consumers, that is to say found on their compile classpath.
compileOnly 'io.nextflow:nextflow:23.04.0'
compileOnly 'io.nextflow:nextflow:24.10.0'
compileOnly 'org.slf4j:slf4j-api:1.7.10'
compileOnly 'org.pf4j:pf4j:3.4.1'
// add here plugins depepencies
compileOnly 'org.pf4j:pf4j:3.12.0'

// test configuration
testImplementation "org.codehaus.groovy:groovy:3.0.8"
testImplementation "org.codehaus.groovy:groovy-nio:3.0.8"
testImplementation 'io.nextflow:nextflow:23.04.0'
testImplementation ("org.codehaus.groovy:groovy-test:3.0.8") { exclude group: 'org.codehaus.groovy' }
testImplementation 'io.nextflow:nextflow:24.10.0'
testImplementation ("cglib:cglib-nodep:3.3.0")
testImplementation ("org.objenesis:objenesis:3.1")
testImplementation ("org.spockframework:spock-core:2.0-M3-groovy-3.0") { exclude group: 'org.codehaus.groovy'; exclude group: 'net.bytebuddy' }
testImplementation ('org.spockframework:spock-junit4:2.0-M3-groovy-3.0') { exclude group: 'org.codehaus.groovy'; exclude group: 'net.bytebuddy' }
testImplementation ('com.google.jimfs:jimfs:1.1')
testImplementation ("org.objenesis:objenesis:3.2")
testImplementation ("org.spockframework:spock-core:2.3-groovy-4.0") { exclude group: 'org.codehaus.groovy'; exclude group: 'net.bytebuddy' }
testImplementation ('org.spockframework:spock-junit4:2.3-groovy-4.0') { exclude group: 'org.codehaus.groovy'; exclude group: 'net.bytebuddy' }
testImplementation ('com.google.jimfs:jimfs:1.2')

// see https://docs.gradle.org/4.1/userguide/dependency_management.html#sec:module_replacement
modules {
Expand Down
57 changes: 47 additions & 10 deletions plugins/nf-prov/src/main/nextflow/prov/BcoRenderer.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,14 @@ import java.time.format.DateTimeFormatter
import groovy.json.JsonOutput
import groovy.transform.CompileStatic
import nextflow.Session
import nextflow.SysEnv
import nextflow.config.Manifest
import nextflow.processor.TaskRun
import nextflow.script.WorkflowMetadata
import nextflow.util.CacheHelper

import static nextflow.config.Manifest.ContributionType

/**
* Renderer for the BioCompute Object (BCO) format.
*
Expand Down Expand Up @@ -63,10 +67,21 @@ class BcoRenderer implements Renderer {
final nextflowMeta = metadata.nextflow

final dateCreated = DateTimeFormatter.ISO_OFFSET_DATE_TIME.format(metadata.start)
final authors = (manifest.author ?: '').tokenize(',')*.trim()
final contributors = getContributors(manifest)
final nextflowVersion = nextflowMeta.version.toString()
final params = session.config.params as Map

final config = session.config
final review = config.navigate('prov.formats.bco.provenance_domain.review', []) as List<Map<String,?>>
final derived_from = config.navigate('prov.formats.bco.provenance_domain.derived_from') as String
final obsolete_after = config.navigate('prov.formats.bco.provenance_domain.obsolete_after') as String
final embargo = config.navigate('prov.formats.bco.provenance_domain.embargo') as Map<String,String>
final usability = config.navigate('prov.formats.bco.usability_domain', []) as List<String>
final keywords = config.navigate('prov.formats.bco.description_domain.keywords', []) as List<String>
final xref = config.navigate('prov.formats.bco.description_domain.xref', []) as List<Map<String,?>>
final external_data_endpoints = config.navigate('prov.formats.bco.execution_domain.external_data_endpoints', []) as List<Map<String,String>>
final environment_variables = config.navigate('prov.formats.bco.execution_domain.environment_variables', []) as List<String>

// create BCO manifest
final bco = [
"object_id": null,
Expand All @@ -75,18 +90,20 @@ class BcoRenderer implements Renderer {
"provenance_domain": [
"name": manifest.name ?: "",
"version": manifest.version ?: "",
"review": review,
"derived_from": derived_from,
"obsolete_after": obsolete_after,
"embargo": embargo,
"created": dateCreated,
"modified": dateCreated,
"contributors": authors.collect( name -> [
"contribution": ["authoredBy"],
"name": name
] ),
"license": ""
"contributors": contributors,
"license": manifest.license
],
"usability_domain": [],
"usability_domain": usability,
"extension_domain": [],
"description_domain": [
"keywords": [],
"keywords": keywords,
"xref": xref,
"platform": ["Nextflow"],
"pipeline_steps": tasks.sort( (task) -> task.id ).collect { task -> [
"step_number": task.id,
Expand All @@ -112,8 +129,12 @@ class BcoRenderer implements Renderer {
]
]
],
"external_data_endpoints": [],
"environment_variables": [:]
"external_data_endpoints": external_data_endpoints,
"environment_variables": environment_variables.inject([:]) { acc, name ->
if( SysEnv.containsKey(name) )
acc.put(name, SysEnv.get(name))
acc
}
],
"parametric_domain": params.toConfigObject().flatten().collect( (k, v) -> [
"param": k,
Expand Down Expand Up @@ -171,4 +192,20 @@ class BcoRenderer implements Renderer {
path.text = JsonOutput.prettyPrint(JsonOutput.toJson(bco))
}

private List getContributors(Manifest manifest) {
manifest.contributors.collect { c -> [
"name": c.name,
"affiliation": c.affiliation,
"email": c.email,
"contribution": c.contribution.collect { ct -> CONTRIBUTION_TYPES[ct] },
"orcid": c.orcid
] }
}

private static Map<ContributionType, String> CONTRIBUTION_TYPES = [
(ContributionType.AUTHOR) : "authoredBy",
(ContributionType.MAINTAINER) : "curatedBy",
(ContributionType.CONTRIBUTOR) : "curatedBy",
]

}
16 changes: 8 additions & 8 deletions plugins/nf-prov/src/main/nextflow/prov/DagRenderer.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ class DagRenderer implements Renderer {
}

private Map<TaskRun,Vertex> getVertices(Set<TaskRun> tasks) {
def result = [:]
for( def task : tasks ) {
Map<TaskRun,Vertex> result = [:]
for( final task : tasks ) {
final inputs = task.getInputFilesMap()
final outputs = ProvHelper.getTaskOutputs(task)

Expand Down Expand Up @@ -154,7 +154,7 @@ class DagRenderer implements Renderer {
}

// render task outputs
final outputs = [:] as Map<Path,String>
Map<Path,String> outputs = [:]

dag.vertices.each { task, vertex ->
vertex.outputs.each { path ->
Expand Down Expand Up @@ -184,11 +184,11 @@ class DagRenderer implements Renderer {
* @param vertices
*/
private Map getTaskTree(Map<TaskRun,Vertex> vertices) {
def taskTree = [:]
final taskTree = [:]

for( def entry : vertices ) {
def task = entry.key
def vertex = entry.value
for( final entry : vertices ) {
final task = entry.key
final vertex = entry.value

// infer subgraph keys from fully qualified process name
final result = getSubgraphKeys(task.processor.name)
Expand All @@ -200,7 +200,7 @@ class DagRenderer implements Renderer {

// navigate to given subgraph
def subgraph = taskTree
for( def key : keys ) {
for( final key : keys ) {
if( key !in subgraph )
subgraph[key] = [:]
subgraph = subgraph[key]
Expand Down
2 changes: 1 addition & 1 deletion plugins/nf-prov/src/resources/META-INF/MANIFEST.MF
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ Plugin-Id: nf-prov
Plugin-Version: 1.2.4
Plugin-Class: nextflow.prov.ProvPlugin
Plugin-Provider: nextflow
Plugin-Requires: >=23.04.0
Plugin-Requires: >=24.10.0

0 comments on commit 6d73833

Please sign in to comment.