Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roslin qc #151

Open
wants to merge 46 commits into
base: develop
Choose a base branch
from
Open

Roslin qc #151

wants to merge 46 commits into from

Conversation

stevekm
Copy link
Member

@stevekm stevekm commented Feb 24, 2020

Roslin QC Operator

@stevekm
Copy link
Member Author

stevekm commented Feb 24, 2020

Travis CI Fails with ModuleNotFoundError: No module named 'gitdb.utils.compat'; @sivkovic what was the fix for this?

@stevekm stevekm requested a review from sivkovic February 24, 2020 17:53
@sivkovic
Copy link
Collaborator

@stevekm fix for this was version bump of GitPython https://github.com/mskcc/beagle/blob/develop/requirements.txt#L11. Fix is already on develop. Did you merge develop back to your branch?

@stevekm
Copy link
Member Author

stevekm commented Feb 24, 2020

needs to be updated with the new Roslin CWL output format for operator input

@stevekm
Copy link
Member Author

stevekm commented Feb 24, 2020

This is blocked by #154 until we can figure out a way to get copies of the updated Roslin pipeline output format into Beagle.

@Harmon758
Copy link

Travis CI Fails with ModuleNotFoundError: No module named 'gitdb.utils.compat'; @sivkovic what was the fix for this?

@stevekm fix for this was version bump of GitPython https://github.com/mskcc/beagle/blob/develop/requirements.txt#L11. Fix is already on develop. Did you merge develop back to your branch?

See gitpython-developers/GitPython#983.

Test that the API Run Create Serializer works and creates a Run
"""
# start with 0 runs in the database
self.assertEqual(len(Run.objects.all()), 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes much more sense to be tested through POST /v0/run/api/, but we can leave it for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you POST to <localhost:1234>/v0/run/api/ you are going to get the real server, not the test-instance.


# merge the dict's; the second dict will overwrite args from the first
normal = {**normal_r1, **normal_r2}
normal['R1'] = file_to_cwl(pair['normal']['R1'].file)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this converted to CWL format? Operator needs to create a job where files are referenced by uri, which means they should be specified with the location instead of the path and uri should be either in format juno://<file_path> or bid://<file_id_in_beagle>.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not clear when I first started building the Operator, but the code works well for producing the CWL output needed in order to develop the Operator so I am not sure what to do with it now that the Operator has been refactored to output Job JSON instead of CWL JSON

runner/operator/roslin_qc_operator/bin/input.py Outdated Show resolved Hide resolved
runparams['pi_email'] = "NA"
return(runparams)

def get_db_files(assay, references_json = "runner/operator/roslin_operator/reference_jsons/roslin_resources.json"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8. There shouldn't be in kwargs between =
def get_db_files(assay, references_json="runner/operator/roslin_operator/reference_jsons/roslin_resources.json"):

runner/operator/roslin_qc_operator/bin/input.py Outdated Show resolved Hide resolved
db_files['ref_fasta'] = references["request_files"]['ref_fasta']

# need to resolve some items to URI's
for key in ['fp_genotypes', 'hotspot_list_maf', 'ref_fasta']:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these files specified in references_json runner/operator/roslin_operator/reference_jsons/roslin_resources.json? If they are why do we have this logic to check is it a path or uri?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because some of the paths must be sent to CWL as character strings because they correspond to files inside containers, while other items must be passed as CWL File records and must be sent as URI's for Beagle to convert back to CWL.

runner/operator/roslin_qc_operator/roslin_qc_operator.py Outdated Show resolved Hide resolved
Operator.__init__(self, model, request_id = request_id, run_ids = run_ids)

def get_pipeline_id(self):
return "9b7f2ac8-03a5-4c44-ae87-1d9f6500d19a"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this anymore as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to remove it and things started breaking. Not sure why I think it has to do with the recent changes to how Operators work.

@@ -95,6 +96,9 @@ def process_triggers():
@shared_task
def create_run_task(run_id, inputs, output_directory=None):
logger.info("Creating and validating Run")
if settings.DUMP_JSON:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like putting this in a task. Prints like this for debug are fine, but I think we should remove them from production code. I don't like the idea of generating files in which are not used anywere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for development, debug, and manual run setup as per #162 and #150. Otherwise there is no straightforward method to actually see what your Job JSON and final output CWL look like. Its easier to leave it in as a feature than to have to re-add it every single time you need to use it for debugging.

for request_id in request_ids:
logging.info("Submitting requestId %s to pipeline %s" % (request_id, pipeline_name))
create_jobs_from_request.delay(request_id, pipeline.operator_id)
if pipeline_name == "roslin-qc":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we handling roslin-qc operator the same way as any other operator? request_handlers shouldn't have any operator specific logic in them.

Copy link
Member Author

@stevekm stevekm Feb 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because Roslin QC Operator requires that all request_ids and run_ids be passed, while other Operators only currently support a single request_id arg. Trying to refactor all other Operators to be able to use this would take a lot more work. Also, its less confusing when the logic for dictating the handling behavior is placed up front here near the endpoint/view, instead of buried way down inside some other module.

@stevekm
Copy link
Member Author

stevekm commented Mar 4, 2020

@sivkovic the copy outputs operator is failing the tests, I am not sure what is wrong with it; https://travis-ci.com/mskcc/beagle/builds/151844563#L2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants