Implement API endpoint that returns TSV report about submissions that are pending review #1443

cristina-stonepedraza · 2024-11-12T22:55:20Z

Add a script to api.py and a query to crud.py to pull information from the submission portal database, and then generate a report of NMDC submissions that have been submitted as a TSV.

This PR references issue 2047 in nmdc-schema: microbiomedata/nmdc-schema#2047

…o crud.py

…an up function

eecavanna

Hi @cristina-stonepedraza, I think this looks great overall! the code looks clean to me and I had no issues following it.

I left some feedback about path, function, and variable naming, as well as one about the endpoint description. In case you have any questions about these, you can message me here/Slack/etc.

There is one more thing I want to see in this PR:

An automated test

~~I'll send you information about that on Slack.~~ I posted information about that below.

eecavanna · 2024-11-13T23:56:16Z

nmdc_server/api.py

+    r"""
+    Generate a report of NMDC submissions that are submitted pending review to allow review
+    of submission triads and evaluate for approval
+    """


I revised the endpoint description after looking at the code. Here's what I came up with:

Suggested change

r"""

Generate a report of NMDC submissions that are submitted pending review to allow review

of submission triads and evaluate for approval

"""

r"""

Generate a TSV-formatted report of biosamples belonging to submissions

that have a status of "Submitted- Pending Review".

The report indicates which environmental package/extension, broad scale,

local scale, and medium are specified for each biosample. The report is

designed to facilitate the review of submissions by NMDC team members.

"""

eecavanna · 2024-11-13T23:58:25Z

nmdc_server/api.py

@@ -637,6 +637,97 @@ async def download_zip_file(
    )


+@router.get(
+    "/metadata_submission/mixs",


I recommend updating the URL path so that the final part describes what is being GET-ed (i.e. gotten/retrieved via the endpoint). In this case, I think of it as a "report" (I'm not too familiar with MIxS and how it relates to what the report contains; my guess is that it is the name of an ontology that the values in the "environment..." fields come from).

Suggested change

"/metadata_submission/mixs",

"/metadata_submission/mixs_report",

eecavanna · 2024-11-14T00:00:20Z

nmdc_server/api.py

+
+        # Get sample names from each sample type
+        for sample_type in sample_data:
+            samples = sample_data[sample_type] if sample_type in sample_data else {}


Assuming samples is typically a list (as opposed to a dictionary), I'd recommend defaulting to an empty list (instead of an empty dictionary) here.

Suggested change

samples = sample_data[sample_type] if sample_type in sample_data else {}

samples = sample_data[sample_type] if sample_type in sample_data else []

eecavanna · 2024-11-14T00:01:24Z

nmdc_server/api.py

+            for x in samples:
+
+                # Get the sample name
+                name = x["samp_name"] if "samp_name" in x else {}


Assuming name normally contains a string, I'd recommend defaulting to an empty string here (instead of an empty dictionary).

Suggested change

name = x["samp_name"] if "samp_name" in x else {}

name = x["samp_name"] if "samp_name" in x else ""

I have the same (analogous) thought about broad, local, and medium below.

eecavanna · 2024-11-14T00:06:01Z

nmdc_server/api.py

+                # Get the env local scale
+                local = x["env_local_scale"] if "env_local_scale" in x else {}
+                local = str(local)
+                local = local.replace("\t", "").replace("\r", "").replace("\n", "").lstrip("_")


I'd prefer the variable not be named local—for two reasons:

The word "local" can have a special meaning in programming (i.e. local versus global) and I think future readers (of this code or stack traces involving this code) will find it confusing (at first glance).

I think "local" is an adjective describing "scale". Based on that, I think the value this variable will contain is some kind of "scale".

With those things in mind, I'd recommend renaming this variable to local_scale (or even env_local_scale).

I have the same thought (analogous) about broad.

eecavanna · 2024-11-14T00:07:54Z

nmdc_server/crud.py

@@ -686,6 +686,19 @@ def get_query_for_all_submissions(db: Session):
    return all_submissions


+def get_query_for_submitted_pending_review(db: Session):
+    r"""
+    Returns a SQLAlchemy query that can be used to retrieve submissions pending review.


👍 Thanks for including a docstring!

eecavanna · 2024-11-14T00:09:16Z

nmdc_server/crud.py

@@ -686,6 +686,19 @@ def get_query_for_all_submissions(db: Session):
    return all_submissions


+def get_query_for_submitted_pending_review(db: Session):


I'd prefer this be named...

- def get_query_for_submitted_pending_review(db: Session): + def get_query_for_submitted_pending_review_submissions(db: Session):

...as a way of indicating that the function can be used to get submissions.

eecavanna · 2024-11-14T00:16:00Z

Hi @cristina-stonepedraza, here's an example of an automated test that targets an endpoint like this one:

nmdc-server/tests/test_submission.py

Lines 41 to 125 in 36dbba5

    
           def test_get_metadata_submissions_report_as_non_admin( 
        
               db: Session, client: TestClient, logged_in_user 
        
           ): 
        
               response = client.request(method="GET", url="/api/metadata_submission/report") 
        
               assert response.status_code == 403 
        
           def test_get_metadata_submissions_report_as_admin( 
        
               db: Session, client: TestClient, logged_in_admin_user 
        
           ): 
        
               now = datetime.utcnow() 
        
               # Create two submissions, only one of which is owned by the logged-in user. 
        
               logged_in_user = logged_in_admin_user  # allows us to reuse some code snippets 
        
               submission = fakes.MetadataSubmissionFactory( 
        
                   author=logged_in_user, 
        
                   author_orcid=logged_in_user.orcid, 
        
                   created=now, 
        
               ) 
        
               fakes.SubmissionRoleFactory( 
        
                   submission=submission, 
        
                   submission_id=submission.id, 
        
                   user_orcid=logged_in_user.orcid, 
        
                   role=SubmissionEditorRole.owner, 
        
               ) 
        
               other_user = fakes.UserFactory() 
        
               other_submission = fakes.MetadataSubmissionFactory( 
        
                   author=other_user, 
        
                   author_orcid=other_user.orcid, 
        
                   created=now + timedelta(seconds=1), 
        
                   metadata_submission={ 
        
                       "studyForm": { 
        
                           "studyName": "My study name", 
        
                           "piName": "My PI name", 
        
                           "piEmail": "My PI email", 
        
                       }, 
        
                   }, 
        
                   status="in-progress", 
        
                   source_client="field_notes", 
        
               ) 
        
               db.commit() 
        
               response = client.request(method="GET", url="/api/metadata_submission/report") 
        
               assert response.status_code == 200 
        
               # Confirm the response payload is a TSV file having the fields and values we expect; 
        
               # i.e. below its header row, it has two data rows, each representing a submission, 
        
               # ordered from most recently-created to least recently-created. 
        
               # Reference: https://docs.python.org/3/library/csv.html#csv.DictReader 
        
               fieldnames = [ 
        
                   "Submission ID", 
        
                   "Author ORCID", 
        
                   "Author Name", 
        
                   "Study Name", 
        
                   "PI Name", 
        
                   "PI Email", 
        
                   "Source Client", 
        
                   "Status", 
        
               ] 
        
               reader = DictReader(response.text.splitlines(), fieldnames=fieldnames, delimiter="\t") 
        
               rows = [row for row in reader] 
        
               assert len(rows) == 3  # includes the header row 
        
               header_row = rows[0]  # gets the header row 
        
               assert len(list(header_row.keys())) == len(fieldnames) 
        
               data_row = rows[1]  # gets the first data row (the most recently-created submission) 
        
               assert data_row["Submission ID"] == str(other_submission.id) 
        
               assert data_row["Author ORCID"] == other_user.orcid 
        
               assert data_row["Author Name"] == other_user.name 
        
               assert data_row["Study Name"] == "My study name" 
        
               assert data_row["PI Name"] == "My PI name" 
        
               assert data_row["PI Email"] == "My PI email" 
        
               assert data_row["Source Client"] == "field_notes" 
        
               assert data_row["Status"] == "in-progress" 
        
               data_row = rows[2]  # gets the second data row 
        
               assert data_row["Submission ID"] == str(submission.id) 
        
               assert data_row["Author ORCID"] == logged_in_user.orcid 
        
               assert data_row["Author Name"] == logged_in_user.name 
        
               assert data_row["Study Name"] == "" 
        
               assert data_row["PI Name"] == "" 
        
               assert data_row["PI Email"] == "" 
        
               assert data_row["Source Client"] == ""  # upstream faker lacks `source_client` attribute 
        
               assert data_row["Status"] == "In Progress"  # matches value in upstream faker

There are actually two tests there because the endpoint in question is only accessible to admins. One of the tests focuses on that aspect (i.e. security).

By the way, I do wonder whether we will restrict access to this reporting endpoint also. You/we can discuss that with @mslarae13.

cristina-stonepedraza added 4 commits November 1, 2024 14:33

Add new MIxS report API endpoint to api.py, add corresponding query t…

8ac6047

…o crud.py

Pull all sample names from submission data

a5ead95

Retrieve environmental broad, local, medium, and pacakge attributes

f346260

Reformat table when parsing values, strip unnecessary characters, cle…

a87d1ee

…an up function

cristina-stonepedraza requested a review from eecavanna November 12, 2024 22:55

cristina-stonepedraza self-assigned this Nov 12, 2024

cristina-stonepedraza marked this pull request as draft November 12, 2024 22:58

cristina-stonepedraza removed the request for review from eecavanna November 12, 2024 22:58

cristina-stonepedraza added 3 commits November 13, 2024 14:00

Address linting test failures in server checks

9d5e66b

Address linting test failures in server checks

9d82a80

Run black locally to reformat files

fe47244

cristina-stonepedraza requested a review from eecavanna November 13, 2024 22:39

eecavanna reviewed Nov 14, 2024

View reviewed changes

eecavanna changed the title ~~2047 mixs endpoint~~ Implement API endpoint that returns TSV report about submissions that are pending review Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement API endpoint that returns TSV report about submissions that are pending review #1443

Implement API endpoint that returns TSV report about submissions that are pending review #1443

cristina-stonepedraza commented Nov 12, 2024 •

edited

Loading

eecavanna left a comment •

edited

Loading

eecavanna Nov 13, 2024

eecavanna Nov 13, 2024

eecavanna Nov 14, 2024

eecavanna Nov 14, 2024

eecavanna Nov 14, 2024

eecavanna Nov 14, 2024

eecavanna Nov 14, 2024

eecavanna commented Nov 14, 2024

-    r"""
-    Generate a report of NMDC submissions that are submitted pending review to allow review
-    of submission triads and evaluate for approval
-    """
+    r"""
+    Generate a TSV-formatted report of biosamples belonging to submissions
+    that have a status of "Submitted- Pending Review".
+    The report indicates which environmental package/extension, broad scale,
+    local scale, and medium are specified for each biosample. The report is
+    designed to facilitate the review of submissions by NMDC team members.
+    """

	"/metadata_submission/mixs",
	"/metadata_submission/mixs_report",

	samples = sample_data[sample_type] if sample_type in sample_data else {}
	samples = sample_data[sample_type] if sample_type in sample_data else []

	name = x["samp_name"] if "samp_name" in x else {}
	name = x["samp_name"] if "samp_name" in x else ""

		@@ -686,6 +686,19 @@ def get_query_for_all_submissions(db: Session):
		return all_submissions


		def get_query_for_submitted_pending_review(db: Session):

Implement API endpoint that returns TSV report about submissions that are pending review #1443

Are you sure you want to change the base?

Implement API endpoint that returns TSV report about submissions that are pending review #1443

Conversation

cristina-stonepedraza commented Nov 12, 2024 • edited Loading

eecavanna left a comment • edited Loading

Choose a reason for hiding this comment

eecavanna Nov 13, 2024

Choose a reason for hiding this comment

eecavanna Nov 13, 2024

Choose a reason for hiding this comment

eecavanna Nov 14, 2024

Choose a reason for hiding this comment

eecavanna Nov 14, 2024

Choose a reason for hiding this comment

eecavanna Nov 14, 2024

Choose a reason for hiding this comment

eecavanna Nov 14, 2024

Choose a reason for hiding this comment

eecavanna Nov 14, 2024

Choose a reason for hiding this comment

eecavanna commented Nov 14, 2024

cristina-stonepedraza commented Nov 12, 2024 •

edited

Loading

eecavanna left a comment •

edited

Loading