Use `read_csv_to_dataframe` in validation #1419

emlys · 2023-09-26T16:54:54Z

Description

Fixes #1379

This sets us up for #327.

Split out some of the functionality of utils.read_csv_to_dataframe in a new function, validation.get_validated_dataframe. This new function enforces a contract: given a CSV path and a spec, get_validated_dataframe will either return a dataframe that matches the spec, or raise an error if that's not possible. utils.read_csv_to_dataframe now handles the lower-level CSV parsing details.
validation.get_validated_dataframe is used both in validation and in execute. If it raises an error during validation, that error message will be returned as a validation message. If it raises an error in execute, that error will be raised. This is convenient because we get the same message in either context without duplicating.
All tables are now read in via utils.read_csv_to_dataframe (indirectly through validation.get_validated_dataframe in most cases, or directly in the few places (HRA criteria table, Wave Energy machine performance table)). This lets us apply the same basic rules to all CSVs, such as encoding, dropping empty rows/columns, and lower-casing column names.

Checklist

Updated HISTORY.rst and link to any relevant issue (if these changes are user-facing)
Updated the user's guide (if needed)
Tested the Workbench UI (if relevant)

…aframe

emlys · 2023-09-26T17:12:48Z

src/natcap/invest/coastal_vulnerability.py

@@ -462,6 +462,13 @@
                                    "type": "integer",
                                    "about": "Shore point ID"
                                },
+                                "R_hab": {


Moving this up so that the r_hab column is matched before we get to the [HABITAT] pattern, which matches everything.

Thanks for pointing this out! Could you add an inline comment about this so we don't forget in the future?

…d_dataframe

emlys · 2023-11-08T17:57:58Z

src/natcap/invest/wave_energy.py

-                    "units": u.kilowatt
-                },
-                "hsmax": {
+            "columns": {


I switched this to columns so that we don't have to validate the individual rows, as discussed on the last team call.

Yep, makes sense! Could you add an inline comment about this so we are sure to remember?

phargogh

Thanks @emlys ! I really like the new contract being enforced here in the validated dataframe ... really cleans things up conceptually.

I just had a few minor comments and suggestions, and then would you object to adding a note to HISTORY about this? I think the main thing that might be worth mentioning there is something along the lines of improving consistency in validation of tables, which should improve the readability of validation errors. Just an idea ... I defer to you on it!

phargogh · 2023-11-08T18:49:17Z

src/natcap/invest/utils.py

-def read_csv_to_dataframe(path, spec, **kwargs):
+def read_csv_to_dataframe(path, **kwargs):


Now that we're removing the spec, I think this docstring becomes much, much simpler! Could you update the docstring to reflect the current state of this function?

phargogh · 2023-11-08T19:09:00Z

src/natcap/invest/validation.py

+                elif col_spec['type'] == 'boolean':
+                    df[col] = df[col].astype('boolean')


Should we have an else base case, in case something falls through the other cases?

Sure! That should never happen with our code because the model specs are tested, but it would be good to have in case of plugins.

src/natcap/invest/validation.py

phargogh · 2023-11-08T19:27:54Z

src/natcap/invest/coastal_vulnerability.py

@@ -462,6 +462,13 @@
                                    "type": "integer",
                                    "about": "Shore point ID"
                                },
+                                "R_hab": {


Thanks for pointing this out! Could you add an inline comment about this so we don't forget in the future?

phargogh · 2023-11-08T20:00:39Z

src/natcap/invest/wave_energy.py

-                    "units": u.kilowatt
-                },
-                "hsmax": {
+            "columns": {


Yep, makes sense! Could you add an inline comment about this so we are sure to remember?

tests/test_habitat_quality.py

Co-authored-by: James Douglass <[email protected]>

…/1379

emlys · 2023-11-09T22:41:25Z

@phargogh I think I addressed all your comments

phargogh

Looks great, thank you!

emlys added 2 commits September 25, 2023 14:29

refactor parts of read_csv_to_dataframe to new func get_validated_dat…

39f4aca

…aframe

tests passing

b4ea905

emlys commented Sep 26, 2023

View reviewed changes

emlys added 7 commits September 26, 2023 10:17

clean up

12a6b63

remove dropping empty rows from read_csv_to_dataframe to get_validate…

6ee7d38

…d_dataframe

remove duplicated kwarg

b865b1a

read in wave energy tables with row headers lowercased

8f6c5e1

Merge branch 'main' into task/1379

6e45494

update machine param table spec to opt out of row validation

6c66244

clean up

d4aa41f

emlys self-assigned this Nov 7, 2023

emlys added 3 commits November 7, 2023 15:29

update user guide commit hash

9c814af

handle any type of exception in dataframe type casting

99db5f5

update user guide commit hash

71b09a1

emlys marked this pull request as ready for review November 8, 2023 17:32

emlys requested a review from phargogh November 8, 2023 17:32

emlys commented Nov 8, 2023

View reviewed changes

phargogh requested changes Nov 8, 2023

View reviewed changes

emlys and others added 2 commits November 8, 2023 13:04

Update src/natcap/invest/validation.py

9428057

Co-authored-by: James Douglass <[email protected]>

Update tests/test_habitat_quality.py

5d5c4ef

Co-authored-by: James Douglass <[email protected]>

emlys mentioned this pull request Nov 8, 2023

Abstraction for model input pre-processing #1451

Open

emlys and others added 5 commits November 8, 2023 15:55

clean up for natcap#1419

1effbb6

add history note natcap#1379

0891e97

Merge branch 'main' into task/1379

bec9cc1

fix syntax errors

3ad9f01

Merge branch 'task/1379' of https://github.com/emlys/invest into task…

c2e13ea

…/1379

emlys requested a review from phargogh November 9, 2023 22:41

phargogh enabled auto-merge November 9, 2023 22:55

phargogh approved these changes Nov 9, 2023

View reviewed changes

phargogh merged commit 6769776 into natcap:main Nov 9, 2023
25 checks passed

emlys deleted the task/1379 branch October 3, 2024 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `read_csv_to_dataframe` in validation #1419

Use `read_csv_to_dataframe` in validation #1419

emlys commented Sep 26, 2023 •

edited

Loading

emlys Sep 26, 2023

phargogh Nov 8, 2023

emlys Nov 8, 2023

phargogh Nov 8, 2023

phargogh left a comment

phargogh Nov 8, 2023

emlys Nov 8, 2023

phargogh Nov 8, 2023

emlys Nov 8, 2023

phargogh Nov 8, 2023

phargogh Nov 8, 2023

emlys commented Nov 9, 2023

phargogh left a comment

		def read_csv_to_dataframe(path, spec, **kwargs):
		def read_csv_to_dataframe(path, **kwargs):

		elif col_spec['type'] == 'boolean':
		df[col] = df[col].astype('boolean')

Use read_csv_to_dataframe in validation #1419

Use read_csv_to_dataframe in validation #1419

Conversation

emlys commented Sep 26, 2023 • edited Loading

Description

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phargogh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emlys commented Nov 9, 2023

phargogh left a comment

Choose a reason for hiding this comment

Use `read_csv_to_dataframe` in validation #1419

Use `read_csv_to_dataframe` in validation #1419

emlys commented Sep 26, 2023 •

edited

Loading