Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biological and physical data not matched up for a number of data sets #25

Open
clayton33 opened this issue Apr 13, 2021 · 2 comments
Open

Comments

@clayton33
Copy link
Contributor

clayton33 commented Apr 13, 2021

When a number of datasets were created, specifically those that use data from station occupations, the biological (BO) and physical (PO) data were combined simply by dplyr::bind_rows. This can be seen in the code, but also when looking at the data, see following reprex example below

library(azmpdata)
#> 
#>  casaultb/azmpdata status:
#>  (Package ver: 0.2019.0.9100) Up to date
#>  (Data ver:2021-01-14 ) is up to date
data("Discrete_Occupations_Stations")
library(oce)
#> Loading required package: gsw
#> Loading required package: testthat
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
d <- Discrete_Occupations_Stations
# example of using HL2 data from 2010 for a certain day
okrow <- d[['station']] %in% 'HL2' & d[['year']] %in% 2010 & d[['month']] %in% 1 & d[['day']] %in% 6
dd <- as.data.frame(d[okrow, ])
dd
#>      station latitude longitude year month day             event_id
#> 2345     HL2 44.26666 -63.31666 2010     1   6 18VA10666_2010666001
#> 2346     HL2 44.26666 -63.31666 2010     1   6 18VA10666_2010666001
#> 2347     HL2 44.26666 -63.31666 2010     1   6 18VA10666_2010666001
#> 2348     HL2 44.26666 -63.31666 2010     1   6 18VA10666_2010666001
#> 2349     HL2 44.26666 -63.31666 2010     1   6 18VA10666_2010666001
#> 2350     HL2 44.26666 -63.31666 2010     1   6 18VA10666_2010666001
#> 2351     HL2 44.26666 -63.31666 2010     1   6 18VA10666_2010666001
#> 2352     HL2 44.26666 -63.31666 2010     1   6 18VA10666_2010666001
#> 2353     HL2 44.26666 -63.31666 2010     1   6 18VA10666_2010666001
#> 2354     HL2 44.26666 -63.31666 2010     1   6 18VA10666_2010666001
#> 9595     HL2 44.26670 -63.31670 2010     1   6                 <NA>
#> 9596     HL2 44.26670 -63.31670 2010     1   6                 <NA>
#> 9597     HL2 44.26670 -63.31670 2010     1   6                 <NA>
#> 9598     HL2 44.26670 -63.31670 2010     1   6                 <NA>
#> 9599     HL2 44.26670 -63.31670 2010     1   6                 <NA>
#> 9600     HL2 44.26670 -63.31670 2010     1   6                 <NA>
#> 9601     HL2 44.26670 -63.31670 2010     1   6                 <NA>
#> 9602     HL2 44.26670 -63.31670 2010     1   6                 <NA>
#> 9603     HL2 44.26670 -63.31670 2010     1   6                 <NA>
#>              sample_id depth nominal_depth chlorophyll nitrate phosphate
#> 2345 2010666001_306670     1             1       0.453   3.921     0.709
#> 2346 2010666001_306669     5             5       0.396   3.899     0.695
#> 2347 2010666001_306668    10            10       0.404   3.963     0.704
#> 2348 2010666001_306667    20            20       0.404   3.900     0.694
#> 2349 2010666001_306666    30            30       0.388   3.820     0.690
#> 2350 2010666001_306665    40            40       0.356   3.798     0.673
#> 2351 2010666001_306664    50            50       0.356   3.632     0.660
#> 2352 2010666001_306663    75            75       0.247   3.870     0.688
#> 2353 2010666001_306662   100           100       0.117   5.800     0.834
#> 2354 2010666001_306661   140           140       0.082   9.479     1.054
#> 9595              <NA>     0            NA          NA      NA        NA
#> 9596              <NA>    10            NA          NA      NA        NA
#> 9597              <NA>    20            NA          NA      NA        NA
#> 9598              <NA>    30            NA          NA      NA        NA
#> 9599              <NA>    50            NA          NA      NA        NA
#> 9600              <NA>    75            NA          NA      NA        NA
#> 9601              <NA>   100            NA          NA      NA        NA
#> 9602              <NA>   125            NA          NA      NA        NA
#> 9603              <NA>   150            NA          NA      NA        NA
#>      silicate cruiseNumber sea_temperature salinity sigmaTheta descriptor
#> 2345    5.410         <NA>              NA       NA         NA       <NA>
#> 2346    5.408         <NA>              NA       NA         NA       <NA>
#> 2347    5.359         <NA>              NA       NA         NA       <NA>
#> 2348    5.343         <NA>              NA       NA         NA       <NA>
#> 2349    4.826         <NA>              NA       NA         NA       <NA>
#> 2350    4.646         <NA>              NA       NA         NA       <NA>
#> 2351    4.540         <NA>              NA       NA         NA       <NA>
#> 2352    4.492         <NA>              NA       NA         NA       <NA>
#> 2353    7.244         <NA>              NA       NA         NA       <NA>
#> 2354   10.417         <NA>              NA       NA         NA       <NA>
#> 9595       NA   BCD2010666            3.18    30.78      24.50  18VA10666
#> 9596       NA   BCD2010666            3.18    30.77      24.50  18VA10666
#> 9597       NA   BCD2010666            3.18    30.77      24.50  18VA10666
#> 9598       NA   BCD2010666            3.38    30.94      24.61  18VA10666
#> 9599       NA   BCD2010666            3.75    31.20      24.78  18VA10666
#> 9600       NA   BCD2010666            3.92    31.32      24.87  18VA10666
#> 9601       NA   BCD2010666            3.96    31.79      25.24  18VA10666
#> 9602       NA   BCD2010666            4.29    32.36      25.66  18VA10666
#> 9603       NA   BCD2010666            5.54    33.05      26.07  18VA10666
Created on 2021-04-13 by the reprex package (v2.0.0)

A potential way of combining is by common metadata in both the BO and PO data, specifically the year, month, day, depth, station and mission descriptor. The biological data has a mission descriptor that follows one naming convention, and the physical another (generally referred to as the cruise number). Effort has already been made to create a lookup table to match them up. Note that two look up tables were created, one that is auto generated by running some code, and another one that was manually created to fill in the gaps for known matches that weren't identified in the database. I will follow up by providing a list of datasets where this should be done.

@clayton33
Copy link
Contributor Author

Datasets where the BO and PO data should be better merged

  • Discrete_Occupations_Sections
  • Discrete_Occupations_Stations

@clayton33
Copy link
Contributor Author

clayton33 commented Apr 15, 2021

This exercise so far has revealed a few things.

  1. The naming of the Cabot Strait stations in the physical data was incorrect, it was, for example, it was CS1, when it should be CSL1. I have made the changes in my code to produce the data, and it has been pushed to the ftp site.
  2. The biological data in Discrete_Occupations_Sections contains the three duplicate measurements at a given station and depth. My understanding of what this data should include are the results presented in the figures in the RES doc, so the averaged (?) values. If we're including all biological measurements, then why aren't we providing all the physical data, here all meaning 1dbar resolution data. Maybe @casaultb could clarify on the decision regarding the biological data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant