Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create function to predicted biomass against DBH, by species #73

Closed
teixeirak opened this issue Mar 12, 2019 · 63 comments
Closed

create function to predicted biomass against DBH, by species #73

teixeirak opened this issue Mar 12, 2019 · 63 comments

Comments

@teixeirak
Copy link
Member

@maurolepore,

In order to check current calculations, and for future users to be able to visualize what allodb is giving in terms of predicted biomass, we'll want to make the following plot type:
x-axis: DBH (cm)
y-axis: biomass (kg)
one colored line or series of points for each species at a site, spanning the range of sizes observed at the site

@maurolepore
Copy link
Member

maurolepore commented Mar 12, 2019

Thanks!

spanning the range of sizes observed at the site

How would you cut the dhb range for each species? Here are some alternatives:

dbh_species1 <- c(11.1, 11.3, 15, 25.8, 25.9, 27, 30.1, 33)

ggplot2::cut_number(dbh_species1, 3)
#> [1] [11.1,18.6] [11.1,18.6] [11.1,18.6] (18.6,26.6] (18.6,26.6] (26.6,33]  
#> [7] (26.6,33]   (26.6,33]  
#> Levels: [11.1,18.6] (18.6,26.6] (26.6,33]

ggplot2::cut_width(dbh_species1, 3)
#> [1] [10.5,13.5] [10.5,13.5] (13.5,16.5] (25.5,28.5] (25.5,28.5] (25.5,28.5]
#> [7] (28.5,31.5] (31.5,34.5]
#> 8 Levels: [10.5,13.5] (13.5,16.5] (16.5,19.5] (19.5,22.5] ... (31.5,34.5]

ggplot2::cut_interval(dbh_species1, 3)
#> [1] [11.1,18.4] [11.1,18.4] [11.1,18.4] (25.7,33]   (25.7,33]   (25.7,33]  
#> [7] (25.7,33]   (25.7,33]  
#> Levels: [11.1,18.4] (18.4,25.7] (25.7,33]

Created on 2019-03-12 by the reprex package (v0.2.1)

@teixeirak
Copy link
Member Author

We don't want to bin sizes-- just display a continuous linear function of biomass as a function of DBH. It could be a scatter plot or a line plot of the equation. I'm including an example of how I'd expect this to look for SCBI, with the modification that I'd like a separate color for each species + legend.

scbi_Allometries

@maurolepore
Copy link
Member

I'm tagging this as high priority because it's a good way to engage reviewers, which is crucial in the development process.

@maurolepore
Copy link
Member

maurolepore commented Mar 13, 2019

Here are two plots similar to what you describe, except that here I'm not using species names but species codes. Notice that agb is the one that comes with the dataset -- not the one we calculate. That'll come a little later.

ALTERNATIVE 1

image

ALTERNATIVE 2

image

image

image

@teixeirak
Copy link
Member Author

I'd actually like to have both. The first is valuable for picking out anomolies, the second for seeing what's going on with each species.

@teixeirak
Copy link
Member Author

teixeirak commented Mar 13, 2019 via email

@maurolepore
Copy link
Member

maurolepore commented Mar 14, 2019

@teixeirak and @gonzalezeb,

Here is an article exploring dbh vs. biomass (compared to dbh vs. agb). Please review and comment if this makes sense.

https://github.com/forestgeo/fgeo.biomass/blob/master/.buildignore/dbh-vs-biomass.md

Notice that we still lack some features that should produce better results. For example, For each row in a census dataset, the code currently sums the biomass for all equaitons associated to it (based on site and species). This should be correct except when the mulitple equations refer not to different parts of a tree but refer to trees of different diameter. We still don't handle dbh-specific equations.

Also, some species have no value and I'll be exploring why.

@maurolepore
Copy link
Member

maurolepore commented Mar 14, 2019

BTW,

  • I don't think the word "prediction" applies in this analysis. Am I missinterpreting what you want? Is the issue-title accurate?

  • This is the kind of anlaysis that I think an intern could start doing right now. Later, the code will have nicer functions and more features, but it would be great to have someone closer to the biology of these trees checking for scientific correctness as the code evolves.

@teixeirak
Copy link
Member Author

To clarify, by biomass do you mean total aboveground biomass from allodb? Whereas 'agb' is what's currently in the SCBI data table (calculated based off tropical allometry)?

@teixeirak
Copy link
Member Author

@gonzalezeb, assuming my interpretation there is correct, it appears that our equations for Platanus occidentalis and Nyssa sylvatica are off. Can you please check?

@maurolepore
Copy link
Member

maurolepore commented Mar 14, 2019

To clarify, by biomass do you mean total aboveground biomass from allodb? Whereas 'agb' is what's currently in the SCBI data table (calculated based off tropical allometry)?

Yes,

  • biomass: Calculated based on equations from allodb.
  • agb: From the agb column in the SCBI data -- presumably calculated using equations for tropical species. Should be a good reference against which to compare the results using equations from allodb.

@teixeirak
Copy link
Member Author

teixeirak commented Mar 14, 2019

Okay, I take it your example plot (# 1 below; from the link Mauro sent) would look more like the example I sent (# 2 below; made by Valentine based on Erika's compilation of equations) when you include larger diameters (but exclude Nyssa sylvatica and Platanus occidentalis)?
@gonzalezeb, plotting on the scale of # 1 below highlights huge divergence in predicted biomass for trees of ~50cm, both based on our allometries and the tropical allometries. Does this make sense? It's hard to tell which species are in which group. Oaks are definitely in the higher-biomass group, and maybe pines in the lower group? I'd assume this is driven by wood density?

image

image

@teixeirak
Copy link
Member Author

Regarding review of these equations, I do not see that as the role for an intern. @maurolepore, you've already produced code to generate these plots. @gonzalezeb and I (to a lesser extent) should be the ones reviewing these plots to make sure output looks reasonable. In the longer term, issue #16 calls for code to flag equations that give unreasonable output. Right now/ to start (and maybe this is all we'll ever want), it may be worth writing a very simple script where each equation will be evaluated at ~3 dbh values (e.g., 1 cm, 50 cm, 100 cm; counting only those within range of the equation's DBH limits) and equations flagged if they don't fall within a range that we consider reasonable for that size (@gonzalezeb could provide guidance on those threshholds).

@maurolepore
Copy link
Member

maurolepore commented Mar 14, 2019

huge divergence in predicted biomass for trees of ~50cm, both based on our allometries and the tropical allometries. ... It's hard to tell which species are in which group.

Good point. Here I added a reference to tell those species appart.

source

image

image

image

@maurolepore
Copy link
Member

Right now/ to start (and maybe this is all we'll ever want), it may be worth writing a very simple script where each equation will be evaluated at ~3 dbh values (e.g., 1 cm, 50 cm, 100 cm; counting only those within range of the equation's DBH limits) and equations flagged if they don't fall within a range that we consider reasonable for that size (@gonzalezeb could provide guidance on those threshholds)

Good idea. I'll follow up at forestgeo/fgeo.biomass#22

@gonzalezeb
Copy link
Contributor

I see the problems in the equations for Platanus, Nyssa and two others.. I am fixing it.. you are just to fast for me!

@teixeirak
Copy link
Member Author

Thanks for the plot, @maurolepore! From this, its mainly the hickories and Tilia that are low. This doesn't make much sense, as their wood density tends to be on the high end. @gonzalezeb, what do you think?

@maurolepore
Copy link
Member

equations flagged if they don't fall within a range that we consider reasonable for that size (@gonzalezeb could provide guidance on those threshholds)

How about creating that reasonable range from the deviation from the curve fit to dbh vs the preexisting agb value for for each species?

Althought the preexisting agb was calculated for tropical trees, the plots above show that it is likely close enough to the biomass we should get from allodb -- that is, enough to pick obvious errors in the equations.

@teixeirak
Copy link
Member Author

teixeirak commented Mar 15, 2019 via email

@maurolepore
Copy link
Member

@gonzalezeb,

FYI, in this update, the biomass of Liriodendron sp. appears too high, which Krista suggests might be solved once I support dbh-specific equations. Also see a few suspicious points over 40,000 [kg].

The most important accomplishment of today is that I can now explain all missing biomass values. I'm ready to move on to more sophisticated features that should result in more precise biomass.

@maurolepore
Copy link
Member

Regarding Lliriodendron tulipifera, I wonder if the issue may be one of units. That may explain such a difference -- where the resulting biomass values` greater not by a little but by orders of magnitude.

@teixeirak
Copy link
Member Author

Incorrect units could definitely cause this kind of problem, but I believe @gonzalezeb already checked that.

@gonzalezeb
Copy link
Contributor

Interestingly, I did n't change that equation in my recent fixes so not sure why those large biomass biomass values didn't show before.
At the same time, I just confirmed that equation 94f593 (the one giving those bad values) is incorrect so I will review it.

@teixeirak
Copy link
Member Author

It wouldn't have shown up bad before if it was being applied only within the specified DBH range.

@maurolepore
Copy link
Member

Thanks @gonzalezeb, it's now looking much better. It will continue to improve as we support the more features.

image

@maurolepore
Copy link
Member

maurolepore commented Mar 20, 2019

@gonzalezeb,

We now support dbh-specific equations. There is a new update at http://bit.ly/demo-dbh-vs-biomass but please see also forestgeo/fgeo.biomass#27

image

image

@teixeirak
Copy link
Member Author

Wonderful! At a first pass, this looks much more like what I expected. At a first pass, I don't see any problems.

@maurolepore
Copy link
Member

maurolepore commented Mar 21, 2019

We still don't support generic equations. On one branch of the project I temporarily removed generic equations completely (discussion at #72).

Removing generic equations should produce more accurate biomass estimates for the species we have equations for, but it comes at the cost of loosing some species.

Compare (@teixeirak):

@teixeirak
Copy link
Member Author

There are a lot of species that rely on generic equations. What do we need to be able to handle them? Please let me know if there are any barriers you face in order to incorporate them (other than just time to work on this).

@maurolepore
Copy link
Member

maurolepore commented Mar 22, 2019

allo_find() now automatically prefers expert equations but falls back to generic equations that match the exact same site and species. This change isn't yet in the master branch but will be merged soon. For now you can see it in action on this branch.

Notice that trees which species is given as Genus sp. can't find a matching species on allodb. We need to support that feature (forestgeo/fgeo.biomass#30).

allo_find(census_species)
#> Assuming `dbh` in [mm] (required to find dbh-specific equations).
#> * Matching equations by site and species.
#> * Refining equations according to dbh.
#> * Using generic equations where expert equations can't be found.
#> Warning:   Can't find equations matching these species:
#>   acer sp, carya sp, crataegus sp, fraxinus sp, juniperus virginiana, quercus prinus, quercus sp, ulmus sp, unidentified unk
#> Warning: Can't find equations for 17132 rows (inserting `NA`).

@teixeirak
Copy link
Member Author

teixeirak commented Mar 22, 2019 via email

@maurolepore
Copy link
Member

There may be cases where “Genus sp.” would have a site-specific equation.

I see some examples of this already.

library(tidyverse)
library(allodb)

master() %>% 
  select(site, species, equation_group) %>% 
  filter(equation_group == "Generic") %>% 
  filter(str_detect(species, " sp.$")) %>% 
  distinct()
#> Joining `equations` and `sitespecies` by 'equation_id'; then `sites_info` by 'site'.
#> # A tibble: 15 x 3
#>    site           species       equation_group
#>    <chr>          <chr>         <chr>         
#>  1 serc           Ligustrum sp. Generic       
#>  2 lilly dicky    Crataegus sp. Generic       
#>  3 serc           Vaccinium sp. Generic       
#>  4 tyson          Crataegus sp. Generic       
#>  5 umbc           Crataegus sp. Generic       
#>  6 harvard forest Crataegus sp. Generic       
#>  7 serc           Quercus sp.   Generic       
#>  8 yosemite       Salix sp.     Generic       
#>  9 wind river     Abies sp.     Generic       
#> 10 yosemite       Abies sp.     Generic       
#> 11 lilly dicky    Carya sp.     Generic       
#> 12 scbi           Carya sp.     Generic       
#> 13 serc           Carya sp.     Generic       
#> 14 tyson          Carya sp.     Generic       
#> 15 umbc           Carya sp.     Generic

... fall back on generic equations for any site ... I’d be happy to enter some examples if needed

Okay, with two examples I should have enogh to build the logic.

@teixeirak
Copy link
Member Author

I will provide a couple examples soon.

@teixeirak
Copy link
Member Author

Should I put the examples in this table? That is, do edits go to the .csv files in raw data, as opposed to the R tables? (Note that I just fixed a typo in that file, assuming that will make it into R tables.)

@maurolepore
Copy link
Member

maurolepore commented Mar 26, 2019

If by "example" we mean a mock, then you may simpliy give me the name of a couple of allometries and tell me which sites and taxa they should match. I can create a toy dataset to build some code around.

If instead by "example" we mean something that is production ready (i.e. not a toy but something that can be really used), then certainly the files you want to edit live in the directory data-raw/csv_database/ (https://github.com/forestgeo/allodb/blob/master/data-raw/csv_database/). In this case you'll need to enter new rows in the equations table, each row with an equation_id from this file (please read this short explanation), the taxa and corresponding allometry, and all other relevant information to complete the row. Also you will need to enter the exact same equation_id in the sitespecies table (because equation_id is the key linking the sitespecies with the equations table) along with the site and other relevant information to complete the row. Below is a short view of the most important columns in each of those two tables. In this case we can tag your commits to make sure that Erika can later review them.

RE

That is, do edits go to the .csv files in raw data, as opposed to the R tables?

That's right. The census tables are given by the users, and--after some wranging--matched with allodbh tables by the values in the columns "species", "site".

library(allodb)
library(tidyverse)

list(equations = equations, sitespecies = sitespecies) %>% 
  map(~ select(.x, matches("^site|^species|^equation|allometry$")))
#> $equations
#> # A tibble: 175 x 3
#>    equation_id equation_allometry                equation_form          
#>    <chr>       <chr>                             <chr>                  
#>  1 2060ea      10^(1.1891+1.419*(log10(dbh^2)))  10^(a+b*(log10(dbh^c)))
#>  2 a4d879      10^(1.2315+1.6376*(log10(dbh^2))) 10^(a+b*(log10(dbh^c)))
#>  3 c59e03      exp(7.217+1.514*log(dbh))         exp(a+b*log(dbh))      
#>  4 96c0af      10^(2.5368+1.3197*(log10(dbh)))   10^(a+b*(log10(dbh)))  
#>  5 529234      10^(2.0865+0.9449*(log10(dbh)))   10^(a+b*(log10(dbh)))  
#>  6 ae65ed      exp(-2.48+2.4835*log(dbh))        exp(a+b*log(dbh))      
#>  7 9c4cc9      10^(-1.326+2.762*(log10(dbh)))    10^(a+b*(log10(dbh)))  
#>  8 7f7777      exp(-2.5095+2.5437*log(dbh))      exp(a+b*log(dbh))      
#>  9 cf733d      exp(5.67+1.97*log(dbh))           exp(a+b*log(dbh))      
#> 10 f08fff      exp(-2.2118+2.4133*log(dbh))      exp(a+b*log(dbh))      
#> # ... with 165 more rows
#> 
#> $sitespecies
#> # A tibble: 772 x 6
#>    site    species    species_code equation_group equation_id equation_taxa
#>    <chr>   <chr>      <chr>        <chr>          <chr>       <chr>        
#>  1 Lilly ~ Acer rubr~ 316          Expert         7c72ed      Acer rubrum  
#>  2 Lilly ~ Acer rubr~ 316          Expert         2060ea      Acer rubrum  
#>  3 Lilly ~ Acer sacc~ 318          Expert         a4d879      Acer sacchar~
#>  4 Lilly ~ Amelanchi~ 356          Expert         c59e03      Amelanchier  
#>  5 Lilly ~ Amelanchi~ 356          Expert         96c0af      Amelanchier  
#>  6 Lilly ~ Amelanchi~ 356          Expert         529234      Amelanchier  
#>  7 Lilly ~ Asimina t~ 367          Generic        ae65ed      Mixed hardwo~
#>  8 Lilly ~ Carpinus ~ 391          Generic        ae65ed      Mixed hardwo~
#>  9 Lilly ~ Carya alba 409          Expert         9c4cc9      Carya        
#> 10 Lilly ~ Carya cor~ 402          Expert         9c4cc9      Carya        
#> # ... with 762 more rows

Created on 2019-03-26 by the reprex package (v0.2.1)

@teixeirak
Copy link
Member Author

I added some (real) generic equation entries to the sitespecies table (this commit). These were equations that were already in the database, not totally new equations. For reasons relating to issue #85, I haven't checked whether they would represent the range of scenarios you may encounter, or even ever be called upon with the current set of ForestGEO sites. You can start with that and let me know if you need others.

@maurolepore
Copy link
Member

maurolepore commented Mar 26, 2019

Your commit is now tagged -- meaning that it's easy to find and revert

image

I'll follow up on this at #72

@maurolepore
Copy link
Member

We now support shrubs. You can see the updated biomass analysis at http://bit.ly/demo-dbh-vs-biomass

After you comments I should be able to soon close #41
I'm only unsure about how to interpret dbh_min_cm when using shrub equations which independent variable is dba. My question is at #41 (comment)

@maurolepore
Copy link
Member

@teixeirak, and @gonzalezeb, FYI:

  • We now have a much simpler interface.
  • The analysis of dbh versus biomass is now on README

@gonzalezeb
Copy link
Contributor

This issue is handle by the functions get_biomass and illustrate_allodb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants