-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write helpers to categorize data #44
Comments
All that follow is by @gabrielareto (via @). we need, in general, code to go from quantitative variables to qualitative variables, to allow flexible splitting of the data. This is code to go from dbh to size class.
it assumes minimum dbh of 10 (mm) and that everything above 1 m (~ exp(7) mm) is the largest size class. It assumes size classes evenly distributed on the log scale. I think it makes sense for trees, in our plots. In some plots with small trees, maybe exp(6) is a more appropriate lower bound for the largest size class. An extra line could merge two size classes if the largest size class is "too empty". we need something similar to calculate the quadrat for every individual, at any grid size. This code calculates a boolean matrix of individuals by quadrats. It's vectorized, but maybe that matrix is too big for a full dataset and slow because of memory issues. Think about it.
the final function should return quadrat codes with consistent names. I suggest x0_y0 format, using coordinates in meters of the bottomleft corner of each quadrat, using 4 digits always because some plots have >1km but all have <10km length. A utility function refill_zeros <- function(x, nchar = 4) should be useful. it may return also the lx, ly coordinates, lx = gx %% gridsize. these quantity-to-category functions could either return the categories, or append new columns to the full dataset, returning a larger full table. In any case, all of them should behave similarly. best regards, Gabriel |
From @gabrielareto on August 31, 2017 21:5 vectorized function to add zeros to each position in character vectors add_zeroes <- add_zeros <- function(x, min.n.char = 4) note it doesn't work with a matrix, sanity checks required This is already implemented in
|
From @gabrielareto on August 31, 2017 21:46
|
From @gabrielareto on August 31, 2017 22:0 we need to decide if we want functions that take any input and return any output, or functions that take the table as input and an expanded table as output. The first is more general (anyone could use for their datasets), the second includes a step that we think necessary in our workflow. maybe arguments should be passed as
if the census.data is the input, the output could be the expanded census.data. If not, just return what is new. Example with the dbh to size class function:
That's an important trade off. Whenever possible, I build general functions. But often times the interface is simplest if the input is ForestGEO-like data. The two kind of functions are tagged with a |
From @gabrielareto on September 12, 2017 21:46 This function reads a vector of quadrat ID's in the x0_y0 format (explained above) and returns the "traditional" Q20 code in 'column row' format. This should be useful only to explore data in other formats, from some habitat maps, etc, but the forestr should not rely on this naming convention. It allows any separation character, and declare what is the first column or row (0 or 1).
Something similar is already implemented# Similar
library(fgeo)
#> -- Attaching packages ------------------------------------------------------------------- fgeo 0.0.0.9002 --
#> v fgeo.x 0.0.0.9000 v fgeo.analyze 0.0.0.9003
#> v fgeo.plot 0.0.0.9402 v fgeo.tool 0.0.0.9005
#> -- Conflicts --------------------------------------------------------------------------- fgeo_conflicts() --
#> x fgeo.tool::filter() masks stats::filter()
census <- tibble(
gx = c(0, 20, 40),
gy = c(0, 20, 40)
)
add_col_row(census, gridsize = 20, plotdim = c(100, 100))
#> # A tibble: 3 x 4
#> gx gy col row
#> <dbl> <dbl> <chr> <chr>
#> 1 0 0 01 01
#> 2 20 20 02 02
#> 3 40 40 03 03 Created on 2018-12-30 by the reprex package (v0.2.1) |
From @maurolepore on August 31, 2017 16:19
Gabriel proposed to develop a friendly way to categorize (cut) numeric variables. Important ones include:
add_quad()
.Created on 2018-07-02 by the reprex package (v0.2.0).
Copied from original issue: forestgeo/fgeo.abundance#46
The text was updated successfully, but these errors were encountered: