Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code profiling: constantDensitySampling() #10

Open
dylanbeaudette opened this issue Feb 12, 2018 · 4 comments
Open

code profiling: constantDensitySampling() #10

dylanbeaudette opened this issue Feb 12, 2018 · 4 comments

Comments

@dylanbeaudette
Copy link
Member

There is likely room for improvement in constantDensitySampling():

  • implement parallel processing of sample.by.poly() as applied to an sp object to be sampled
  • more efficient use of try() / if() within sample.by.poly()

Profiling data for constantDensitySampling(), see documentation:

image

Some recent stats:

  • Drummer series extent (1,481,412 ac.) @ 0.01 pts / ac. = 13 seconds
  • San Joaquin series extent (457,649ac.) @ 0.01 pts / ac. = 0.9 seconds
  • Auburn series extent (36,5341 ac.) @ 0.01 pts / ac. = 0.6 seconds
@dylanbeaudette
Copy link
Member Author

There is only limited support for sampling st objects, see the porting wiki. Currently, only "random" sampling of st objects is supported, and it is about as efficient as sp::spsample().

Eventually, we will want to migrate over to sf because of the spatial indexing.

@dylanbeaudette
Copy link
Member Author

dylanbeaudette commented Feb 12, 2018

Experiments with parallel processing on Windows (parallel package: parLapplyLB()) suggest that the underlying functions are mostly tied-up with disk IO, especially when polygons are large. There isn't likely to be much of an improvement unless:

  • I am doing it incorrectly
  • there are many, many polygons to sample (n.polygons > n.cores)
  • sampling a single polygon is fast relative to total required sampling time
  • further efficiency can be made in sample.by.poly

Worth the effort? Not sure. I'll commit these changes and test further.

Here is some great discussion on when / where / how parallelization might be possible in Windows. More details here.

There is a lot of overhead when using parallel on Windows:

image

@dylanbeaudette
Copy link
Member Author

Strange: constantDensitySampling(parallel=TRUE) is 2x slower than serial processing, unless the function code is sourced into the current R session. How could this be?

dylanbeaudette added a commit that referenced this issue Feb 13, 2018
@dylanbeaudette
Copy link
Member Author

I don't think that parallel processing is working as intended, even for a very large number of samples. Consider alternatives to sp::spsample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant