-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
createWeights.R (for Frescalo) contains arguably inappropriate hard-coded distance function #240
Comments
Thanks @sacrevert not something I had considered |
Just noting here that I have created a quick tool for visualising Frescalo weights (https://github.com/sacrevert/visualiseFresNeighbours) and sets of new weights at various geographic scales (https://github.com/sacrevert/frescaloNeighbourhoods). It would be interesting to compare the existing approach in Sparta to these new sets that use newer land cover information, additional geological information, and the cosine similarity measure (rather than the Euclidean approach currently encoded in sparta) |
Quick comparison here between the sparta approach and what I did. Doesn't actually make a great deal of difference in this case (although some neighbourhoods show differences, this is probably negligible for trend estimates, even if they are slightly more coherent ecologically); still, might be wise to give the user an option, or warning, with regards to the dissimilarity measure, as it could have bigger effects in other cases. See https://github.com/sacrevert/frescaloNeighbourhoods/blob/main/spartaWeightsComparison.pdf |
@sacrevert thank you for doing the comparison and taking the time to put together the PDF. Realistically, given other priorities, I don't see any changes being made to sparta's frescalo functionality in the near future. I'd be happy to review and pull in and changes that you want to make, but realise you may well not have the time either. |
No worries. No, I probably won't have time either : ) |
Just noting that this function to create neighbourhood land cover-based weights uses
dist()
with default options. This is a Euclidean distance measure that is potentially inappropriate for very sparse matrices (because lots of shared zeros between items can have a strong influence on the distance measures -- a similar issue that often comes up in community ecology). A warning, and the option to use something like the cosine similarity measure, would be desirable. Efficient code for the latter is the second answer here: https://stats.stackexchange.com/questions/31565/compute-a-cosine-dissimilarity-matrix-in-rThe text was updated successfully, but these errors were encountered: