-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add figure-loss-small-data animint as a test #123
base: master
Are you sure you want to change the base?
Conversation
Since the PR only adds a test and not any capability, I did not add NEWS item and increase version number. Please let me know if it still recommended |
acontext("FigureLossSmall") | ||
library(data.table) | ||
library(animint2) | ||
library(jointseg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use jointseg:: instead of library(jointseg)
library(animint2) | ||
library(jointseg) | ||
|
||
loss.small <- readRDS("loss.small.rds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of adding another data set to animint2 (which is already almost too big for CRAN), would it be possible to compute these data? if not, a new data set is OK, but please save it as data/something.RData that we can access via data(something,package="animint2")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to compute these data instead of storing it. My first attempt at computing the data increases the runtime and memory of the test. Currently it takes 1.3 Gb and 600 seconds on my machine. There may be ways to reduce that by optimizing how loss.small and nb are computed. I will spend some time investigating it:
https://github.com/tdhock/changepoint-data-structure/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be great to use a simple/small version of the data (instead of the full/large data), as long as it captures the essence of the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how many clickSelects/showSelected values are there, but typically only 2 subsets are necessary for a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote the test to compute data instead of storing it. Using a small subset of data reduces the runtime. The sample function controls the size of the subset:
sample(1:.N, 5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good start, thanks, please revise
also it looks like update_axes is commented in one of the plots, is that the one that will stop working if update_axes is included? |
Yes, the commented update_axes is the one which stops working. The y axis is the one which does not work. I added another plot to the tests such that both with update_axes and without update_axes are tested |
The tests successfully runs on my local machine, however fails after pushing to GitHub: Let me spend time investigating it further |
you may have to add neuroblastoma to Suggests DESCRIPTION to fix "there is no package called neuroblastoma" |
I rewrote the test to simulate neuroblastoma data set and replicate the issue. The simulated data set uses same fields and a uniform random distribution of values. We no longer have dependency on neuroblastoma package. However two other dependencies to jointseg, penaltyLearning packages were required to help compute the simulated data set |
Hi Toby and Yufan, I was able to remove dependency of neuroblastoma data and also reduce the time required to compute the data instead. Please let me know how to latest commit looks |
Looks like good progress, thanks! Next I believe we need to add a test that fails. geom_tallrect(aes(
xmin=min.lambda, xmax=max.lambda),
alpha=0.5,
clickSelects="changes",
showSelected="pid.chr",
data=some.selection) and add a test about the X axis limits on that plot (max X should not always be 5/max over all pid.chr values, it should be smaller when we click on some other pid.chr values). Also please delete vizWithoutUpdateAxes, since it is irrelevant (the test should be about the update axis functionality). |
Also here is the rendered data viz on the new gallery https://tdhock.github.io/2024-05-22-changepoint-model-selection/ (without the geom_tallrect which causes the problematic X axis) |
loss.small.rds and nb.rds stores subset of data from neuroblastoma. Storing the subset helps avoid loading entire neuroblastoma and improves test executing time
Closes #80