Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add figure-loss-small-data animint as a test #123

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

siddhesh195
Copy link
Contributor

loss.small.rds and nb.rds stores subset of data from neuroblastoma. Storing the subset helps avoid loading entire neuroblastoma and improves test executing time

Closes #80

@siddhesh195
Copy link
Contributor Author

Since the PR only adds a test and not any capability, I did not add NEWS item and increase version number. Please let me know if it still recommended

acontext("FigureLossSmall")
library(data.table)
library(animint2)
library(jointseg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use jointseg:: instead of library(jointseg)

library(animint2)
library(jointseg)

loss.small <- readRDS("loss.small.rds")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding another data set to animint2 (which is already almost too big for CRAN), would it be possible to compute these data? if not, a new data set is OK, but please save it as data/something.RData that we can access via data(something,package="animint2")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to compute these data instead of storing it. My first attempt at computing the data increases the runtime and memory of the test. Currently it takes 1.3 Gb and 600 seconds on my machine. There may be ways to reduce that by optimizing how loss.small and nb are computed. I will spend some time investigating it:
https://github.com/tdhock/changepoint-data-structure/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great to use a simple/small version of the data (instead of the full/large data), as long as it captures the essence of the issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how many clickSelects/showSelected values are there, but typically only 2 subsets are necessary for a test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the test to compute data instead of storing it. Using a small subset of data reduces the runtime. The sample function controls the size of the subset:
sample(1:.N, 5)

Copy link
Collaborator

@tdhock tdhock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good start, thanks, please revise

tests/testthat/test-renderer3-figure-loss-small-data.R Outdated Show resolved Hide resolved
@tdhock
Copy link
Collaborator

tdhock commented Apr 25, 2024

also it looks like update_axes is commented in one of the plots, is that the one that will stop working if update_axes is included?
Could you please include two plots, one with update_axes, and one without, and test them both?
Also I guess this is not going to be working until after we fix #48 so I guess we can work on that fix in this branch.

@siddhesh195
Copy link
Contributor Author

also it looks like update_axes is commented in one of the plots, is that the one that will stop working if update_axes is included? Could you please include two plots, one with update_axes, and one without, and test them both? Also I guess this is not going to be working until after we fix #48 so I guess we can work on that fix in this branch.

Yes, the commented update_axes is the one which stops working. The y axis is the one which does not work. I added another plot to the tests such that both with update_axes and without update_axes are tested

@siddhesh195
Copy link
Contributor Author

The tests successfully runs on my local machine, however fails after pushing to GitHub:
Error in find.package(package, lib.loc, verbose = verbose): there is no package called ‘neuroblastoma’

Let me spend time investigating it further

@tdhock
Copy link
Collaborator

tdhock commented Apr 29, 2024

you may have to add neuroblastoma to Suggests DESCRIPTION to fix "there is no package called neuroblastoma"
but again it would be preferable if you could create a small simulated data set that replicates the issue (so we could avoid adding the dependency)

@siddhesh195
Copy link
Contributor Author

you may have to add neuroblastoma to Suggests DESCRIPTION to fix "there is no package called neuroblastoma" but again it would be preferable if you could create a small simulated data set that replicates the issue (so we could avoid adding the dependency)

I rewrote the test to simulate neuroblastoma data set and replicate the issue. The simulated data set uses same fields and a uniform random distribution of values. We no longer have dependency on neuroblastoma package. However two other dependencies to jointseg, penaltyLearning packages were required to help compute the simulated data set

@siddhesh195
Copy link
Contributor Author

Hi Toby and Yufan,

I was able to remove dependency of neuroblastoma data and also reduce the time required to compute the data instead. Please let me know how to latest commit looks

@tdhock
Copy link
Collaborator

tdhock commented May 22, 2024

Looks like good progress, thanks!
First of all I think the "Selected data and model" plot could be improved by plotting the geom_segment correctly, see below for an example (first segment should start at first data point, last segment should end at last data point etc)
image

Next I believe we need to add a test that fails.
#80 (comment) says "if I add clickSelects it stops working (but it should still work)." which is a bit hard to understand (sorry) but here is a more detailed explanation.
Adding the geom_tallrect with clickSelects="changes" yields the following viz below, where the limits on the X axis on the last plot are incorrect -- penalty goes all the way up to 5 but it should be smaller (as in viz above)
image
So please add the following geom to vizWithUpdateAxes$lines

    geom_tallrect(aes(
      xmin=min.lambda, xmax=max.lambda),
      alpha=0.5,
      clickSelects="changes",
      showSelected="pid.chr",
      data=some.selection)

and add a test about the X axis limits on that plot (max X should not always be 5/max over all pid.chr values, it should be smaller when we click on some other pid.chr values).

Also please delete vizWithoutUpdateAxes, since it is irrelevant (the test should be about the update axis functionality).

@tdhock
Copy link
Collaborator

tdhock commented May 22, 2024

Also here is the rendered data viz on the new gallery https://tdhock.github.io/2024-05-22-changepoint-model-selection/ (without the geom_tallrect which causes the problematic X axis)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add update_axes test case
2 participants