-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Evaluation, Reproducibility, Benchmarks Meeting 29
Nicholas Heller edited this page Oct 23, 2024
·
1 revision
Date: 23rd October, 2024
- Olivier
- Annika
- Carole
- Nicola
- Michela
- Nick
- Lena
Broadly:
- Consensus recommendations + implementation in MONAI
Specific Ideas:
- Ranking analysis implemented into MONAI (currently in R)
- Confidence intervals when it comes to hierarchical datasets
- We could have a shared codebase that is not mature, not for deployment, but meant for our experiments
- Gathering the data necessary to inform the consensuses that we'd like to present
- Specifically, inference results for many algorithms over many tasks
- Specifically, what recommendations would we like to make about inference and comparing algorithms?
- Proper use of bootstrapping
- Aggregation in the presence of hierarchical data (e.g. IPD meta analysis/random effects analysis)
- Graphs
- Recommendations w.r.t. standard deviation
- Foundations for computing them
- Across data points vs. across folds
- More
- Do we need to move to parametric approaches?
- If you fulfill the assumptions, it is usually more accurate and gives you more statistical power
- Allows for Bayesian approaches (credible intervals, etc.)
- Do we need to move to parametric approaches?
Short term
- It would be great to get something out soon that may not be complete, but covers many common use cases
- Would be very nice to include hierarchical data, but might not be feasible
- Non-independence of the datasets needs to be addressed (video frames, for example)
- Re: Parametric vs nonparametric
- We should let the empirical data guide us
- Would be very nice to include hierarchical data, but might not be feasible
Some preliminary data to start with
- The decathalon might be a good fit
- We've already worked with this data
- Has many tasks
- Segmentation is very common
- Can this be shared?
- Metrics themselves -- almost certainly. They are public in most cases
- Predictions -- sure, but won't be useful without ground truth which can't be shared
- Michela can compute new metrics if needed
- MICCAI 2015 challenges
- Have lots of metrics for these, but they're somewhat outdated