Skip to content

Evaluation, Reproducibility, Benchmarks Meeting 29

Nicholas Heller edited this page Oct 23, 2024 · 1 revision

Minutes of Meeting 29

Date: 23rd October, 2024

Present

  • Olivier
  • Annika
  • Carole
  • Nicola
  • Michela
  • Nick
  • Lena

Brainstorming for New Projects

Broadly:

  • Consensus recommendations + implementation in MONAI

Specific Ideas:

  • Ranking analysis implemented into MONAI (currently in R)
  • Confidence intervals when it comes to hierarchical datasets
  • We could have a shared codebase that is not mature, not for deployment, but meant for our experiments
  • Gathering the data necessary to inform the consensuses that we'd like to present
    • Specifically, inference results for many algorithms over many tasks
  • Specifically, what recommendations would we like to make about inference and comparing algorithms?
    • Proper use of bootstrapping
    • Aggregation in the presence of hierarchical data (e.g. IPD meta analysis/random effects analysis)
    • Graphs
    • Recommendations w.r.t. standard deviation
      • Foundations for computing them
      • Across data points vs. across folds
  • More
    • Do we need to move to parametric approaches?
      • If you fulfill the assumptions, it is usually more accurate and gives you more statistical power
      • Allows for Bayesian approaches (credible intervals, etc.)

Short term

  • It would be great to get something out soon that may not be complete, but covers many common use cases
    • Would be very nice to include hierarchical data, but might not be feasible
      • Non-independence of the datasets needs to be addressed (video frames, for example)
    • Re: Parametric vs nonparametric
      • We should let the empirical data guide us

Some preliminary data to start with

  • The decathalon might be a good fit
    • We've already worked with this data
    • Has many tasks
    • Segmentation is very common
    • Can this be shared?
      • Metrics themselves -- almost certainly. They are public in most cases
      • Predictions -- sure, but won't be useful without ground truth which can't be shared
      • Michela can compute new metrics if needed
  • MICCAI 2015 challenges
    • Have lots of metrics for these, but they're somewhat outdated
Clone this wiki locally