MIMIC Clinical Decision Making Analysis

🔥 New Addition: Llama 3.3 has been added to the leaderboard! 🔥

🔥 New Addition: Llama 3.1 has been added to the leaderboard! 🔥

🔥 New Addition: OpenBio has been added to the leaderboard! 🔥

🔥 New Addition: Llama 3 has been added to the leaderboard! 🔥

This folder contains the code used during the analysis of the paper "Evaluating and Mitigating Limitations of Large Language Models in Clinical Decision Making".

To run the analysis, you must first download the appropriate models.

Llama 2 chat: https://huggingface.co/TheBloke/Llama-2-70B-Chat-GPTQ

OASST: https://huggingface.co/TheBloke/Llama2-70B-OASST-SFT-v10-GPTQ

WizardLM: https://huggingface.co/TheBloke/WizardLM-70B-V1.0-GPTQ

Clinical Camel: https://huggingface.co/wanglab/ClinicalCamel-70B (The model must first be quantized using quantize.py)

Meditron: https://huggingface.co/TheBloke/meditron-70B-GPTQ

Then, after creating the dataset using the code from https://github.com/paulhager/MIMIC-Clinical-Decision-Making-Dataset, run all experiments with the code from https://github.com/paulhager/MIMIC-Clinical-Decision-Making-Framework.

The experiments that need to be run are specified in jobs_to_run.sh. Note the comment at the top of the file. As these are hundreds of experiments, we highly recommend using a cluster with some type of multi-job scheduling. We leave the exact implementation up to the specifics of the users system.

Once the files are created, they can be aggregated and pulled using download.py. The results should be grouped by experiment in seperate folders evaluate_cdm.py and evaluate_fi.py. Then, execute all of the cells of benchmark.ipynb, specifying the path to each experiment, to generate all results of the paper.

Visit https://huggingface.co/spaces/MIMIC-CDM/leaderboard to check out the current leaderboard. I will update this as new models are released. If you would like a model to be tested and put on the board, please write me an email at paul (dot) hager (at) tum (dot) de.

Environment

To setup the environment, create a new virtual environment with python=3.10 of your choosing and then run

pip install --no-deps -r requirements.txt

Citation

If you found this code and dataset useful, please cite our paper and dataset with:

Hager, P., Jungmann, F., Holland, R. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat Med (2024). https://doi.org/10.1038/s41591-024-03097-1

@article{hager_evaluation_2024,
	title = {Evaluation and mitigation of the limitations of large language models in clinical decision-making},
	issn = {1546-170X},
	url = {https://doi.org/10.1038/s41591-024-03097-1},
	doi = {10.1038/s41591-024-03097-1},,
	journaltitle = {Nature Medicine},
	shortjournal = {Nature Medicine},
	author = {Hager, Paul and Jungmann, Friederike and Holland, Robbie and Bhagat, Kunal and Hubrecht, Inga and Knauer, Manuel and Vielhauer, Jakob and Makowski, Marcus and Braren, Rickmer and Kaissis, Georgios and Rueckert, Daniel},
	date = {2024-07-04},
}

Hager, P., Jungmann, F., & Rueckert, D. (2024). MIMIC-IV-Ext Clinical Decision Making: A MIMIC-IV Derived Dataset for Evaluation of Large Language Models on the Task of Clinical Decision Making for Abdominal Pathologies (version 1.0). PhysioNet. https://doi.org/10.13026/2pfq-5b68.

@misc{hager_mimic-iv-ext_nodate,
	title = {{MIMIC}-{IV}-Ext Clinical Decision Making: A {MIMIC}-{IV} Derived Dataset for Evaluation of Large Language Models on the Task of Clinical Decision Making for Abdominal Pathologies},
	url = {https://physionet.org/content/mimic-iv-ext-cdm/1.0/},
	shorttitle = {{MIMIC}-{IV}-Ext Clinical Decision Making},
	publisher = {{PhysioNet}},
	author = {Hager, Paul and Jungmann, Friederike and Rueckert, Daniel},
	urldate = {2024-07-04},
	doi = {10.13026/2PFQ-5B68},
	note = {Version Number: 1.0
Type: dataset},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIMIC Clinical Decision Making Analysis

Environment

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
benchmark.ipynb		benchmark.ipynb
download.py		download.py
evaluate_cdm.py		evaluate_cdm.py
evaluate_fi.py		evaluate_fi.py
jobs_to_run.sh		jobs_to_run.sh
quantize.py		quantize.py
ref_range_test.py		ref_range_test.py
requirements.txt		requirements.txt

License

paulhager/MIMIC-Clinical-Decision-Making-Analysis

Folders and files

Latest commit

History

Repository files navigation

MIMIC Clinical Decision Making Analysis

Environment

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages