Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial federated COVID-rich ICU database documentation. #209

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
a8fd7f8
mimicfed-initial-documentation
danamouk Sep 22, 2023
d886750
fix typo
danamouk Sep 22, 2023
c720f84
minor edits
danamouk Sep 27, 2023
d3104c2
fix abbreviations
danamouk Sep 27, 2023
f3e35fc
removes freely available
danamouk Sep 27, 2023
1a96d34
adds MIMIC-Fed is not yet released and its structure is subject to ch…
danamouk Sep 27, 2023
0b8287f
restructures mimicfed
danamouk Sep 27, 2023
f2580ef
adds mimic feature block for consistency.
danamouk Sep 28, 2023
1d5e663
update NMHC Branded Name w/ HealthCare.
danamouk Oct 4, 2023
a94bf9e
restructures harmonized multicenter db to the main docs page.
danamouk Oct 4, 2023
c397bb6
minor edits and removes mentions of federated
danamouk Oct 4, 2023
4a74db5
edits formatting.
danamouk Oct 4, 2023
9739e7e
updates links and text.
danamouk Oct 4, 2023
1a81904
replaces endouttime with edregttime
danamouk Oct 10, 2023
e6fa4d1
removes order_provider_id
danamouk Oct 10, 2023
9cc1e41
adds more explicit description for hospital expire flag.
danamouk Oct 10, 2023
d54af00
clarifies ICD10 use compared to ICD9 for 2020-2022 period.
danamouk Oct 10, 2023
a34c0e9
removes caregiver-id.
danamouk Oct 10, 2023
edb72f0
adds references to mimic code repo for mappings.
danamouk Oct 10, 2023
2f8806e
fixes typos in start/stoptime & adds itemid and loinc for examples.
danamouk Oct 10, 2023
5f05e6f
adds a clarifying statement about dod in northwestern db.
danamouk Oct 10, 2023
908c7c3
updates mutlicenter to mimic-northwestern
danamouk Oct 10, 2023
0f0db10
updates MIMIC-NW docs.
danamouk Jun 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions content/en/_index.html
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@
The Note module contains deidentified free-text clinical notes for hospitalized patients.
{{% /blocks/feature %}}

{{% blocks/feature icon="fa-lightbulb" title="MIMIC-Northwestern" url="/docs/multi-center/" %}}
A large harmonized multi-center COVID-rich ICU database from Beth Israel Deaconess Medical Center (BIDMC) and Northwestern Memorial HealthCare (NMHC) spanning 2020 to 2022.
{{% /blocks/feature %}}

{{% blocks/feature icon="fa-scroll" title="MIMIC-III" url="/docs/iii/" %}}
MIMIC-III is an older version of MIMIC. It contains an older group of patients (ending in 2012), and a subset of the ICU and hospital information available in MIMIC-IV.
We highly recommend researchers starting new studies to use the above modules in MIMIC-IV.
Expand Down
31 changes: 31 additions & 0 deletions content/en/docs/multi-center/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: "MIMIC-Northwestern documentation"
linktitle: Multi-center
weight: 45

cascade:
- type: "docs"
_target:
path: "/**"

description: >
MIMIC-Northwestern: A Harmonized Multi-center COVID-rich ICU Database
---
We introduce MIMIC-Northwestern, a large harmonized multi-center COVID-rich ICU database. It comprises deidentified health-related data from Beth Israel Deaconess Medical Center (BIDMC) and Northwestern Memorial HealthCare (NMHC) spanning 2020 to 2022, capturing the data distribution shifts during this critical period. The database adopts a similar data structure as MIMIC-IV v2.2.

Notably, Northwestern Memorial HealthCare (NMHC) uses the Epic electronic medical records (EMR) system. To make the EMR data available for research and quality assurance, the NM EMR systems transfer selected data into a relational Enterprise Data Warehouse (NM EDW).

The NM EDW tables are categorized into two primary categories, Fact and Dimension, following data warehousing conventions. As implemented in the NM EDW, Fact tables primarily contain events (such as encounters, admissions, diagnosis events, procedure orders, and medication orders), while Dimension tables describe persistent attributes of entities (patients, procedure names, the medication formulary).

The NM EDW also includes auxiliary tables not directly related to patient care, such as a list of International Classification of Disease codes (ICD-9 and ICD-10). In response to the COVID-19/SARS-COV-2 pandemic, a COVID-19 data mart was created within the EDW to provide convenient access to information on COVID-19 patients, lab results, medications and treatments.

The MIMIC-Northwestern database is currently organized into two distinct modules to highlight the source of the data:

- [Hosp](/docs/multi-center/modules/hosp/) - Hospital level data including patients, admissions, labs, ICD diagnoses for billing purposes, prescriptions, and electronic medication administration records.
alistairewj marked this conversation as resolved.
Show resolved Hide resolved
- [ICU](/docs/multi-center/modules/icu/) - ICU level data including icu stays, procedure events, chartevents (vital signs).

{{% pageinfo %}}
The MIMIC-Northwestern database is not yet released and its structure is subject to change.
{{% /pageinfo %}}

The tables structures adopted to align with MIMIC's data structure for each module are detailed in the respective sections. Additionally, we have incorporated COVID-related concepts and standard terminologies (LOINC, RxNorm, SNOMED, ICD-9/10) and derived mappings (for drug administration) into the dataset. This integration not only facilitates current multi-center initiatives, but also facilities interoperability, allowing for seamless data exchange and collaboration across healthcare systems.
10 changes: 10 additions & 0 deletions content/en/docs/multi-center/modules/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: "Modules"
linkTitle: "Modules"
weight: 3
date: 2023-09-18
description: >
Description of the data contained in each of the MIMIC-Northwestern modules.
---

Data within the modules will be made available on PhysioNet.
14 changes: 14 additions & 0 deletions content/en/docs/multi-center/modules/hosp/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
title: "Hosp"
linkTitle: "Hosp"
date: 2023-09-18
weight: 20
description: >
The Hosp module comprises data sourced from the comprehensive Electronic Health Record (EHR) systems of both BIDMC and NMHC hospitals. Information covered includes patient and admission information, laboratory measurements, billed diagnoses, medication orders, and electronic medication administration records.
---

The Hosp module contains data derived from the hospital wide EHR of BIDMC and NMHC. These measurements are predominantly recorded during the hospital stay, though some tables include data from outside the hospital as well (e.g. outpatient laboratory tests in *labevents*).

Information includes patient and admission details (*patients*, *admissions*), laboratory measurements (*labevents*, *d_labitems*), hospital billing information (*diagnoses_icd*, *d_icd_diagnoses*), medication orders (*prescriptions*), and electronic medication administration records (*emar*).


145 changes: 145 additions & 0 deletions content/en/docs/multi-center/modules/hosp/admissions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
title: "admissions table"
linktitle: "admissions"
date: 2023-09-18
weight: 1
description: >
Detailed information about hospital stays, including admission, discharge, and death times, as well as admission type, admission location, and discharge location; additionally, patient details such as insurance, language, marital status, and race are recorded at the hospital stay level.
---

The *admissions* table gives information regarding a patient's admission to the hospital.

### Links to

* *patients* on `subject_id`

## Table columns

Name | Postgres data type
---- | ----
`subject_id` | INTEGER NOT NULL
`hadm_id` | INTEGER NOT NULL
`admittime` | TIMESTAMP NOT NULL
`dischtime` | TIMESTAMP
`deathtime` | TIMESTAMP
`admission_type` | VARCHAR(40) NOT NULL
`admission_location` | VARCHAR(60)
`discharge_location` | VARCHAR(60)
`insurance` | VARCHAR(255)
`language` | VARCHAR(10)
`marital_status` | VARCHAR(30)
`race` | VARCHAR(80)
`ethnicity` | VARCHAR(80)
`edregtime` | TIMESTAMP
`edouttime` | TIMESTAMP
`hospital_expire_flag` | SMALLINT

## Detailed description

The *admissions* table defines all hospitalizations in the database. Hospitalizations are assigned a unique random integer known as the `hadm_id`.

### `subject_id`

`subject_id` is unique identifier for each patient. `subject_id` is unique to each row and can be used to identify data associated with a specific patient. It is a cryptographic random number and each patient has a `subject_id` which is consistent across tables.

### `hadm_id`

Each row of this table contains a unique `hadm_id`, which represents a single patient's admission to the hospital. It is possible for this table to have duplicate `subject_id`, indicating that a single patient had multiple admissions to the hospital. The ADMISSIONS table can be linked to the *patients* table using `subject_id`.

### `admittime`

`admittime` provides the date and time the patient was admitted to the hospital.

### `dischtime`

`dischtime` provides the date and time the patient was discharged from the hospital.

### `deathtime`

`deathtime` provides the time of in-hospital death for the patient. Note that `deathtime` is only present if the patient died in-hospital, and if present is almost always the same as the patient’s dischtime. However, there may be some discrepancies.

### `admission_type`

`admission_type` is useful for classifying the urgency of the admission. There are 6 distinct additional admission types sourced from the NW EDW database: 'Emergency', 'Urgent', 'Elective', 'Elective-Routine', and 'Trauma'.

### `admission_location`

`admission_location` provides information about the hospital department into which the patient was initially admitted. There are 24 admission locations from NW EDW, including 'Neurology', 'Radiation Oncology', 'Pediatrics', 'Medicine', 'Respiratory Therapy', 'Cardiology', 'Cardiac Rehabilitation', 'Pre-Admission Testing', 'Neurological Intensive Care', 'Orthopaedic Surgery', 'Sleep Medicine', 'Gastroenterology', 'Unknown', 'Obstetrics and Gynecology', 'Emergency Medicine', 'Research', 'Intensive Care', 'Gynecology', 'Pediatric Intensive Care', 'Radiology', 'Pathology', 'Obstetrics', and 'Surgery'. Note, 'Pediatrics' is the name of the unit or room, which is not necessarily exclusively for pediatric patients. The data being shared pertains to the adult hospital, with patients aged 18 and above.


## `discharge_location`

Similarly, `discharge_location` is the disposition of the patient after they are discharged from the hospital. There are 33 discharge locations from NW EDW. Some of the 33 discharge locations are suppressed under 'Other Facility' for privacy.

NMHC discharge locations:

| Discharge Location | Full Abbreviation (for clarity) |
| ------------------------------------------------------- | --------------------------------- |
| Expired | Died |
| Planned Readmission - DC/transferred to acute inpatient rehab | |
| ED Dismissed-Never Arrived | |
| Home with Equipment or O2 | |
| Shelter | |
| Expired - Hospice | Died in Hospice |
| Home with Home Health Care | |
| Planned Readmission - DC/transferred to skilled nursing facility | |
| Acute Inpatient Rehabilitation | |
| Home or Self Care | |
| Group Home | |
| Planned Readmission - Discharged to home/self-care | |
| Left Against Medical Advice | |
| Inpatient Hospice | |
| Admitted to L&D | Admitted to Labor and Delivery |
| Planned Readmission - DC/transferred to nursing home (custodial) | |
| unknown | |
| Cancer Center or Children's Hospital | |
| Home with Outpatient Services | |
| Critical Access Hospital | |
| Planned Readmission - DC/transferred to other type of healthcare institution | |
| Gift of Hope / Still a Patient | |
| Nursing Home (Custodial) | |
| Home with Hospice | |
| VA System Facility | |
| Planned Readmission - DC/transferred to Long-term Acute Care Hospital (LTAC) | |
| Swing Bed | |
| Against Medical Advice (AMA) or Elopement | |
| Skilled Nursing Facility or Subacute Rehab Care | |
| Designated Disaster Alternative Care Site | |
| Acute Care Hospital | |
| Long-Term Acute Care Hospital (LTAC) | |


### `insurance`, `language`, `marital_status`, `race`, `ethnicity`

The `insurance`, `language`, `marital_status`, and `race` and `ethnicity` columns provide information about patient demographics for the given hospitalization. Note, in BIDMC there is only one column for `race`, however we have added `ethnicity` column to incorporate NMHC's data.

The race column in NMHC includes:

- American Indian or Alaska Native
- Other
- Unknown
- 2 or more races
- Unable to Answer
- Native Hawaiian or Other Pacific Islander
- Asian
- White
- Declined
- Black or African American

The ethnicity column in NMHC includes:

- Not Hispanic or Latino
- Hispanic or Latino
- Declined
- Unable to Answer

### `edouttime`
alistairewj marked this conversation as resolved.
Show resolved Hide resolved

The date and time at which arrival of the patient in the emergency department was registered.

### `edouttime`
The date and time at which the patient was discharged from the emergency department, either discharged from the hospital or transferred.

### `hospital_expire_flag`

This is a binary flag which indicates whether the patient died within the given hospitalization. `1` indicates death in the hospital as noted in the `dod` column as part of the *patient* table, and `0` indicates survival to hospital discharge.
alistairewj marked this conversation as resolved.
Show resolved Hide resolved
77 changes: 77 additions & 0 deletions content/en/docs/multi-center/modules/hosp/d_icd_diagnoses.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: "d_icd_diagnoses"
linktitle: "d_icd_diagnoses"
weight: 6
date: 2023-09-18
description: >
Dimension table for *diagnoses_icd*; provides a description of ICD-9/ICD-10 billed diagnoses.
---

The *d_icd_diagnoses* table defines International Classification of Diseases (ICD) Version 9 and 10 codes for **diagnoses**. These codes are assigned at the end of the patient's stay and are used by the hospital to bill for care provided.

### Links to

* *diagnoses_icd* ON `icd_code` and `icd_version`

## Table columns

Name | Postgres data type
---- | ----
`icd_code` | CHAR(7) NOT NULL
`icd_version` | INTEGER NOT NULL
`long_title` | VARCHAR(255)

## Detailed Description

### `icd_code`

`icd_code` is the International Coding Definitions (ICD) code.

### `icd_version`
There are two versions for this coding system: version 9 (ICD-9) and version 10 (ICD-10). These can be differentiated using the `icd_version` column. [ICD-9](https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes) and [ICD-10](https://www.cms.gov/Medicare/Coding/ICD10/2019-ICD-10-CM.html) diagnosis codes are acquired from Centers for Medicare & Medicaid Services (CMS).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of the data for this subset should be ICD-10 - can't imagine NW using ICD-9 in 2020 onward - perhaps make it clear the description of ICD-9 is just for informational purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, added a clarifying statement to indicate ICD 9 codes are just for informational purposes, after mandate of ICD10 in 2015 based on CMS.


In general, ICD-10 codes are more detailed, though code mappings (or "cross-walks") exist which convert ICD-9 codes to ICD-10 codes.

Both ICD-9 and ICD-10 codes are often presented with a decimal. This decimal is not required for interpretation of an ICD code; i.e. the `icd_code` of '0010' is equivalent to '001.0'.

ICD-9 and ICD-10 codes have distinct formats: ICD-9 codes are 5 character long strings which are entirely numeric (with the exception of codes prefixed with "E" or "V" which are used for external causes of injury or supplemental classification). Importantly, ICD-9 codes are retained as strings in the database as the leading 0s in codes are meaningful.

ICD-10 codes are 3-7 characters long and always prefixed by a letter followed by a set of numeric values.

ICD-11 became the official [WHO standard](https://www.who.int/standards/classifications/classification-of-diseases) on January 1, 2022 but has not been adopted in the US. The US Center for Medicare and Medicaid services (CMS) and HIPAA require ICD-10 since October 1, 2015.

### `long_title`

The `long_title` provides a description of the ICD code. For example, the ICD-10 code U07.1 has a `long_title` of 'COVID-19 (confirmed by laboratory testing)'.

In the tables below, we provide ICD-10 codes related to covid or long covid.

Terminologies related to COVID markers in the ICD-10:

| icd_code | long_title |
| -------- | -------------------------------------------------------- |
| U07.1 | COVID-19 (confirmed by laboratory testing) |
| U07.2 | COVID-19, virus not identified |
| U10.9 | Multisystem inflammatory syndrome associated with COVID-19 |
| J12.81 | Pneumonia due to SARS-associated coronavirus |


Terminologies for Long COVID markers in ICD-10:

| icd_code | long_title |
| -------- | ---------------------------------------------- |
| U09.9 | Post COVID-19 condition, unspecified |

Terminologies related to other COVID aspects in ICD-10:

| icd_code | long_title |
|----------|-----------------------------------------------------------------------------------------------------------|
| U08.9 | Personal history of COVID-19, unspecified (not a marker) |
| B97.2 | Coronavirus as the cause of diseases classified elsewhere (not necessarily COVID-19) |
| B97.21 | SARS-associated coronavirus as the cause of diseases classified elsewhere |
| Z28.31 | Underimmunization for COVID-19 status (see detailed codes below) |
| Z28.310 | Unvaccinated for COVID-19 |
| Z28.311 | Partially vaccinated for COVID-19 |
| B97.29 | Other coronavirus as the cause of diseases classified elsewhere (SUPERSEDED; early coding guidelines) |
| Z20.822 | Contact with and (suspected) exposure to COVID-19 (unconfirmed) |
| Z86.16 | Personal history of COVID-19 |
57 changes: 57 additions & 0 deletions content/en/docs/multi-center/modules/hosp/d_labitems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
title: "d_labitems"
linktitle: "d_labitems"
weight: 4
date: 2023-09-18
description: >
Dimension table for *labevents* provides a description of all lab items.
---

## *d_labitems*

*d_labitems* contains definitions for all `itemid` associated with lab measurements in the MIMIC database. All data in *labevents* link to the *d_labitems* table. Each unique (`fluid`, `category`, `label`) tuple in the hospital database was assigned an `itemid` in this table, and the use of this `itemid` facilitates efficient storage and querying of the data.

Laboratory data contains information collected and recorded in the hospital laboratory database. This includes measurements made in wards within the hospital and clinics outside the hospital. Most concepts in this table have been mapped to LOINC codes, an openly available ontology which facilitates interoperability.

For the data sourced from NMHC, Illinois law defines certain categories of information as Sensitive Protected Health Information (SPHI) which require special treatment. SPHI includes genetic counseling but does not include genetic testing.

To facilitate further multi-center initiatives, the lab mappings to standard terminologies (LOINC) will be released.

### Links to

* *labevents* on `itemid`

## Table columns

Name | Postgres data type
---- | ----
`itemid` | INTEGER
`label` | VARCHAR(50)
`fluid` | VARCHAR(50)
`category` | VARCHAR(50)

## Detailed Description

### `itemid`

A unique identifier for a laboratory concept. `itemid` is unique to each row, and can be used to identify data in labevents associated with a specific concept.

### `label`

The `label` column describes the concept which is represented by the `itemid`.

We provide a list of common COVID-19 tests and measurements in the database, as defined by LOINC terminology, below:

- SARS-CoV-2 (COVID-19) [Presence] in Specimen by Organism specific culture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be useful to add itemid and the LOINC for these examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I don't see it? something wrong with my viewer or did the commit not go through?

Copy link
Contributor Author

@danamouk danamouk Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is actually part of commit 2f8806e.

- SARS-CoV-2 (COVID-19) Ag [Presence] in Respiratory specimen by Rapid immunoassay
- SARS-CoV-2 (COVID-19) N gene [Cycle Threshold #] in Specimen by NAA with probe detection
- SARS-CoV-2 (COVID-19) E gene [Cycle Threshold #] in Respiratory specimen by NAA with probe detection


### `fluid`

`fluid` describes the substance on which the measurement was made. These include blood, cerebrospinal fluid, joint fluid, ascites, urine and other body fluid.

### `category`

`category` provides higher level information as to the type of measurement. These categories include hematology, chemistry, and blood gas. For example, a category of 'ABG' indicates that the measurement is an arterial blood gas.
Loading
Loading