Skip to content

noharm-ai/brateca

Repository files navigation

BRATECA

BRATECA (BRAzilian TErtiary CAre dataset) is a clinical dataset that boasts more than 400 million words across over 2.5 million free-text clinical notes from over 70,000 individual admissions in 10 different hospitals located in two Brazilian states. The dataset also possesses patient, prescription, and exam information for these admissions, when available. This data is collected, deidentified and managed by the Institute for Artificial Intelligence in Healthcare, a non-profit startup from Brazil composed of an interdisciplinary team of data scientists and practicing healthcare professionals such as pharmacists and physicians that develop smart systems for clinical pharmacy. The dataset has been made available by them for credentialed access.

Dataset details

BRATECA contains 73,040 admission records of 52,973 unique adults (18 years of age or older) extracted from 10 hospitals located in two Brazilian states. Amongst those admissions, several are associated with specialty treatment wards, as follows: publicly funded wards (12,096 admissions total); intensive care wards (4,666 admissions total); obstetrics wards (5,550 admissions total); COVID-19 wards (1,714 admissions total); surgical wards (25,004 admissions total); emergency wards (37,392 admissions total); and ambulatory wards (3,107 admissions total). The remaining 8,674 admissions associated with any specialty wards.

The median patient age is 54 (Q1 = 38, Q3 = 68), 41.3% of the patients are male, 70.7% are identified as white, 3.8% are identified as mixed, 3.8% are identified as black and 0.2% are identified as yellow, and the mortality rate of patients is 6.5%. Each admission is paired with laboratory exam results (2,374,807 total), prescriptions and their itemized contents (519,318 total), and clinical notes (2,849,572 total).

An interactive dashboard has been created for the dataset using Google Data Studio, and is openly available.

Resources in this page

This page is a repository for all scripts used in the creation of the BRATECA dataset. These have been made freely available for the purposes of transparency, and are of no particular use for those without direct access to the Institute's database.

For those looking to be credentialed for access to the BRATECA dataset, please read the Data Access section of this readme for more information.

Data Access

In order to receive access, the researcher must complete the following steps:

  • access Physionet platform
  • complete a course on protecting human research participants
  • If the requester is a student, their supervisor must also agree to the terms of confidentiality

Once the process is complete, the researcher's request will be evaluated. The evaluation may take up to two weeks. If the request is accepted, the researcher will be granted access to the dataset.

Citing This Work

Full Text, BibText

About

Brazilian Tertiary Care Dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published