NSIDES project data release v0.1
Data are available at http://tatonettilab.org/resources/nsides/.
This release includes data for drug side effects (OFFSIDES) and drug-drug pair side effects (TWOSIDES).
These data represent an update from the data released with the publication of [1], to include adverse events reported to the FDA through FAERS up to and including 2014.
We are releasing source code (notebooks and scripts) on GitHub, as well as data in the form both of database dumps and compressed .csv
files.
[1] Tatonetti, Nicholas P., P. Ye Patrick, Roxana Daneshjou, and Russ B. Altman. "Data-driven prediction of drug effects and interactions." Science translational medicine 4, no. 125 (2012): 125ra31-125ra31. doi:10.1126/scitranslmed.3003377
Summary information
Number of | Value |
---|---|
Drugs (≥ 1 exposure) | 3,394 |
Adverse events types (≥ 1 occurrence) | 17,552 |
Drug-event pairs | 9,505,200 |
Significant* drug-event pairs | 125,647 |
Drug-drug-event triplets† | 222,155,888 |
Significant* drug-drug-event triplets† | 5,729,992 |
* Significant determined by LOG(PRR) - 1.96 * PRR_error > LOG(2)
† This is not filtered by OFFSIDES, meaning a drug-drug-event triplet can be significant even if one of the drugs is more significantly associated with the event by itself.
Notes on methods used to compute data
Signal detection methods
A contingency table can be drawn using exposed and unexposed cohorts produced by propensity score matching.
Had outcome | Didn't have outcome | |
---|---|---|
Drug exposed | A | B |
Not drug exposed | C | D |
Using these definitions,
and the error is
Several consequences of these definitions should be taken into account when inspecting the data.
- PRR is
NaN
when both A and C are zero. - PRR is
Inf
when C is zero but A is greater than zero. - PRR is zero when A is zero and C is not zero.
- PRR_s is
Inf
when A or C or both are zero.
Computational notes
Propensity score matching was used to account for covariates.
OFFSIDES and TWOSIDES were handled in the same way.
The (drug exposure) propensity scores for were computed using 10 bootstrap iterations.
For each iteration, only a fraction of reports were used to fit a logistic regression model that predicted drug exposure.
Specifically, non-exposed reports were sampled in equal number to exposed reports.
If fewer than 100 reports were exposed, then exposed reports (however many available) were used with 100 unexposed reports for propensity score calculation.
Final propensity scores were calculated using an average across the 10 bootstrap iterations.
Propensity score matching used the following bins: [0, 0.2, 0.4, 0.6, 0.8, 1].
These bins were used to create case and control sets with similar compositions.
For each bin, 10 times the number of cases were sampled from control (unexposed) reports.
We only added cases or controls from bins that had at least one case and at least one control.
While some filtering was conducted for both OFFSIDES and TWOSIDES (A > 0 for both flat files and database dumps and PRR > 0.1 for flat files), TWOSIDES contains pairwise associations even for pairs where one of the drugs alone has a higher association score than the pair.
Flat files
Two flat files are made available for download: OFFSIDES.csv.xz
and TWOSIDES.csv.xz
.
The files have the following schemata:
OFFSIDES.csv.xz
column name | data type | description |
---|---|---|
drug_rxnorn_id | integer | RxNorm CUI (RxCUI) of each drug |
drug_concept_name | string | String name of each drug |
condition_meddra_id | integer | MedDRA code of each (adverse event) condition |
condition_concept_name | string | String name of each condition |
A | integer | Number of reports prescribed the drug that had the condition |
B | integer | Number of reports prescribed the drug that did not have the condition |
C | integer | Number of reports not prescribed the drug† that had the condition |
D | integer | Number of reports not prescribed the drug† that did not have the condition |
PRR | float | Proportional reporting ratio* |
PRR_error | float | Proportional reporting ratio error* |
mean_reporting_frequency | float | A / (A + B) |
† Number of controls is determined using propensity-score matching with controls sampled at 10-to-1 relative to cases.
* See signal detection methods
Note that this table does not include any drug-condition combinations for which both A and C were zero.
In other words, no information is presented for conditions that did not occur or for drugs that caused no adverse events.
TWOSIDES.csv.xz
Each drug pair in the TWOSIDES file occur each only once, since drug pair ordering is not meaningful.
Each pair's order is determined by OMOP CDM IDs in the TWOSIDES
database table, though this means there is no particular order between RxNorm IDs in this flat file.
column name | data type | description |
---|---|---|
drug_1_rxnorm_id | integer | RxNorm CUI (RxCUI) of the first drug |
drug_1_concept_name | string | String name of the first drug |
drug_2_rxnorm_id | integer | RxNorm CUI (RxCUI) of the second drug |
drug_2_concept_name | string | String name of the second drug |
condition_meddra_id | integer | MedDRA code of each (adverse event) condition |
condition_concept_name | string | String name of each condition |
A | integer | Number of reports prescribed the drug that had the condition |
B | integer | Number of reports prescribed the drug that did not have the condition |
C | integer | Number of reports not prescribed the drug† that had the condition |
D | integer | Number of reports not prescribed the drug† that did not have the condition |
PRR | float | Proportional reporting ratio* |
PRR_error | float | Proportional reporting ratio error* |
mean_reporting_frequency | float | A / (A + B) |
† Number of controls is determined using propensity-score matching with controls sampled at 10-to-1 relative to cases.
* See signal detection methods
Note that this table does not include any drug-drug-condition triplets for which both A and C were zero.
In other words, no information is presented for conditions that did not occur or for drug pairs that caused no adverse events.
Database dumps
The data contained in this release is a stored in a MySQL database called effect_nsides
(MySQL Ver 14.14 Distrib 5.7.26, for Linux (x86_64)).
The code used to build and populate the tables in this database are located in nb/4.insert_tables/.
The dump was created using
usr/bin/mysqldump -p$PASSWORD effect_nsides | gzip > effect_nsides-2019-11-13.sql.gz
To load the database dumps into a local MySQL database, use the following commands:
mysqladmin create effect_nsides
gunzip < effect_nsides-2019-11-13.sql.gz | mysql effect_nsides
For more information about loading from these dump files, see the MySQL documentation on this topic.
Table schemata
effect_nsides
has the following tables: REPORT
, CONDITION_CONCEPT
, CONDITION_OCCURRENCE
, DRUG_CONCEPT
, DRUG_EXPOSURE
, OFFSIDES
, and TWOSIDES
.
These tables have the following schemata:
REPORT
column name | data type | description |
---|---|---|
report_id | int | Unique ID for a report |
report_year | int | Year in which the report was received |
person_age | int | Age of the affected person |
person_sex | char(1) | Sex of the affected person |
Note that both person_age
and person_sex
columns contain NULL
values.
person_sex
has been coded to be one of three values: 'M', 'F', or 'U'.
person_age
is normalized to units of years, though some unreasonable values are clearly erroneous reports (eg. 1054, 869, 5200).
CONDITION_CONCEPT
column name | data type | description |
---|---|---|
condition_concept_id | int | OMOP CDM concept_id of each (adverse event) condition |
condition_concept_name | varchar(255) | String name of each (adverse event) condition |
condition_meddra_id | int | MedDRA code of each (adverse event) condition |
condition_snomed_id | int | SNOMED-CT code of each (adverse event) condition |
Note that the condition_snomed_id
column does contain NULL
values, as the map between MedDRA and SNOMED-CT is imperfect.
CONDITION_OCCURRENCE
column name | data type | description |
---|---|---|
report_id | int | Unique ID for a report |
condition_concept_id | int | OMOP CDM concept_id of an (adverse event) condition |
This table simply connects rows from REPORT
with rows in CONDITION_CONCEPT
.
DRUG_CONCEPT
column name | data type | description |
---|---|---|
drug_concept_id | int | OMOP CDM concept_id of each drug |
drug_concept_name | varchar(255) | String name of each drug |
rxnorm_concept_id | int | RxNorm CUI (RxCUI) of each drug |
drugbank_concept_id | varchar(255) | DrugBank Accession Number (DB*) of each drug |
chebi_concept_id | int | ChEBI ID of each drug |
Note that drugbank_concept_id
and chebi_concept_id
both contain NULL
values, as the maps among these terminologies are imperfect.
DRUG_EXPOSURE
column name | data type | description |
---|---|---|
report_id | int | Unique ID for a report |
drug_concept_id | int | OMOP CDM concept_id of a drug |
This table simply connects rows from REPORT
with rows in DRUG_CONCEPT
.
OFFSIDES
column name | data type | description |
---|---|---|
drug_concept_id | int | OMOP CDM concept_id of a drug |
condition_concept_id | int | OMOP CDM concept_id of an (adverse event) condition |
A | integer | Number of reports prescribed the drug that had the condition |
B | integer | Number of reports prescribed the drug that did not have the condition |
C | integer | Number of reports not prescribed the drug† that had the condition |
D | integer | Number of reports not prescribed the drug† that did not have the condition |
PRR | float | Proportional reporting ratio* |
PRR_error | float | Proportional reporting ratio error* |
mean_reporting_frequency | float | A / (A + B) |
† Number of controls is determined using propensity-score matching with controls sampled at 10-to-1 relative to cases.
* See signal detection methods
Note that this table does not include any drug-condition combinations for which both A and C were zero.
In other words, no information is presented for conditions that did not occur or for drugs that caused no adverse events.
TWOSIDES
column name | data type | description |
---|---|---|
drug_concept_id_1 | int | OMOP CDM concept_id of first drug |
drug_concept_id_2 | int | OMOP CDM concept_id of second drug |
condition_concept_id | int | OMOP CDM concept_id of an (adverse event) condition |
A | integer | Number of reports prescribed the drug that had the condition |
B | integer | Number of reports prescribed the drug that did not have the condition |
C | integer | Number of reports not prescribed the drug† that had the condition |
D | integer | Number of reports not prescribed the drug† that did not have the condition |
PRR | float | Proportional reporting ratio* |
PRR_error | float | Proportional reporting ratio error* |
mean_reporting_frequency | float | A / (A + B) |
† Number of controls is determined using propensity-score matching with controls sampled at 10-to-1 relative to cases.
* See signal detection methods
Note that this table does not include any drug-drug-condition triplets for which both A and C were zero.
In other words, no information is presented for conditions that did not occur or for drug pairs that caused no adverse events.
Questions
This work is the result of efforts by a number of people.
Questions or comments are best made by opening an issue on the GitHub repository.
Additionally, as this work is the result of a rotation project in the Tatonetti Lab at Columbia University, questions can also be sent to Michael Zietz (zietzm) or Nicholas Tatonetti directly.