readHCUP makes it easy to read and work with datasets from Healthcare Cost and Utilization Project (HCUP). readHCUP’s functions are designed to work with the ASC files directly, so there is no need to preprocess the data for any of the supported datasets. The number of supported datasets is currently limited while the package is in beta, but more datasets will be added soon.
The currently supported datasets:
- NIS 2016-2021
The current import method utilizes readr, and additional import functions (data.table, etc) will be added in the near future.
Please feel free to create an issue if you have any questions, feedback, or feature requests.
You can install the development version of readHCUP from GitHub with:
# install.packages("devtools")
devtools::install_github("jonbry/readHCUP")
library(readHCUP)
# The following uses an example NIS dataset
# Read theNIS dataset
df <- read_nis("inst/data/NIS_2019_test_data.ASC", 2019)
# Read only the first 5 observations
df_5 <- read_nis("NIS_2019_test_data.ASC", 2019, n_max = 5)
# Read in only the first three diagnostic codes (columns) of the first 10 observations
df_3dx <- read_nis("NIS_2019_test_data.ASC", 2019,
col_select = c("I10_DX1", "I10_DX2", "I10_DX3"),
n_max = 10)
By default, the read_nis()
automatically returns the corrected version
of the data. For example, HCUP released a corrections for
PCLASS_ORPROC
in the NIS 2019 and 2020 datasets. Usually, you’d need
download a csv file with the corrections and then update the values in
the dataset. This can be a bit of a hassle when there are 7M+ records,
so the corrections are automatically applied when using read_nis()
.
- Note: In order for the corrections to be applied,
KEY_NIS
andPCLASS_ORPROC
need to be included in your dataset. If they are not included,read_nis()
will still return the data and you will receive a warning that corrections were not applied.
If you don’t want the corrections to be automatically applied, use
corrected = FALSE
:
# Read dataset the first 10 records of the dataset without corrections.
df <- read_nis("NIS_2019_test_data.ASC", 2019, n_max = 10, corrected = FALSE)
The structure of the NIS dataset can change each year, which means
read_nis()
needs to be updated to support each NIS dataset. You can
find a list of readHCUP’s supported datasets by running the following:
View(supported_datasets)
supported_datasets
includes:
-
A
data
column is the name of the dataset and the year -
A
dataset_file_name
column is the file name that was provided by the HCUP Central Distributor
The NIS dataset has over 150 variables, which are covered in detail on
HCUP’s website. The
descriptions()
function allows you to get a list of all of the
variable descriptions:
d_list <- descriptions(nis, 2019)
head(d_list)