- Added
load_files()
to load data using explicit file paths.
- GitHub release with QC report
- Update package documentation
- General package maintenance.
- Bugfix release that disables problematic dv.loader::load_data file name partial matching.
First release
- Major changes before first release
- Changed name of package from "dataloader" to "dv.loader" to avoid CRAN collision
- Changed the
domains
paramater name throughout tofile_names
- Changed the
study_dir
paramater name throughout tosub_dir
, because it is used to reference a sub directory ofget_cre_path
. - Improved tests and test coverage
- code coverage > 93% and refactored tests to use BDD
- Refactored to take out the OOP design. Now you just use the functions directly without having to create a "dataloader" object.
- Took out "meta" object from the return of
load_data()
. Now, metadata is appended directly to each dataframe as an attribute.- E.g.,
attr(df, "meta")
to view.
- E.g.,
- package style changed from camelCase to snake_case
- better documentation added with more examples
-
The API for
dataloader
has changed considerably. Now there are three public functions:- loadData() for bringing data into memory
- createDB() for creating an SQLite database
- now creates a new "attr" attributes table containing the column-level attributes of the files read keyed on table (domain) name.
- getTableRefs() for retrieving a list of table references from a DB connection
-
Support for indexing the SQLite DB added.
- Pass a list to the "file_names" arg in
createDB()
such that:- the elements of the list are domain names
- the values of the elements are character vectors representing columns that you want indexed
- check the result in the "sqlite_stat1" metatable retrieved from
getTableRefs()
- Pass a list to the "file_names" arg in
-
If you specify
useDB = T
and there's an existing database file, thenload_data()
will connect to it, and return its tables. -
load_data()
will check for an existing DB based on the users input fordbFileName
. If that is leftNULL
, then it will use the default name based onstudyDir
, and if that isNULL
, then it will usedb
. Otherwise, it will just create a new database (ifuseDB = T
). -
The API to
read_file()
has changed. No longer need to specifyisRDS
. The function will figure that out based on thefile_name
passed in.
-
data.loader
is now calleddataloader
-
dataloader
’s local DB functionality is now passing unit tests, meaning that the integrity of data flow from the producing system (CARE) is ensured. -
The output of
load_data()
for databases includes attributes from the original file. So, even if those attributes are lost when the data is loaded into the database, you can recover them. Here is how to recover for a given column and dataframe:
attr <- dataList[["attr"]]
attr_adsl <- attr["adlb"]
attr_adsl_studyid <- attr[["adlb"]][["STUDYID"]]
attr_adsl_studyid
# $label
# [1] "Study Identifier"
#
# $format.sas
# [1] "$"
-
load_data()
by default returns a list named by the file_names passed, and containing a dataframes, along with metadata for that table. -
Create a DB table connection (created with dplyr::tbl(dl$dbConn, “myDomain”)) by using
useDB = T
inload_data()
. -
set_base_path()
has been removed, and thebase_path
attribute is now private so as to restrict where on CARE this module can access. -
the only arg that is mandatory is
file_names
. IfstudyDir
is left as null, then it just uses working directory as default.
-
data.loader
now supports the creation of local SQL databases for managing very large files. Users should interact with thedb_conn
connection using the DBI package, found on CRAN. Usage is detailed below in the benchmarking section. Also, internally, theload_data()
function has been refactored to separate out the importing of data from the creation of the local database. -
This dv.loader also fixes a bug where if a user did not provide a “/” in front of the
studyDir
arg, then it wouldn’t be able to find the right path. -
isLocal
in theload_data()
API has been replaced withuseWD
(for “use working directory”) to make more sense. -
A flag for prefering SAS files over RDS files has been added to
load_data()
data.loader
is now an R6 class for internal scoping ofbase_path
. This way, users can change the “working directory” of the data loader module without affecting the working directory of their global environment. A new function is available calledset_base_path()
for this purpose. See usage below for how to create a “dataloader” object. Otherwise, usage is the same as in V0.1.0.
- Initial commit.
data.loader
has the functionsload_data(studyDir, file_names)
andset_base_path()
.