pldf is a framework for working with JavaScript objects as if they were DataFrames. The pldf class allows data to be manipulated through a range of functions similar to dplyr verbs, which simplifies a number of common data processing tasks. Details of all available functions can be found below.
pldf includes built-in support for rendering and updating data in interactive HTML tables, called pldt. However this function is not yet fully implemented and should not be used outside of testing.
A new pldf can be created by passing suitably prepared data to a new instance of the pldf class. pldf expects data to be structured as a JavaScript Object populated with arrays. Object keys are used as column headers, arrays are used to populate each column, and array indexes are used as row numbers.
let mydata = {
"col1":[1, 2, 3, 4, 5],
"col2":["one", "two", "three", "four", "five"],
"col3":[true, true, false, false, false]
}
let mydf = new pldf(mydata)
pldf will accept any type of value that can be stored in a JavaScript Object, including functions or other Objects. However, pldf functions are only able to evaluate strings, numbers and Booleans.
For users looking to include a pldt data table, an optional second argument accepts an HTML ID reference, which indicates where the table should be rendered. An optional third element allows users to control how the pldt renders, however it is generally recommended to rather make these edits by changing the defaultSpec
object at the top of the pldf.js script.
let mydf = new pldf(mydata, "myid", myspec)
Any changes made to a pldf Object will overwrite the data passed to the Object. For this reason, if you want to keep your original DataFrame, you should use the clone()
command to create a new instance of the DataFrame.
let df_copy = mydf.clone()
The content of the pldf can be exported to CSV using the toCSV command, and specifying an optional filename.
mydf.toCSV("mydownload.csv")
The arrange function sorts the pldf by values in the specified column. Values are sorted in ascending order by default, but this can be overwritten by passing "desc"
to the optional direction command.
mydf.arrange("col1", "desc");
The bind function adds the rows of another pldf to the first pldf. Headers of the two pldfs must be identical.
mydf.bind(otherpldf);
The filter function removes any values from the pldf that do not meet the specified criteria. The base filter function returns all rows for which the value in the specified column is equal to the specified condition.
mydf.filter("col1", 3)
In addition, filter includes a flexible filterFn function that allows for user specified conditions. These conditions should be written as a condition that evaluates up to two values from a set of columns, and returns a Boolean.
// User specified way to filter for rows with equal values
mydf.filterFn("col1", "col2", function(a,b){
return(a === b);
});
Finally, filter includes two shortcut functions, which allow the user to find values that are greater or less than the specified condition.
mydf.filterGt("col1", 3) // Returns values greater than 3
mydf.filterLt("col1", 3) // Returns values less than 3
The merge function combines two pldf by a specified column, matching the order of rows to the values of that column. By default, merge drops rows from the second DataFrame that have values in the evaluation column that are not contained in the first DataFrame's evaluation column. To override this behaviour, users can set keepy
to true.
mydf.merge(otherpldf, "col1", "col2", true)
The mutate function adds the columns of another pldf to the first pldf. The two pldfs must have an identical number of rows.
mydf.mutate(otherpldf);
In addition, mutateFn can be used to add new columns to the pldf that base their values off one or more of the other columns in the DataFrame, based on a function provided by the user. This function should take two values from the specified columns, and should return a new value for the new column.
// User specified way to create a column "newcol" that adds values from two existing columns
mydf.mutateFn("newcol", "col1", "col2", function(a,b){
return(a + b);
});
Finally, mutate includes a number of shortcut functions, which cover a range of common use cases.
mydf.mutateAv("newcol", "col1", "col2") // newcol is the average of the two columns
mydf.mutateSm("newcol", "col1", "col2") // newcol is the sum of the two columns
mydf.mutateDf("newcol", "col1", "col2") // newcol is the difference of the columns
mydf.mutateDv("newcol", "col1", "col2") // newcol is equal to col1 divided by col2
mydf.mutateTx("newcol", "My label") // newcol contains the specified text in every row
mydf.mutateRn("newcol", "col1") // newcol contains the ordered rank of values in the specified column
The remove function deletes columns from the pldf. The remove function expects an array of column headers, even if only one column is being deleted.
mydf.remove(["col1", "col2"])
The rename function renames a column from the pldf. Similarly, the renameAll function renames all specified columns.
mydf.rename("col1", "Value")
mydf.renameAll(["col1", "col2"], ["Value", "Text"])
The replace function substitutes all matching values in a column with the new specified value. Similarly, the replaceNulls function substitutes all null
or undefined
values with the specified value.
mydf.replace("col1", 1, 0)
mydf.replaceNulls("col1", 1)
The select function returns only the specified columns, removing all others from the pldf.
mydf.select(["col1", "col2"])
The slice function returns only the specified rows, removing all others from the pldf.
mydf.slice(1, 3)
The summarise function works similarly to a Pivot table, reducing the pldf to unique values in the specified groupby array, and adding, averaging or counting the values in the specified column.
mydf.summarise(["col2", "col3"], "col1", "sum")
mydf.summarise(["col2", "col3"], "col1", "mean")
mydf.summarise(["col2", "col3"], "col1", "count")
At present, summarise only has the three functions shown above - sum, mean, and count - however a function with a user-specified input will be added at a later stage.
The widen function is used to convert text in a given column to column headers, and populate these columns with their associated values.
Widen accepts three inputs: a reference column, which remains a single column against which the names and values are assigned; a names column, unique values from which are used as the names of subsequent columns; a values column, which is used to populate these names columns; and an optional sort column, which is normally linked to the reference column (for example, as a numeric reference for a set of dates).
mydf.widen("date", "name", "value", "datesort");
Note that missing values are assigned a value of zero, not NA. Widen removes all other columns from the df. The base widen is only anchored by a single reference column. In order to wideb by multiple columns, use widenMulti.
mydf.widenMulti(["date", "measure"], "name", "value", "datesort");
pldt is an interactive HTML table that updates as changes are made to the pldf object. pldt is currently in development and additional information will be added at a later date.
Additional helper functions, particularly to assist with preparing data before passing it to a pldf, will be added in future updates.
The prep_JSONarray function converts data structured as an array of JSON objects, into data that is suitably structured for a pldf. Users should specify which keys from the Objects in the array should be used as column headers in their new pldf. If a keys value is not provided, the function will try to use the keys found in the first object in the array.
let rawdata = [
{"col1":1, "col2":"One", "col3":true},
{"col1":2, "col2":"Two", "col3":true},
{"col1":3, "col2":"Three", "col3":false}
]
let mydata = prep_JSONarray(rawdata, ["col1", "col2", "col3"]);
let mydf = new pldf(mydata)
The prep_NamedObjects function converts data structured as an object of JSON objects, into data that is suitably structured for a pldf.
let rawdata = {
"One": {"col1":1, "col2":true},
"Two": {"col1":2, "col2":true},
"Three": {"col1":3, "col2":false}
}
let mydata = prep_NamedObjects(rawdata);
let mydf = new pldf(mydata)