Skip to content
/ pldf Public

pldf is a framework for working with JavaScript objects as if they were DataFrames

License

Notifications You must be signed in to change notification settings

cwoodza/pldf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pldf

Overview

pldf is a framework for working with JavaScript objects as if they were DataFrames. The pldf class allows data to be manipulated through a range of functions similar to dplyr verbs, which simplifies a number of common data processing tasks. Details of all available functions can be found below.

pldf includes built-in support for rendering and updating data in interactive HTML tables, called pldt. However this function is not yet fully implemented and should not be used outside of testing.

Table of Contents
  1. Overview
  2. Getting started
  3. pldf functions
    1. arrange
    2. bind
    3. filter
    4. merge
    5. mutate
    6. remove
    7. rename
    8. replace
    9. select
    10. slice
    11. summarise
    12. widen
  4. pldt functions
  5. Helper functions

Getting started

A new pldf can be created by passing suitably prepared data to a new instance of the pldf class. pldf expects data to be structured as a JavaScript Object populated with arrays. Object keys are used as column headers, arrays are used to populate each column, and array indexes are used as row numbers.

let mydata = {
    "col1":[1, 2, 3, 4, 5],
    "col2":["one", "two", "three", "four", "five"],
    "col3":[true, true, false, false, false]
}

let mydf = new pldf(mydata)

pldf will accept any type of value that can be stored in a JavaScript Object, including functions or other Objects. However, pldf functions are only able to evaluate strings, numbers and Booleans.

For users looking to include a pldt data table, an optional second argument accepts an HTML ID reference, which indicates where the table should be rendered. An optional third element allows users to control how the pldt renders, however it is generally recommended to rather make these edits by changing the defaultSpec object at the top of the pldf.js script.

let mydf = new pldf(mydata, "myid", myspec)

Any changes made to a pldf Object will overwrite the data passed to the Object. For this reason, if you want to keep your original DataFrame, you should use the clone() command to create a new instance of the DataFrame.

let df_copy = mydf.clone()

The content of the pldf can be exported to CSV using the toCSV command, and specifying an optional filename.

mydf.toCSV("mydownload.csv")

pldf functions

arrange

The arrange function sorts the pldf by values in the specified column. Values are sorted in ascending order by default, but this can be overwritten by passing "desc" to the optional direction command.

mydf.arrange("col1", "desc");

bind

The bind function adds the rows of another pldf to the first pldf. Headers of the two pldfs must be identical.

mydf.bind(otherpldf);

filter

The filter function removes any values from the pldf that do not meet the specified criteria. The base filter function returns all rows for which the value in the specified column is equal to the specified condition.

mydf.filter("col1", 3)

In addition, filter includes a flexible filterFn function that allows for user specified conditions. These conditions should be written as a condition that evaluates up to two values from a set of columns, and returns a Boolean.

// User specified way to filter for rows with equal values
mydf.filterFn("col1", "col2", function(a,b){
    return(a === b);
});

Finally, filter includes two shortcut functions, which allow the user to find values that are greater or less than the specified condition.

mydf.filterGt("col1", 3) // Returns values greater than 3
mydf.filterLt("col1", 3) // Returns values less than 3

merge

The merge function combines two pldf by a specified column, matching the order of rows to the values of that column. By default, merge drops rows from the second DataFrame that have values in the evaluation column that are not contained in the first DataFrame's evaluation column. To override this behaviour, users can set keepy to true.

mydf.merge(otherpldf, "col1", "col2", true)

mutate

The mutate function adds the columns of another pldf to the first pldf. The two pldfs must have an identical number of rows.

mydf.mutate(otherpldf);

In addition, mutateFn can be used to add new columns to the pldf that base their values off one or more of the other columns in the DataFrame, based on a function provided by the user. This function should take two values from the specified columns, and should return a new value for the new column.

// User specified way to create a column "newcol" that adds values from two existing columns
mydf.mutateFn("newcol", "col1", "col2", function(a,b){
    return(a + b);
});

Finally, mutate includes a number of shortcut functions, which cover a range of common use cases.

mydf.mutateAv("newcol", "col1", "col2") // newcol is the average of the two columns
mydf.mutateSm("newcol", "col1", "col2") // newcol is the sum of the two columns
mydf.mutateDf("newcol", "col1", "col2") // newcol is the difference of the columns
mydf.mutateDv("newcol", "col1", "col2") // newcol is equal to col1 divided by col2
mydf.mutateTx("newcol", "My label") // newcol contains the specified text in every row
mydf.mutateRn("newcol", "col1") // newcol contains the ordered rank of values in the specified column

remove

The remove function deletes columns from the pldf. The remove function expects an array of column headers, even if only one column is being deleted.

mydf.remove(["col1", "col2"])

rename

The rename function renames a column from the pldf. Similarly, the renameAll function renames all specified columns.

mydf.rename("col1", "Value")
mydf.renameAll(["col1", "col2"], ["Value", "Text"])

replace

The replace function substitutes all matching values in a column with the new specified value. Similarly, the replaceNulls function substitutes all null or undefined values with the specified value.

mydf.replace("col1", 1, 0)
mydf.replaceNulls("col1", 1)

select

The select function returns only the specified columns, removing all others from the pldf.

mydf.select(["col1", "col2"])

slice

The slice function returns only the specified rows, removing all others from the pldf.

mydf.slice(1, 3)

summarise

The summarise function works similarly to a Pivot table, reducing the pldf to unique values in the specified groupby array, and adding, averaging or counting the values in the specified column.

mydf.summarise(["col2", "col3"], "col1", "sum")
mydf.summarise(["col2", "col3"], "col1", "mean")
mydf.summarise(["col2", "col3"], "col1", "count")

At present, summarise only has the three functions shown above - sum, mean, and count - however a function with a user-specified input will be added at a later stage.

widen

The widen function is used to convert text in a given column to column headers, and populate these columns with their associated values.

Widen accepts three inputs: a reference column, which remains a single column against which the names and values are assigned; a names column, unique values from which are used as the names of subsequent columns; a values column, which is used to populate these names columns; and an optional sort column, which is normally linked to the reference column (for example, as a numeric reference for a set of dates).

mydf.widen("date", "name", "value", "datesort");

Note that missing values are assigned a value of zero, not NA. Widen removes all other columns from the df. The base widen is only anchored by a single reference column. In order to wideb by multiple columns, use widenMulti.

mydf.widenMulti(["date", "measure"], "name", "value", "datesort");

pldt functions

pldt is an interactive HTML table that updates as changes are made to the pldf object. pldt is currently in development and additional information will be added at a later date.

Helper functions

Additional helper functions, particularly to assist with preparing data before passing it to a pldf, will be added in future updates.

prep_JSONarray

The prep_JSONarray function converts data structured as an array of JSON objects, into data that is suitably structured for a pldf. Users should specify which keys from the Objects in the array should be used as column headers in their new pldf. If a keys value is not provided, the function will try to use the keys found in the first object in the array.

let rawdata = [
    {"col1":1, "col2":"One", "col3":true},
    {"col1":2, "col2":"Two", "col3":true},
    {"col1":3, "col2":"Three", "col3":false}
]

let mydata = prep_JSONarray(rawdata, ["col1", "col2", "col3"]);

let mydf = new pldf(mydata)

prep_NamedObjects

The prep_NamedObjects function converts data structured as an object of JSON objects, into data that is suitably structured for a pldf.

let rawdata = {
    "One": {"col1":1, "col2":true},
    "Two": {"col1":2, "col2":true},
    "Three": {"col1":3, "col2":false}
}

let mydata = prep_NamedObjects(rawdata);

let mydf = new pldf(mydata)

About

pldf is a framework for working with JavaScript objects as if they were DataFrames

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published