From 546259ddaba0e8ab1506729113688f85ca2986fd Mon Sep 17 00:00:00 2001 From: Ani Date: Wed, 20 Nov 2024 13:32:24 -0700 Subject: [PATCH] Using hyperlinks and vignette() calls for readability (#6617) --- vignettes/datatable-intro.Rmd | 8 ++++---- vignettes/datatable-joins.Rmd | 6 +++--- vignettes/datatable-keys-fast-subset.Rmd | 14 +++++++------- vignettes/datatable-reference-semantics.Rmd | 14 +++++++------- vignettes/datatable-sd-usage.Rmd | 2 +- ...tatable-secondary-indices-and-auto-indexing.Rmd | 12 ++++++------ 6 files changed, 28 insertions(+), 28 deletions(-) diff --git a/vignettes/datatable-intro.Rmd b/vignettes/datatable-intro.Rmd index a0ce8bf047..d32a25eb90 100644 --- a/vignettes/datatable-intro.Rmd +++ b/vignettes/datatable-intro.Rmd @@ -101,7 +101,7 @@ You can also convert existing objects to a `data.table` using `setDT()` (for `da getOption("datatable.print.nrows") ``` -* `data.table` doesn't set or use *row names*, ever. We will see why in the *"Keys and fast binary search based subset"* vignette. +* `data.table` doesn't set or use *row names*, ever. We will see why in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette. ### b) General form - in what way is a `data.table` *enhanced*? {#enhanced-1b} @@ -479,7 +479,7 @@ ans **Keys:** Actually `keyby` does a little more than *just ordering*. It also *sets a key* after ordering by setting an `attribute` called `sorted`. -We'll learn more about `keys` in the `vignette("datatable-keys-fast-subset", package="data.table")`; for now, all you have to know is that you can use `keyby` to automatically order the result by the columns specified in `by`. +We'll learn more about `keys` in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette; for now, all you have to know is that you can use `keyby` to automatically order the result by the columns specified in `by`. ### c) Chaining @@ -659,7 +659,7 @@ We have seen so far that, * We can also sort a `data.table` using `order()`, which internally uses data.table's fast order for better performance. -We can do much more in `i` by keying a `data.table`, which allows for blazing fast subsets and joins. We will see this in the `vignette("datatable-keys-fast-subset", package="data.table")` and the `vignette("datatable-joins", package="data.table")`. +We can do much more in `i` by keying a `data.table`, which allows for blazing fast subsets and joins. We will see this in the vignettes [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) and [`vignette("datatable-joins", package="data.table")`](datatable-joins.html). #### Using `j`: @@ -693,7 +693,7 @@ We can do much more in `i` by keying a `data.table`, which allows for blazing fa As long as `j` returns a `list`, each element of the list will become a column in the resulting `data.table`. -We will see how to *add/update/delete* columns *by reference* and how to combine them with `i` and `by` in the next vignette (`vignette("datatable-reference-semantics", package="data.table")`). +We will see how to *add/update/delete* columns *by reference* and how to combine them with `i` and `by` in the [next vignette (`vignette("datatable-reference-semantics", package="data.table")`)](datatable-reference-semantics.html). *** diff --git a/vignettes/datatable-joins.Rmd b/vignettes/datatable-joins.Rmd index b3b30598d1..a35f78bb21 100644 --- a/vignettes/datatable-joins.Rmd +++ b/vignettes/datatable-joins.Rmd @@ -26,9 +26,9 @@ In this vignette you will learn how to perform any join operation using resource It assumes familiarity with the `data.table` syntax. If that is not the case, please read the following vignettes: -- `vignette("datatable-intro", package="data.table")` -- `vignette("datatable-reference-semantics", package="data.table")` -- `vignette("datatable-keys-fast-subset", package="data.table")` +- [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) +- [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) +- [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) *** diff --git a/vignettes/datatable-keys-fast-subset.Rmd b/vignettes/datatable-keys-fast-subset.Rmd index d60552ea8f..6f77c3de24 100644 --- a/vignettes/datatable-keys-fast-subset.Rmd +++ b/vignettes/datatable-keys-fast-subset.Rmd @@ -24,13 +24,13 @@ knitr::opts_chunk$set( .old.th = setDTthreads(1) ``` -This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and group by using `by`. If you're not familiar with these concepts, please read the `vignette("datatable-intro", package="data.table")` and the `vignette("datatable-reference-semantics", package="data.table")` first. +This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and group by using `by`. If you're not familiar with these concepts, please read the vignettes [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) and [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) first. *** ## Data {#data} -We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`. +We will use the same `flights` data as in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette. ```{r echo = FALSE} options(width = 100L) @@ -58,7 +58,7 @@ In this vignette, we will ### a) What is a *key*? -In the `vignette("datatable-intro", package="data.table")`, we saw how to subset rows in `i` using logical expressions, row numbers and using `order()`. In this section, we will look at another way of subsetting incredibly fast - using *keys*. +In the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette, we saw how to subset rows in `i` using logical expressions, row numbers and using `order()`. In this section, we will look at another way of subsetting incredibly fast - using *keys*. But first, let's start by looking at *data.frames*. All *data.frames* have a row names attribute. Consider the *data.frame* `DF` below. @@ -143,7 +143,7 @@ head(flights) * Alternatively you can pass a character vector of column names to the function `setkeyv()`. This is particularly useful while designing functions to pass columns to set key on as function arguments. -* Note that we did not have to assign the result back to a variable. This is because like the `:=` function we saw in the `vignette("datatable-reference-semantics", package="data.table")`, `setkey()` and `setkeyv()` modify the input *data.table* *by reference*. They return the result invisibly. +* Note that we did not have to assign the result back to a variable. This is because like the `:=` function we saw in the [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) vignette, `setkey()` and `setkeyv()` modify the input *data.table* *by reference*. They return the result invisibly. * The *data.table* is now reordered (or sorted) by the column we provided - `origin`. Since we reorder by reference, we only require additional memory of one column of length equal to the number of rows in the *data.table*, and is therefore very memory efficient. @@ -262,7 +262,7 @@ flights[.("LGA", "TPA"), .(arr_delay)] * The *row indices* corresponding to `origin == "LGA"` and `dest == "TPA"` are obtained using *key based subset*. -* Once we have the row indices, we look at `j` which requires only the `arr_delay` column. So we simply select the column `arr_delay` for those *row indices* in the exact same way as we have seen in `vignette("datatable-intro", package="data.table")`. +* Once we have the row indices, we look at `j` which requires only the `arr_delay` column. So we simply select the column `arr_delay` for those *row indices* in the exact same way as we have seen in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette. * We could have returned the result by using `with = FALSE` as well. @@ -290,7 +290,7 @@ flights[.("LGA", "TPA"), max(arr_delay)] ### d) *sub-assign* by reference using `:=` in `j` -We have seen this example already in the `vignette("datatable-reference-semantics", package="data.table")`. Let's take a look at all the `hours` available in the `flights` *data.table*: +We have seen this example already in the [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) vignette. Let's take a look at all the `hours` available in the `flights` *data.table*: ```{r} # get all 'hours' in flights @@ -498,7 +498,7 @@ In this vignette, we have learnt another method to subset rows in `i` by keying * combine key based subsets with `j` and `by`. Note that the `j` and `by` operations are exactly the same as before. -Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*. But it may not be always desirable to set key and physically reorder the *data.table*. In the next `vignette("datatable-secondary-indices-and-auto-indexing", package="data.table")`, we will address this using a *new* feature -- *secondary indexes*. +Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*. But it may not be always desirable to set key and physically reorder the *data.table*. In the next [next vignette (`vignette("datatable-secondary-indices-and-auto-indexing", package="data.table")`)](datatable-secondary-indices-and-auto-indexing.html), we will address this using a *new* feature -- *secondary indexes*. ```{r, echo=FALSE} diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd index 0c55fc4a1d..b6d895af14 100644 --- a/vignettes/datatable-reference-semantics.Rmd +++ b/vignettes/datatable-reference-semantics.Rmd @@ -23,13 +23,13 @@ knitr::opts_chunk$set( collapse = TRUE) .old.th = setDTthreads(1) ``` -This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familiar with these concepts, please read the `vignette("datatable-intro", package="data.table")` first. +This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familiar with these concepts, please read the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette first. *** ## Data {#data} -We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`. +We will use the same `flights` data as in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette. ```{r echo = FALSE} options(width = 100L) @@ -169,7 +169,7 @@ We see that there are totally `25` unique values in the data. Both *0* and *24* flights[hour == 24L, hour := 0L] ``` -* We can use `i` along with `:=` in `j` the very same way as we have already seen in the `vignette("datatable-intro", package="data.table")`. +* We can use `i` along with `:=` in `j` the very same way as we have already seen in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette. * Column `hour` is replaced with `0` only on those *row indices* where the condition `hour == 24L` specified in `i` evaluates to `TRUE`. @@ -234,7 +234,7 @@ head(flights) * We provide the columns to group by the same way as shown in the *Introduction to data.table* vignette. For each group, `max(speed)` is computed, which returns a single value. That value is recycled to fit the length of the group. Once again, no copies are being made at all. `flights` *data.table* is modified *in-place*. -* We could have also provided `by` with a *character vector* as we saw in the `vignette("datatable-intro", package="data.table")`, e.g., `by = c("origin", "dest")`. +* We could have also provided `by` with a *character vector* as we saw in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette, e.g., `by = c("origin", "dest")`. # @@ -253,7 +253,7 @@ head(flights) * Note that since we allow assignment by reference without quoting column names when there is only one column as explained in [Section 2c](#delete-convenience), we can not do `out_cols := lapply(.SD, max)`. That would result in adding one new column named `out_cols`. Instead we should do either `c(out_cols)` or simply `(out_cols)`. Wrapping the variable name with `(` is enough to differentiate between the two cases. -* The `LHS := RHS` form allows us to operate on multiple columns. In the RHS, to compute the `max` on columns specified in `.SDcols`, we make use of the base function `lapply()` along with `.SD` in the same way as we have seen before in the `vignette("datatable-intro", package="data.table")`. It returns a list of two elements, containing the maximum value corresponding to `dep_delay` and `arr_delay` for each group. +* The `LHS := RHS` form allows us to operate on multiple columns. In the RHS, to compute the `max` on columns specified in `.SDcols`, we make use of the base function `lapply()` along with `.SD` in the same way as we have seen before in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette. It returns a list of two elements, containing the maximum value corresponding to `dep_delay` and `arr_delay` for each group. # Before moving on to the next section, let's clean up the newly created columns `speed`, `max_speed`, `max_dep_delay` and `max_arr_delay`. @@ -369,7 +369,7 @@ However we could improve this functionality further by *shallow* copying instead * It is used to *add/update/delete* columns by reference. -* We have also seen how to use `:=` along with `i` and `by` the same way as we have seen in the `vignette("datatable-intro", package="data.table")`. We can in the same way use `keyby`, chain operations together, and pass expressions to `by` as well all in the same way. The syntax is *consistent*. +* We have also seen how to use `:=` along with `i` and `by` the same way as we have seen in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette. We can in the same way use `keyby`, chain operations together, and pass expressions to `by` as well all in the same way. The syntax is *consistent*. * We can use `:=` for its side effect or use `copy()` to not modify the original object while updating by reference. @@ -379,6 +379,6 @@ setDTthreads(.old.th) # -So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the next vignette `vignette("datatable-keys-fast-subset", package="data.table")` to perform *blazing fast subsets* by *keying data.tables*. +So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the [next vignette (`vignette("datatable-keys-fast-subset", package="data.table")`)](datatable-keys-fast-subset.html) to perform *blazing fast subsets* by *keying data.tables*. *** diff --git a/vignettes/datatable-sd-usage.Rmd b/vignettes/datatable-sd-usage.Rmd index f005b1594e..2f91f0bb1d 100644 --- a/vignettes/datatable-sd-usage.Rmd +++ b/vignettes/datatable-sd-usage.Rmd @@ -124,7 +124,7 @@ head(unique(Teams[[fkt[1L]]])) Note: -1. The `:=` is an assignment operator to update the `data.table` in place without making a copy. See `vignette("datatable-reference-semantics", package="data.table")` for more. +1. The `:=` is an assignment operator to update the `data.table` in place without making a copy. See [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) for more. 2. The LHS, `names(.SD)`, indicates which columns we are updating - in this case we update the entire `.SD`. 3. The RHS, `lapply()`, loops through each column of the `.SD` and converts the column to a factor. 4. We use the `.SDcols` to only select columns that have pattern of `teamID`. diff --git a/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd b/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd index 85ce0d67c3..7be917032d 100644 --- a/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd +++ b/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd @@ -24,13 +24,13 @@ knitr::opts_chunk$set( .old.th = setDTthreads(1) ``` -This vignette assumes that the reader is familiar with data.table's `[i, j, by]` syntax, and how to perform fast key based subsets. If you're not familiar with these concepts, please read the *"Introduction to data.table"*, *"Reference semantics"* and *"Keys and fast binary search based subset"* vignettes first. +This vignette assumes that the reader is familiar with data.table's `[i, j, by]` syntax, and how to perform fast key based subsets. If you're not familiar with these concepts, please read the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html), [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html), and [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignettes first. *** ## Data {#data} -We will use the same `flights` data as in the `vignette("datatable-intro", package="data.table")`. +We will use the same `flights` data as in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette. ```{r echo = FALSE} options(width = 100L) @@ -193,7 +193,7 @@ flights[.("JFK", "LAX"), on = c("origin", "dest")][1:5] ### b) Select in `j` -All the operations we will discuss below are no different to the ones we already saw in the `vignette("datatable-keys-fast-subset", package="data.table")`. Except we'll be using the `on` argument instead of setting keys. +All the operations we will discuss below are no different to the ones we already saw in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette. Except we'll be using the `on` argument instead of setting keys. #### -- Return `arr_delay` column alone as a data.table corresponding to `origin = "LGA"` and `dest = "TPA"` @@ -219,7 +219,7 @@ flights[.("LGA", "TPA"), max(arr_delay), on = c("origin", "dest")] ### e) *sub-assign* by reference using `:=` in `j` -We have seen this example already in the `vignette("datatable-reference-semantics", package="data.table")` and the `vignette("datatable-keys-fast-subset", package="data.table")`. Let's take a look at all the `hours` available in the `flights` *data.table*: +We have seen this example already in the vignettes [`vignette("datatable-reference-semantics", package="data.table")`](datatable-reference-semantics.html) and [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html). Let's take a look at all the `hours` available in the `flights` *data.table*: ```{r} # get all 'hours' in flights @@ -253,7 +253,7 @@ head(ans) ### g) The *mult* argument -The other arguments including `mult` work exactly the same way as we saw in the `vignette("datatable-keys-fast-subset", package="data.table")`. The default value for `mult` is "all". We can choose, instead only the "first" or "last" matching rows should be returned. +The other arguments including `mult` work exactly the same way as we saw in the [`vignette("datatable-keys-fast-subset", package="data.table")`](datatable-keys-fast-subset.html) vignette. The default value for `mult` is "all". We can choose, instead only the "first" or "last" matching rows should be returned. #### -- Subset only the first matching row where `dest` matches *"BOS"* and *"DAY"* @@ -327,7 +327,7 @@ system.time(dt[x %in% 1989:2012]) In recent version we extended auto indexing to expressions involving more than one column (combined with `&` operator). In the future, we plan to extend binary search to work with more binary operators like `<`, `<=`, `>` and `>=`. -We will discuss fast *subsets* using keys and secondary indices to *joins* in the next vignette, `vignette("datatable-joins", package="data.table")`. +We will discuss fast *subsets* using keys and secondary indices to *joins* in the [next vignette (`vignette("datatable-joins", package="data.table")`)](datatable-joins.html). ***