From 9f83871db90f9bfa1d559b0e4ae78cb1763bf9b4 Mon Sep 17 00:00:00 2001
From: Kyle Haynes <5267027+KyleHaynes@users.noreply.github.com>
Date: Mon, 23 Dec 2024 08:05:53 +1000
Subject: [PATCH] various tweaks to joins rmd file

---
 vignettes/datatable-joins.Rmd | 38 +++++++++++++++++------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/vignettes/datatable-joins.Rmd b/vignettes/datatable-joins.Rmd
index a35f78bb2..e86e6df84 100644
--- a/vignettes/datatable-joins.Rmd
+++ b/vignettes/datatable-joins.Rmd
@@ -126,7 +126,7 @@ The next diagram shows a description for each basic argument. In the following s
 x[i, on, nomatch]
 | |  |   |
 | |  |   \__ If NULL only returns rows linked in x and i tables
-| |  \____ a character vector o list defining match logict
+| |  \____ a character vector o list defining match logic
 | \_____ primary data.table, list or data.frame
 \____ secondary data.table
 ```
@@ -304,7 +304,7 @@ ProductReceived[Products,
 
 Despite both tables have the same information, they present some relevant differences:
 
-- They present different order for their columns
+- They present different order for their columns.
 - They have some name differences on their columns names:
   - The `id` column of first table has the same information as the `product_id` in the second table.
   - The `i.id` column of first table has the same information as the `id` in the second table.
@@ -391,7 +391,7 @@ Here some important considerations:
   
 - **Row level**
   - All rows from in the `i` table were kept as we never received any banana but row is still part of the results.
-  - The row related to `product_id = 6` is no part of the results any more as it is not present in the `Products` table.
+  - The row related to `product_id = 6` is not part of the results any more as it is not present in the `Products` table.
 
 
 #### 3.5.1. Joining after chain operations
@@ -510,7 +510,7 @@ Use this method if you need to combine columns from 2 tables based on one or mor
 
 As we saw in the previous section, any of the prior operations can keep the missing `product_id = 6` and the **soda** (`product_id = 4`) as part of the results.
 
-To save this problem, we can use the `merge` function even thought it is lower than using the native `data.table`'s joining syntax.
+To save this problem, we can use the `merge` function even though it is lower than using the native `data.table`'s joining syntax.
 
 ```{r}
 merge(x = Products,
@@ -526,22 +526,22 @@ merge(x = Products,
 
 A non-equi join is a type of join where the condition for matching rows is not based on equality, but on other comparison operators like <, >, <=, or >=. This allows for **more flexible joining criteria**. In `data.table`, non-equi joins are particularly useful for operations like:
 
-- Finding the nearest match
-- Comparing ranges of values between tables
+- Finding the nearest match.
+- Comparing ranges of values between tables.
 
 It's a great alternative if after applying a right of inner join:
 
-- You want to decrease the number of returned rows based on comparing numeric columns of different table.
+- You want to decrease the number of returned rows based on comparing numeric columns of a different table.
 - You don't need to keep the columns from table `x`*(secondary data.table)* in the final table.
 
-To illustrate how this work, let's center over attention on how are the sales and receives for product 2.
+To illustrate how this works, let's focus on the sales and receives for product 2.
   
 ```{r}
 ProductSalesProd2 = ProductSales[product_id == 2L]
 ProductReceivedProd2 = ProductReceived[product_id == 2L]
 ```
 
-If want to know, for example, if can find any receive that took place before a sales date, we can apply the next code.
+If want to know, for example, you can find any receive that took place before a sales date, we can apply the following.
 
 ```{r}
 ProductReceivedProd2[ProductSalesProd2,
@@ -552,16 +552,16 @@ ProductReceivedProd2[ProductSalesProd2,
 
 What does happen if we just apply the same logic on the list passed to `on`?
 
-- As this opperation it's still a right join, it returns all rows from the `i` table, but only shows the values for `id` and `count` when the rules are met.
+- As this operation is still a right join, it returns all rows from the `i` table, but only shows the values for `id` and `count` when the rules are met.
 
-- The date related `ProductReceivedProd2` was omited from this new table.
+- The date related `ProductReceivedProd2` was omitted from this new table.
 
 ```{r}
 ProductReceivedProd2[ProductSalesProd2,
                      on = list(product_id, date < date)]
 ```
 
-Now, after applying the join, we can limit the results only show the cases that meet all joining criteria.                                                               
+Now, after applying the join, we can limit the results only showing the cases that meet all joining criteria.                                                               
 
 ```{r}
 ProductReceivedProd2[ProductSalesProd2,
@@ -574,7 +574,7 @@ ProductReceivedProd2[ProductSalesProd2,
 
 Rolling joins are particularly useful in time-series data analysis. They allow you to **match rows based on the nearest value** in a sorted column, typically a date or time column. 
 
-This is valuable when you need to align data from different sources **that may not have exactly matching timestamps**, or when you want to carry forward the most recent value. 
+This is valuable when you need to align data from different sources **that may not have exact matching timestamps**, or when you want to carry forward the most recent value. 
 
 For example, in financial data, you might use a rolling join to assign the most recent stock price to each transaction, even if the price updates and transactions don't occur at the exact same times.
 
@@ -594,7 +594,7 @@ ProductPriceHistory = data.table(
 ProductPriceHistory
 ```
 
-Now, we can perform a right join giving a different prices for each product based on the sale date.
+Now, we can perform a right join giving a different price for each product based on the sale date.
 
 ```{r}
 ProductPriceHistory[ProductSales,
@@ -617,9 +617,9 @@ ProductPriceHistory[ProductSales,
 
 ### 7.1. Subsets as joins
 
-As we just saw in the prior section the `x` table gets filtered by the values available in the `i` table. Actually, that process is faster than passing a Boolean expression to the `i` argument.
+As we just saw in the prior section the `x` table gets filtered by the values available in the `i` table. This process is faster than passing a Boolean expression to the `i` argument.
 
-To filter the `x` table at speed we don't to pass a complete `data.table`, we can pass a `list()` of vectors with the values that we want to keep or omit from the original table.
+To filter the `x` table at speed we don't need to pass a complete `data.table`, we can pass a `list()` of vectors with the values that we want to keep or omit from the original table.
 
 For example, to filter dates where the market received 100 units of bananas (`product_id = 1`) or popcorn (`product_id = 3`) we can use the following:
 
@@ -628,7 +628,7 @@ ProductReceived[list(c(1L, 3L), 100L),
                 on = c("product_id", "count")]
 ```
 
-As at the end, we are filtering based on a join operation the code returned a **row that was not present in original table**. To avoid that behavior, it is recommended to always to add the argument `nomatch = NULL`.
+As at the end, we are filtering based on a join operation the code returned a **row that was not present in original table**. To avoid that behavior, it is recommended to always add the argument `nomatch = NULL`.
 
 ```{r}
 ProductReceived[list(c(1L, 3L), 100L),
@@ -644,7 +644,7 @@ ProductReceived[!list(c(1L, 3L), 100L),
                 on = c("product_id", "count")]
 ```
 
-If you just want to filter a value for a single **character column**, you can omit calling the `list()` function pass the value to been filtered in the `i` argument.
+If you just want to filter a value for a single **character column**, you can omit calling the `list()` function and pass the value to be filtered in the `i` argument.
 
 ```{r}
 Products[c("banana","popcorn"),
@@ -674,7 +674,7 @@ copy(Products)[ProductPriceHistory,
 
 In this operation:
 
-- The function `copy` prevent that `:=` changes by reference the `Products` table.s
+- The function copy creates a ***deep*** copy of the `Products` table, preventing modifications made by `:=` from changing the original table by reference.
 - We join `Products` with `ProductPriceHistory` based on `id` and `product_id`.
 - We update the `price` column with the latest price from `ProductPriceHistory`.
 - We add a new `last_updated` column to track when the price was last changed.