From 55e0f4935392c7da99a39bbe6dbf675222be1ecd Mon Sep 17 00:00:00 2001 From: Ryan Brideau Date: Thu, 16 Dec 2021 23:17:03 -0500 Subject: [PATCH 1/3] Fixed the dead URL --- www/src/2e/chapter-3-obtaining-data.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/www/src/2e/chapter-3-obtaining-data.html b/www/src/2e/chapter-3-obtaining-data.html index dabde60..747caed 100644 --- a/www/src/2e/chapter-3-obtaining-data.html +++ b/www/src/2e/chapter-3-obtaining-data.html @@ -454,7 +454,7 @@

The -H option specifies that the CSV file has no header.

Let’s demonstrate in2csv using a spreadsheet that contains the 2000 most popular songs according to an annual Dutch marathon radio program Top 2000. To extract its data, you invoke in2csv as follows:

-
$ curl "https://www.nporadio2.nl/data/download/TOP-2000-2020.xlsx" > top2000.xls
+
$ curl "https://cms-assets.nporadio.nl/npoRadio2/TOP-2000-2021.xlsx?v=1639653660" > top2000.xls
 x
   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                  Dload  Upload   Total   Spent    Left  Speed

From b3af8ca68a7476fa70494fc5fd812ba02653f69a Mon Sep 17 00:00:00 2001
From: Ryan Brideau 
Date: Thu, 16 Dec 2021 23:21:11 -0500
Subject: [PATCH 2/3] Fixed column name

---
 www/src/2e/chapter-3-obtaining-data.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/www/src/2e/chapter-3-obtaining-data.html b/www/src/2e/chapter-3-obtaining-data.html
index 747caed..9023e33 100644
--- a/www/src/2e/chapter-3-obtaining-data.html
+++ b/www/src/2e/chapter-3-obtaining-data.html
@@ -464,7 +464,7 @@ 

BadZipFile: File is not a zip file

Who is Danny Vera? The most popular song is supposed to be Bohemian Rhapsody, of course. Well, at least Queen appears plenty of times in the Top 2000 so I can’t really complain:

-
$ csvgrep top2000.csv --columns ARTIEST --regex '^Queen$' | csvlook -I 
+
$ csvgrep top2000.csv --columns artiest --regex '^Queen$' | csvlook -I 
 StopIteration:
 StopIteration:

The value after the --regex options is a regular expression (or regex). It’s a special syntax for defining patterns. Here, I only want to match artists that exactly match “Queen,” so I use the caret (^) and dollar sign ($) to match the start and end of the values in the column ARTIEST.

From 144efa70090d75e36968e00bf1b7791efa39b2fc Mon Sep 17 00:00:00 2001 From: Ryan Brideau Date: Thu, 16 Dec 2021 23:23:17 -0500 Subject: [PATCH 3/3] Update chapter-3-obtaining-data.html --- www/src/2e/chapter-3-obtaining-data.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/www/src/2e/chapter-3-obtaining-data.html b/www/src/2e/chapter-3-obtaining-data.html index 9023e33..0a1273f 100644 --- a/www/src/2e/chapter-3-obtaining-data.html +++ b/www/src/2e/chapter-3-obtaining-data.html @@ -467,7 +467,7 @@

$ csvgrep top2000.csv --columns artiest --regex '^Queen$' | csvlook -I 
 StopIteration:
 StopIteration:
-

The value after the --regex options is a regular expression (or regex). It’s a special syntax for defining patterns. Here, I only want to match artists that exactly match “Queen,” so I use the caret (^) and dollar sign ($) to match the start and end of the values in the column ARTIEST.

+

The value after the --regex options is a regular expression (or regex). It’s a special syntax for defining patterns. Here, I only want to match artists that exactly match “Queen,” so I use the caret (^) and dollar sign ($) to match the start and end of the values in the column artiest.

By the way, the tools in2csv, csvgrep, and csvlook are part of CSVkit, which is a collection of command-line tools to work with CSV data.

The format of the file is automatically determined by the extension, .xlsx in this case. If you were to pipe the data into in2csv, you would have to specify the format explicitly.