diff --git a/www/src/2e/chapter-3-obtaining-data.html b/www/src/2e/chapter-3-obtaining-data.html index dabde60..0a1273f 100644 --- a/www/src/2e/chapter-3-obtaining-data.html +++ b/www/src/2e/chapter-3-obtaining-data.html @@ -454,7 +454,7 @@

The -H option specifies that the CSV file has no header.

Let’s demonstrate in2csv using a spreadsheet that contains the 2000 most popular songs according to an annual Dutch marathon radio program Top 2000. To extract its data, you invoke in2csv as follows:

-
$ curl "https://www.nporadio2.nl/data/download/TOP-2000-2020.xlsx" > top2000.xls
+
$ curl "https://cms-assets.nporadio.nl/npoRadio2/TOP-2000-2021.xlsx?v=1639653660" > top2000.xls
 x
   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                  Dload  Upload   Total   Spent    Left  Speed
@@ -464,10 +464,10 @@ 

BadZipFile: File is not a zip file

Who is Danny Vera? The most popular song is supposed to be Bohemian Rhapsody, of course. Well, at least Queen appears plenty of times in the Top 2000 so I can’t really complain:

-
$ csvgrep top2000.csv --columns ARTIEST --regex '^Queen$' | csvlook -I 
+
$ csvgrep top2000.csv --columns artiest --regex '^Queen$' | csvlook -I 
 StopIteration:
 StopIteration:
-

The value after the --regex options is a regular expression (or regex). It’s a special syntax for defining patterns. Here, I only want to match artists that exactly match “Queen,” so I use the caret (^) and dollar sign ($) to match the start and end of the values in the column ARTIEST.

+

The value after the --regex options is a regular expression (or regex). It’s a special syntax for defining patterns. Here, I only want to match artists that exactly match “Queen,” so I use the caret (^) and dollar sign ($) to match the start and end of the values in the column artiest.

By the way, the tools in2csv, csvgrep, and csvlook are part of CSVkit, which is a collection of command-line tools to work with CSV data.

The format of the file is automatically determined by the extension, .xlsx in this case. If you were to pipe the data into in2csv, you would have to specify the format explicitly.