diff --git a/www/src/2e/chapter-3-obtaining-data.html b/www/src/2e/chapter-3-obtaining-data.html index dabde60..0a1273f 100644 --- a/www/src/2e/chapter-3-obtaining-data.html +++ b/www/src/2e/chapter-3-obtaining-data.html @@ -454,7 +454,7 @@
➊ The -H
option specifies that the CSV file has no header.
Let’s demonstrate in2csv
using a spreadsheet that contains the 2000 most popular songs according to an annual Dutch marathon radio program Top 2000.
To extract its data, you invoke in2csv
as follows:
$ curl "https://www.nporadio2.nl/data/download/TOP-2000-2020.xlsx" > top2000.xls +$ curl "https://cms-assets.nporadio.nl/npoRadio2/TOP-2000-2021.xlsx?v=1639653660" > top2000.xls x % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed @@ -464,10 +464,10 @@BadZipFile: File is not a zip file
Who is Danny Vera? The most popular song is supposed to be Bohemian Rhapsody, of course. Well, at least Queen appears plenty of times in the Top 2000 so I can’t really complain:
-$ csvgrep top2000.csv --columns ARTIEST --regex '^Queen$' | csvlook -I ➊ +$ csvgrep top2000.csv --columns artiest --regex '^Queen$' | csvlook -I ➊ StopIteration: StopIteration:-➊ The value after the
+--regex
options is a regular expression (or regex). It’s a special syntax for defining patterns. Here, I only want to match artists that exactly match “Queen,” so I use the caret (^
) and dollar sign ($
) to match the start and end of the values in the columnARTIEST
.➊ The value after the
--regex
options is a regular expression (or regex). It’s a special syntax for defining patterns. Here, I only want to match artists that exactly match “Queen,” so I use the caret (^
) and dollar sign ($
) to match the start and end of the values in the columnartiest
.By the way, the tools
in2csv
,csvgrep
, andcsvlook
are part of CSVkit, which is a collection of command-line tools to work with CSV data.The format of the file is automatically determined by the extension, .xlsx in this case. If you were to pipe the data into
in2csv
, you would have to specify the format explicitly.