Cast is a CLI tool for reading strings or complex data sets from CSV files to output them in other text formats.
Cast can also be used as a CLI interface for xxHash to generate hash values.
This Python script is a command line tool for generating SQL commands or other output with automatically hashed values using an algorithm of the xxHash family. The data to be processed is entered through the console or by reading in a text file. Furthermore, the printed results can be precisely formatted using templates.
When reading CSV files, the column names are available as placeholders within the template, which allows the data records to be automatically converted into a suitable SQL insert command, for instance. In fact, any use case that requires automatic conversion to another text file format is conceivable.
- Install Python interpreter (at least version 3.9).
- Satisfy dependency with
pip install xxhash
.
-
Place program file
cast
(without Python extensionpy
) under~/local/bin
(Linux). -
In the hidden file
.bashrc
located in the user's home directory, write the following line of code,export PATH=$HOME/.local/bin:$PATH
if this search path for executable scripts is not yet known.
-
Make the script file executable with
chmod +x cast
.
python cast <mode> <strings> [<options>]
Or if stored in the local or system-wide bin folder as xxh
:
cast [<mode>] <strings> [<options>]
The first parameter specifies the algorithm, with the following options available:
32
→xxh32
64
→xxh64
3_64
→xxh3_64
3_128
→xxh3_128
uuid
→ hexadecimal digest ofxxh3_128
formated as a UUID
In addition, each algorithm has a variant for a hexadecimal number 32x
and an unsigned integer s32
as a result (which can be particularly useful in the context of PostgreSQL, since there are only signed integers available).
By default, 64
is assumed.
--read
/-r
Load file, where each line is treated as a string to be hashed. If a CSV file is present, the hash value is calculated from all columns separated by commas, unless --input
is used to specify exactly which columns should be hashed and how. For this reason, the program assumes that CSV files have a table header.
-
--write
/-w
Specify the file in which the results should be written instead of outputting them to the console. If the specified file does not yet exist, it will be created automatically.
-
--input
/-i
Template with the placeholder
{string}
, which specifies how strings are to be hashed. When a CSV file is read in, individual column values can also be addressed using their column names as placeholders. -
--output
/-o
Template to specify exactly how records are to be output, with the possible placeholders
{string}
,{input}
,{hash}
, but also all column names of a read CSV file, where spaces between words are to be replaced by underscores. By default,"{input}" => {hash}
is used as a template; for CSV files, however, an additional column for the generated hash values is added at the beginning. -
--template
/-t
The overall output can be defined using another template, where the placeholder
{records}
stands for all records. -
--spacing
/-s
This allows additional characters to be inserted between the individual records, by default a simple line break
"\n"
.
Important Note
In order for Bash to interpret line breaks as in ";\n"
, such strings must be written as $';\n'
.
Input with Default Settings:
cast 64 "Hello, world!" "This is a test string."
Output in Custom Format:
cast 64 "Hello, world!" "This is a test." -o "'{input}': {hash}"
Generate an SQL insert command with records from a CSV data
By specifying a template, an insert command can be generated:
cast 32s \
-r capitals.csv \
-w capitals.sql \
-i "{country_code},{capital_city}" \
-o "({hash}, '{country_code}', '{capital_city}')" \
-t $'insert into City\n\t(hash, country, capital)\nvalues\n\t{records};\n' \
-s $'\n\t' \
And with this CSV file as dataset,
capital city, country code, country
Washington D.C., US, United States
Ottawa, CA, Canada
Berlin, DE, Germany
Tokyo, JP, Japan
Canberra, AU, Australia
Paris, FR, France
Brasília, BR, Brazil
Moscow, RU, Russia
Beijing, CN, China
New Delhi, IN, India
the following output is generated:
insert into City
(hash, country, capital)
values
(1507852509, 'US', 'Washington D.C.')
(2050315825, 'CA', 'Ottawa')
(-1405512320, 'DE', 'Berlin')
(1261058448, 'JP', 'Tokyo')
(1366882969, 'AU', 'Canberra')
(-1994286539, 'FR', 'Paris')
(1797318940, 'BR', 'Brasília')
(2116051181, 'RU', 'Moscow')
(-711255517, 'CN', 'Beijing')
(1246361623, 'IN', 'New Delhi');
A Python interpreter version 3.9 or higher is expected. Furthermore, the following dependency must be installed using pip:
pip install xxhash
Source code is public domain.