Skip to content

Commit

Permalink
README.md: add note that 'input_format' now supported 'csvdictreader'…
Browse files Browse the repository at this point in the history
… when used as a library
  • Loading branch information
bxparks committed Jul 19, 2023
1 parent 7e0c046 commit 6823ac5
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,18 @@ csv` flag. The support is not as robust as JSON file. For example, CSV format
supports only the comma-separator, and does not support the pipe (`|`) or tab
(`\t`) character.

**Side Note**: The `input_format` parameter now supports (v1.6.0) the
`csvdictreader` option which allows using the
[csv.DictReader](https://docs.python.org/3/library/csv.html) class that can be
customized to handle different delimiters such as tabs. But this requires
creating a custom Python script using `bigquery_schema_generator` as a library.
See [SchemaGenerator.deduce_schema()` from
csv.DictReader](#SchemaGeneratorDeduceSchemaFromCsvDictReader) section below. It
is probably possible to enable this functionality through the command line
script, but it was not obvious how to expose the various options of
`csv.DictReader` through the command line flags. I didn't spend any time on this
problem because this is not a feature that I use personally.)

Unlike `bq load`, the `generate_schema.py` script reads every record in the
input data file to deduce the table's schema. It prints the JSON formatted
schema file on the STDOUT.
Expand Down

0 comments on commit 6823ac5

Please sign in to comment.