-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
109 lines (82 loc) · 3.8 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# xls2txt: converting spreadsheets to text
## Purpose
`xls2txt` and `xsl2csv` allow converting spreadsheet files to text for
compatibility with terminals and command-line utilities (e.g. diff or
less). Despite the name, they should work with both excel (xls, xlsx
or xlsb) and OpenDocument (ods) files.
The two executables provided by this package broadly work the same
way, returning one line per row separated by a unix newline (`\n`) and
quoting field values if necessary, they differ only in their *default*
field (cell) separator:
- `xls2txt` separates cells with a TAB character
- `xls2csv` uses a comma (`,`)
The tab separator seems like a better default for readability and
compatibility with various unix utilities.
## Interface
Both utilities have the same parameter and options, differing only by
their default:
* `PATH` is a mandatory path to a spreadsheet file to convert.
* `--sheet` (`-s`) is the sheet to convert. By default, the first
sheet is converted.
`sheet` can be either the *name* of a sheet, or its position in the
workbook (starting at 1, though 0 will be treated as 1).
* `--record-separator` (`-r`) is the separator used between
sheet rows, it defaults to a unix newline for both utilities.
* `--field-separator` (`f`) is the separator used between
sheet cells, it defaults to a TAB character for `xsl2txt` and a
COMMA for `xls2csv`.
* `--formula` specifies the *formula display mode*:
- by default (`cached-value`), formulas are not displayed, if a
formula cell has a cached value that is displayed, otherwise the
cell is empty
- `if-empty` will show the cached value if there is one, but will
show the formula if there isn't one
- `always` will always display the formula of a formula cell, never
the cached value
* the converted data is written to stdout
* error messages can be written to stdout, including on success
e.g. if an error occurs while parsing formulas and the formula mode
is not the default, an error will be signaled then the formula mode
will be reset to default
* returns `0` if the entire conversion succeeded, `1` otherwise:
- no input file was provided
- the input file was not found or not recognized as a valid
spreadsheet file
- the provided record or field separator is invalid (it must be a
single ascii character)
- the specified sheet was not found
- an error was found in one of the sheet cells
- an problem occurred while writing data to stdout
## Recipes
### git text conversion
This allows viewing textual diffs of spreadsheet files using `git log`
or `git diff` rather than get an unhelpful "binary files differ":
1. create a `$HOME/.gitattributes` file, or set an arbitrary file as
the attributes file (`git config --global core.attributesFile
<filename>`)
2. in that file, associate the relevant spreadsheet extensions with
the proper category (hunk-header):
*.ods diff=spreadsheet
*.xls diff=spreadsheet
*.xlsx diff=spreadsheet
*.xlsb diff=spreadsheet
3. set `xls2txt` (or `xls2csv`), possibly configured as you desire, as
the diff text converter:
git config --global diff.spreadsheet.textconv xls2txt
## Changelog
### 1.1.0
* Add support for displaying formulas
- if there is no cached value for a formula cell
- or instead of the cached value
### 1.0.2
* Update dependencies to latest versions, bump edition
### 1.0.1
* Switch back to calamine proper as 0.16.1 guarantees sheets are
in-order
## Thanks
* [calamine](https://github.com/tafia/calamine) makes getting data out
of spreadsheet files a breeze
* [rust-csv](https://github.com/BurntSushi/rust-csv) is not strictly
necessary as generating value-separated data is fairly easy (as
opposed to consuming it), but it provides confidence that the output
will be properly quoted / escaped when necessary