Skip to content

Positional indexing and other data-cleaning features

Compare
Choose a tag to compare
@johnkerl johnkerl released this 01 Sep 03:03

Features:

  • The new positional-indexing feature resolves #236 from @aborruso. You can now get the name of the 3rd field of each record via $[[3]], and its value by $[[[3]]]. These are both usable on either the left-hand or right-hand side of assignment statements, so you can more easily do things like renaming fields progrmatically within the DSL.

  • There is a new capitalize DSL function, complementing the already-existing toupper. This stems from #236.

  • There is a new skip-trivial-records verb, resolving #197. Similarly, there is a new remove-empty-columns verb, resolving #206. Both are useful for data-cleaning use-cases.

  • Another pair is #181 and #256. While Miller uses mmap internally (and invisibily) to get approximately a 20% performance boost over not using it, this can cause out-of-memory issues with reading either large files, or too many small ones. Now, Miller automatically avoids mmap in these cases. You can still use --mmap or --no-mmap if you want manual control of this.

  • There is a new --ivar option for the nest verb which complements the already-existing --evar. This is from #260 thanks to @jgreely.

  • There is a new keystroke-saving urandrange DSL function: urandrange(low, high) is the same as low + (high - low) * urand(). This arose from #243.

  • There is a new -v option for the cat verb which writes a low-level record-structure dump to standard error.

  • There is a new -N option for mlr which is a keystroke-saver for --implicit-csv-header --headerless-csv-output.

Documentation:

Bugfixes:

  • There was a SEGV using nest within then-chains, fixed in response to #220.

  • Quotes and backslashes weren't being escaped in JSON output with --jvquoteall; reported on #222.

An extra thank-you:

I've never code-named releases but if I were to code-name 5.5.0 I would call it "aborruso". Andrea has contributed many fantastic feature requests, as well as driving a huge volume of Miller-related discussions in StackExchange (#212). Mille grazie al mio amico @aborruso!