-
-
Notifications
You must be signed in to change notification settings - Fork 163
Shell WTFs
andychu edited this page Jun 26, 2024
·
73 revisions
Blogged about many of these:
- parsing bash is undecideable -- arrays vs. associative arrays with
"${a[i+1]}"
- word splitting as a hack for lack of arrays
-
${}
language ambiguity with${####}
and${x///}
, etc. -
exec {fd}< input.txt
is a terrible syntax forfd = open('input.txt')
-
test
builtin ambiguity
Other:
- programming errors are confused with runtime errors:
- trying to assign to or
unset
a readonly variable just causes a status 1, which can be ignored. Neederrexit
to make it a hard failure. - if you pass too many arguments to
continue
, it prints an error, but might continue anyway (dash/mksh/zsh) or exit the shell (bash)
- trying to assign to or
- dynamic scope
-
errexit
problems -- subshell/command sub, local -- two different issues -
getopts
builtin is implemented in all shells, butOPTIND
is a global variable with wildly diverging behavior. There's no reliable way to tell when it should be reset, becausegetopts
is called in a loop. This is a fundamental design flaw.- it also sets globals
OPTARG
and the secondopt
argument
- it also sets globals
- issue #3, arithmetic parsing at runtime
- this is actually ShellShock-like behavior in not just bash, but bash and all ksh derivatives!
-
eval
andecho
shouldn't implicitly join multiple args -- this is a confusion of strings and arrays- someone at RC was confused about this
-
trap
shouldn't take a string to be eval'd? Why not the name of a function? - multiple expression languages per type, leads to WTFs
-
(( a = b ))
is assignment of variable names -
(( a == b ))
is equality of variable names -
[[ a = b ]]
is equality of strings, like[[ 'a' == 'b' ]]
-
[[ a == b ]]
is equality of strings, like[[ 'a' == 'b' ]]
-
- undefined variables
0
in the arithmetic context - multiple += operators
-
a+=b
vs.(( a += b ))
-
-
type-compat.test.sh
-- horrible runtime parsing of array declarations pointed out by Nix devs- there is a fundamental redundancy between literals like
a=()
anddeclare +a myarray=()
- there is a fundamental redundancy between literals like
- runtime globbing -- it shouldn't happen after variable substitution. Then you can end up globbing untrusted data?
shopt -s simple_word_eval
fixes this in Oil. -
$* "$*" $@ "$@"
are not orthogonal. You never need$*
and$@
."$*"
joins by IFS? - hacky syntax rules
- here doc
EOF
vs'EOF'
/"EOF"
/\EOF
-- this is a very hacky rule. The thing that's easiest to implement. -
getopts
leading:
for error handling is hacky
- here doc
-
read
shouldn't return 1 on lack of newline -- it still modified the variable -
[[ foo.py == *.py ]]
shouldn't do globbing, should be a different operator - bash WTF: a different lex state for
[[ foo =~ (.*) ]]
-- no quotes needed, in fact no quotes allowed!- the
( ) |
chars are special
- the
- arrays:
-
${myarray}
is the same as${myarray[0]}
-
${mystr[@]}
is silently allowed - decay to strings on equality --
[[ "${a[@]}" == "${b[@]}" ]]
doesn't work - until bash 4.4, lack of ability to use empty arrays and set -u
- fundamental confusion between unset variables and empty arrays. present in
mksh
.
- fundamental confusion between unset variables and empty arrays. present in
-
- extended glob
- overloading of
*
in*(a*|b)
- bash specific: 'shopt -s extglob; echo @(a|b)` gives a syntax error, but if you change the ; to a newline, it doesn't. It does dynamic parsing!!!
- ambiguity of
[[ !(a == a) ]]
-- is it a negation of an equality test, or an extended glob? See doc/osh-manual.md. - use case: matching
*.py
without*_test.py
with extended glob:echo */!(*_test).py
- this syntax is confusing! not at all like regexes!
- I guess
!(*_test)
is like a negative lookahead and then.*
?
- overloading of
- var ref
-
${!ref}
quirks:set -u
is respected with strings, but not arrays
-
- Overloaded
!
and@
syntax:-
${!ref}
is a var ref,${a[@]}
is an array, but${!a[@]}
is not a ref to an array! It means something totally different. - this means that when substituting var refs, it's hard to know how many args will be generated
-
- argumenting parsing:
set -eou pipefail
is a very confusing syntax to parse.set -oo
orset -ee
. - Too many sublanguages, most of them fully recursive:
- command
- word
- arithmetic
-
[[
, and then at runtimetest
/[
- brace expansion -- this is recursive
- glob -- non-recursive, but extended glob is recursive
- regular expressions -- recursive
-
IFS
is used with two different algorithms: splitting a line forread
, and "splicing" an unquoted word into an argv array. POSIX says thay are related, but in practice they seem different? At the very least, one supports backslash escaping and the other doesn't (read -r
). Or you can look at it a different way: one supports quotes AND backslashes; the other supports just backslashes. - two different syntaxes for octal C escapes:
echo -e '\0377' and echo $'\377'
. FWIW C is the latter -- don't need a leading zero, and Python uses it. - string variables with hidden structure
- the first char of
$PS4
is treated differently - characters in
$IFS
are treated differently, depending on whether they're whitespace or not.
- the first char of
-
break
orcontinue
in a subshell in a loop is syntactically valid, but doesn't do what it looks like because of the process boundary. (from Connor at RC, seespec/loop.test.sh
) - Assignments can have redirects:
-
FOO=$(ls /OOPS 2>/dev/null)
vs. FOO=$(ls /OOPS) 2>/dev/null
- assignment builtins can also have redirects
-
- Semicolon vs. newline can be significant !!! Sometimes it doesn't behave the same as newlines.
- Sometimes shells miss optimizations (
test/syscall
) - Bash's failglob behaves differently. It aborts everything on the same line, even if there's a
;
. But it doesn't abort across lines.
- Sometimes shells miss optimizations (
- word elision is confusing and can result in command elision, e.g.
$(true)
. Fromhelp-bash@
. - Double quotes within double quotes is an awkward syntax, but sometimes necessary:
echo "${x:-"a*b"}"
- single quoted arg to double quoted brace sub is treated differently based on operator
-
"${x:-'default'}"
-- single quotes are literals -
"${x#'glob'}"
and"${x//'glob'}"
-- single quotes are processed by the shell
-
- Brace expansion: ascending/descending ranges and positive/negative steps gives too many degrees of freedom.
The semantics are inconsistent and confusing. In bash,
{1..7..2}
and{1..7..-2}
are the same thing, but not inzsh
. - The same syntax is reused for different semantics
- sometimes a word is in a context where you need a sequence of strings (
EvalWordSequence()
):- SimpleCommand
- for loops
- arrays
- sometimes a word is in a context where you need a single string (
EvalWordToString()
)x=$word
case $word in $pat) ...
echo hi > $word
- "$@" decays in one case but not the other. shopt -s strict-array eliminates this.
- Oil fixes this by only allowing allowing
@words
in the "sequence of strings" context, but not in the "string" context
- sometimes a word is in a context where you need a sequence of strings (
-
set
without args shows VARIABLES, even though theset
builtin sets shell options, not variables!-
set -o
shows the options, with confusing/inconsistent syntax
-
-
printf
-
%c
to get a char doesn't respect unicode; it will slice a UTF-8 character, producing binary garbage -
%6.4s
is overspecified --%6s
is the same -
%6.4d
does something weird -- it pads with zeros AND spaces. It doesn't mean "width" and "precision".
-
-
[[]
is a single left bracket. Conflicts with[[:alpha:]]
. User should write[\[]
instead.- Likewise,
[]]
should be[\]]
.
- Likewise,
- With
${#s}
or${s:1:3}
, invalid utf-8 causes nonsensical results to be returned. No errors are reported. - The stack doesn't line up!
BASH_SOURCE
is off by one fromFUNCNAME
andBASH_LINENO
This is documented but makes no sense! Sort of like the parsing of regexes after=~
. - History substitution syntax is ill-defined, with hacks to avoid conflict with
${!indirect}
,!(foo|bar)
, etc.- there is a horrible code snippet in
bashline.c
I believe - https://github.com/oilshell/oil/commit/22ea84e43289c5ea3e26917b171a8016a88cff26
- there is a horrible code snippet in
-
${arr[@]::}
means length zero, while${arr[@]: }
means length N -- empty expression is zero, unset expression is N- UNDOCUMENTED: #oil-discuss > ${arr[@]::} in bash - is it documented?
TODO: organize the criticisms in these categories:
- syntactic puns: the same character is used to mean different things
- opposite problem: different characters/conventions are used to mean the same thing (negation, etc.)
-
(( a == b ))
vs[[ a == b ]]
(although they differ slightly)
-
- sloppiness with types: string, array, undefined vs. empty
- dynamic parsing -- confusing data and code.
- arithmetic inside strings:
s=1+2; [[ $s -eq 3 ]]
-
echo -e '\n'
andprintf '\n' "\n"
vs.$'\n'
- local, declare, etc. and array syntax (
type-compat.test.sh
) -
shopt -s extglob
changes the parsing algorithm, and it doesn't work on the same line!!!bash -c 'shopt -s extglob; echo @(a|b)'
- arithmetic inside strings:
- macro processing / word expansion confuses code and data
- the "else-whatever" pattern confuses code and data
- globs that don't match evaluate to themselves (fixed by
nullglob
andsimple-word-eval
) - syntax errors in brace expansion evaluate to themselves
- tilde expansion evaluates to itself if it doesn't exist
- globs that don't match evaluate to themselves (fixed by
- lack of error checking / invalid input.
-
echo -e \x
isNUL
in mksh and zsh, but\x
in bash. It's a syntax error in C. Shell generally has the "keep going" mindset of JavaScript/PHP/Perl, which makes it hard to use. - likewise with
\1
-- should be a syntax error. Or even\d
should be\\d
. - TODO: maybe strict-backslash can handle this?
-
Escaping constructs: \
, 'single quotes'
, "double quotes"
, and $'C-style strings'
- arbitrary
CompoundWord
toglob()
orfnmatch()
input, which allows\
escaping but not double quoting. - arbitrary
CompoundWord
toregcomp()
input, where characters like[
are special too - respect
\
escape inread
without-r
-
\n
outside of double quotes evalutes ton
. Inside double quotes, it's\n
(which is the same as the behavior inside single quotes). Note that neither evalutes to a newline! That only happens with$'\n'
. - The quoting of
$(command subs)
is different than that ofbackticks
, e.g. with respect to double quotes and other backticks. This is very confusing and shell behaviors diverge once you have 2 or 3 levels of quoting.
-
BASH_REGEX
andREGEX_CHARS
lexer modes. This is orthogonal to theregcomp()
algorithm- Pathological example:
[[ foo =~ [ab]<>(foo|bar) ]]
???
- Pathological example:
- Different leading char for flag:
set -e
vsset +e
,declare -a
vs.declare +a
- Different flags:
shopt -s
vsshopt -u
- An Extra Flag:
-
export
vs.export -n
-- remove the export bit
-
- Different builtin:
alias
andunalias
are opposites-
set
andunset
aren't opposites! One sets options and argv. The other unsets variables.
-
- capitalization:
echo -e
vsecho -E
- No args:
set
-- prints functions-
readonly
,export
-- prints vars with those properties
-
-
-p
arg:declare -p
-
shopt -p
-- prints bothset
andshopt
options alias -p
-
test
-- no reason for this other than speed? -
time
-- because it should be a block? But you could do this with a more general mechanism -
kill
-- for job specs -
printf
-- don't see a reason for this -
getopts
-- tighter integration, because we want to mutate shell variables. Doesn't behave like a builtin, but has the syntax of one.
- all the flags:
read -n
,echo -n
, etc. - not shell, but a common pattern:
date +%m
vsdate +%M
-- I can never remember which. I don't know what+
means either. -
tar xvzf foo.tar.gz
can just betar -x -v -z < foo.tar.gz
- or
tar --verbose --extract --gzip < foo.tar.gz
- or
See Unix Tools
A questionable Pattern? These builtins don't behave like external commands because they can mutate memory.
read varname
getopts SPEC varname
printf -v name '%s' value