Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing and unparsing has super-linear runtime #8

Open
thurstond opened this issue Jan 28, 2021 · 6 comments
Open

Parsing and unparsing has super-linear runtime #8

thurstond opened this issue Jan 28, 2021 · 6 comments
Labels
enhancement New feature or request

Comments

@thurstond
Copy link

thurstond commented Jan 28, 2021

Parsing and unparsing have super-linear runtime, largely because OCaml list append and string concatenation are not optimized.

Parsing: parse_tilde quadratic (/cubic?) behavior

for n in 1000 2000 4000 8000 16000 32000 64000; do perl -e "print '~'; print 'a' x $n;" > /tmp/t; time ./parse_to_json.native /tmp/t > /dev/null; done

user    0m0.015s
user    0m0.034s
user    0m0.153s
user    0m0.830s
user    0m4.540s
user    0m27.661s
user    3m4.591s

(real/sys runtime omitted for brevity)

Notes:

  • This is likely because of ast.ml: | c::s' -> parse_tilde (acc @ [c]) s'. It's not obvious to me why it seems to be cubic (rather than quadratic) runtime though.
    • An alternative would be to prepend to the list, and then reverse the result at the end.
  • The parse_to_json.native test application is from https://github.com/andromeda/pash/tree/main/compiler/parser
  • To avoid conflating the libdash runtime with the JSON serialization, I disabled the JSON serialization code (print_ast) in the test app

Parsing: to_assign cubic (?) behavior

for n in 1000 2000 4000 8000 16000 32000 64000; do perl -e "print 'f' x $n; print '=pay_respects;'" > /tmp/t1; time ./parse_to_json.native /tmp/t1 > /dev/null; done

user    0m0.011s
user    0m0.021s
user    0m0.098s
user    0m0.571s
user    0m3.749s
user    0m25.097s
user    2m52.969s

This also has an expensive list append operation: ast.ml | C c :: a -> to_assign (v @ [c]) a.

Unparsing: ^ string concatenation considered harmful

OCaml's "^" string operator is not optimized; concatenating n strings one at a time can take O(n^2) runtime (https://discuss.ocaml.org/t/whats-the-fastest-way-to-concatenate-strings/2616/7). This is arguably a compiler issue e.g., CPython optimizes for common cases (https://mail.python.org/pipermail/python-dev/2013-February/124031.html).

ast.ml's unparsing is essentially a series of "^" operations, hence everything is going to have a worst-case runtime that's super-linear.

Here's an example of a long pipeline, showing quadratic runtime for json_to_shell:

for n in 2000 4000 8000 16000 32000 64000 128000; do (echo -n "echo 'Hello '"; for f in `seq 1 $n`; do echo -n "| cat "; done; echo) > /tmp/s1; cat /tmp/s1 | ./parse_to_json.native > /tmp/j1; time ./json_to_shell.native /tmp/j1 | md5sum; done

user    0m0.035s
user    0m0.112s
user    0m0.508s
user    0m1.920s
user    0m13.909s
user    0m52.966s
user    4m26.274s

I also tried removing the Ast.to_string operation in json_to_shell.ml, to show that the JSON deserialization by itself is fast (linear); thus, the quadratic runtime is due to the libdash core.

Unparsing: fresh_marker for heredocs is slow on adversarial inputs

fresh_marker tries to find increasingly long-variants of {EOF, EOFF, EOFFF ..}, until it can find a marker that is not contained in the heredoc. This is slow for adversarial inputs that deliberately use all those markers:

cat <<CTHULHU
EOF
EOFF
EOFFF
EOFFFF
EOFFFFF
CTHULHU
@mgree
Copy link
Collaborator

mgree commented Jan 28, 2021

These are amazing (a/k/a awful) performance issues, thanks for finding them!

@thurstond
Copy link
Author

Another source of inefficiency in parse_tilde is that OCaml by default has eager evaluation, which means implode will be called at every per-character iteration of parse_tilde, instead of just once when the ret value is actually needed
i.e., for an n-character input, implode will be about n times, leading to O(n^2) runtime.

Code snippet where this happens in ast.ml:

parse_tilde acc = 
  let ret = if acc = [] then None else Some (implode acc) in
  function
  | [] -> (ret , [])
  ...
  | c::s' -> parse_tilde (acc @ [c]) s'  

To show this, I changed dash.ast to add a printf to the implode function:

let implode l =
  let s = Bytes.create (List.length l) in
  let rec imp i l =
    match l with
    | []  -> ()
    | (c::l) -> (Bytes.set s i c; imp (i+1) l)
  in
  imp 0 l;
  Printf.printf "implode created: %s\n" s;
  Bytes.unsafe_to_string s

Output:

/pash/compiler/parser# echo "~lovecraft" | ./parse_to_json.native
implode created: l
implode created: lo
implode created: lov
implode created: love
implode created: lovec
implode created: lovecr
implode created: lovecra
implode created: lovecraf
implode created: lovecraft
["Command",[1,[],[[["T",["Some","lovecraft"]]]],[]]]

There is a "Lazy" module in OCaml, or just use Haskell 🙃

@mgree
Copy link
Collaborator

mgree commented Jan 29, 2021

😬 😅

@thurstond
Copy link
Author

Linear-time versions of parse_tilde and to_assign (changing list append to list prepend + one-off reverse, and removing the eager implode in parse_tilde):

and maybe_implode_rev acc =
  if acc = [] then None else Some (implode (List.rev acc))

and parse_tilde acc =
  function
  | [] -> (maybe_implode_rev acc, [])
  (* CTLESC *)
  | '\129'::_ as s -> None, s
  (* CTLQUOTEMARK *)
  | '\136'::_ as s -> None, s
  (* terminal: CTLENDVAR, /, : *)
  | '\131'::_ as s -> maybe_implode_rev acc, s
  | ':'::_ as s -> maybe_implode_rev acc, s
  | '/'::_ as s -> maybe_implode_rev acc, s
  (* ordinary char *)
  (* TODO 2019-01-03 only characters from the portable character set *)
  | c::s' -> parse_tilde (c :: acc) s'
  • The copy-and-pasted maybe_implode_rev acc is not as pretty as having ret (or a lazy version of it), but it's fast.
and to_assign v = function
  | [] -> failwith ("Never found an '=' sign in assignment, got " ^ implode (List.rev v))
  | C '=' :: a -> (implode (List.rev v),a)
  | C c :: a -> to_assign (c :: v) a
  | _ -> failwith "Unexpected special character in assignment"

@mgree
Copy link
Collaborator

mgree commented Feb 18, 2021

Relatedly, parse_arg and arg_char are using the call stack when they shouldn't need to.

@mgree mgree added the enhancement New feature or request label Feb 24, 2022
@mgree
Copy link
Collaborator

mgree commented Jul 10, 2022

After #17 merges, the only remaining issue is the pervasively expensive string concatenation. Converting to_string to use Buffer should resolve this last issue.

mgree added a commit that referenced this issue Dec 14, 2023
* options: Do not set commandname in procargs

We set commandname in procargs when we don't have to.  This results
in a duplicated output of arg0 when an error occurs.

Reported-by: Olivier Duclos <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* expand: Fix double-decrement in argstr

Due to a double decrement in argstr we may miss field separators
at the end of a word in certain situations.

Reported-by: Martijn Dekker <[email protected]>
Fixes: 3cd5386 ("expand: Do not reprocess data when...")
Signed-off-by: Herbert Xu <[email protected]>

* eval: Reset handler when entering a subshell

As it is a subshell can execute code that is only meant for the
parent shell when it executes a longjmp that is caught by something
like evalcommand.  This patch fixes it by resetting the handler
when entering a subshell.

Reported-by: Martijn Dekker <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* parser: Fix old-style command substitution here-document crash

On Wed, Jul 25, 2018 at 12:38:27PM +0000, project-repo wrote:
> Hi,
> I am working on a project in which I use the honggfuzz fuzzer to fuzz open
> source software and I decided to fuzz dash. In doing so I discovered a
> NULL pointer dereference in src/redir.ch on line 305. Following is a
> backtrace as supplied by the address sanitizer:
>
> AddressSanitizer:DEADLYSIGNAL
> =================================================================
> ==39623==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x0000005768ed bp 0x7ffc00273df0 sp 0x7ffc00273c60 T0)
> ==39623==The signal is caused by a READ memory access.
> ==39623==Hint: address points to the zero page.
>     #0 0x5768ec in openhere /home/jfe/dash/src/redir.c:305:29
>     #1 0x574d92 in openredirect /home/jfe/dash/src/redir.c:230:7
>     #2 0x5737fe in redirect /home/jfe/dash/src/redir.c:121:11
>     #3 0x576017 in redirectsafe /home/jfe/dash/src/redir.c:424:3
>     #4 0x522326 in evalcommand /home/jfe/dash/src/eval.c:828:11
>     #5 0x520010 in evaltree /home/jfe/dash/src/eval.c:288:12
>     #6 0x5270da in evaltreenr /home/jfe/dash/src/eval.c:332:2
>     #7 0x526f04 in evalbackcmd /home/jfe/dash/src/eval.c:640:3
>     #8 0x539020 in expbackq /home/jfe/dash/src/expand.c:522:2
>     #9 0x5332d7 in argstr /home/jfe/dash/src/expand.c:343:4
>     #10 0x5322f7 in expandarg /home/jfe/dash/src/expand.c:196:2
>     #11 0x528118 in fill_arglist /home/jfe/dash/src/eval.c:659:3
>     #12 0x5213b6 in evalcommand /home/jfe/dash/src/eval.c:769:13
>     #13 0x520010 in evaltree /home/jfe/dash/src/eval.c:288:12
>     #14 0x554423 in cmdloop /home/jfe/dash/src/main.c:234:8
>     #15 0x553bcc in main /home/jfe/dash/src/main.c:176:3
>     #16 0x7f201c2b2a86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21a86)
>     #17 0x41dfb9 in _start (/home/jfe/dash/src/dash+0x41dfb9)
>
> AddressSanitizer can not provide additional info.
> SUMMARY: AddressSanitizer: SEGV /home/jfe/dash/src/redir.c:305:29 in openhere
> ==39623==ABORTING
>
> This bug can be reproduced by running "dash < min" where min is þhe file
> attached. I was able to reproduce this bug with the current git version
> and the current debian version.
>
> cheers
> project-repo
>
> <<A
> `<<A(`

Thanks for the report! This is caused by the recent change to
save/restore here-docment list around command substitutions.  In
doing so we must finish existing here-documents prior to restoring
the old here-document list.  This is done for new-style command
substitutions but not for old-style.

This patch fixes it by doing it for both.

Reported-by: project-repo <[email protected]>
Fixes: 51e2d88 ("parser: Save/restore here-documents in...")
Signed-off-by: Herbert Xu <[email protected]>

* expand: Fix trailing newlines processing in backquote expanding

According to POSIX.1-2008 we should remove newlines only at the end of
the substitution. Newlines-only substitions causes dash to remove
newlines before beggining of the substitution. The following code:

    cat <<END
    1
    $(echo "")
    2
    END

prints "1<newline>2" instead of expected "1<newline><newline>2".

This patch fixes trailing newlines processing in backquote expanding.

Signed-off-by: Nikolai Merinov <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* parser: Only accept single-digit parameter expansion outside of braces

On Thu, Apr 25, 2019 at 01:39:52AM +0000, Michael Orlitzky wrote:
> The POSIX spec says,
>
>   The parameter name or symbol can be enclosed in braces, which are
>   optional except for positional parameters with more than one digit or
>   when parameter is a name and is followed by a character that could be
>   interpreted as part of the name.
>
> However, dash seems to diverge from that behavior when we get to $10:
>
>   $ cat test.sh
>   echo $10
>
>   $ dash ./test.sh one two three four five six seven eight nine ten
>   ten
>
>   $ bash ./test.sh one two three four five six seven eight nine ten
>   one0

This patch should fix the problem.

Signed-off-by: Herbert Xu <[email protected]>

* shell: delete AC_PROG_YACC

Signed-off-by: Herbert Xu <[email protected]>

* redir: Clear saved redirections in subshell

When we enter a subshell we need to drop the saved redirections
as otherwise a subsequent unwindredir could produce incorrect
results.

This patch does this by simply clearing redirlist.  While we
could actually free the memory underneath for subshells it isn't
really worth the trouble for now.

In order to ensure that this is done in every place where we enter
a subshell, this patch adds a new mkinit hook called forkreset.
The calls closescript, clear_traps and reset_handler are also added
to the forkreset hook.

This fixes a bug where the first two functions weren't called
if we enter a subshell without forking.

Reported-by: Harald van Dijk <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* builtin: Fix seconds part of times(1)

The seconds part of the times(1) built-in is wrong as it does not
exclude the minutes part of the result.  This patch fixes it.

This problem was first noted by Michael Greenberg who also sent
a similar patch.

Reported-by: Michael Greenberg <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* jobs: Rename DOWAIT_NORMAL to DOWAIT_NONBLOCK

To make it clearer what it is doing: nonblocking wait()

Signed-off-by: Denys Vlasenko <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* var: Remove poplocalvars() always-zero argument, make it static

Signed-off-by: Denys Vlasenko <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* jobs: Fix infinite loop in waitproc

After we changed the resetting of gotsigchld so that it is only
done if jp is NULL, we can now get an infinite loop in waitproc
if gotsigchld is set but there is no outstanding child because
everything had been waited for previously without gotsigchld being
zeroed.

This patch fixes it by always zeroing gotsigchld as we did before.
The bug that the previous patch was trying to fix is now resolved
by switching the blocking mode to DOWAIT_NORMAL after the specified
job has been completed so that we really do wait for all outstanding
dead children.

Reported-by: Harald van Dijk <[email protected]>
Fixes: 6c691b3 ("jobs: Only clear gotsigchld when waiting...")
Signed-off-by: Herbert Xu <[email protected]>

* parser: Fix handling of empty aliases

Dash was incorrectly handling empty aliases. When attempting to use an
empty alias with nothing else, I'm (incorrectly) prompted for more
input:

```
$ alias empty=''
$ empty
>
```

Other shells (e.g., bash, yash) correctly handle the lone, empty alias as an
empty command:

```
$ alias empty=''
$ empty
$
```

The problem here is that we incorrectly enter the loop eating TNLs
in readtoken().  This patch fixes it by setting checkkwd correctly.

Reported-by: Michael Greenberg <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* parser: Catch errors in expandstr

On Fri, Dec 13, 2019 at 02:51:34PM +0000, Simon Ser wrote:
> Just noticed another dash bug: when setting invalid PS1 values dash
> enters an infinite loop.
>
> For instance, setting PS1='$(' makes dash print many of these:
>
>    dash: 1: Syntax error: end of file unexpected (expecting ")")
>
> It would be nice to fallback to the default PS1 value on error.

This patch fixes it by using the literal value of PS1 should an
error occur during expansion.

On Wed, Feb 26, 2020 at 09:12:04PM +0000, Ron Yorston wrote:
>
> There's another case that should be handled.  PS1='`xxx(`' causes the
> shell to exit because the old-style backquote leaves an additional file
> on the stack.

Ron's change has been folded into this patch.

Reported-by: Simon Ser <[email protected]>
Reported-by: Ron Yorston <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* parser: Fix alias expansion after heredoc or newlines

This script should print OK:

	alias a="case x in " b=x
	a
	b) echo BAD;; esac

	alias BEGIN={ END=}
	BEGIN
		cat <<- EOF > /dev/null
			$(:)
		EOF
	END

	: <<- EOF &&
		$(:)
	EOF
	BEGIN
		echo OK
	END

However, because the value of checkkwd is either zeroed when it
shouldn't, or isn't zeroed when it should, dash currently gets
it wrong in every case.

This patch fixes it by saving checkkwd and zeroing it where needed.

Suggested-by: Harald van Dijk <[email protected]>
Reported-by: Harald van Dijk <[email protected]>
Reported-by: Martijn Dekker <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* expand: Remove unused expandmeta() flag parameter

Signed-off-by: Denys Vlasenko <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* shell: mktokens relative TMPDIR

The mktokens script fails when /tmp isn't writable (e.g., when building
in a sandbox with a different TMPDIR). Replace absolute references to
/tmp to relative references to TMPDIR. If TMPDIR is unset or null,
default to /tmp.

The mkbuiltins script was already hardened to work relative to TMPDIR,
also defaulting to /tmp.

v2 ensures that TMPDIR is quoted.
v3 adds an extra quotation that prevents extra pathname expansions.

Signed-off-by: Michael Greenberg <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* input: Fix compiling against libedit with -fno-common

With -fno-common, which will be enabled by default in GCC 10, we see
this error:

ld: input.o:(.bss+0x0): multiple definition of `el';
histedit.o:(.bss+0x8): first defined here

To fix this, simply remove the definition as it is not needed.

Signed-off-by: Jeroen Roovers <[email protected]>
Signed-off-by: Mike Gilbert <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* shell: Always use explicit large file API

There are some remaining stat/readdir calls in dash that may lead
to spurious EOVERFLOW errors on 32-bit platforms.  This patch changes
them (as well as open(2)) to use the explicit large file API.

Reported-by: Tatsuki Sugiura <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* parser: Save and restore heredoclist in expandstr

On Sun, May 17, 2020 at 01:19:28PM +0100, Harald van Dijk wrote:
>
> This still does not restore the state completely. It does not clean up any
> pending heredocs. I see:
>
>   $ PS1='$(<<EOF "'
>   src/dash: 1: Syntax error: Unterminated quoted string
>   $(<<EOF ":
>   >
>
> That is, after entering the ':' command, the shell is still trying to read
> the heredoc from the prompt.

This patch saves and restores the heredoclist in expandstr.

It also removes a bunch of unnecessary volatiles as those variables
are only referenced in case of a longjmp other than one started by
a signal like SIGINT.

Reported-by: Harald van Dijk <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* shell: Fix typos

Signed-off-by: Martin Michlmayr <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* parser: Fix double-backslash nl in old-style command sub

When handling backslashes within an old-style command substitution,
we should not call pgetc_eatbnl because that would treat the next
backslash character as another escape character if it was then
followed by a new-line.

This patch fixes it by calling pgetc.

Reported-by: Matt Whitlock <[email protected]>
Fixes: 6bbc71d ("parser: use pgetc_eatbnl() in more places")
Signed-off-by: Herbert Xu <[email protected]>

* Release 0.5.11.

* parser: Get rid of PEOA

PEOA is a special character used to mark an alias as being finished
so that we don't enter an infinite loop with nested aliases.  It
complicates the parser because we have to ensure that it is skipped
where necessary and not copied to the resulting token text.

This patch removes it and instead delays the marking of aliases
until the second pgetc.  This has the same effect as the current
PEOA code while keeping the complexities within the input code.

Signed-off-by: Herbert Xu <[email protected]>

* eval: Prevent recursive PS4 expansion

Yaroslav Halchenko <[email protected]> wrote:
>
> I like to (ab)use PS4 and set -x for tracing execution of scripts.
> Reporting time and PID is very useful in this context.
>
> I am not 100% certain if bash's behavior (of actually running the command
> embedded within PS4 string, probably eval'ing it) is actually POSIX
> compliant, posh seems to not do that; but I think it is definitely not
> desired for dash to just stall:
>
> - the script:
>
> #!/bin/sh
> set -x
> export PS4='+ $(date +%T.%N) [$$] '
>
> echo "lets go"
> sleep 1
> echo "done $var"
>
> - bash:
>
> /tmp > bash --posix test.sh
> +export 'PS4=+ $(date +%T.%N) [$$] '
> +PS4='+ $(date +%T.%N) [$$] '
> + 09:15:48.982296333 [2764323] echo 'lets go'
> lets go
> + 09:15:48.987829613 [2764323] sleep 1
> + 09:15:49.994485037 [2764323] echo 'done '
> done
>
>
> - posh:
> exit:130 /tmp > posh test.sh
> +export PS4=+ $(date +%T.%N) [$$]
> + $(date +%T.%N) [$$] echo lets go
> lets go
> + $(date +%T.%N) [$$] sleep 1
> + $(date +%T.%N) [$$] echo done
> done
>
> - dash: (stalls it set -x)
>
> /tmp > dash test.sh
> +export PS4=+ $(date +%T.%N) [$$]
> ^C^C

This patch fixes the infinite loop caused by repeated expansions
of PS4.

Reported-by: Yaroslav Halchenko <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* redir: Retry open64 on EINTR

It is possible for open64 to block on named pipes, and therefore
it can be interrupted by signals and return EINTR.  We should only
let it fail with EINTR if real signals are pending (i.e., it should
not fail on SIGCHLD if SIGCHLD has not been trapped).

This patch adds a new helper sh_open to retry the open64 call if
necessary.  It also calls sh_error when appropriate.

Fixes: 3800d49 ("[JOBS] Fix dowait signal race")
Reported-by: Samuel Thibault <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* shell: Enable fnmatch/glob by default

As fnmatch(3) and glob(3) from glibc are now working consistently,
this patch enables them by default.

Signed-off-by: Herbert Xu <[email protected]>

* expand: Make glob(3) interruptible by SIGINT

If glob(3) is used then it can't be interrupted by SIGINT.  This
is bad when an expansion causes a large number of entries to be
generated.  This patch improves things by adding an int_pending
check to gl_opendir call.  Note that this is still not perfect,
e.g., the sort would still be uninterruptible.

Signed-off-by: Herbert Xu <[email protected]>

* error: Remove USE_NORETURN ifdef

The USE_NORETURN was added because gcc was buggy almost 20 years
ago.  This is no longer needed and this patch removes it.

Signed-off-by: Herbert Xu <[email protected]>

* jobs: Fix waitcmd busy loop

We need to clear gotsigchld in waitproc because it is used as
a loop conditional for the waitcmd case.  Without it waitcmd
may busy loop after a SIGCHLD.

This patch also changes gotsigchld into a volatile sig_atomic_t
to prevent compilers from optimising its accesses away.

Fixes: 6c691b3 ("jobs: Only clear gotsigchld when waiting...")
Signed-off-by: Herbert Xu <[email protected]>

* eval: Check nflag in evaltree instead of cmdloop

This patch moves the nflag check from cmdloop into evaltree.  This
is so that nflag will be in force even if we enter the shell via a
path other than cmdloop, e.g., through sh -c.

Reported-by: Joey Hess <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* man: fix formatting

Fix formatting according to the output of "mandoc -Tlint".

Overview:

  Start each sentence on a new line.

  Protect a punctuation mark in a macro call with '\&'.

  Trim trailing space.

  Add a missing comma in a row of words.

  Use an en-dash instead of '--' if there is space around it.
  An em-dash is used without space around it.

  Comment out ".Pp" macros that do nothing.

  Split long sentences after a punctuation mark.

  Remove a "-width ..." for a ".Bl -item" macro, as it has no influence

Details:

mandoc: ./src/bltin/echo.1:69:38: WARNING: new sentence, new line
mandoc: ./src/bltin/echo.1:75:35: WARNING: new sentence, new line
mandoc: ./src/bltin/printf.1:205:12: WARNING: skipping empty macro: No
mandoc: ./src/bltin/printf.1:284:28: STYLE: whitespace at end of input line
mandoc: ./src/bltin/printf.1:288:20: STYLE: whitespace at end of input line
mandoc: ./src/bltin/printf.1:293:28: STYLE: whitespace at end of input line
mandoc: ./src/bltin/printf.1:353:31: WARNING: new sentence, new line
mandoc: ./src/bltin/printf.1:74:2: STYLE: useless macro: Tn
mandoc: ./src/bltin/printf.1:111:2: STYLE: useless macro: Tn
mandoc: ./src/bltin/printf.1:116:2: STYLE: useless macro: Tn
mandoc: ./src/bltin/printf.1:279:2: STYLE: useless macro: Tn
mandoc: ./src/bltin/printf.1:334:2: WARNING: unusual Xr punctuation: none before vis(3)
mandoc: ./src/bltin/printf.1:334:2: WARNING: unusual Xr order: vis(3) after printf(9)
mandoc: ./src/bltin/printf.1:348:2: STYLE: useless macro: Tn
mandoc: ./src/bltin/printf.1:333:6: STYLE: referenced manual not found: Xr printf 9
mandoc: ./src/bltin/printf.1:334:6: STYLE: referenced manual not found: Xr vis 3
mandoc: ./src/bltin/test.1:46:16: WARNING: skipping empty macro: Cm
mandoc: ./src/bltin/test.1:105:5: STYLE: useless macro: Tn
mandoc: ./src/dash.1:1180:58: WARNING: new sentence, new line
mandoc: ./src/dash.1:1186:13: STYLE: whitespace at end of input line
mandoc: ./src/dash.1:1194:38: WARNING: new sentence, new line
mandoc: ./src/dash.1:1200:35: WARNING: new sentence, new line
mandoc: ./src/dash.1:1474:71: WARNING: new sentence, new line
mandoc: ./src/dash.1:1783:62: WARNING: new sentence, new line
mandoc: ./src/dash.1:2061:22: WARNING: new sentence, new line
mandoc: ./src/dash.1:2311:54: WARNING: new sentence, new line
mandoc: ./src/dash.1:2315:63: WARNING: new sentence, new line
mandoc: ./src/dash.1:37:2: WARNING: prologue macros out of order: Dt after Os
mandoc: ./src/dash.1:87:2: STYLE: useless macro: Tn
mandoc: ./src/dash.1:94:2: STYLE: useless macro: Tn
mandoc: ./src/dash.1:343:2: STYLE: useless macro: Tn
mandoc: ./src/dash.1:442:17: STYLE: verbatim "--", maybe consider using \(em
mandoc: ./src/dash.1:466:2: STYLE: useless macro: Tn
mandoc: ./src/dash.1:581:34: STYLE: verbatim "--", maybe consider using \(em
mandoc: ./src/dash.1:583:25: STYLE: verbatim "--", maybe consider using \(em
mandoc: ./src/dash.1:585:43: STYLE: verbatim "--", maybe consider using \(em
mandoc: ./src/dash.1:595:11: STYLE: verbatim "--", maybe consider using \(em
mandoc: ./src/dash.1:618:29: STYLE: verbatim "--", maybe consider using \(em
mandoc: ./src/dash.1:697:2: WARNING: skipping paragraph macro: Pp before Bd
mandoc: ./src/dash.1:1344:2: STYLE: useless macro: Tn
mandoc: ./src/dash.1:1420:2: WARNING: skipping paragraph macro: Pp before Bd
mandoc: ./src/dash.1:1434:2: WARNING: skipping paragraph macro: Pp before Bd
mandoc: ./src/dash.1:1556:2: STYLE: useless macro: Tn
mandoc: ./src/dash.1:1587:2: STYLE: useless macro: Tn
mandoc: ./src/dash.1:1746:2: STYLE: useless macro: Tn
mandoc: ./src/dash.1:1875:5: STYLE: useless macro: Tn
mandoc: ./src/dash.1:1525:2: WARNING: skipping paragraph macro: Pp before It
mandoc: ./src/dash.1:2182:2: WARNING: skipping paragraph macro: Pp before It
mandoc: ./src/dash.1:2247:2: WARNING: sections out of conventional order: Sh ENVIRONMENT
mandoc: ./src/dash.1:2323:11: WARNING: skipping -width argument: Bl -item
mandoc: ./src/dash.1:2347:31: STYLE: consider using OS macro: Nx
mandoc: ./src/dash.1:92:6: STYLE: referenced manual not found: Xr ksh 1 (2 times)
mandoc: ./src/dash.1:253:6: STYLE: referenced manual not found: Xr emacs 1
mandoc: ./src/dash.1:2253:9: STYLE: referenced manual not found: Xr passwd 4
mandoc: ./src/dash.1:2330:6: STYLE: referenced manual not found: Xr csh 1

Signed-off-by: Bjarni Ingi Gislason <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* shell: Group readdir64/dirent64 with open64

The test for open64 is separate from stat64 for macOS.  However,
the newly introduced tests for readdir64/dirent64 should be grouped
with open64 instead of stat64 as otherwise they cause similar build
failures.

Reported-by: Martijn Dekker <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* shell: Disable glob again as it strips traing slashes

On Mon, Nov 16, 2020 at 01:47:48PM +1100, Herbert Xu wrote:
> René Scharfe <[email protected]> wrote:
> >
> > on Debian testing dash eats trailing slashes of parameters that happen
> > to be regular files when expanding "$@".  Example:
> >
> >   $ rm -f foo bar
> >   $ touch foo
> >   $ dash -c 'echo "$0" "$@"' baz foo/ bar/ ./
> >   baz foo bar/ ./
>
> In fact you just have to do
>
> 	dash -c 'echo bar\/'
>
> This is a bug in glob(3).  It's stripping the slash.
>
> I guess we'll just have to disable glob again.

This patch disables glob(3) by default.

Reported-by: René Scharfe <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* jobs: Only block in waitcmd on first run

This patch ensures that waitcmd never blocks unless there are
outstanding jobs.  This could otherwise trigger a hang if children
were created prior to the shell coming into existence, or if
there are backgrounded children of other kinds (e.g., a here-
document).

Fixes: 6c691b3 ("jobs: Only clear gotsigchld when waiting...")
Reported-by: Michael Biebl <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* shell: Fail if building --with-libedit and can't find libedit

Previously, configure --with-libedit would only fail in the case where
libedit is available but its header file histedit.h is not.

Fixes: 13537aa ("[BUILD] Added --with-libedit option to...")
Signed-off-by: Herbert Xu <[email protected]>

* input: Clear unget on RESET

On Sat, Dec 19, 2020 at 02:23:44PM +0100, Denys Vlasenko wrote:
> Current git:
>
> $ ;l
> dash: 1: Syntax error: ";" unexpected
> $ s
> COPYING    ChangeLog.O    Makefile.am  aclocal.m4  autom4te.cache
> config.h     config.log     configure       dash
> dollar_altvalue1.tests  missing  stamp-h1
> ChangeLog  Makefile    Makefile.in  autogen.sh  compile
> config.h.in  config.status  configure.ac  depcomp  install-sh
>   src      trace

This patch fixes it by clearing ungetc on RESET.

Fixes: 17db43b ("input: Allow two consecutive calls to pungetc")
Reported-by: Denys Vlasenko <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* jobs: Block signals during tcsetpgrp

Harald van Dijk <[email protected]> wrote:
> On 19/12/2020 22:21, Steffen Nurpmeso wrote:
>> Steffen Nurpmeso wrote in
>>   <20201219172838.1B-WB%[email protected]>:
>>   |Long story short, after falsely accusing BSD make of not working
>>
>> After dinner i shortened it a bit more, and attach it again, ok?
>> It is terrible, but now less redundant than before.
>> Sorry for being so terse, that problem crosses my head for about
>> a week, and i was totally mislead and if you bang your head
>> against the wall so many hours bugs or misbehaviours in a handful
>> of other programs is not the expected outcome.
>
> I think a minimal test case is simply
>
> all:
>         $(SHELL) -c 'trap "echo TTOU" TTOU; set -m; echo all good'
>
> unless I accidentally oversimplified.
>
> The SIGTTOU is caused by setjobctl's xtcsetpgrp(fd, pgrp) call to make
> its newly started process group the foreground process group when job
> control is enabled, where xtcsetpgrp is a wrapper for tcsetpgrp. (That's
> in dash, the other variants may have some small differences.) tcsetpgrp
> has this little bit in its specification:
>
>        Attempts to use tcsetpgrp() from a process which is a member of
>        a background process group on a fildes associated with its con‐
>        trolling  terminal  shall  cause the process group to be sent a
>        SIGTTOU signal. If the calling thread is blocking SIGTTOU  sig‐
>        nals  or  the  process is ignoring SIGTTOU signals, the process
>        shall be allowed to perform the operation,  and  no  signal  is
>        sent.
>
> Ordinarily, when job control is enabled, SIGTTOU is ignored. However,
> when a trap action is specified for SIGTTOU, the signal is not ignored,
> and there is no blocking in place either, so the tcsetpgrp() call is not
> allowed.
>
> The lowest impact change to make here, the one that otherwise preserves
> the existing shell behaviour, is to block signals before calling
> tcsetpgrp and unblocking them afterwards. This ensures SIGTTOU does not
> get raised here, but also ensures that if SIGTTOU is sent to the shell
> for another reason, there is no window where it gets silently ignored.
>
> Another way to fix this is by not trying to make the shell start a new
> process group, or at least not make it the foreground process group.
> Most other shells appear to not try to do this.

This patch implements the blocking of SIGTTOU (and everything else)
while we call tcsetpgrp.

Reported-by: Steffen Nurpmeso <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* jobs: Always reset SIGINT/SIGQUIT handlers

On Fri, Jan 08, 2021 at 08:55:41PM +0000, Harald van Dijk wrote:
> On 18/05/2018 19:39, Herbert Xu wrote:
> > This patch adds basic vfork support for the case of a simple command.
> > ...  @@ -879,17 +892,30 @@ forkchild(struct job *jp, union node *n, int
> > mode)
> >   		}
> >   	}
> >   	if (!oldlvl && iflag) {
> > -		setsignal(SIGINT);
> > -		setsignal(SIGQUIT);
> > +		if (mode != FORK_BG) {
> > +			setsignal(SIGINT);
> > +			setsignal(SIGQUIT);
> > +		}
> >   		setsignal(SIGTERM);
> >   	}
> > +
> > +	if (lvforked)
> > +		return;
> > +
> >   	for (jp = curjob; jp; jp = jp->prev_job)
> >   		freejob(jp);
> >   }
>
> This leaves SIGQUIT ignored in background jobs in interactive shells.
>
>   ENV= dash -ic 'dash -c "kill -QUIT \$\$; echo huh" & wait'
>
> As of dash 0.5.11, this prints "huh". Before, the subprocess process killed
> itself before it could print anything. Other shells do not leave SIGQUIT
> ignored.
>
> (In a few other shells, this also prints "huh", but in those other shells,
> that is because the inner shell chooses to ignore SIGQUIT, not because the
> outer shell leaves it ignored.)

Thanks for catching this.  I have no idea how that got in there
and it makes no sense whatsoever.  This patch removes the if
conditional.

Fixes: e94a964 ("eval: Add vfork support")
Reported-by: Harald van Dijk <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* eval: Do not cache value of eflag in evaltree

Patrick Brünn <[email protected]> wrote:
>
> Since we are migrating to Debian bullseye, we discovered a new behavior
> with our scripts, which look like this:
>>#!/bin/sh
>>cleanup() {
>>        set +e^M
>>        rmdir ""
>>}
>>set -eu
>>trap 'cleanup' EXIT INT TERM
>>echo 'Hello world!'
>
> With old dash v0.5.10.2 this script would return 0 as we expected it.
> But since commit 62cf695 it returns
> the last exit code of our cleanup function.
> Reverting that commit gives a merge conflict, but it seems to fix _our_
> problem. As that topic appears too complex to us I want to ask the
> experts here:
>
> Is this change in behavior intended, by dash?
>
> Our workaround at the moment would be:
>>trap 'cleanup || true' EXIT INT TERM

Thanks for the report.  This is actually a fairly old bug with
set -e that's just been exposed by the exit status change.  What's
really happening is that cleanup itself is triggering a set -e
exit incorrectly because evaltree cached the value of eflag prior
to the function call.

This patch should fix the problem.

Reported-by: Patrick Brünn <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Tested-by: Patrick Brünn <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* shell: Call CHECK_DECL on stat64

On macOS it is possible to find stat64 at link-time but not at
compile-time.  To make the build process more robust we should
check for the header file as well as the library.

Reported-by: Saagar Jha <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* parser: Fix VSLENGTH parsing with trailing garbage

On Sat, Jun 19, 2021 at 02:44:46PM +0200, Denys Vlasenko wrote:
>
> CTLVAR and CTLBACKQ are not properly handled if encountered
> inside {$#...}. Testcase:
>
> dash -c "`printf 'echo ${#1\x82}'`" 00 111 222
>
> It should execute "echo ${#1 <byte 0x82> }" and thus print "3"
> (the length of $1, which is "111").
>
> Instead, it segfaults.
>
> (Ideally, it should fail since "1 <byte 0x82>" is not a valid
> variable name, but currently dash accepts e.g. "${#1abc}"
> as if it is "${#1}bc". A separate, less serious bug...).

In fact these two bugs are one and the same.  This patch fixes
both by detecting the invalid substitution and not emitting it
into the node tree.

Incidentally this reveals a bug in how we parse ${#10} that got
introduced recently, which is also fixed here.

Reported-by: Denys Vlasenko <[email protected]>
Fixes: 7710a92 ("parser: Only accept single-digit parameter...")
Signed-off-by: Herbert Xu <[email protected]>

* input: Remove special case for unget EOF

Commit 17db43b (input: Allow two
consecutive calls to pungetc) ensures that EOF is handled like any
other character with respect to unget.  As a result it's possible
to remove the special case for unget of EOF in preadbuffer.

Signed-off-by: Ron Yorston <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* expand: Always quote caret when using fnmatch

This patch forces ^ to be a literal when we use fnmatch.

In order to allow for the extra space to quote the caret, the
function _rmescapes will allocate up to twice the memory if the
flag RMESCAPE_GLOB is set.

Fixes: 7638476 ("shell: Enable fnmatch/glob by default")
Reported-by: Christoph Anton Mitterer <[email protected]>
Suggested-by: Harald van Dijk <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* expand: Add ifsfree to expand to fix a logic error that causes a buffer over-read

On Mon, Jun 20, 2022 at 02:27:10PM -0400, Alex Gorinson wrote:
> Due to a logic error in the ifsbreakup function in expand.c if a
> heredoc and normal command is run one after the other by means of a
> semi-colon, when the second command drops into ifsbreakup the command
> will be evaluated with the ifslastp/ifsfirst struct that was set when
> the here doc was evaluated. This results in a buffer over-read that
> can leak the program's heap, stack, and arena addresses which can be
> used to beat ASLR.
>
> Steps to Reproduce:
> First bug:
> cmd args: ~/exampleDir/example> dash
> $ M='AAAAAAAAAAAAAAAAA'    <note: 17 A's>
> $ q00(){
> $ <<000;echo
> $ ${D?$M$M$M$M$M$M}        <note: 6 $M's>
> $ 000
> $ }
> $ q00                      <note: After the q00 is typed in, the leak
> should be echo'd out; this works with ash, busybox ash, and dash and
> with all option args.>
>
> Patch:
> Adding the following to expand.c will fix both bugs in one go.
> (Thank you to Harald van Dijk and Michael Greenberg for doing the
> heavy lifting for this patch!)
> ==========================
> --- a/src/expand.c
> +++ b/src/expand.c
> @@ -859,6 +859,7 @@
> if (discard)
> return -1;
>
> +ifsfree();
> sh_error("Bad substitution");
> }
>
> @@ -1739,6 +1740,7 @@
> } else
> msg = umsg;
> }
> +ifsfree();
> sh_error("%.*s: %s%s", end - var - 1, var, msg, tail);
>  }
> ==========================

Thanks for the report!

I think it's better to add the ifsfree() call to the exception
handling path as other sh_error calls may trigger this too.

Reported-by: Alex Gorinson <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* eval: Always set exitstatus in evaltree

There is no harm in setting exitstatus unconditionally in evaltree.

Signed-off-by: Herbert Xu <[email protected]>

* eval: Check eflag after redirection error

>
> This is a POSIX violation, and quite a grave one at that:
> set -e is oft[1] used to guard against precisely this type of error!
>
> The same happens if set -e is executed.
>
> All quotes POSIX.1, Issue 7, TC2:
> sh, OPTIONS:
>  > The -a, -b, -C, -e, -f, -m, -n, -o option, -u, -v, and -x options
>  > are described as part of the set utility in Special Built-In
>  > Utilities.
>
> set, DESCRIPTION, -e:
>  > When this option is on, when any command fails (for any of the
>  > reasons listed in Consequences of Shell Errors or by returning an
>  > exit status greater than zero), the shell immediately shall exit, as
>  > if by executing the exit special built-in utility with no arguments,
>  > with the following exceptions:
>  >
>  > 1. The failure of any individual command in a multi-command pipeline
>  >    shall not cause the shell to exit. Only the failure of the
>  >    pipeline itself shall be considered.
>  > 2. The -e setting shall be ignored when executing the compound list
>  >    following the while, until, if, or elif reserved word, a pipeline
>  >    beginning with the ! reserved word, or any command of an AND-OR
>  >    list other than the last.
>  > 3. If the exit status of a compound command other than a subshell
>  >    command was the result of a failure while -e was being ignored,
>  >    then -e shall not apply to this command.
>
> XCU, 2.9.4: Shell Command Language, Shell Commands, Compound Commands:
> The while Loop:
>  > The format of the while loop is as follows:
>  >
>  > while compound-list-1
>  > do
>  >   compound-list-2
>  > done
> (until is equivalent).
> The if Conditional Construct:
>  > The format for the if construct is as follows:
>  >
>  > if compound-list
>  > then
>  >   compound-list
>  > [elif compound-list
>  > then
>  >   compound-list] ...
>  > [else
>  >   compound-list]
>  > fi
>
> It follows, therefore, that
>  * Exception 1. does not apply as there is no pipeline
>  * Exception 2. does not apply, as the redirection does /not/ follow
>    "while" or "if" directly and is /not/ part of the conditional
>        compound-list
>  * in the "for" case, there is no such provision, so this is likely not
>    a confusion w.r.t. the conditional compound-lists
>  * Exception 3. does not apply as -e was not being ignored while the
>    compound commands were being executed (indeed, the compound commands
>    do not run at all, as evidenced by the program terminating)
>
> [1]: https://salsa.debian.org/glibc-team/glibc/-/merge_requests/6#note_329899
> ----- End forwarded message -----

Yes we should check the exit status after redirections.

Reported-by: наб <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* parser: Add VSBIT to ensure subtype is never zero

Harald van Dijk <[email protected]> wrote:
> On 21/11/2022 13:08, Harald van Dijk wrote:
>> On 21/11/2022 02:38, Christoph Anton Mitterer wrote:
>>> reject_filtered_cmd()
>>> {
>>> 	reject_and_die "disallowed command${restrict_path_list:+
>>> (restrict-path: \"${restrict_path_list//|/\", \"}\")}"
>>> }
>>>
>>> reject_filtered_cmd
>>[...]
>> This should either result in the ${...//...} being skipped, or the "Bad
>> substitution" error. Currently, what happens instead is it attempts, but
>> fails, to skip the ${...//...}.
>
> The reason it fails is because the word is cut off.
>
> Variable substitutions are encoded as a CTLVAR special character,
> followed by a byte indicating the type of substitution, followed by the
> rest of the substitution data. The type of substitution is the VSNORMAL,
> VSMINUS, etc. seen in parser.h. An invalid substitution is encoded as a
> value of 0.
>
> When we define a function, we clone the function body in order to
> preserve it. Cloning the function body is done by cloning each node.
> Cloning a "word" node (NARG) involves copying the characters that make
> up the word up to and including the terminating null byte.
>
> These two interact badly. The invalid substitution is seen as
> terminating the word, the rest of the word is not copied, but the
> expansion code does not have any way of seeing that anything got cut off
> and happily continues attempting to process the rest of the word.
>
> If dash decides to issue an error in this case, this is not a problem:
> the null byte is guaranteed to be copied, and if processing is
> guaranteed to stop if a null byte is encountered, everything works out.
>
> If dash decides to not issue an error in this case, the encoding of bad
> substitutions needs to change to a non-null byte. It appears that if we
> set the byte to VSNUL, the expansion logic is already able to handle it,
> but I have not tested this extensively.

Thanks for the analysis Harald!

This patch does basically what you've described except it uses a new
bit to avoid any confusion with a genuine VSNUL.

Fixes: 3df3edd ("[PARSER] Report substition errors at...")
Reported-by: Christoph Anton Mitterer <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

Cheers,

Signed-off-by: Herbert Xu <[email protected]>

* eval: Test evalskip before flipping status for NNOT

On Tue, Dec 06, 2022 at 10:15:03AM +0000, Harald van Dijk wrote:
>
> There is a long-standing bug that may or may not be harder to fix if this
> patch goes in, depending on how you want to fix it. Here's a script that
> already fails on current dash.
>
>   f() {
>     if ! return 0
>     then :
>     fi
>   }
>   f
>
> This should return 0, and does return 0 in bash and ksh (and almost all
> shells), but returns 1 in dash.
>
> There are a few possible ways of fixing it. Some of them rely on continuing
> to conditionally set exitstatus.

This can be fixed simply by testing evalskip prior to flipping the
status.

Reported-by: Harald van Dijk <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

* Release 0.5.12.

---------

Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Nikolai Merinov <[email protected]>
Signed-off-by: Denys Vlasenko <[email protected]>
Signed-off-by: Michael Greenberg <[email protected]>
Signed-off-by: Jeroen Roovers <[email protected]>
Signed-off-by: Mike Gilbert <[email protected]>
Signed-off-by: Martin Michlmayr <[email protected]>
Signed-off-by: Bjarni Ingi Gislason <[email protected]>
Signed-off-by: Ron Yorston <[email protected]>
Co-authored-by: Herbert Xu <[email protected]>
Co-authored-by: Nikolai Merinov <[email protected]>
Co-authored-by: Fangrui Song <[email protected]>
Co-authored-by: Denys Vlasenko <[email protected]>
Co-authored-by: Jeroen Roovers <[email protected]>
Co-authored-by: Martin Michlmayr <[email protected]>
Co-authored-by: Bjarni Ingi Gislason <[email protected]>
Co-authored-by: C. McEnroe <[email protected]>
Co-authored-by: Ron Yorston <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants