-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing and unparsing has super-linear runtime #8
Comments
These are amazing (a/k/a awful) performance issues, thanks for finding them! |
Another source of inefficiency in Code snippet where this happens in ast.ml:
To show this, I changed
Output:
There is a "Lazy" module in OCaml, or just use Haskell 🙃 |
😬 😅 |
Linear-time versions of
|
Relatedly, |
* options: Do not set commandname in procargs We set commandname in procargs when we don't have to. This results in a duplicated output of arg0 when an error occurs. Reported-by: Olivier Duclos <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * expand: Fix double-decrement in argstr Due to a double decrement in argstr we may miss field separators at the end of a word in certain situations. Reported-by: Martijn Dekker <[email protected]> Fixes: 3cd5386 ("expand: Do not reprocess data when...") Signed-off-by: Herbert Xu <[email protected]> * eval: Reset handler when entering a subshell As it is a subshell can execute code that is only meant for the parent shell when it executes a longjmp that is caught by something like evalcommand. This patch fixes it by resetting the handler when entering a subshell. Reported-by: Martijn Dekker <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * parser: Fix old-style command substitution here-document crash On Wed, Jul 25, 2018 at 12:38:27PM +0000, project-repo wrote: > Hi, > I am working on a project in which I use the honggfuzz fuzzer to fuzz open > source software and I decided to fuzz dash. In doing so I discovered a > NULL pointer dereference in src/redir.ch on line 305. Following is a > backtrace as supplied by the address sanitizer: > > AddressSanitizer:DEADLYSIGNAL > ================================================================= > ==39623==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x0000005768ed bp 0x7ffc00273df0 sp 0x7ffc00273c60 T0) > ==39623==The signal is caused by a READ memory access. > ==39623==Hint: address points to the zero page. > #0 0x5768ec in openhere /home/jfe/dash/src/redir.c:305:29 > #1 0x574d92 in openredirect /home/jfe/dash/src/redir.c:230:7 > #2 0x5737fe in redirect /home/jfe/dash/src/redir.c:121:11 > #3 0x576017 in redirectsafe /home/jfe/dash/src/redir.c:424:3 > #4 0x522326 in evalcommand /home/jfe/dash/src/eval.c:828:11 > #5 0x520010 in evaltree /home/jfe/dash/src/eval.c:288:12 > #6 0x5270da in evaltreenr /home/jfe/dash/src/eval.c:332:2 > #7 0x526f04 in evalbackcmd /home/jfe/dash/src/eval.c:640:3 > #8 0x539020 in expbackq /home/jfe/dash/src/expand.c:522:2 > #9 0x5332d7 in argstr /home/jfe/dash/src/expand.c:343:4 > #10 0x5322f7 in expandarg /home/jfe/dash/src/expand.c:196:2 > #11 0x528118 in fill_arglist /home/jfe/dash/src/eval.c:659:3 > #12 0x5213b6 in evalcommand /home/jfe/dash/src/eval.c:769:13 > #13 0x520010 in evaltree /home/jfe/dash/src/eval.c:288:12 > #14 0x554423 in cmdloop /home/jfe/dash/src/main.c:234:8 > #15 0x553bcc in main /home/jfe/dash/src/main.c:176:3 > #16 0x7f201c2b2a86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21a86) > #17 0x41dfb9 in _start (/home/jfe/dash/src/dash+0x41dfb9) > > AddressSanitizer can not provide additional info. > SUMMARY: AddressSanitizer: SEGV /home/jfe/dash/src/redir.c:305:29 in openhere > ==39623==ABORTING > > This bug can be reproduced by running "dash < min" where min is þhe file > attached. I was able to reproduce this bug with the current git version > and the current debian version. > > cheers > project-repo > > <<A > `<<A(` Thanks for the report! This is caused by the recent change to save/restore here-docment list around command substitutions. In doing so we must finish existing here-documents prior to restoring the old here-document list. This is done for new-style command substitutions but not for old-style. This patch fixes it by doing it for both. Reported-by: project-repo <[email protected]> Fixes: 51e2d88 ("parser: Save/restore here-documents in...") Signed-off-by: Herbert Xu <[email protected]> * expand: Fix trailing newlines processing in backquote expanding According to POSIX.1-2008 we should remove newlines only at the end of the substitution. Newlines-only substitions causes dash to remove newlines before beggining of the substitution. The following code: cat <<END 1 $(echo "") 2 END prints "1<newline>2" instead of expected "1<newline><newline>2". This patch fixes trailing newlines processing in backquote expanding. Signed-off-by: Nikolai Merinov <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * parser: Only accept single-digit parameter expansion outside of braces On Thu, Apr 25, 2019 at 01:39:52AM +0000, Michael Orlitzky wrote: > The POSIX spec says, > > The parameter name or symbol can be enclosed in braces, which are > optional except for positional parameters with more than one digit or > when parameter is a name and is followed by a character that could be > interpreted as part of the name. > > However, dash seems to diverge from that behavior when we get to $10: > > $ cat test.sh > echo $10 > > $ dash ./test.sh one two three four five six seven eight nine ten > ten > > $ bash ./test.sh one two three four five six seven eight nine ten > one0 This patch should fix the problem. Signed-off-by: Herbert Xu <[email protected]> * shell: delete AC_PROG_YACC Signed-off-by: Herbert Xu <[email protected]> * redir: Clear saved redirections in subshell When we enter a subshell we need to drop the saved redirections as otherwise a subsequent unwindredir could produce incorrect results. This patch does this by simply clearing redirlist. While we could actually free the memory underneath for subshells it isn't really worth the trouble for now. In order to ensure that this is done in every place where we enter a subshell, this patch adds a new mkinit hook called forkreset. The calls closescript, clear_traps and reset_handler are also added to the forkreset hook. This fixes a bug where the first two functions weren't called if we enter a subshell without forking. Reported-by: Harald van Dijk <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * builtin: Fix seconds part of times(1) The seconds part of the times(1) built-in is wrong as it does not exclude the minutes part of the result. This patch fixes it. This problem was first noted by Michael Greenberg who also sent a similar patch. Reported-by: Michael Greenberg <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * jobs: Rename DOWAIT_NORMAL to DOWAIT_NONBLOCK To make it clearer what it is doing: nonblocking wait() Signed-off-by: Denys Vlasenko <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * var: Remove poplocalvars() always-zero argument, make it static Signed-off-by: Denys Vlasenko <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * jobs: Fix infinite loop in waitproc After we changed the resetting of gotsigchld so that it is only done if jp is NULL, we can now get an infinite loop in waitproc if gotsigchld is set but there is no outstanding child because everything had been waited for previously without gotsigchld being zeroed. This patch fixes it by always zeroing gotsigchld as we did before. The bug that the previous patch was trying to fix is now resolved by switching the blocking mode to DOWAIT_NORMAL after the specified job has been completed so that we really do wait for all outstanding dead children. Reported-by: Harald van Dijk <[email protected]> Fixes: 6c691b3 ("jobs: Only clear gotsigchld when waiting...") Signed-off-by: Herbert Xu <[email protected]> * parser: Fix handling of empty aliases Dash was incorrectly handling empty aliases. When attempting to use an empty alias with nothing else, I'm (incorrectly) prompted for more input: ``` $ alias empty='' $ empty > ``` Other shells (e.g., bash, yash) correctly handle the lone, empty alias as an empty command: ``` $ alias empty='' $ empty $ ``` The problem here is that we incorrectly enter the loop eating TNLs in readtoken(). This patch fixes it by setting checkkwd correctly. Reported-by: Michael Greenberg <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * parser: Catch errors in expandstr On Fri, Dec 13, 2019 at 02:51:34PM +0000, Simon Ser wrote: > Just noticed another dash bug: when setting invalid PS1 values dash > enters an infinite loop. > > For instance, setting PS1='$(' makes dash print many of these: > > dash: 1: Syntax error: end of file unexpected (expecting ")") > > It would be nice to fallback to the default PS1 value on error. This patch fixes it by using the literal value of PS1 should an error occur during expansion. On Wed, Feb 26, 2020 at 09:12:04PM +0000, Ron Yorston wrote: > > There's another case that should be handled. PS1='`xxx(`' causes the > shell to exit because the old-style backquote leaves an additional file > on the stack. Ron's change has been folded into this patch. Reported-by: Simon Ser <[email protected]> Reported-by: Ron Yorston <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * parser: Fix alias expansion after heredoc or newlines This script should print OK: alias a="case x in " b=x a b) echo BAD;; esac alias BEGIN={ END=} BEGIN cat <<- EOF > /dev/null $(:) EOF END : <<- EOF && $(:) EOF BEGIN echo OK END However, because the value of checkkwd is either zeroed when it shouldn't, or isn't zeroed when it should, dash currently gets it wrong in every case. This patch fixes it by saving checkkwd and zeroing it where needed. Suggested-by: Harald van Dijk <[email protected]> Reported-by: Harald van Dijk <[email protected]> Reported-by: Martijn Dekker <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * expand: Remove unused expandmeta() flag parameter Signed-off-by: Denys Vlasenko <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * shell: mktokens relative TMPDIR The mktokens script fails when /tmp isn't writable (e.g., when building in a sandbox with a different TMPDIR). Replace absolute references to /tmp to relative references to TMPDIR. If TMPDIR is unset or null, default to /tmp. The mkbuiltins script was already hardened to work relative to TMPDIR, also defaulting to /tmp. v2 ensures that TMPDIR is quoted. v3 adds an extra quotation that prevents extra pathname expansions. Signed-off-by: Michael Greenberg <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * input: Fix compiling against libedit with -fno-common With -fno-common, which will be enabled by default in GCC 10, we see this error: ld: input.o:(.bss+0x0): multiple definition of `el'; histedit.o:(.bss+0x8): first defined here To fix this, simply remove the definition as it is not needed. Signed-off-by: Jeroen Roovers <[email protected]> Signed-off-by: Mike Gilbert <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * shell: Always use explicit large file API There are some remaining stat/readdir calls in dash that may lead to spurious EOVERFLOW errors on 32-bit platforms. This patch changes them (as well as open(2)) to use the explicit large file API. Reported-by: Tatsuki Sugiura <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * parser: Save and restore heredoclist in expandstr On Sun, May 17, 2020 at 01:19:28PM +0100, Harald van Dijk wrote: > > This still does not restore the state completely. It does not clean up any > pending heredocs. I see: > > $ PS1='$(<<EOF "' > src/dash: 1: Syntax error: Unterminated quoted string > $(<<EOF ": > > > > That is, after entering the ':' command, the shell is still trying to read > the heredoc from the prompt. This patch saves and restores the heredoclist in expandstr. It also removes a bunch of unnecessary volatiles as those variables are only referenced in case of a longjmp other than one started by a signal like SIGINT. Reported-by: Harald van Dijk <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * shell: Fix typos Signed-off-by: Martin Michlmayr <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * parser: Fix double-backslash nl in old-style command sub When handling backslashes within an old-style command substitution, we should not call pgetc_eatbnl because that would treat the next backslash character as another escape character if it was then followed by a new-line. This patch fixes it by calling pgetc. Reported-by: Matt Whitlock <[email protected]> Fixes: 6bbc71d ("parser: use pgetc_eatbnl() in more places") Signed-off-by: Herbert Xu <[email protected]> * Release 0.5.11. * parser: Get rid of PEOA PEOA is a special character used to mark an alias as being finished so that we don't enter an infinite loop with nested aliases. It complicates the parser because we have to ensure that it is skipped where necessary and not copied to the resulting token text. This patch removes it and instead delays the marking of aliases until the second pgetc. This has the same effect as the current PEOA code while keeping the complexities within the input code. Signed-off-by: Herbert Xu <[email protected]> * eval: Prevent recursive PS4 expansion Yaroslav Halchenko <[email protected]> wrote: > > I like to (ab)use PS4 and set -x for tracing execution of scripts. > Reporting time and PID is very useful in this context. > > I am not 100% certain if bash's behavior (of actually running the command > embedded within PS4 string, probably eval'ing it) is actually POSIX > compliant, posh seems to not do that; but I think it is definitely not > desired for dash to just stall: > > - the script: > > #!/bin/sh > set -x > export PS4='+ $(date +%T.%N) [$$] ' > > echo "lets go" > sleep 1 > echo "done $var" > > - bash: > > /tmp > bash --posix test.sh > +export 'PS4=+ $(date +%T.%N) [$$] ' > +PS4='+ $(date +%T.%N) [$$] ' > + 09:15:48.982296333 [2764323] echo 'lets go' > lets go > + 09:15:48.987829613 [2764323] sleep 1 > + 09:15:49.994485037 [2764323] echo 'done ' > done > > > - posh: > exit:130 /tmp > posh test.sh > +export PS4=+ $(date +%T.%N) [$$] > + $(date +%T.%N) [$$] echo lets go > lets go > + $(date +%T.%N) [$$] sleep 1 > + $(date +%T.%N) [$$] echo done > done > > - dash: (stalls it set -x) > > /tmp > dash test.sh > +export PS4=+ $(date +%T.%N) [$$] > ^C^C This patch fixes the infinite loop caused by repeated expansions of PS4. Reported-by: Yaroslav Halchenko <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * redir: Retry open64 on EINTR It is possible for open64 to block on named pipes, and therefore it can be interrupted by signals and return EINTR. We should only let it fail with EINTR if real signals are pending (i.e., it should not fail on SIGCHLD if SIGCHLD has not been trapped). This patch adds a new helper sh_open to retry the open64 call if necessary. It also calls sh_error when appropriate. Fixes: 3800d49 ("[JOBS] Fix dowait signal race") Reported-by: Samuel Thibault <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * shell: Enable fnmatch/glob by default As fnmatch(3) and glob(3) from glibc are now working consistently, this patch enables them by default. Signed-off-by: Herbert Xu <[email protected]> * expand: Make glob(3) interruptible by SIGINT If glob(3) is used then it can't be interrupted by SIGINT. This is bad when an expansion causes a large number of entries to be generated. This patch improves things by adding an int_pending check to gl_opendir call. Note that this is still not perfect, e.g., the sort would still be uninterruptible. Signed-off-by: Herbert Xu <[email protected]> * error: Remove USE_NORETURN ifdef The USE_NORETURN was added because gcc was buggy almost 20 years ago. This is no longer needed and this patch removes it. Signed-off-by: Herbert Xu <[email protected]> * jobs: Fix waitcmd busy loop We need to clear gotsigchld in waitproc because it is used as a loop conditional for the waitcmd case. Without it waitcmd may busy loop after a SIGCHLD. This patch also changes gotsigchld into a volatile sig_atomic_t to prevent compilers from optimising its accesses away. Fixes: 6c691b3 ("jobs: Only clear gotsigchld when waiting...") Signed-off-by: Herbert Xu <[email protected]> * eval: Check nflag in evaltree instead of cmdloop This patch moves the nflag check from cmdloop into evaltree. This is so that nflag will be in force even if we enter the shell via a path other than cmdloop, e.g., through sh -c. Reported-by: Joey Hess <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * man: fix formatting Fix formatting according to the output of "mandoc -Tlint". Overview: Start each sentence on a new line. Protect a punctuation mark in a macro call with '\&'. Trim trailing space. Add a missing comma in a row of words. Use an en-dash instead of '--' if there is space around it. An em-dash is used without space around it. Comment out ".Pp" macros that do nothing. Split long sentences after a punctuation mark. Remove a "-width ..." for a ".Bl -item" macro, as it has no influence Details: mandoc: ./src/bltin/echo.1:69:38: WARNING: new sentence, new line mandoc: ./src/bltin/echo.1:75:35: WARNING: new sentence, new line mandoc: ./src/bltin/printf.1:205:12: WARNING: skipping empty macro: No mandoc: ./src/bltin/printf.1:284:28: STYLE: whitespace at end of input line mandoc: ./src/bltin/printf.1:288:20: STYLE: whitespace at end of input line mandoc: ./src/bltin/printf.1:293:28: STYLE: whitespace at end of input line mandoc: ./src/bltin/printf.1:353:31: WARNING: new sentence, new line mandoc: ./src/bltin/printf.1:74:2: STYLE: useless macro: Tn mandoc: ./src/bltin/printf.1:111:2: STYLE: useless macro: Tn mandoc: ./src/bltin/printf.1:116:2: STYLE: useless macro: Tn mandoc: ./src/bltin/printf.1:279:2: STYLE: useless macro: Tn mandoc: ./src/bltin/printf.1:334:2: WARNING: unusual Xr punctuation: none before vis(3) mandoc: ./src/bltin/printf.1:334:2: WARNING: unusual Xr order: vis(3) after printf(9) mandoc: ./src/bltin/printf.1:348:2: STYLE: useless macro: Tn mandoc: ./src/bltin/printf.1:333:6: STYLE: referenced manual not found: Xr printf 9 mandoc: ./src/bltin/printf.1:334:6: STYLE: referenced manual not found: Xr vis 3 mandoc: ./src/bltin/test.1:46:16: WARNING: skipping empty macro: Cm mandoc: ./src/bltin/test.1:105:5: STYLE: useless macro: Tn mandoc: ./src/dash.1:1180:58: WARNING: new sentence, new line mandoc: ./src/dash.1:1186:13: STYLE: whitespace at end of input line mandoc: ./src/dash.1:1194:38: WARNING: new sentence, new line mandoc: ./src/dash.1:1200:35: WARNING: new sentence, new line mandoc: ./src/dash.1:1474:71: WARNING: new sentence, new line mandoc: ./src/dash.1:1783:62: WARNING: new sentence, new line mandoc: ./src/dash.1:2061:22: WARNING: new sentence, new line mandoc: ./src/dash.1:2311:54: WARNING: new sentence, new line mandoc: ./src/dash.1:2315:63: WARNING: new sentence, new line mandoc: ./src/dash.1:37:2: WARNING: prologue macros out of order: Dt after Os mandoc: ./src/dash.1:87:2: STYLE: useless macro: Tn mandoc: ./src/dash.1:94:2: STYLE: useless macro: Tn mandoc: ./src/dash.1:343:2: STYLE: useless macro: Tn mandoc: ./src/dash.1:442:17: STYLE: verbatim "--", maybe consider using \(em mandoc: ./src/dash.1:466:2: STYLE: useless macro: Tn mandoc: ./src/dash.1:581:34: STYLE: verbatim "--", maybe consider using \(em mandoc: ./src/dash.1:583:25: STYLE: verbatim "--", maybe consider using \(em mandoc: ./src/dash.1:585:43: STYLE: verbatim "--", maybe consider using \(em mandoc: ./src/dash.1:595:11: STYLE: verbatim "--", maybe consider using \(em mandoc: ./src/dash.1:618:29: STYLE: verbatim "--", maybe consider using \(em mandoc: ./src/dash.1:697:2: WARNING: skipping paragraph macro: Pp before Bd mandoc: ./src/dash.1:1344:2: STYLE: useless macro: Tn mandoc: ./src/dash.1:1420:2: WARNING: skipping paragraph macro: Pp before Bd mandoc: ./src/dash.1:1434:2: WARNING: skipping paragraph macro: Pp before Bd mandoc: ./src/dash.1:1556:2: STYLE: useless macro: Tn mandoc: ./src/dash.1:1587:2: STYLE: useless macro: Tn mandoc: ./src/dash.1:1746:2: STYLE: useless macro: Tn mandoc: ./src/dash.1:1875:5: STYLE: useless macro: Tn mandoc: ./src/dash.1:1525:2: WARNING: skipping paragraph macro: Pp before It mandoc: ./src/dash.1:2182:2: WARNING: skipping paragraph macro: Pp before It mandoc: ./src/dash.1:2247:2: WARNING: sections out of conventional order: Sh ENVIRONMENT mandoc: ./src/dash.1:2323:11: WARNING: skipping -width argument: Bl -item mandoc: ./src/dash.1:2347:31: STYLE: consider using OS macro: Nx mandoc: ./src/dash.1:92:6: STYLE: referenced manual not found: Xr ksh 1 (2 times) mandoc: ./src/dash.1:253:6: STYLE: referenced manual not found: Xr emacs 1 mandoc: ./src/dash.1:2253:9: STYLE: referenced manual not found: Xr passwd 4 mandoc: ./src/dash.1:2330:6: STYLE: referenced manual not found: Xr csh 1 Signed-off-by: Bjarni Ingi Gislason <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * shell: Group readdir64/dirent64 with open64 The test for open64 is separate from stat64 for macOS. However, the newly introduced tests for readdir64/dirent64 should be grouped with open64 instead of stat64 as otherwise they cause similar build failures. Reported-by: Martijn Dekker <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * shell: Disable glob again as it strips traing slashes On Mon, Nov 16, 2020 at 01:47:48PM +1100, Herbert Xu wrote: > René Scharfe <[email protected]> wrote: > > > > on Debian testing dash eats trailing slashes of parameters that happen > > to be regular files when expanding "$@". Example: > > > > $ rm -f foo bar > > $ touch foo > > $ dash -c 'echo "$0" "$@"' baz foo/ bar/ ./ > > baz foo bar/ ./ > > In fact you just have to do > > dash -c 'echo bar\/' > > This is a bug in glob(3). It's stripping the slash. > > I guess we'll just have to disable glob again. This patch disables glob(3) by default. Reported-by: René Scharfe <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * jobs: Only block in waitcmd on first run This patch ensures that waitcmd never blocks unless there are outstanding jobs. This could otherwise trigger a hang if children were created prior to the shell coming into existence, or if there are backgrounded children of other kinds (e.g., a here- document). Fixes: 6c691b3 ("jobs: Only clear gotsigchld when waiting...") Reported-by: Michael Biebl <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * shell: Fail if building --with-libedit and can't find libedit Previously, configure --with-libedit would only fail in the case where libedit is available but its header file histedit.h is not. Fixes: 13537aa ("[BUILD] Added --with-libedit option to...") Signed-off-by: Herbert Xu <[email protected]> * input: Clear unget on RESET On Sat, Dec 19, 2020 at 02:23:44PM +0100, Denys Vlasenko wrote: > Current git: > > $ ;l > dash: 1: Syntax error: ";" unexpected > $ s > COPYING ChangeLog.O Makefile.am aclocal.m4 autom4te.cache > config.h config.log configure dash > dollar_altvalue1.tests missing stamp-h1 > ChangeLog Makefile Makefile.in autogen.sh compile > config.h.in config.status configure.ac depcomp install-sh > src trace This patch fixes it by clearing ungetc on RESET. Fixes: 17db43b ("input: Allow two consecutive calls to pungetc") Reported-by: Denys Vlasenko <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * jobs: Block signals during tcsetpgrp Harald van Dijk <[email protected]> wrote: > On 19/12/2020 22:21, Steffen Nurpmeso wrote: >> Steffen Nurpmeso wrote in >> <20201219172838.1B-WB%[email protected]>: >> |Long story short, after falsely accusing BSD make of not working >> >> After dinner i shortened it a bit more, and attach it again, ok? >> It is terrible, but now less redundant than before. >> Sorry for being so terse, that problem crosses my head for about >> a week, and i was totally mislead and if you bang your head >> against the wall so many hours bugs or misbehaviours in a handful >> of other programs is not the expected outcome. > > I think a minimal test case is simply > > all: > $(SHELL) -c 'trap "echo TTOU" TTOU; set -m; echo all good' > > unless I accidentally oversimplified. > > The SIGTTOU is caused by setjobctl's xtcsetpgrp(fd, pgrp) call to make > its newly started process group the foreground process group when job > control is enabled, where xtcsetpgrp is a wrapper for tcsetpgrp. (That's > in dash, the other variants may have some small differences.) tcsetpgrp > has this little bit in its specification: > > Attempts to use tcsetpgrp() from a process which is a member of > a background process group on a fildes associated with its con‐ > trolling terminal shall cause the process group to be sent a > SIGTTOU signal. If the calling thread is blocking SIGTTOU sig‐ > nals or the process is ignoring SIGTTOU signals, the process > shall be allowed to perform the operation, and no signal is > sent. > > Ordinarily, when job control is enabled, SIGTTOU is ignored. However, > when a trap action is specified for SIGTTOU, the signal is not ignored, > and there is no blocking in place either, so the tcsetpgrp() call is not > allowed. > > The lowest impact change to make here, the one that otherwise preserves > the existing shell behaviour, is to block signals before calling > tcsetpgrp and unblocking them afterwards. This ensures SIGTTOU does not > get raised here, but also ensures that if SIGTTOU is sent to the shell > for another reason, there is no window where it gets silently ignored. > > Another way to fix this is by not trying to make the shell start a new > process group, or at least not make it the foreground process group. > Most other shells appear to not try to do this. This patch implements the blocking of SIGTTOU (and everything else) while we call tcsetpgrp. Reported-by: Steffen Nurpmeso <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * jobs: Always reset SIGINT/SIGQUIT handlers On Fri, Jan 08, 2021 at 08:55:41PM +0000, Harald van Dijk wrote: > On 18/05/2018 19:39, Herbert Xu wrote: > > This patch adds basic vfork support for the case of a simple command. > > ... @@ -879,17 +892,30 @@ forkchild(struct job *jp, union node *n, int > > mode) > > } > > } > > if (!oldlvl && iflag) { > > - setsignal(SIGINT); > > - setsignal(SIGQUIT); > > + if (mode != FORK_BG) { > > + setsignal(SIGINT); > > + setsignal(SIGQUIT); > > + } > > setsignal(SIGTERM); > > } > > + > > + if (lvforked) > > + return; > > + > > for (jp = curjob; jp; jp = jp->prev_job) > > freejob(jp); > > } > > This leaves SIGQUIT ignored in background jobs in interactive shells. > > ENV= dash -ic 'dash -c "kill -QUIT \$\$; echo huh" & wait' > > As of dash 0.5.11, this prints "huh". Before, the subprocess process killed > itself before it could print anything. Other shells do not leave SIGQUIT > ignored. > > (In a few other shells, this also prints "huh", but in those other shells, > that is because the inner shell chooses to ignore SIGQUIT, not because the > outer shell leaves it ignored.) Thanks for catching this. I have no idea how that got in there and it makes no sense whatsoever. This patch removes the if conditional. Fixes: e94a964 ("eval: Add vfork support") Reported-by: Harald van Dijk <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * eval: Do not cache value of eflag in evaltree Patrick Brünn <[email protected]> wrote: > > Since we are migrating to Debian bullseye, we discovered a new behavior > with our scripts, which look like this: >>#!/bin/sh >>cleanup() { >> set +e^M >> rmdir "" >>} >>set -eu >>trap 'cleanup' EXIT INT TERM >>echo 'Hello world!' > > With old dash v0.5.10.2 this script would return 0 as we expected it. > But since commit 62cf695 it returns > the last exit code of our cleanup function. > Reverting that commit gives a merge conflict, but it seems to fix _our_ > problem. As that topic appears too complex to us I want to ask the > experts here: > > Is this change in behavior intended, by dash? > > Our workaround at the moment would be: >>trap 'cleanup || true' EXIT INT TERM Thanks for the report. This is actually a fairly old bug with set -e that's just been exposed by the exit status change. What's really happening is that cleanup itself is triggering a set -e exit incorrectly because evaltree cached the value of eflag prior to the function call. This patch should fix the problem. Reported-by: Patrick Brünn <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Tested-by: Patrick Brünn <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * shell: Call CHECK_DECL on stat64 On macOS it is possible to find stat64 at link-time but not at compile-time. To make the build process more robust we should check for the header file as well as the library. Reported-by: Saagar Jha <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * parser: Fix VSLENGTH parsing with trailing garbage On Sat, Jun 19, 2021 at 02:44:46PM +0200, Denys Vlasenko wrote: > > CTLVAR and CTLBACKQ are not properly handled if encountered > inside {$#...}. Testcase: > > dash -c "`printf 'echo ${#1\x82}'`" 00 111 222 > > It should execute "echo ${#1 <byte 0x82> }" and thus print "3" > (the length of $1, which is "111"). > > Instead, it segfaults. > > (Ideally, it should fail since "1 <byte 0x82>" is not a valid > variable name, but currently dash accepts e.g. "${#1abc}" > as if it is "${#1}bc". A separate, less serious bug...). In fact these two bugs are one and the same. This patch fixes both by detecting the invalid substitution and not emitting it into the node tree. Incidentally this reveals a bug in how we parse ${#10} that got introduced recently, which is also fixed here. Reported-by: Denys Vlasenko <[email protected]> Fixes: 7710a92 ("parser: Only accept single-digit parameter...") Signed-off-by: Herbert Xu <[email protected]> * input: Remove special case for unget EOF Commit 17db43b (input: Allow two consecutive calls to pungetc) ensures that EOF is handled like any other character with respect to unget. As a result it's possible to remove the special case for unget of EOF in preadbuffer. Signed-off-by: Ron Yorston <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * expand: Always quote caret when using fnmatch This patch forces ^ to be a literal when we use fnmatch. In order to allow for the extra space to quote the caret, the function _rmescapes will allocate up to twice the memory if the flag RMESCAPE_GLOB is set. Fixes: 7638476 ("shell: Enable fnmatch/glob by default") Reported-by: Christoph Anton Mitterer <[email protected]> Suggested-by: Harald van Dijk <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * expand: Add ifsfree to expand to fix a logic error that causes a buffer over-read On Mon, Jun 20, 2022 at 02:27:10PM -0400, Alex Gorinson wrote: > Due to a logic error in the ifsbreakup function in expand.c if a > heredoc and normal command is run one after the other by means of a > semi-colon, when the second command drops into ifsbreakup the command > will be evaluated with the ifslastp/ifsfirst struct that was set when > the here doc was evaluated. This results in a buffer over-read that > can leak the program's heap, stack, and arena addresses which can be > used to beat ASLR. > > Steps to Reproduce: > First bug: > cmd args: ~/exampleDir/example> dash > $ M='AAAAAAAAAAAAAAAAA' <note: 17 A's> > $ q00(){ > $ <<000;echo > $ ${D?$M$M$M$M$M$M} <note: 6 $M's> > $ 000 > $ } > $ q00 <note: After the q00 is typed in, the leak > should be echo'd out; this works with ash, busybox ash, and dash and > with all option args.> > > Patch: > Adding the following to expand.c will fix both bugs in one go. > (Thank you to Harald van Dijk and Michael Greenberg for doing the > heavy lifting for this patch!) > ========================== > --- a/src/expand.c > +++ b/src/expand.c > @@ -859,6 +859,7 @@ > if (discard) > return -1; > > +ifsfree(); > sh_error("Bad substitution"); > } > > @@ -1739,6 +1740,7 @@ > } else > msg = umsg; > } > +ifsfree(); > sh_error("%.*s: %s%s", end - var - 1, var, msg, tail); > } > ========================== Thanks for the report! I think it's better to add the ifsfree() call to the exception handling path as other sh_error calls may trigger this too. Reported-by: Alex Gorinson <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * eval: Always set exitstatus in evaltree There is no harm in setting exitstatus unconditionally in evaltree. Signed-off-by: Herbert Xu <[email protected]> * eval: Check eflag after redirection error > > This is a POSIX violation, and quite a grave one at that: > set -e is oft[1] used to guard against precisely this type of error! > > The same happens if set -e is executed. > > All quotes POSIX.1, Issue 7, TC2: > sh, OPTIONS: > > The -a, -b, -C, -e, -f, -m, -n, -o option, -u, -v, and -x options > > are described as part of the set utility in Special Built-In > > Utilities. > > set, DESCRIPTION, -e: > > When this option is on, when any command fails (for any of the > > reasons listed in Consequences of Shell Errors or by returning an > > exit status greater than zero), the shell immediately shall exit, as > > if by executing the exit special built-in utility with no arguments, > > with the following exceptions: > > > > 1. The failure of any individual command in a multi-command pipeline > > shall not cause the shell to exit. Only the failure of the > > pipeline itself shall be considered. > > 2. The -e setting shall be ignored when executing the compound list > > following the while, until, if, or elif reserved word, a pipeline > > beginning with the ! reserved word, or any command of an AND-OR > > list other than the last. > > 3. If the exit status of a compound command other than a subshell > > command was the result of a failure while -e was being ignored, > > then -e shall not apply to this command. > > XCU, 2.9.4: Shell Command Language, Shell Commands, Compound Commands: > The while Loop: > > The format of the while loop is as follows: > > > > while compound-list-1 > > do > > compound-list-2 > > done > (until is equivalent). > The if Conditional Construct: > > The format for the if construct is as follows: > > > > if compound-list > > then > > compound-list > > [elif compound-list > > then > > compound-list] ... > > [else > > compound-list] > > fi > > It follows, therefore, that > * Exception 1. does not apply as there is no pipeline > * Exception 2. does not apply, as the redirection does /not/ follow > "while" or "if" directly and is /not/ part of the conditional > compound-list > * in the "for" case, there is no such provision, so this is likely not > a confusion w.r.t. the conditional compound-lists > * Exception 3. does not apply as -e was not being ignored while the > compound commands were being executed (indeed, the compound commands > do not run at all, as evidenced by the program terminating) > > [1]: https://salsa.debian.org/glibc-team/glibc/-/merge_requests/6#note_329899 > ----- End forwarded message ----- Yes we should check the exit status after redirections. Reported-by: наб <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * parser: Add VSBIT to ensure subtype is never zero Harald van Dijk <[email protected]> wrote: > On 21/11/2022 13:08, Harald van Dijk wrote: >> On 21/11/2022 02:38, Christoph Anton Mitterer wrote: >>> reject_filtered_cmd() >>> { >>> reject_and_die "disallowed command${restrict_path_list:+ >>> (restrict-path: \"${restrict_path_list//|/\", \"}\")}" >>> } >>> >>> reject_filtered_cmd >>[...] >> This should either result in the ${...//...} being skipped, or the "Bad >> substitution" error. Currently, what happens instead is it attempts, but >> fails, to skip the ${...//...}. > > The reason it fails is because the word is cut off. > > Variable substitutions are encoded as a CTLVAR special character, > followed by a byte indicating the type of substitution, followed by the > rest of the substitution data. The type of substitution is the VSNORMAL, > VSMINUS, etc. seen in parser.h. An invalid substitution is encoded as a > value of 0. > > When we define a function, we clone the function body in order to > preserve it. Cloning the function body is done by cloning each node. > Cloning a "word" node (NARG) involves copying the characters that make > up the word up to and including the terminating null byte. > > These two interact badly. The invalid substitution is seen as > terminating the word, the rest of the word is not copied, but the > expansion code does not have any way of seeing that anything got cut off > and happily continues attempting to process the rest of the word. > > If dash decides to issue an error in this case, this is not a problem: > the null byte is guaranteed to be copied, and if processing is > guaranteed to stop if a null byte is encountered, everything works out. > > If dash decides to not issue an error in this case, the encoding of bad > substitutions needs to change to a non-null byte. It appears that if we > set the byte to VSNUL, the expansion logic is already able to handle it, > but I have not tested this extensively. Thanks for the analysis Harald! This patch does basically what you've described except it uses a new bit to avoid any confusion with a genuine VSNUL. Fixes: 3df3edd ("[PARSER] Report substition errors at...") Reported-by: Christoph Anton Mitterer <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Cheers, Signed-off-by: Herbert Xu <[email protected]> * eval: Test evalskip before flipping status for NNOT On Tue, Dec 06, 2022 at 10:15:03AM +0000, Harald van Dijk wrote: > > There is a long-standing bug that may or may not be harder to fix if this > patch goes in, depending on how you want to fix it. Here's a script that > already fails on current dash. > > f() { > if ! return 0 > then : > fi > } > f > > This should return 0, and does return 0 in bash and ksh (and almost all > shells), but returns 1 in dash. > > There are a few possible ways of fixing it. Some of them rely on continuing > to conditionally set exitstatus. This can be fixed simply by testing evalskip prior to flipping the status. Reported-by: Harald van Dijk <[email protected]> Signed-off-by: Herbert Xu <[email protected]> * Release 0.5.12. --------- Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Nikolai Merinov <[email protected]> Signed-off-by: Denys Vlasenko <[email protected]> Signed-off-by: Michael Greenberg <[email protected]> Signed-off-by: Jeroen Roovers <[email protected]> Signed-off-by: Mike Gilbert <[email protected]> Signed-off-by: Martin Michlmayr <[email protected]> Signed-off-by: Bjarni Ingi Gislason <[email protected]> Signed-off-by: Ron Yorston <[email protected]> Co-authored-by: Herbert Xu <[email protected]> Co-authored-by: Nikolai Merinov <[email protected]> Co-authored-by: Fangrui Song <[email protected]> Co-authored-by: Denys Vlasenko <[email protected]> Co-authored-by: Jeroen Roovers <[email protected]> Co-authored-by: Martin Michlmayr <[email protected]> Co-authored-by: Bjarni Ingi Gislason <[email protected]> Co-authored-by: C. McEnroe <[email protected]> Co-authored-by: Ron Yorston <[email protected]>
Parsing and unparsing have super-linear runtime, largely because OCaml list append and string concatenation are not optimized.
Parsing: parse_tilde quadratic (/cubic?) behavior
(real/sys runtime omitted for brevity)
Notes:
| c::s' -> parse_tilde (acc @ [c]) s'
. It's not obvious to me why it seems to be cubic (rather than quadratic) runtime though.Parsing: to_assign cubic (?) behavior
This also has an expensive list append operation: ast.ml
| C c :: a -> to_assign (v @ [c]) a
.Unparsing: ^ string concatenation considered harmful
OCaml's "^" string operator is not optimized; concatenating n strings one at a time can take O(n^2) runtime (https://discuss.ocaml.org/t/whats-the-fastest-way-to-concatenate-strings/2616/7). This is arguably a compiler issue e.g., CPython optimizes for common cases (https://mail.python.org/pipermail/python-dev/2013-February/124031.html).
ast.ml's unparsing is essentially a series of "^" operations, hence everything is going to have a worst-case runtime that's super-linear.
Here's an example of a long pipeline, showing quadratic runtime for json_to_shell:
I also tried removing the Ast.to_string operation in json_to_shell.ml, to show that the JSON deserialization by itself is fast (linear); thus, the quadratic runtime is due to the libdash core.
Unparsing: fresh_marker for heredocs is slow on adversarial inputs
fresh_marker tries to find increasingly long-variants of {EOF, EOFF, EOFFF ..}, until it can find a marker that is not contained in the heredoc. This is slow for adversarial inputs that deliberately use all those markers:
The text was updated successfully, but these errors were encountered: