Skip to content

Commit

Permalink
Delay the crash system command
Browse files Browse the repository at this point in the history
One of the HanaSR test is about crashing one cluster node running HANA.
Crash command is executed through a ssh channel. Problem is that, as soon as the
system crash, the ssh connection  is interrupted leaving the ssh client blocked.
The idea is: compose the remotely executed command with a sleep and then the crash,
run these two in background. It gives time to the ssh client to close the
session before the crash happening.
Remove the timepout=0 behavior, stop forwarding to run_ssh_command all
the args content.
  • Loading branch information
mpagot committed Sep 13, 2024
1 parent c2a84bf commit 130a755
Showing 1 changed file with 21 additions and 11 deletions.
32 changes: 21 additions & 11 deletions lib/sles4sap_publiccloud.pm
Original file line number Diff line number Diff line change
Expand Up @@ -353,9 +353,9 @@ sub stop_hana {
$args{method} //= 'stop';
my $timeout = bmwqemu::scale_timeout($args{timeout} // 300);
my %commands = (
stop => "HDB stop",
kill => "HDB kill -x",
crash => "echo b > /proc/sysrq-trigger &"
stop => 'HDB stop',
kill => 'HDB kill -x',
crash => 'sudo -b sh -c "sleep 5; echo b > /proc/sysrq-trigger"''
);
croak("HANA stop method '$args{method}' unknown.") unless $commands{$args{method}};
Expand All @@ -368,14 +368,24 @@ sub stop_hana {
if ($args{method} eq "crash") {
# Crash needs to be executed as root and wait for host reboot
$self->{my_instance}->wait_for_ssh(timeout => $timeout);
$self->{my_instance}->run_ssh_command(cmd => "sudo su -c sync", timeout => "0", %args);
# Close SSH mux file before Crash test
$self->{my_instance}->run_ssh_command(cmd => " ", timeout => "0", ssh_opts => "-O exit");
$self->{my_instance}->run_ssh_command(cmd => 'sudo su -c "' . $cmd . '"',
timeout => "0",
# Try only extending ssh_opts
ssh_opts => "-o ServerAliveInterval=2 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o LogLevel=ERROR",
%args);
$self->{my_instance}->run_ssh_command(cmd => "sudo su -c sync", timeout => $timeout);
$self->{my_instance}->run_ssh_command(
cmd => $cmd,
# This timeout is to ensure the run_ssh_command is executed in a reasonable amount of time.
# It is not about how much time the crash is executed remotely, as that one is
# configured to be executed in background.
# So, in theory, the run_ssh_command is expected to immediately return.
# Also consider that internally run_ssh_command is using this value for
# two different guard mechanism.
timeout => 10,
# Try only extending ssh_opts. -fn is needed to be able to detachleaving crash to run in background
ssh_opts => "-fn -o ServerAliveInterval=2 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o LogLevel=DEBUG3");
# crash has sleep 5 executed remotely. As it also has & to put in background,
# run_ssh_command return immediately, so even before the remote system execute the crash command
# So now sleep to wait that remote system has time to execute the crash procedure.
sleep 50;
# It is better to wait till ssh disappear
record_info("Wait ssh disappear start");
my $out = $self->{my_instance}->wait_for_ssh(timeout => 60, wait_stop => 1);
Expand Down

0 comments on commit 130a755

Please sign in to comment.