Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] correctly initalize CONTEXT before failing fast #81010

Merged
merged 7 commits into from
Feb 6, 2023

Conversation

AustinWise
Copy link
Contributor

@AustinWise AustinWise commented Jan 23, 2023

Short Explanation

Basically the CONTEXT structure is not being initalized properly before calling RaiseFailFastException.

Long Explanation

Consider this program:

using System;
using System.Globalization;

static class Program
{
    static void FillStack(byte value)
    {
        Span<byte> bytes = stackalloc byte[1024 * 16];
        bytes.Fill(value);
    }

    static void Main(string[] args)
    {
        if (args.Length != 0)
        {
            byte fillValue = byte.Parse(args[0], NumberStyles.HexNumber);
            AppDomain.CurrentDomain.UnhandledException += (_,_) => FillStack(fillValue);
        }
        throw new Exception();
    }
}

On NativeAOT on Windows, the last act of a process that throws an unhanded exception is to call RaiseFailFastException here. The instruction pointer and CONTEXT structure passed to RaiseFailFastException point at the function that raised the unhanded exception. If you have a debugger like Visual Studio attached after when RaiseFailFastException is called, you will see a call stack pointing to the faulting function.

Depending on what argument you pass to the program, you get different call stacks.

If you pass 0 you get:

 	ntdll.dll!NtRaiseException�()	Unknown
 	KernelBase.dll!RaiseFailFastException�()	Unknown
 	reproNative.exe!S_P_CoreLib_Interop_Kernel32__RaiseFailFastException_0�()	Unknown
>	reproNative.exe!S_P_CoreLib_Interop_Kernel32__RaiseFailFastException() Line 46	Unknown
 	reproNative.exe!S_P_CoreLib_System_RuntimeExceptionHelpers__FailFast_1() Line 279	Unknown
 	reproNative.exe!S_P_CoreLib_System_RuntimeExceptionHelpers__RuntimeFailFast() Line 219	Unknown
 	reproNative.exe!S_P_CoreLib_System_Runtime_EH__UnhandledExceptionFailFastViaClasslib() Line 205	Unknown
 	reproNative.exe!S_P_CoreLib_System_Runtime_EH__DispatchEx() Line 650	Unknown
 	reproNative.exe!S_P_CoreLib_System_Runtime_EH__RhThrowEx() Line 572	Unknown
 	reproNative.exe!RhpThrowEx() Line 190	Unknown
 	reproNative.exe!repro_Program__Main() Line 24	Unknown
 	reproNative.exe!repro__Module___MainMethodWrapper()	Unknown
 	reproNative.exe!repro__Module___StartupCodeMain()	Unknown
 	reproNative.exe!wmain(int argc, wchar_t * * argv) Line 216	C++
 	reproNative.exe!invoke_main() Line 91	C++
 	reproNative.exe!__scrt_common_main_seh() Line 288	C++
 	reproNative.exe!__scrt_common_main() Line 331	C++
 	reproNative.exe!wmainCRTStartup(void * __formal) Line 17	C++
 	kernel32.dll!BaseThreadInitThunk()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

That's less than ideal as it does not directly point at the faulting function.

If you pass ff you get garbage:

 	00000000f2d49ff1()	Unknown
 	000001ed07011578()	Unknown
 	000001ed07011548()	Unknown
>	reproNative.exe!repro_Program___c__DisplayClass1_0___Main_b__0() Line 22	Unknown

After this PR is merged, the call stack correctly points at the faulting function:

>	reproNative.exe!repro_Program__Main() Line 24	Unknown
 	reproNative.exe!repro__Module___MainMethodWrapper()	Unknown
 	reproNative.exe!repro__Module___StartupCodeMain()	Unknown
 	reproNative.exe!wmain(int argc, wchar_t * * argv) Line 216	C++
 	reproNative.exe!invoke_main() Line 91	C++
 	reproNative.exe!__scrt_common_main_seh() Line 288	C++
 	reproNative.exe!__scrt_common_main() Line 331	C++
 	reproNative.exe!wmainCRTStartup(void * __formal) Line 17	C++
 	kernel32.dll!BaseThreadInitThunk()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

TODO before merging:

  • AMD64: capture CS, SS, and RFLAGS into PAL_LIMITED_CONTEXT when throwing and exception
  • AMD64: restore CS, SS, and RFLAGS in RhpCopyContextFromExInfo
  • In RhpCopyContextFromExInfo, capture the CS and SS segment registers from the current thread and stuff them into the CONTEXT.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 23, 2023
@ghost
Copy link

ghost commented Jan 23, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Short Explanation

Basically the CONTEXT structure is not being initalized properly before calling RaiseFailFastException.

Long Explanation

Consider this program:

using System;
using System.Globalization;

static class Program
{
    static void FillStack(byte value)
    {
        Span<byte> bytes = stackalloc byte[1024 * 16];
        bytes.Fill(value);
    }

    static void Main(string[] args)
    {
        if (args.Length != 0)
        {
            byte fillValue = byte.Parse(args[0], NumberStyles.HexNumber);
            AppDomain.CurrentDomain.UnhandledException += (_,_) => FillStack(fillValue);
        }
        throw new Exception();
    }
}

When run on Windows, the last thing it does is call RaiseFailFastException here. The parameters passed to RaiseFailFastException point at the function that raised the unhanded exception. If you debug the process, you can see a call stack after it calls this function. Depending on what argument you pass to the program, you get different call stacks.

If you pass 0 you get:

 	ntdll.dll!NtRaiseException�()	Unknown
 	KernelBase.dll!RaiseFailFastException�()	Unknown
 	reproNative.exe!S_P_CoreLib_Interop_Kernel32__RaiseFailFastException_0�()	Unknown
>	reproNative.exe!S_P_CoreLib_Interop_Kernel32__RaiseFailFastException() Line 46	Unknown
 	reproNative.exe!S_P_CoreLib_System_RuntimeExceptionHelpers__FailFast_1() Line 279	Unknown
 	reproNative.exe!S_P_CoreLib_System_RuntimeExceptionHelpers__RuntimeFailFast() Line 219	Unknown
 	reproNative.exe!S_P_CoreLib_System_Runtime_EH__UnhandledExceptionFailFastViaClasslib() Line 205	Unknown
 	reproNative.exe!S_P_CoreLib_System_Runtime_EH__DispatchEx() Line 650	Unknown
 	reproNative.exe!S_P_CoreLib_System_Runtime_EH__RhThrowEx() Line 572	Unknown
 	reproNative.exe!RhpThrowEx() Line 190	Unknown
 	reproNative.exe!repro_Program__Main() Line 24	Unknown
 	reproNative.exe!repro__Module___MainMethodWrapper()	Unknown
 	reproNative.exe!repro__Module___StartupCodeMain()	Unknown
 	reproNative.exe!wmain(int argc, wchar_t * * argv) Line 216	C++
 	reproNative.exe!invoke_main() Line 91	C++
 	reproNative.exe!__scrt_common_main_seh() Line 288	C++
 	reproNative.exe!__scrt_common_main() Line 331	C++
 	reproNative.exe!wmainCRTStartup(void * __formal) Line 17	C++
 	kernel32.dll!BaseThreadInitThunk()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

That's less than ideal as it does not directly point at the faulting function.

If you pass ff you get garbage:

 	00000000f2d49ff1()	Unknown
 	000001ed07011578()	Unknown
 	000001ed07011548()	Unknown
>	reproNative.exe!repro_Program___c__DisplayClass1_0___Main_b__0() Line 22	Unknown

After this PR is merged, the call stack correctly points at the faulting function:

>	reproNative.exe!repro_Program__Main() Line 24	Unknown
 	reproNative.exe!repro__Module___MainMethodWrapper()	Unknown
 	reproNative.exe!repro__Module___StartupCodeMain()	Unknown
 	reproNative.exe!wmain(int argc, wchar_t * * argv) Line 216	C++
 	reproNative.exe!invoke_main() Line 91	C++
 	reproNative.exe!__scrt_common_main_seh() Line 288	C++
 	reproNative.exe!__scrt_common_main() Line 331	C++
 	reproNative.exe!wmainCRTStartup(void * __formal) Line 17	C++
 	kernel32.dll!BaseThreadInitThunk()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

TODO before merging:

  • AMD64: capture CS, SS, and RFLAGS into PAL_LIMITED_CONTEXT when throwing and exception
  • AMD64: restore CS, SS, and RFLAGS in RhpCopyContextFromExInfo

I won't bother with x86 since it not currently supported by NativeAOT.

Author: AustinWise
Assignees: -
Labels:

community-contribution, area-NativeAOT-coreclr

Milestone: -

@jkotas
Copy link
Member

jkotas commented Jan 23, 2023

capture CS, SS, and RFLAGS into PAL_LIMITED_CONTEXT when throwing and exception

I do not think we want to grow PAL_LIMITED_CONTEXT for this. We can capture these registers from current thread by calling GetThreadContext or similar API.

@AustinWise AustinWise marked this pull request as ready for review January 27, 2023 18:21
@AustinWise
Copy link
Contributor Author

I did not bother getting the segment registers when running on non-Windows, as the only use of the CONTEXT downstream of RhpCopyContextFromExInfo is on Windows.

@VSadov
Copy link
Member

VSadov commented Jan 27, 2023

I did not bother getting the segment registers when running on non-Windows, as the only use of the CONTEXT downstream of RhpCopyContextFromExInfo is on Windows.

In such case there is no need for a TODO in there. It would be better to instead put a comment justifying that we do not care about these on non-Windows. Maybe zero-init them for determinism.

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@AustinWise
Copy link
Contributor Author

FYI, this problem reproduces with .NET 7.0.2 on Windows 11 22H2. So it should be considered for servicing to the .NET 7 branch.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@AustinWise
Copy link
Contributor Author

Test failure looks like #75244. It the "coreclr Pri0 Runtime Tests Run windows x86 checked" job that failed, so presumably that's unrelated to this NativeAOT change.

@jkotas jkotas merged commit 29b6a51 into dotnet:main Feb 6, 2023
@jkotas
Copy link
Member

jkotas commented Feb 6, 2023

FYI, this problem reproduces with .NET 7.0.2 on Windows 11 22H2. So it should be considered for servicing to the .NET 7 branch.

I am not able to reproduce it. What is the program that you were able to reproduce it with?

@AustinWise
Copy link
Contributor Author

AustinWise commented Feb 6, 2023

FYI, this problem reproduces with .NET 7.0.2 on Windows 11 22H2. So it should be considered for servicing to the .NET 7 branch.

I am not able to reproduce it. What is the program that you were able to reproduce it with?

I made a GitHub repo with the reproduction program and a script that builds and runs:

https://github.com/AustinWise/dotnet-81010

Also included in the readme is the exact versions of software used.

@AustinWise AustinWise deleted the austin/FailFastContext branch February 6, 2023 06:06
@jkotas
Copy link
Member

jkotas commented Feb 8, 2023

Thanks for great repro!

@jkotas
Copy link
Member

jkotas commented Feb 8, 2023

/backport to release/7.0

@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2023

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/4127325824

@ghost ghost locked as resolved and limited conversation to collaborators Mar 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants