Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM: scan linux kernel bytecode #624

Closed
small-cat opened this issue May 31, 2023 · 14 comments
Closed

OOM: scan linux kernel bytecode #624

small-cat opened this issue May 31, 2023 · 14 comments
Labels
external bug A bug in depending code

Comments

@small-cat
Copy link

I use wllvm to build kernel 5.10.59, and extract bytecode vmlinux.bc, when I use phasar

./phasar-cli -m vmlinux.bc -D ide-solvertest

the process was killed after a while, and I found from the top command that the process need so large memory even 80G+ on macos, and process was killed by OS.

kernel could be built with clang lto, which will aggregate all the llvm ir into a whole file and optimize on it, but why phasar need so large memory to read bytecode and do the analysis?

Is this a bug?

@small-cat
Copy link
Author

the size of vmlinux.bc is about 387M, and phasar could read llvm ir successfully

./phasar-cli -m vmlinux.bc --emit-ir

@small-cat
Copy link
Author

after debugging with lldb, the program trapped at function computeFunctionsAliasSet

@MMory
Copy link
Member

MMory commented May 31, 2023

Hi @small-cat,

it's interesting that you intend to analyze the kernel with phasar.

In order to reproduce your issue the following would be helpful:

a) Which phasar version are you using? Which kernel configuration did you choose? What command lines did you use to configure/build the kernel?
b) How much memory does the system have in total? How long did you run the analysis before it crashed?

Your invocation of phasar computes interprocedural whole-program points-to information as a pre-analysis. The compiler does not do this and thus does not need that much of memory. Your issue could be related to some other bug, but could also be "just" the OS killing the process due to memory exhaustion.

The challenge for me reproducing this issue might actually be the memory size of my machine, which is below 80GB. I might get a machine with more memory to try myself.

Best regards
Martin

@small-cat
Copy link
Author

I use phasar v0323 and llvm 14.0.0, I used allyesconfig to build kernel (5.10.59)

make ARCH=arm64 LLVM=1 LLVM_IAS=1 allyesconfig

memory of my computer is 32GB, when the process was killed, it lasts about 8-10 min.

I check logs from dmesg and make sure it was killed because of oom-kill

@MMory
Copy link
Member

MMory commented Jun 2, 2023

What happens if you use a kernel config that includes fewer parts of the kernel? I just failed to build the kernel with clang 14.0.6 and allyesconfig, but not ARCH=arm64 because it seems I don't have clang built with that backend. I'm trying a defconfig build now.

@MMory
Copy link
Member

MMory commented Jun 2, 2023

With a defconfig build it is also running out of memory for me. The culprit is the points-to analysis, looks like it is grossly overapproximating and thus needing lots of memory. With tinyconfig it succeeds.

What is your use case? Are you interested in analyzing the whole kernel?

We need to look if the exploding memory usage is a bug or a forced consequence of the points-to analysis.

@small-cat
Copy link
Author

Yes, I try to analyze the whole kernel. If I tailor the kernel with few parts, phasar looks fine.

@small-cat
Copy link
Author

When computeValuesAliasSet in LLVMAliasSet.cpp, program will run out of memory, I found so many function calls in llvm ir is the llvm apis, and are these llvm apis necessary for the anaysis, and will affect the analysis process? I wonder if I can skip these functions and only pay attention to the code (functions) we want, instead of the llvm ir builtin functions.

@fabianbs96
Copy link
Member

Hi @small-cat, I could reproduce the error on my system (I compiled with defconfig and ARCH=x86_64, but should not really matter).

I could boil down the problem to a single alias-query in the LLVMAliasSet::computeFunctionsAliasSet function that I isolated to exclude any interference of phasar:

int main(int Argc, const char **Argv) {
  if (Argc < 2 || !std::filesystem::exists(Argv[1]) ||
      std::filesystem::is_directory(Argv[1])) {
    llvm::errs() << "myphasartool\n"
                    "A small PhASAR-based example program\n\n"
                    "Usage: myphasartool path/to/vmlinux.bc\n";
    return 1;
  }

  llvm::LLVMContext Ctx;
  llvm::SMDiagnostic Diag;
  auto Mod = llvm::parseIRFile(Argv[1], Diag, Ctx);
  if (!Mod) {
    Diag.print(nullptr, llvm::errs());
    return 1;
  }

  auto Fun = Mod->getFunction("x86_64_start_kernel");
  assert(Fun);

  auto Arg = Fun->getArg(0);
  assert(Arg);

  auto Glob = Mod->getNamedGlobal("pgdir_shift");
  assert(Glob);

  llvm::PassBuilder PB;
  llvm::FunctionAnalysisManager FAM;

  FAM.registerPass([&] {
    llvm::AAManager AA;
    AA.registerFunctionAnalysis<llvm::CFLAndersAA>();
    AA.registerFunctionAnalysis<llvm::TypeBasedAA>();
    AA.registerFunctionAnalysis<llvm::BasicAA>();
    return AA;
  });

  PB.registerFunctionAnalyses(FAM);

  llvm::FunctionPassManager FPM;
  std::ignore = FPM.run(*Fun, FAM);
  llvm::AAResults &AAR = FAM.getResult<llvm::AAManager>(*Fun);

  std::ignore = AAR.alias(Arg, Glob);
}

Just running the above code snippet leads to the same error (you may need to adjust the function/global names). This leads to the strong conclusion that the error indeed is within LLVM in the CFLAndersAA implementation that unfortunately is no longer maintained by LLVM.
In the code snippet above we try to resolve an aliasing relationship between a function parameter and a global variable which leads the CFL analysis to recursive analyze the callers of the function (in this case all transitive callers of x86_64_start_kernel) leading to a state explosion that they do not seem to handle properly.

In #626 we provide an option to disable the CFL alias analysis; you may want to try this out.
Until #626 is merged, you may also want to try out a different call-graph resolver strategy (if the dataflow analysis that you want to perform does not need alias info). For that you can use the --call-graph-analysis option of the phasar-cli tool to e.g., set the callgraph resolver to rta that does not depend on alias information.

@fabianbs96 fabianbs96 added the external bug A bug in depending code label Jun 6, 2023
@fabianbs96
Copy link
Member

Small addition: I have tried running phasar with --alias-analysis=cflsteens and it seems to work; however, the precision of the analysis results then may be worse than with the anderson analysis and llvm::CFGSteensAA is as unmaintained by LLVM as CFGAndersAA, but for this case it seems to work.

Note, that we are working on alias/pointer analyses implemented completely within PhASAR, but getting this right is a challenging task that will take quite a while still.

@small-cat
Copy link
Author

small-cat commented Jun 14, 2023

@fabianbs96 Thank you very much for your reply. I try call graph analysis with cha instead of otf(default), and test empty analysis on vmlinux.bc, the command is the following:

./phasar-cli -m vmlinux.bc -C cha -D ifds-solvertest --entry-points=irq_enter

the program terminated because of a coredump, and I try ifds-uninit, ide-solvertest, coredump too.

I read the paper to know the principle of the ifds framework and debug the program, the bug occurs when compute PathEdge in the tabulate algorithm in propagate() function, but not always. Each time when I run the program, the coredump point from backtrace is not the same. The esg is so big to analyze the bug, and I does not reproduce the bug by a small case so far.

May I send you an email, or phasar has the forum/community to discuss?

@small-cat
Copy link
Author

I found the cause of coredump. It is because of the stack overflow. Phasar implements the ifds/ide framework by recursive. However, when phasar analyze the llvm ir of kernel, the depth of recursion is so deep that caused the stack overflow error. set the stack size ulimit -s unlimited could resolve the error so far on linux.

@fabianbs96
Copy link
Member

Hi @small-cat,
sorry for the late reply. You are right, PhASAR's IDESolver is highly recursive in the version that you use. In the meantime, we have modified the solver to work in an iterative way instead. So, if you can upgrade, you can already make use of it. Then you don't need to increase the stack limit anymore.

@small-cat
Copy link
Author

@fabianbs96 Sorry, I do not notice the commit after v0323. I will upgrade and have a try, thanks very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external bug A bug in depending code
Projects
None yet
Development

No branches or pull requests

3 participants