OOM: scan linux kernel bytecode #624

small-cat · 2023-05-31T03:35:10Z

I use wllvm to build kernel 5.10.59, and extract bytecode vmlinux.bc, when I use phasar

./phasar-cli -m vmlinux.bc -D ide-solvertest

the process was killed after a while, and I found from the top command that the process need so large memory even 80G+ on macos, and process was killed by OS.

kernel could be built with clang lto, which will aggregate all the llvm ir into a whole file and optimize on it, but why phasar need so large memory to read bytecode and do the analysis?

Is this a bug?

small-cat · 2023-05-31T03:40:29Z

the size of vmlinux.bc is about 387M, and phasar could read llvm ir successfully

./phasar-cli -m vmlinux.bc --emit-ir

small-cat · 2023-05-31T08:53:10Z

after debugging with lldb, the program trapped at function computeFunctionsAliasSet

MMory · 2023-05-31T12:54:08Z

Hi @small-cat,

it's interesting that you intend to analyze the kernel with phasar.

In order to reproduce your issue the following would be helpful:

a) Which phasar version are you using? Which kernel configuration did you choose? What command lines did you use to configure/build the kernel?
b) How much memory does the system have in total? How long did you run the analysis before it crashed?

Your invocation of phasar computes interprocedural whole-program points-to information as a pre-analysis. The compiler does not do this and thus does not need that much of memory. Your issue could be related to some other bug, but could also be "just" the OS killing the process due to memory exhaustion.

The challenge for me reproducing this issue might actually be the memory size of my machine, which is below 80GB. I might get a machine with more memory to try myself.

Best regards
Martin

small-cat · 2023-06-01T03:24:48Z

I use phasar v0323 and llvm 14.0.0, I used allyesconfig to build kernel (5.10.59)

make ARCH=arm64 LLVM=1 LLVM_IAS=1 allyesconfig

memory of my computer is 32GB, when the process was killed, it lasts about 8-10 min.

I check logs from dmesg and make sure it was killed because of oom-kill

MMory · 2023-06-02T11:13:58Z

What happens if you use a kernel config that includes fewer parts of the kernel? I just failed to build the kernel with clang 14.0.6 and allyesconfig, but not ARCH=arm64 because it seems I don't have clang built with that backend. I'm trying a defconfig build now.

MMory · 2023-06-02T12:31:56Z

With a defconfig build it is also running out of memory for me. The culprit is the points-to analysis, looks like it is grossly overapproximating and thus needing lots of memory. With tinyconfig it succeeds.

What is your use case? Are you interested in analyzing the whole kernel?

We need to look if the exploding memory usage is a bug or a forced consequence of the points-to analysis.

small-cat · 2023-06-05T02:35:42Z

Yes, I try to analyze the whole kernel. If I tailor the kernel with few parts, phasar looks fine.

small-cat · 2023-06-06T11:56:38Z

When computeValuesAliasSet in LLVMAliasSet.cpp, program will run out of memory, I found so many function calls in llvm ir is the llvm apis, and are these llvm apis necessary for the anaysis, and will affect the analysis process? I wonder if I can skip these functions and only pay attention to the code (functions) we want, instead of the llvm ir builtin functions.

fabianbs96 · 2023-06-06T17:56:29Z

Hi @small-cat, I could reproduce the error on my system (I compiled with defconfig and ARCH=x86_64, but should not really matter).

I could boil down the problem to a single alias-query in the LLVMAliasSet::computeFunctionsAliasSet function that I isolated to exclude any interference of phasar:

int main(int Argc, const char **Argv) {
  if (Argc < 2 || !std::filesystem::exists(Argv[1]) ||
      std::filesystem::is_directory(Argv[1])) {
    llvm::errs() << "myphasartool\n"
                    "A small PhASAR-based example program\n\n"
                    "Usage: myphasartool path/to/vmlinux.bc\n";
    return 1;
  }

  llvm::LLVMContext Ctx;
  llvm::SMDiagnostic Diag;
  auto Mod = llvm::parseIRFile(Argv[1], Diag, Ctx);
  if (!Mod) {
    Diag.print(nullptr, llvm::errs());
    return 1;
  }

  auto Fun = Mod->getFunction("x86_64_start_kernel");
  assert(Fun);

  auto Arg = Fun->getArg(0);
  assert(Arg);

  auto Glob = Mod->getNamedGlobal("pgdir_shift");
  assert(Glob);

  llvm::PassBuilder PB;
  llvm::FunctionAnalysisManager FAM;

  FAM.registerPass([&] {
    llvm::AAManager AA;
    AA.registerFunctionAnalysis<llvm::CFLAndersAA>();
    AA.registerFunctionAnalysis<llvm::TypeBasedAA>();
    AA.registerFunctionAnalysis<llvm::BasicAA>();
    return AA;
  });

  PB.registerFunctionAnalyses(FAM);

  llvm::FunctionPassManager FPM;
  std::ignore = FPM.run(*Fun, FAM);
  llvm::AAResults &AAR = FAM.getResult<llvm::AAManager>(*Fun);

  std::ignore = AAR.alias(Arg, Glob);
}

Just running the above code snippet leads to the same error (you may need to adjust the function/global names). This leads to the strong conclusion that the error indeed is within LLVM in the CFLAndersAA implementation that unfortunately is no longer maintained by LLVM.
In the code snippet above we try to resolve an aliasing relationship between a function parameter and a global variable which leads the CFL analysis to recursive analyze the callers of the function (in this case all transitive callers of x86_64_start_kernel) leading to a state explosion that they do not seem to handle properly.

In #626 we provide an option to disable the CFL alias analysis; you may want to try this out.
Until #626 is merged, you may also want to try out a different call-graph resolver strategy (if the dataflow analysis that you want to perform does not need alias info). For that you can use the --call-graph-analysis option of the phasar-cli tool to e.g., set the callgraph resolver to rta that does not depend on alias information.

fabianbs96 · 2023-06-11T16:37:25Z

Small addition: I have tried running phasar with --alias-analysis=cflsteens and it seems to work; however, the precision of the analysis results then may be worse than with the anderson analysis and llvm::CFGSteensAA is as unmaintained by LLVM as CFGAndersAA, but for this case it seems to work.

Note, that we are working on alias/pointer analyses implemented completely within PhASAR, but getting this right is a challenging task that will take quite a while still.

small-cat · 2023-06-14T12:24:41Z

@fabianbs96 Thank you very much for your reply. I try call graph analysis with cha instead of otf(default), and test empty analysis on vmlinux.bc, the command is the following:

./phasar-cli -m vmlinux.bc -C cha -D ifds-solvertest --entry-points=irq_enter

the program terminated because of a coredump, and I try ifds-uninit, ide-solvertest, coredump too.

I read the paper to know the principle of the ifds framework and debug the program, the bug occurs when compute PathEdge in the tabulate algorithm in propagate() function, but not always. Each time when I run the program, the coredump point from backtrace is not the same. The esg is so big to analyze the bug, and I does not reproduce the bug by a small case so far.

May I send you an email, or phasar has the forum/community to discuss?

small-cat · 2023-07-05T03:13:14Z

I found the cause of coredump. It is because of the stack overflow. Phasar implements the ifds/ide framework by recursive. However, when phasar analyze the llvm ir of kernel, the depth of recursion is so deep that caused the stack overflow error. set the stack size ulimit -s unlimited could resolve the error so far on linux.

fabianbs96 · 2023-07-08T12:34:02Z

Hi @small-cat,
sorry for the late reply. You are right, PhASAR's IDESolver is highly recursive in the version that you use. In the meantime, we have modified the solver to work in an iterative way instead. So, if you can upgrade, you can already make use of it. Then you don't need to increase the stack limit anymore.

small-cat · 2023-07-13T07:01:31Z

@fabianbs96 Sorry, I do not notice the commit after v0323. I will upgrade and have a try, thanks very much.

fabianbs96 added the external bug A bug in depending code label Jun 6, 2023

small-cat closed this as completed Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM: scan linux kernel bytecode #624

OOM: scan linux kernel bytecode #624

small-cat commented May 31, 2023

small-cat commented May 31, 2023

small-cat commented May 31, 2023

MMory commented May 31, 2023

small-cat commented Jun 1, 2023

MMory commented Jun 2, 2023

MMory commented Jun 2, 2023

small-cat commented Jun 5, 2023

small-cat commented Jun 6, 2023

fabianbs96 commented Jun 6, 2023

fabianbs96 commented Jun 11, 2023

small-cat commented Jun 14, 2023 •

edited

Loading

small-cat commented Jul 5, 2023

fabianbs96 commented Jul 8, 2023

small-cat commented Jul 13, 2023

OOM: scan linux kernel bytecode #624

OOM: scan linux kernel bytecode #624

Comments

small-cat commented May 31, 2023

small-cat commented May 31, 2023

small-cat commented May 31, 2023

MMory commented May 31, 2023

small-cat commented Jun 1, 2023

MMory commented Jun 2, 2023

MMory commented Jun 2, 2023

small-cat commented Jun 5, 2023

small-cat commented Jun 6, 2023

fabianbs96 commented Jun 6, 2023

fabianbs96 commented Jun 11, 2023

small-cat commented Jun 14, 2023 • edited Loading

small-cat commented Jul 5, 2023

fabianbs96 commented Jul 8, 2023

small-cat commented Jul 13, 2023

small-cat commented Jun 14, 2023 •

edited

Loading