Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Details of Linux itimer implementation and SIGPROF delivery may cause very skewed profiles #252

Open
urisimchoni opened this issue Oct 9, 2018 · 0 comments

Comments

@urisimchoni
Copy link
Contributor

When multiple CPUs are busy running the Java program, the accuracy of honest profiler depends on SIGPROF being delivered to a randomly-chosen (with uniform distribution) running thread. It seems like on Linux this is not the case, at least not when the setitimer() API is being used.

I've tested a physical machine running Fedora 28, kernel 4.18.10-200.fc28.x86_64, and also an aws EC2 m5.xlarge machine running CentOS 7.5

To illustrate consider the following program:

public class Main {
    private static void wait1() {
        long base = System.nanoTime();
        while (System.nanoTime() - base < 1000 * 1000 * 1000)
            ;
    }

    private static void wait2() {
        long base = System.nanoTime();
        while (System.nanoTime() - base < 1000 * 1000 * 1000)
            ;
    }

    public static void main(String args[]) throws InterruptedException {
        Thread t1 = new Thread(() -> {
            wait1();
        });
        Thread t2 = new Thread(() -> {
            wait2();
        });
        t1.start();
        t2.start();
        t1.join();
        t2.join();
    }
}

Compile and run:

javac Main.java
time java -agentpath:liblagent.so=interval=3,logPath=log.hpl,start=1 Main
Starting sampling
Stopping sampling

real	0m1.145s
user	0m2.173s
sys	0m0.034s

The output of the time command shows that two CPUs were running the two threads. We would expect roughly same number of ticks assigned to each thread. The result I get in a typical run is this:
hp

It's important to stress that the issue is with setitimer - it can be demonstrated in a C program as well, and is also reported here

Using timer_create() with per-thread timers (CLOCK_THREAD_CPUTIME_ID) seems to produce accurate profiles - on a test C program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant