Quantcast
Channel: Timur Akhmadeev – Official Pythian Blog
Viewing all articles
Browse latest Browse all 14

A Simple Way to Monitor Java in Linux

$
0
0

A quick and easy way to know what is it inside Java process that is using your CPU. Using just Linux command line tools and JDK supplied command line utilities.

Introduction

Here are a few things you need to know before starting. Following the links is not necessary, they are available for the reference.

  • there are different vendors of Java Virtual Machine. This post is about Oracle’s JVM which is called HotSpot. Linux x86-64 is considered as OS platform. Most of the things about HotSpot are applicable to other vendors too but with slight changes. OSes other than Linux may add some more complications
  • it’s called Virtual Machine, because it virtualizes runtime environment for a Java application. So to know where to look at you need to know a few things about how specific VM is organized. For a detailed overview of the HotSpot, please refer to this article
  • on Linux, a thread inside HotSpot VM is mapped directly to an OS level thread. Well, it may not be true on all OSes, but for modern Linux kernels this is true. So every thread on the OS level is a thread inside a Java application
  • there are generally two types of threads inside a HotSpot VM: native and application threads. Application threads are those that run some Java code, and usually this is what applications are using to run their code. Native threads run something which is not written in Java, usually it’s code in C/C++ and usually all these threads are special utility threads started by a VM itself.

Identifying Threads

Since a Java program may start many threads, each executing some program code, it is necessary to understand which threads are using CPUs. On Linux, top -H will show you CPU usage on a per-thread basis. Here is an example. First, a process which consumes CPU:

top - 16:32:29 up 10:29,  3 users,  load average: 1.08, 0.64, 0.56
Tasks: 172 total,   1 running, 171 sleeping,   0 stopped,   0 zombie
Cpu(s): 48.7%us, 51.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   1500048k total,  1476708k used,    23340k free,    62012k buffers
Swap:  4128764k total,    75356k used,  4053408k free,   919836k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7599 oracle    20   0 1151m  28m 9100 S 85.5  1.9   0:12.67 java
 2575 oracle    -2   0  709m  10m 8336 S 10.6  0.7  47:34.05 oracle
 2151 root      20   0  207m  44m 6420 S  1.7  3.0   0:27.18 Xorg

If we check the details of CPU usage for PID=7599 with “top -H -p 7599”, then we will see something like this:

top - 16:40:39 up 10:37,  3 users,  load average: 1.47, 1.25, 0.90
Tasks:  10 total,   1 running,   9 sleeping,   0 stopped,   0 zombie
Cpu(s): 49.3%us, 50.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   1500048k total,  1460468k used,    39580k free,    50140k buffers
Swap:  4128764k total,    76208k used,  4052556k free,   912644k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7601 oracle    20   0 1151m  28m 9100 R 85.9  1.9   7:19.98 java
 7602 oracle    20   0 1151m  28m 9100 S  1.0  1.9   0:02.95 java
 7599 oracle    20   0 1151m  28m 9100 S  0.0  1.9   0:00.01 java

So there is 1 execution thread inside a Java process, which is constantly on top and is utilizing around 85% of a single core.

Now the next thing to know is: what is this thread doing. To answer that question we need to know two things: thread stacks from a Java process and a way to map OS level thread to a Java thread. As I mentioned previously, there is one to one mapping between OS and Java level threads in HotSpot running on Linux.

To get a thread dump we need to use a JDK utility called jstack:

[oracle@oel6u4-2 test]$ jstack 7599
2014-02-28 16:57:23
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.12-b01 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f05a0001000 nid=0x1e66 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Low Memory Detector" daemon prio=10 tid=0x00007f05c4088000 nid=0x1db8 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007f05c4085800 nid=0x1db7 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007f05c4083000 nid=0x1db6 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007f05c4081000 nid=0x1db5 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007f05c4064800 nid=0x1db4 in Object.wait() [0x00007f05c0631000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000eb8a02e0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
	- locked <0x00000000eb8a02e0> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x00007f05c4062800 nid=0x1db3 in Object.wait() [0x00007f05c0732000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000eb8a0380> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:485)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
	- locked <0x00000000eb8a0380> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]
   java.lang.Thread.State: RUNNABLE
	at java.security.SecureRandom.getInstance(SecureRandom.java:254)
	at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:176)
	at java.security.SecureRandom.<init>(SecureRandom.java:133)
	at RandomUser.main(RandomUser.java:9)

"VM Thread" prio=10 tid=0x00007f05c405c000 nid=0x1db2 runnable

"VM Periodic Task Thread" prio=10 tid=0x00007f05c408b000 nid=0x1db9 waiting on condition

JNI global references: 975

To map OS level thread to a Java thread in a thread dump, we need to convert native thread ID from Linux to base 16, and search for “nid=$ID” in the stack trace. In our case thread ID is 7601 which is 0x1db1, and the Java thread had following stack trace at the time of running jstack:

"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]
   java.lang.Thread.State: RUNNABLE
	at java.security.SecureRandom.getInstance(SecureRandom.java:254)
	at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:176)
	at java.security.SecureRandom.<init>(SecureRandom.java:133)
	at RandomUser.main(RandomUser.java:9)

A Way to Monitor

Here is a way to get a stack trace of a thread inside a Java process with command line tools (PID and TID are Process ID of Java process, and Thread ID of an interesting thread on the OS level):

[oracle@oel6u4-2 test]$ jstack $PID | awk '/ nid='"$(printf '%#x' $TID)"' /,/^$/'
"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:220)
	at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:185)
	at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(NativePRNG.java:247)
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:261)
	- locked <0x00000000eb8a3370> (a java.lang.Object)
	at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:108)
	at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:97)
	at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
	- locked <0x00000000e43adc90> (a java.security.SecureRandom)
	at java.security.SecureRandom.next(SecureRandom.java:455)
	at java.util.Random.nextInt(Random.java:189)
	at RandomUser.main(RandomUser.java:9)

As you can see here, the thread is executing a main method of RandomUser class – at least at the time of taking a thread dump. If you would like to see how this changes over time, then a simple watch command may help to get an idea if this thread stack changes frequently or not:

watch -n .5 "jstack $PID | awk '/ nid='"$(printf '%#x' $TID)"' /,/^$/'"

Every 0.5s: jstack 7599 | awk '/ nid='0x1db1' /,/^$/'                                                                                Fri Mar 14 16:29:37 2014

"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]
   java.lang.Thread.State: RUNNABLE
        at java.util.LinkedHashMap$LinkedHashIterator.<init>(LinkedHashMap.java:345)
        at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383)
        at java.util.LinkedHashMap$KeyIterator.<init>(LinkedHashMap.java:383)
        at java.util.LinkedHashMap.newKeyIterator(LinkedHashMap.java:396)
        at java.util.HashMap$KeySet.iterator(HashMap.java:874)
        at java.util.HashSet.iterator(HashSet.java:153)
        at java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1005)
        at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1004)
        at java.security.SecureRandom.getPrngAlgorithm(SecureRandom.java:523)
        at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:165)
        at java.security.SecureRandom.<init>(SecureRandom.java:133)
        at RandomUser.main(RandomUser.java:9)

So this way you could see what the application thread is doing right now. Since it could be quite a lot of different type of work, the next reasonable step is to add aggregation.

[oracle@oel6u4-2 test]$ ./prof.sh 7599 7601
Sampling PID=7599 every 0.5 seconds for 10 samples
      6  "main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]    java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:220)
	at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:185)
	at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(NativePRNG.java:247)
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:261)
	- locked <address> (a java.lang.Object)
	at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:108)
	at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:97)
	at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
	- locked <address> (a java.security.SecureRandom)
	at java.security.SecureRandom.next(SecureRandom.java:455)
	at java.util.Random.nextInt(Random.java:189)
	at RandomUser.main(RandomUser.java:9)

      2  "main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]    java.lang.Thread.State: RUNNABLE
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:268)
	- locked <address> (a java.lang.Object)
	at sun.security.provider.NativePRNG$RandomIO.access$200(NativePRNG.java:108)
	at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:97)
	at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
	- locked <address> (a java.security.SecureRandom)
	at java.security.SecureRandom.next(SecureRandom.java:455)
	at java.util.Random.nextInt(Random.java:189)
	at RandomUser.main(RandomUser.java:9)

      1  "main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]    java.lang.Thread.State: RUNNABLE
	at java.util.Random.nextInt(Random.java:189)
	at RandomUser.main(RandomUser.java:9)

      1  "main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]    java.lang.Thread.State: RUNNABLE
	at java.security.SecureRandom.next(SecureRandom.java:452)
	at java.util.Random.nextInt(Random.java:189)
	at RandomUser.main(RandomUser.java:9)

Here is what’s inside the prof.sh script:

#!/bin/bash

P_PID=$1
P_NID=$2

if [ "$P_SLEEP" == "" ]; then
  P_SLEEP=0.5
fi

if [ "$P_CNT" == "" ]; then
  P_CNT=10
fi

echo Sampling PID=$P_PID every $P_SLEEP seconds for $P_CNT samples

if [ "$P_NID" == "" ]; then
  CMD="awk '//'"
else
  CMD="awk '/ nid='"$(printf '%#x' $P_NID)"' /,/^$/'"
fi

for i in `seq $P_CNT`
do
  jstack $P_PID | eval $CMD
  sleep $P_SLEEP;
done |
  awk ' BEGIN { x = 0; s = "" }
    /nid=/ { x = 1; }
    // {if (x == 1) {s = s ", "gensub(/<\w*>/, "<address>", "g") } }
    /^$/ { if ( x == 1) { print s; x = 0; s = ""; } }' |
  sort | uniq -c | sort -n -r | head -10 |
  sed -e 's/$/\n/g' -e 's/\t/\n\t/g' -e 's/,//g'

The idea of the script is based on the method from poor man’s profiler adapted for HotSpot thread dumps. The script does the following things:

  • Takes $P_CNT thread dumps of the Java process ID passed as $1 (10 by default)
  • If a native thread ID has been supplied as $2, then searches for the thread stack of this thread in the thread dump
  • Concatenates each thread stack trace into a comma-separated string
  • Aggregates strings and sorts them by the number of occurrences
  • Prettifies the output: removes tabs, commas, and adds new lines back to the thread stack

Conclusion

With a few little things it is possible to understand quite a lot of things in almost any situation related to Java: you can find out the most frequent stack trace by sampling thread dumps.
With this knowledge it is then easy to understand why an issue happening. In my test case, the application instantly generates random numbers without a pause, and 1 thread is occupying 1 CPU core.


Viewing all articles
Browse latest Browse all 14

Trending Articles