You are on page 1of 24

JBoss Data Grid 6 Implementations with OpenJDK

John Osborne
Middleware Solutions Architect
Red Hat
josborne@redhat.com
November 2014
1

Executive Summary
In recent years, IT departments have been to tasked to deal with the performance and scalability
concerns related to the exponential growth in data. Many enterprise IT departments are turning to inmemory distributed grid (IMDG) technologies such as JBoss Data Grid in order to meet their Service
Level Agreements (SLAs). JBoss Data Grid is an enterprise IMDG built off of the community open
source community project Infinispan; it has helped enterprise IT departments meet their data
performance and scalability requirements by providing features like extremely fast ( O(1) ) read/write
access to data, elastic scaling, high availability, fault tolerance, and much more. When implementing an
IMDG technology like JBoss Data Grid performance is essential to every step in the process from
development to production.
There are several factors when taking into consideration the performance of a distributed cache.
At the hardware level it is now commonplace to see very large machines capable of running JVMs with
large heaps. Java has also evolved to help to meet the demands of large heaps through JVM
enhancements including the release of its new garbage collector (G1) in Java 7. The core of JBoss Data
Grid is a Java application and will require JVM tuning whether running in server mode (accessing the
cache remotely through protocols like REST, memcached, Hot Rod) or in library mode (accessing the
cache via a local Java API). When using large heaps in Java, the JVM itself requires a lot of tuning to
fully optimize the JVM for that custom workload.
This white paper focuses mainly on the considerations and steps that are involved with
maximizing the OpenJDK 7 JVM for JBoss Data Grid workloads. All of the JVM flags that are
mentioned are also supported by the Oracle JDK. While operating system and network tuning are not
the main focus of the paper, a few critical tweaks are also mentioned. By reading this white paper,
architects and developers will be able to tune OpenJDK 7 for a successful implementation of a JBoss
Data Grid application. At the time of writing this white paper Java 8 has been released but is not yet
supported in JBoss Data Grid. This white paper will address several considerations for transitioning to
Java 8 for future deployments. This white paper does not address issues with resolving holistic system
performance. The best practices around maximizing the network throughput, CPU usage, or I/O latency
(using a cache store for cache persistence) need to be analyzed in order to find system bottlenecks. This
white paper assumes that Java components themselves can be tuned to provide optimized performance.
This white paper also assumes that the reader already has a basic understanding of IMDG technology
and JBoss Data Grid architectures.
2

Introduction
JBoss Data Grid is an enterprise class, open source, in memory data grid platform. It is
implemented as a distributed in-memory key-value NoSQL store. JBoss Data Grid can be configured to
run in several difference architectures; the cache supports local, invalidation, replicated, and distributed
modes. Most frequently, it is deployed as a data tier to improve throughput to an application by
offloading I/O from a persistence data store. JBoss Data Grid supports several SQL and NoSQL data
persistence options. The data grid itself acts as a schema-less key/value store allowing the ability for
applications to store distinct objects without having a rigid fixed data model.
The data grid is typically run on commodity servers to form a cluster that can scale up and
down elastically based upon the fluctuating needs of the application. By scaling elastically in replicated
or distributed mode, the grid is high scalable and fault tolerant by having no single point of failure. In
the event of a server failure, JBoss Data Grid provides full data integrity through its support for the
Java Transaction API (JTA). It also supports parallel processing through its Map Reduce Framework
and the ability to run tasks on all or some of the nodes through its Distributed Execution Framework.
There have been a broad range of sectors that have implemented JBoss Data Grid technology to
power applications, including telecommunications, e-commerce, mobile, gaming, and financial
services. Regardless of the sector, the main reason to consider a data grid is performance. When
performance is critical it is necessary to pay close attention to the run-time components in order to
maximize the value of JBoss Data Grid.
The technology itself can be embedded into a Java application by including the appropriate JAR
files and instantiating the components programmatically. It can also be used as a remote data grid by
running remote JBoss Data Grid nodes in a cluster and connecting over a socket. In both scenarios the
technology itself is Java application running within a JVM. The performance of Java in the last decade
has grown with every release in large part to the new compilers and garbage collection algorithms as
well as implementing several JVM optimizations and tuning several JVM flags. The performance of
the JVM is highly influenced by tuning these JVM flags. Unfortunately, optimizing Java often requires
not only expertise about the Java APIs but also the JVM tuning flags. This paper attempts to relieve the
JBoss Data Grid developer, architect, or admin from the guess work involved with making such
optimizations. The most up-to-date JBoss Data Grid documentation should always be referenced as the
official Red Hat word in tuning JBoss Data Grid. This white paper will complement that documentation
3

by providing guidance tips and lessons learned.

Capacity Planning and Sizing The Java Heap


Proper sizing of the Java Heap is basic, yet critical, part to configure JBoss Data Grid. An
incorrectly sized Java Heap will often lead to poor performance. The most common issues resulting
from a poorly sized Java Heap are increased latency, reduced throughput, and potentially out of
memory exceptions. Proper capacity planning is crucial to sizing the Java Heap.
Capacity planning is a several step process:
1. Determine if it is appropriate to run a replicated or distributed cache. A replicated cache will
allow for the most redundancy. A distributed cache allows for redundancy while providing the
ability to scale horizontally outside the size of individual heap sizes.
2. Assess which portion of the data set for the application needs to held in memory. If the data set
is expected to grow that will need to be taken into account.
3. Decide how many nodes will run within the JBoss Data Grid cluster, running one JVM per
server. Running several JBoss Data Grid nodes in a distributed or replicated configuration will
allow for fault tolerance of the application. By running one JVM per server, context switching
and contention will be reduced on the server and the additional cores on each node will allow
garbage collection to be maximized. Java garbage collection is very efficient and processors
usually hit 100% utilization during garbage collection so it is important that several JVMs are
not in contention for the same processors. This is especially true for heaps greater than 32GB as
the garbage collectors will require additional processing power. If you are running a distributed
cache, take into account the amount of node failures that the cluster is expected to tolerate by
including spare capacity in case of node failures.
4. Determine the max size of the Java Heap using the following guidelines:

The data distribution should be balanced (heap sizes should be the same across the
cluster)

There is enough physical memory (not virtual memory) on each server to support the
JVM and other system processes. If the system runs out of physical memory then
swapping will occur which will cause a severe performance penalty. This is most
noticeable during a garbage collection cycle when memory is constantly swapped back
and forth between physical memory and disk. Swapping during a full garbage collection

of a large heap could cause the application to be completely halted for a long period of
time.

Adequate heap space is left for JBoss Data Grid and the JVM. JBoss Data Grid will
require additional space for computations and searches. The JVM will require additional
space for critical tasks like garbage collection, compaction, temporary objects, network
transfer buffers, etc. A good rule of thumb is to keep the heap sized such that only 1/3 of
the heap is full after a full garbage collection. Red Hat recommends that a maximum of
50% of the heap is used for active cache data. In order to make this determination, bring
the application to a normal state and then force garbage collection which can be done in
a number of ways. One easy way is to use jconsole to force a full garbage collection and
then observe how much memory is still in use by the application. Alternatively, if you
are on the command line of a Linux based system you can force a full garbage collection
by running jcmd <pid> GC.run and observe the memory in use of the application by
using top and observing the resident memory (RES). This test will also ensure that you
are sizing the heap based on the actual and not hypothetical memory usage of the
application.

Finally, set the min and max sizes of the Java Heap to the same setting which will disable JVM
adaptive sizing, which can decrease performance and an increase in garbage collection while the JVM
is re-sized.
Here is one example:
- Distributed Cache
- Estimated Data Set to be held in memory = 128GB
- 8 node cluster which is able to tolerate 2 node failures. This will require 2 redundant nodes
(numowners=3 in cache configuration).
- Largest estimated data set in each JVM will equal:
Total Data Set
Node Failures
(1+
)
Nodes
NodesNode Failures
128 GB
2
( 1+
)=21.33 GB mininum heap=42.66 GB , recommended heap=64 GB
8
82
- Adjust accordingly based upon actual memory usage utilizing jconsole.
- Verify that there is enough physical memory on the machine to avoid swapping.
In this example the recommended JVM Settings would be -Xms64G -Xmx64G.

Why Tiered Compilation Is Critical


The Java compiler has two main tasks; enable the code to be platform independent and ensure
high performance. Enabling platform independence is accomplish through the use of a static compiler
like javac which takes Java code and translates it into bytecode which can be executed anywhere by the
Java Virtual Machine. Ensuring high performance is not no simple. It is accomplished first by
optimizations in the static compiler but then even further compiled into machine language by a
dynamic compiler.
The dynamic compilers in Java are typically called just-in-time (JIT) compilers and are critical
to the high performance that Java has maintained in recent years. JIT compilers improve performance
by compiling bytecode into machine language which typically leads to about a 10X in code
performance. It compiles bytecode into machine language on the fly as code is executed based upon
environment and run-time profile data. JIT compilers will optimize and recompile code over time as
more code profiling data is available to the JVM. This allows the JVM performance to adapt over time
to changes in the application load or environment. The JIT compiler can have a drastic impact on
performance. Selecting and tuning the JIT compiler is a crucial part to optimizing JBoss Data Grid.
OpenJDK 7 uses technology from Oracle's HotSpot JVM for JIT compilation. The name
HotSpot originates from the sections of the code that are compiled. In a typical Java program, 90
percent of a program's processing time is spent executing 10 percent of the code. The performance of
the application is greatly dependent on how fast that small subset of code can be executed. That small
subset of code is known as the HotSpot. Fortunately, for the developer or architect who is
implementing JBoss Data Grid, maximizing the dynamic compilation of the hot spots only requires
selecting a compiler and making a small modification to its parameters.
The first compiler is the C1 compiler which is enabled the -client JVM flag. The C1 compiler
is designed for smaller applications that have few resources and are usually short lived. The C1
compiler is often beneficial for applications that need to fast start-up times because it dynamically
compiles bytecode very early on in the application's lifecycle, without waiting for extensive profiling
and environment data. Since the C1 compiler compiles byte into machine language very early in the
process, applications usually perform optimal with the C1 compiler soon after the application is
deployed.
Most JBoss Data Grid applications are long-running server-side enterprise applications which
can benefit from even further compilations. Typically for these types of applications, it makes more
6

sense to use the C2 compiler which waits longer to gather more profiling data before optimizing the
bytecode into machine language. This allows the JVM to apply more advanced algorithms and
optimizations when further compiling the code. In all cases, waiting for more profiling data will yield
better results over time, allowing the JVM to provide more complex analysis on which optimizations
paths would be he most beneficial to the JBoss Data Grid application. The more extensive profiling and
analysis will come at a cost of CPU cycles and a larger code cache which will need to be tuned.
Although the C1 compiler yields the best results initially, the C2 compiler typically provides much
better performance in a long running Java application. The C2 compiler is enabled using the -server
JVM flag. This is the default setting for 64-bit versions of Java 7. As a matter of fact, passing -client
to a 64-bit JVM on Linux or Windows (not true on Solaris) will still result in the JVM using the C2
compiler.
One issue with the C2 compiler is that it is not triggered until the code is executed 10,000 times
which is often the reason for a warm-up period when running benchmark tests. The C1 compiler
maintains a lower setting of 1,500. For both compilers it can be modified with the
-XX:CompileThreshold flag. The JVM itself uses a counter to keep track of when the code should be
compiled. That counter decrements over time as it mainly measures how many times the code has been
executed recently. One option is to simply lower the compile threshold but that risks filling up the code
cache with machine language that has not been fully optimized by the JVM.
If a JBoss Data Grid application uses the C2 compiler but has critical code that never hits the
compile threshold then it will be stuck running slow interpreted code. Fortunately, there is a solution to
this problem Tiered Compilation. Tiered compilation combines the best features of both compilers.
The C1 compiler is most active during application start-up and handles optimizations triggered by
lower performance-counter thresholds. The C1 compiler also inserts performance counters and prepares
instruction sets for more advanced optimizations later completed by the C2 compiler. The C2 compiler
delivers more advanced optimizations later in the execution cycle. 64 bit versions of OpenJDK 7 are
packaged with the C1 compiler solely for tiered compilations. Tiered compilation is disabled by default
in Java 7 but enabled by default in Java 8. JBoss Data Grid applications always should always use
tiered compilation which is enabled with the -XX:+TieredCompilation flag.
Tiered compilation is new to Java 7 and requires some tuning to the code cache which is where
the machine language is stored after it is compiled by the JIT compiler. In Java 8 the default code cache
for tiered compilation is 240MB. In Java 7 the default code cache is 48MB for the C2 compiler alone

and 96MB when using tiered compilation. For Java 7 implementations it is best to increase the code
cache which has a fixed size and will not allow for additional compiled code once it is full. If the code
cache is filled, you could get stuck running slow interpreted code. Once, JBoss Data Grid applications
are supported in Java 8 this will no longer be an issue but for now it is best to increase the code cache.
The code cache can be increased by setting the -XX:ReservedCodeCacheSize flag. I recommend using
at least 256MB. To view the size of the code cache you can connect with jconsole and view the
memory pool code cache in the memory tab.
For JBoss Data Grid implementations start with the following parameters and, if necessary, modify the
code cache size as indicated by jconsole.
-server -XX:+TieredCompilation -XX:ReservedCodeCacheSize=256m

Selecting a Garbage Collector


The Java garbage collector's essential role is to free unreferenced memory so that JBoss Data
Grid can allocate memory before running out of space. It also needs to reclaim the memory without
impacting the latency and throughput of the data grid. The performance of the garbage collector is
mostly determined by how fast it can find unused objects, move them to make available memory, and
free memory in a contiguous block so that it can be allocated appropriately by the application. In
OpenJDK 7 the garbage collector will often need to pause the application threads in order to process
garbage collection. These pauses are called stop-the-world pauses and have a large impact on the
performance of JBoss Data Grid.
All of the garbage collectors in OpenJDK 7 split the heap into various regions. The first region
is the young generation which is subdivided into the eden space and survivor spaces. The second region
is the old generation. All OpenJDK 7 garbage collectors will have a stop-the-world pause when the
young generation fills up (minor garbage collection) at which point unreferenced objects are discard
and referenced objectors are moved to empty the eden space. The young generation is then compacted.
Once the old generation fills there will be a full garbage collection which will cause long stop-theworld pauses.
A JBoss Data Grid implementation, like all Java based distributed data structures, requires
careful attention to detail when selecting and tuning the garbage collector. The ultimate goal is to avoid
full garbage collections and to minimize minor garbage collections to the young generation. This is
especially true in replicated or distributed cache configurations. Even with concurrent garbage
8

collection, stop-the-world pauses can still occur due to fragmentation1 or when the garbage collector
can no longer keep up with the application object allocation. During stop-the-world garbage collection
pauses, JBoss Data Grid is completely halted. If this occurs during a data transfer between nodes then
the data being sent across the network will start to fill the network buffers (the next section will show
how to modify these values). If the buffers fill up completely, before the application is resumed, then
packet loss will occur which could result in JBoss Data Grid timeouts2. In the worst case, the stability
of the entire could be affected due to especially long garbage collections. Typically, garbage collection
performance scales worse than linearly, meaning that when a heap is doubled it would take more than
twice as long for a full garbage collection. With very large heaps, a full garbage collection could pause
the application threads for so long that it would cause the cluster to act as if the paused node has been
terminated.
Tuning garbage collection for JBoss Data Grid requires careful attention to detail and some
analysis to see how the application behaves. Unfortunately, there is no silver bullet to garbage
collection as the correct tuning will be dependent on application workloads. The goal of this section is
to assist developers and architects with the tuning and performance considerations of garbage
collection with JBoss Data Grid.
OpenJDK 7 has four available garbage collectors. The first is the serial garbage collector (-XX:
+UseSerialGC) which uses a single thread to process garbage collection and should never be used for a
high performance Java application. Making a decision between the other three garbage collectors will
depend on the configuration, requirements, and performance testing.
The first decision to make is whether you want to use a throughput collector or a concurrent
collector. OpenJDK 7 only has one throughput collector and it is enabled by setting -XX:
+UseParallelGC -XX:+UseParallelOldGC. This is the default collector for 64-bit OpenJDK 7 as it
provides the highest throughput in an isolated Java application. The throughput collector is extremely
efficient because it pauses all application threads during minor and full garbage collections and utilizes
all machine CPUs for maximum garbage collection throughput. When garbage collection is completed,
all of the machine CPUs are used for application processing. The throughput collector fully compacts
the old generation during a full GC in order to prevent fragmentation. JBoss Data Grid applications
1. Fragmentation occurs when the free space in the heap is fragmented into small pieces of memory spread in various locations. When
fragmentation occurs there is often a drastic decrease in performance as the system cannot allocate enough contiguous space for the
application. This causes the garbage collector to work harder and can often lead to full stop-the-world garbage collection.

2. JBoss Data Grid timeouts can be modified in the cache configuration.

using the throughput collector will often notice a major drawback; since it is constantly stopping all
application threads this will often lead to frequent latency outliers, especially for larger heap sizes. In a
distributed data structure like JBoss Data Grid, the constant application stop-the-world pauses could
also cause packet loss. If a node attempts to re-balance by sending entries in the data grid to a node that
is stopped for garbage collection the socket buffer will fill until either the machine resumes processing
or the socket buffer runs out of space. This could lead to JBoss Data Grid timeouts. Furthermore, since
the throughput collector frequently incurs a stop-the-world pause for a full garbage collection, the
performance will degrade with larger heaps since the stop-the-world pauses will be longer.
Due to the throughput collector's inherit nature to purposely and frequently stop all application
threads there is a limited use case for it in JBoss Data Grid. It can be very fruitful for developers and
architects who value total throughput over predictable latency in limited cases where there is a smaller
heap (< 4 GB) and large network transfers are not expected to occur. JBoss Data Grid implementations
without large network transfers are typically read-heavy configurations with minimal distributed
preloading requirements. Large network transfers from preloading or write-heavy caches can cause
timeouts during long pauses for garbage collection. Batch jobs can also cause several long pauses and
should be avoid with the throughput collector. In most cases it is best to use a concurrent garbage
collector. One final caveat to throughput collectors is that concurrent collectors assume are highly
multi-threaded. If you are implementing a high performance application with JBoss Data Grid it should
be assumed that you have a multiprocessor machine capable of handling simultaneous application logic
and garbage collection. If for some reason you are running JBoss Data Grid on a machine that is CPUlimited then the throughput collector may be able to compensate for the limited available CPU
resources. However, in most cases it is best to go with a concurrent collector.
If you have decided to use a concurrent garbage collector then your options in OpenJDK 7 are
either to use the Concurrent Mark Sweep (CMS) collector or the Garbage First (G1) collector, which is
new to OpenJDK 7. Both concurrent collectors make it possible to concurrently execute garbage
collection in the background with minimal stops to the application threads. Both concurrent collectors
use additional CPU overall since they are processing garbage collection in the background but the
application will often experience fewer and shorter pauses. This is very important when running JBoss
Data Grid in a distributed or replicated configuration.
The CMS collector is enabled with the following flags: -XX:+UseConcMarkSweepGC -XX:
+UseParNewGC. As with all garbage collection algorithms in OpenJDK 7, it will still stop all

10

application threads during a minor garbage collection of the young generation. CMS collector tries to
avoid a full garbage collection by using background threads to periodically scan through the old
generation and discard unused objects. Typically, CMS will only stop application threads during a
minor garbage collection and for very short periods while collecting the old generation. However, a
poor tuned CMS collector can still result in extremely long pauses which can hurt the performance of
JBoss Data Grid. In CMS, the background threads need to scan the entire old generation before freeing
objects. The time to scan the old generation is dependent on heap size so if the old generation runs out
of adequate space before the scanning is completed then a concurrent mode failure occurs which will
result in a drastic drop in performance. This often happens when the background threads do not have
enough CPU resources to complete their tasks in a timely manner. Since background threads cannot do
compaction a concurrent mode failure will also occur if the old generation gets too fragmented. In the
case of a concurrent mode failure the CMS collector will stop all applications for a full garbage
collection which will be done by a single thread! This occurs often on large heaps when the CMS
background threads have trouble keeping up with the heap size. If a concurrent mode failure occurs
during large network transfer it will almost certainly result in a JBoss Data Grid timeout. For this
reason, it is best to avoid CMS with larger heap sizes unless it is tuned and tested properly. A well
tuned CMS implementation may provide better throughput than the G1 collector. However, the worst
case performance with CMS will often be worse than using G1 due to risk of a concurrent mode
failure.
Fortunately, OpenJDK 7 fully supports the G1 collector which was designed to do a better job
handling background garbage collection for large heaps. Heap fragmentation can still occur with the G1
collector, but it is much less likely since the G1 collector can compact the heap in the background. The
G1 collector is enabled with the following flag, -XX:+UseG1GC. Like, CMS, G1 still stops all the
application the threads for a minor garbage collection of the young generation. G1 divides the old
generation into regions and utilizes those regions to compact part of the old generation as it goes which
makes it less likely for heap fragmentation. The G1 increase in performance for larger heaps comes
with a trade-off of overhead. G1 requires more CPU resources than CMS. G1 may also require very
short pauses for compaction of various regions of the heap. However, the pauses are usually much
shorter than CMS full garbage collections as its only compacting a region of the old generation and not
the entire old generation.
The G1 collector makes a best effort to divide the heap in a way that will meet a user defined

11

latency (XX:MaxGCPauseMillis) which defaults to 200ms. For large heaps, its best to increase the
latency to 500ms or 1000ms. The G1 collector should be able to keep up with that configuration. One
final benefit to the G1 collector over the CMS collector is that it requires far less tuning. As previously
mentioned, JBoss Data Grid does not yet support Java 8, but there have been many improvements in
the G1 collector for Java 8 as well that will be taken advantage of in future release.
The best choice of a garbage collector for a JBoss Data Grid implementation will vary per use
case and can only be found by experimenting. However, most developers and architects of JBoss Data
Grid are going to be running larger heaps and therefore the G1 collector should be the collector of
choice for high overall throughput and good worst case performance. If the JBoss Data Grid
implementer is not as concerned with worst case performance and is more concerned with total
throughput than a well tuned CMS implementation may be a better fit. A good way to start tuning G1 is
to set the max XX:MaxGCPauseMillis to your pause time for the worst case SLA. The throughput
collector will be beneficial in limited use cases and the CMS collector will be beneficial for small to
medium sized heaps, although it must be tuned to avoid single-threaded concurrent mode failures
which could result in pauses of 10-30 seconds for large heaps that have not been tuned.
The default garbage collector in OpenJDK 7 (the throughput collector) has a very limited use case in
JBoss Data Grid. In nearly all cases it is best to use one of the concurrent collectors. A well tuned
CMS implementation may give the highest throughput but it will require lots of testing and tuning and
will often give poor worst case performance. The G1 collector will provide high throughput without all
the tuning, in addition to improving on the worst cases performance. Most developers and architects
will best be served by starting with the G1 collector and set the worst case pause time to meet their
SLA with the following parameters:
-XX:+UseG1GC XX:MaxGCPauseMillis=<x>ms

Tuning the garbage collector to avoid JBoss Data Grid pauses


For the limited JBoss Data Grid use cases for the throughput collector, there is fortunately very
little tuning. Simply by setting a static heap size, and thus turning off adaptive sizing, you will get the
maximum performance (or very close to it) with the throughput collector.
For use cases in which the CMS collector is used there is some important necessary to avoid
concurrent mode failures. As mentioned in the last section, concurrent mode failures are single threaded
and very expensive. CMS needs to be tuned to avoid these failures. One way to avoid concurrent mode
failures is to simply make the old generation larger by making the heap bigger or changing the ratio of
the young generation to the old generation. Another way to avoid concurrent mode failures is to
12

increase how many threads (and how often they run) to keep the garbage collector one step ahead of the
application. The easiest and most effective way to avoid concurrent mode failures is to make the old
generation larger. This can be done by tuning the -XX:NewRatio flag which is the ratio of young
generation:old generation. The default setting is 2 which indicates that the old generation is 2 times the
young generation, or 2/3 of the heap size. By increasing this setting it will reduce the size of the young
generation and increase the size of the old generation. This will in turn cause more frequent minor
garbage collections but will help prevent concurrent mode failures. Note that in cases where concurrent
mode failures are not incurring, if you increase the new ratio (and decrease the young generation) you
may be decreasing total throughput since you will be inducing more frequent minor garbage
collections. JBoss Data Grid will require young generation space to do computations, so you cannot
simply calculate the young generation space required from your application. Adjusting the new ratio
will require testing and tuning to find the ideal configuration.
Using the CMS collector is often a large balancing act. By setting a larger young generation size
(decreasing -XX:NewRatio flag), you will often increase the total throughput unless a concurrent mode
failure occurs. By setting a larger old generation size (increase the -XX:NewRatio flag), more frequent
minor garbage collections will occur due to a smaller young generation, which may decrease total
throughput. However, by instantiating a larger old generation you will give the cores more time to run
garbage collection threads in the background and thus greatly reduce your change of inducing a
concurrent mode failure.
JBoss Data Grid implementations that use CMS may also best be served by adjusting the Eden
space size. Internal testing at Red Hat by the Quality Assurance team also shows that you can improve
performance even further with CMS by increasing the Eden space size. Since, JBoss Data Grid itself
utilizes short lived buffers and other collections to do internal computations, it is often best to increase
the default size of the Eden space within the new generation. OpenJDK 7 defaults the
-XX:SurvivorRatio=8 which sets the ratio of the survivor spaces to the eden space as 1:8, meaning that
each survivor space will be 1/10th the young generation. For larger young generation sizes, you can
push the limits even further by doubling or quadrupling the -XX:SurvivorRatio to 16 or 32 which will
decrease minor garbage collections and increase total throughput.
In addition, make sure to turn off adaptive sizing (min heap is set to max heap) or else the JVM
will modify the new ratio in execution behind the scenes.
By default, the CMS collector does not collect permgen. Permgen often fills up when the same

13

archive (JBoss Data Grid application) is deployed multiple times to the same running environment. If
the permgen space fills up a full garbage collection will be executed. In order to turn on permgen
garbage collection in CMS use the following flags: -XX:+CMSPermGenSweepingEnabled -XX:
+CMSClassUnloadingEnabled.
The G1 collector, if tuned properly, should not experience a full garbage collection. G1 will still
stop JBoss Data Grid application threads for minor garbage collection and for some of the concurrent
garbage collection cycles to compact regions in the old generation. Similar to CMS, increasing the size
of the old generation will reduce the chances of a full garbage collection. However, unlike CMS, G1
has much more efficient internal algorithms to maximize the generational sizes based upon the
-XX:MaxGCPauseMillis flag. If you set the -XX:MaxGCPauseMillis to a value that the G1 collector
cannot keep up with then it will reduce the young generation size automatically which will result in
more frequent minor garbage collections but lower the chance of a full garbage collection. However,
with a -XX:MaxGCPauseMillis target that is set too low, the amount of old generation regions that can
be collected and compacted will be reduced increasing the chance of a full garbage collection. A higher
-XX:MaxGCPauseMillis will allow the heap to be sized to meet more flexible target which will allow
the G1 collector to produce fewer minor and full garbage collections and increase overall throughput. If
you still see full garbage collections occur after setting the flag to your SLA then increase the size of
the overall heap. If full garbage collection still occurs then increase the number of background threads
given to the garbage collector with the -XX:ConcGCThreads flag. If CPU is not available to increase
the thread count then make the background threads kickoff sooner by adjusting the
-XX:InitiatingHeapOccupancyPercent setting. G1 will only collect PermGen during a full garbage
collection and that cannot be modified.
For JBoss Data Grid implementations use the following guidelines for garbage collection tuning:
Throughput collector turn off adaptive sizing (Xms=Xmx)
CMS turn off adaptive sizing, increase the young generation size large enough to avoid
concurrent mode failures without hampering performance, increase the Eden space size, and
finally use the -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled flags
to enable permgen GC. Always make sure to test any CMS implementation thoroughly to avoid
concurrent mode failures.
G1 turn off adaptive sizing, set -XX:MaxGCPauseMillis to a reasonable target (i.e. 500ms or
1000ms). If full garbage collection still occurs increase the heap size and then add more
threads.

What about PermGen?


14

Historically, OpenJDK has been keeping class definitions and class metadata in the Permanent
Generation (PermGen) space. The PermGen space is being phased out of Java 7 and in Java 8 it has
been completely replaced with Metaspace. When sizing the PermGen space keep in mind that, in Java
7, several things have already been moved out of the PermGen space including class statics and
interned strings.
As previously stated, Java should be tuned to avoid full garbage collections when running JBoss
Data Grid in replicated or distributed mode. Therefore, from a JBoss Data Grid perspective, it is
important to ensure that PermGen space is sized properly. If the PermGen space fills up or needs to be
re-sized it will lead to a full garbage collection including that of the entire heap space.
Fortunately, simple tuning can be implemented to avoid a PermGen induced full garbage
collection scenario. Similarly to when setting the heap space, it is best to turn off adaptive sizing by
setting the initial size equal to the max size. In OpenJDK 7, -XX:PermSize=N and
-XX:MaxPermSize=N. In Java 8, -XX:MetaspaceSize=N and -XX:MaxMetaspaceSize=N.
In order to determine the required Java PermGen space, connect with jconsole to the JBoss Data
Grid container after start-up and record how much PermGen space is required. If developers will be
pushing new archived Java files (i.e. sprint cycles) to the running environment then it is best to increase
the PermGen size even further to avoid full garbage collections. Since most JBoss Data Grid
implementations will be using the G1 garbage collector, it is important to have this tuned properly since
G1 may never collect PermGen space since it only does so during a full garbage collection.
For JBoss Data Grid implementations use the following guidelines for PermGen tuning:
Turn off adaptive sizing for PermGen, -XX:PermSize=N and -XX:MaxPermSize=N.
256MB is a good default for both settings, but will most likely need to be increased for
environments with frequent deployments to the container.

Some More Fine Tuning


Enabling large pages (huge pages in Linux), is another enhancement for a high performing
JBoss Data Grid implementation. Increasing the page size will increase the likelihood that accessing a
page of memory can be done so via the fast translation lookaside buffer (TLB) as opposed to the slower
global page table. The TLB can only reference a limited number of pages so by increasing the size of
the pages, it will increase the TLB hit rate.
Many JBoss Data Grid implementations use large heaps which makes setting large pages to be
15

even more important than with an average sized Java application. You can enable the JVM to use large
pages with the -XX:+UseLargePages flag, however the operating system must also be configured to
support large pages. On Linux, large pages is not enabled by default. If the operating system supports
large pages you can run a simple test to verify its configuration. Execute the command java
-Xmx<JDG max heap size>g -XX:+UseLargePages -version. If large pages is misconfigured you will
see the following error OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory (errno
= 1).
Some Linux configurations support transparent huge pages (THP) which allows users to do
away with several of the operating system level tuning and Java parameter(s) required to use huge
pages. Large Java applications can suffer from performance with transparent huge pages and it is best
to disable this setting with JBoss Data Grid configurations.
Enabling Thread Local Allocation Buffers (TLAB), can also offer a boost for some JBoss Data
Grid applications. This setting allows threads to have their own allocation space in Eden and will
reduce contention on shared memory resources. When TLAB is enabled, object allocation from threads
is very fast and efficient. However, since more memory is being allocated in Eden, it will lead to the
young generation being filled up faster and thus more minor garbage collections. If the JBoss Data Grid
implementation will be utilizing a significant amount of memory for each thread then enable TLAB
with the setting XX:+UseTLAB. If needed, you can modify the default allocation size for each thread
with XX:TLABSize.
Compressing the Java object pointers (-XX:+UseCompressedOops) can also help limit the
memory footprint of object printers and reduce garbage collection. In OpenJDK 7, this setting is
already enabled when the max heap space (Xmx) is set to less than 32GB. Since CompressedOops only
compresses references up to 32GB, it will not be applicable for heaps larger than 32GB.
Additionally, modifying socket buffer sizes are a critical component of JBoss Data Grid fine
tuning. If the JBoss Data Grid server is paused for garbage collection during a network transfer then the
socket buffers will buffer the data. If the buffers fill up, then packet loss and JBoss Data Grid timeouts
can occur. Therefore, it makes sense to increase the size of the buffers.
The socket buffer sizes need to modified for the send window and the receive window. The
JBoss Data Grid recommended size is to set the receive window to 25MB and the send window to
1MB. Start by modifying the operating system configuration. This is not required for Windows users,
but for Linux users, this can be done by executing the following as root:

16

sysctl -w net.core.rmem_max=26214400
sysctl -w net.core.wmem_max=1048576
JBoss Data Grid utilizes JGroups for its peer-to-peer communication library for node-to-node
communication. JGroups can make use of both TCP and UDP network protocols. It is recommended to
use UDP so JBoss Data Grid can leverage UDP multicasting, which allows JBoss Data Grid nodes to
send messages to the entire cluster without creating a TCP connection and handshake separately for
each node. JGroups will also handle message delivery guarantee even with an unreliable protocol like
UDP. Modify the JGroups settings to match the operating system settings. In the JGroups configuration
file ensure the following settings:
UDP
ucast_recv_buf_size 20,000,000
ucast_send_buf_size 1,000,000
mcast_recv_buf_size 25,000,000
mcast_send_buf_size 1,000,000
TCP
recv_buf_size 25,000,000
send_buf_size 640,000
The other JGroups settings will work well for most JBoss Data Grid applications. Some users,
may need to match JGroups characteristics such as thread pool sizes and time-to-live (TTL) with the
specific requires of the application.
Since networking will be the most expensive component in most JBoss Data Grid applications it
is also important to enable jumbo frames, which allows network interface cards (NICs) to send frames
that are larger than the 1500 byte standard and take advantage of greater bandwidth. Many JBoss Data
Grid implementations will be sending larger packets and by increasing the packet size it will reduce
packet fragmentation. Enabling jumbo frames on an existing infrastructure can be a difficult task. To
support jumbo frames the network communication path needs to support jumbo frames end to end
which means configuring the operating system, VMKernel, virtual switches, switches, routers, etc to all
support jumbo frames for ingress and egress networking. Retrofitting existing infrastructures for jumbo
frames may not be realistic, but it will likely be beneficial for JBoss Data Grid applications especially

17

those running on 10G+ networks.


Enabling Jumbo Frames will only help JBoss Data Grid performance. Jumbo Frames only
changes the upper limit on the frame size and does not impact the minimum size. Path MTU Discovery
(PMTUD), ensures that if Jumbo Frames is not supported, end-to-end, then it will not be used. PMTUD
occurs at the IP-layer and thus works for JGroups configurations running TCP and UDP. PMTUD must
not be blocked by firewalls or packets could be dropped by switches that do not support the MTU. On
Linux you can test the size of the MTU through two JBoss Data Grid nodes with ping by using the
following command ping -c 3 -M do -s 8068 jdgnode2. Adjust the packet size as appropriate,
subtracting 32 bytes for the packet headers.
For JBoss Data Grid implementations:
- Always enable large pages (-XX:+UseLargePages) at the JVM and operating system level
- For threaded applications that allocate a lot of local objects, enable thread local allocation buffers
XX:+UseTLAB
- CompressedOops is already enabled by default and only applicable for heaps smaller than 32GB.
- If possible, enable Jumbo Frames, especially on 10G+ networks.

Reducing the JBoss Data Grid Application Memory Footprint


JBoss Data Grid applications are high performing in memory data grid solutions which should
not solely rely on the JIT compiler and optimized garbage collection for efficiency. Building memory
efficient Java applications is important, especially with large heaps and distributed configurations. Here
are some good guidelines for creating a memory efficient Java application.
1. Use smaller instance variables. This is a simple but effective to reduce the footprint. If you can
use a byte (1 byte) instead of a long (8 bytes) it can be a big gain for large distributed
applications creating millions of objects.
2. If possible, use a character array (char[]) over a String. Java Strings have 32 bytes of overhead.
Therefore, the total size of a Java String equals 32 bytes + (number of characters * 2), rounded
up to a multiple of 8. An 8 character Java String (64-bit) would consume 64 bytes. In Java,
arrays have 16 byte overheads. A character array of 8 characters would consume 16 + 8*2 = 32
bytes. In this scenario, a character array would consume half as many bytes as a Java String.
Another option is to use a byte[] with UTF-8 encoding and save an additional 8 bytes.
3. For frequently duplicated Java Strings, use String interning to ensure that you are not wasting
space storing the same object repeatedly. In a replicated or distributed JBoss Data Grid this will
18

only be valuable for duplicate Strings on the same nodes. By default, JBoss Data Grid stores
objects unserialized within the JVM unless the option StoreAsBinary is enabled.
4. Replace wrapper objects with instance variables where possible. Wrapper objects have
overhead. For instance, a double consumes 8 bytes while Double consumes 24 bytes (2/3 JVM
overhead).
5. Use less objects. Even generic objects cost 16 byte overhead at minimum, plus 8 bytes for every
reference to that object. More complex objects can frequently add 100-200 bytes in overhead.
6. Use arrays and array based data structures over collections when possible. For example,
ArrayList has significantly less overhead than LinkedList over HashSet.
7. Avoid allocating the same object multiple times.

The most common scenario is when this happens is when developers iteratively create
temporary objects in a loop. If it is a mutable object just create it once prior to the loop.
For immutable objects like Strings and Wrapper objects, be aware that using the same
object and modifying it is actually creating a new object. For custom objects, implement
a reset() method which will allow for object reuse. For large loops, minimizing object
creation will have a high impact on the health of the JVM and will in turn reduce
garbage collection.

The second most common scenario is multi-line String concatenation which results in
the compiler generating a StringBuilder object for each line. The best practice is to
create the String all on one line or explicitly use a StringBuilder. If possible, avoid
String concatenation in a loop.

8. Consider lazy instantiation for infrequently used operations. If you have a method that returns
an expensive object and is only called on a small minority of instantiations then it makes
since to compute the return value dynamically. This will not make sense for common
operations since memory will not be saved.
For JBoss Data Grid implementations, always use Java best practices for minimizing memory
footprint.

Monitoring JVM Health with JBoss Data Grid


It is important to monitor the JBoss Data Grid system from end-to-end including the JVM
Health, System Memory, CPU, Disk I/O, and networking. This section will focus specifically on the
19

JVM Health. The two most important aspects of JVM Health with respect to JBoss Data Grid is
memory consumption and garbage collection performance. OpenJDK 7 includes several useful
monitoring tools including jconsole, jstat, jvisualvm, jhat, and jmap. These are critical tools for
monitoring the health of JBoss Data Grid.
Memory consumption needs to be closely monitored. Red Hat recommends that at most 50% of
memory is consumed by application data so that the rest of the space can be optimized for searches,
computations, and JVM management. This limit may vary from configuration to configuration. Static
caches may be able to add slightly more data to their heap, while search and computation intensive
applications may only be able to fill their heap with 1/3 application data before noticing a drop in
performance. The total heap space consumption should be monitored on periodic intervals to ensure
that the application data is not growing past 50% of the heap. If it does, there will be contention for
other operations like garbage collection. Insufficient memory will likely cause prolonged processing
spikes, severe latency, and overall instability in the cluster. Distributed configurations will need to have
been sized properly to include not only the individual node sizing but also to leave space capacity in
order to withstand the loss of nodes in the cluster.
In most situations, reductions in JBoss Data Grid response time directly correlate to garbage
collection pauses. Monitoring the memory is crucial in how it affects garbage collection activity. A well
tuned garbage collector (especially G1) will rarely pause the JBoss Data Grid application for a full
garbage collection, but minor garbage collections will cause frequent small pauses in the application. If
there is a high object allocation rate, the young generation will quickly fill up and cause a minor
garbage collection pausing the JBoss Data Grid application. Although the minor garbage collection is
very short, it is important to minimize the number of minor garbage collections. From a memory
perspective, use jconsole and look at the JVM old generation memory dashboard. If the old generation
grows continuously and then drops off after a full garbage collection then its possible that the young
generation is too small, causing temporary objects to get moved into the old generation. This issue will
also be noticed by a high rate of minor garbage collections. This is a common issue when using the
CMS collector which requires more manual tuning. In that case, if you are using the CMS collector,
decrease the -XX:NewRatio setting to allow for a larger young generation. If the application itself is
simply creating too many temporary objects, this may also be a good time for a code review.
A good way to quickly look at the health of the JBoss Data Grid application is to look at a
histogram of the heap distribution. OpenJDK 7 has a couple ways to do that from the console using

20

jcmd and jmap. Generating a histogram will show the instance count and total size for each class in the
application. This can be very helpful in identifying memory eaters without doing a full heap dump
which takes time to analyze and requires a lot of spare disk space. By generating trending histograms
you can also quickly see if there are memory leaks in the application. Here are two commands to
generate heap histograms in OpenJDK 7:

jmap -histo <pid>, or collect a full garbage collection prior with jmap -histo:live <pid>

jcmd <pid> GC.class_histogram


When unused objects are still referenced by the application it will not be garbage collected and

will cause a leak. If you are putting a lot of items in the cache that may only be needed for a short
period of time, make sure to utilize expiration and eviction policies to keep the cache fresh.
Sometimes a deeper analysis of the JBoss Data Grid application is required at which point a full
heap dump will be required. There are several ways to create a full heap dump, but it is often easiest to
generate the heap dump either automatically with Java flags or on the command line with either jmap
or jcmd. The commands are:

jmap -dump:live,file=my_jdg_stack.bin <JDG pid>. By adding the live parameter full


garbage collection will be forced before the heap dump.

jcmd <pid> GC.heap_dump /path/file.hprof. By default a full garbage collection is forced


before the heap dump.

You can configure the JVM to automatically create heap dumps before and after full garbage
collection with the following runtime flags: -XX:+HeapDumpAfterFullGC -XX:
+HeapDumpBeforeFullGC. If JBoss Data Grid is generating out of memory errors you can also
automatically generate a heap dump with the flag -XX:+HeapDumpOnOutOfMemoryError.
VisualVM is an open source Java troubleshooting tool that sometimes is not shipped with

OpenJDK but can be downloaded separately. You can generate and analyze heap dumps using
VisualVM. If you're running JBoss Developer Studio you can also install the Eclipse Memory Analyzer
to process the heap dumps.
The biggest impact on JBoss Data Grid application latency will be full garbage collections
which stop all application threads during the collection. A stop-the-world pause will also impact JBoss
Data Grid throughput and scalability. Always monitor garbage collection times with either jstat or
21

jconsole. Jstat also exposes the collection times using JMX so that it can be monitored with other JMX
solutions. Jstat works from the command line and allows users to see several different views of garbage
collection during program execution. The command jstat -gccause <pid> 1000 will continuously
print valuable garbage collection information to the console. If garbage collection needs to be forced, it
can be done with jcmd using the command jcmd <pid> GC.run. If VisualVM is your preferred tool
then add the VisualGC plugin for visual garbage collection monitoring.
Additionally, you can collect garbage collection logs with Java flags. This process has very low
overhead and can provide a lot of insight into JBoss Data Grid performance so to enable garbage
collection logging even on a production system. For useful logging, add the following Java flags: verbose:gc -Xloggc:filename -XX:+PrintGCDetails -XX:+PrintGCTimeStamps, -XX:
+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime -XX:
+PrintGCApplicationConcurrentTime . Ensure to update the output file information with the Xloggc
flag. Although you can manually review the garbage collection logs, the best way to analyze them is to
use either GC Histogram or GCVIZ which are separate downloads. If you observe a high rate of
garbage collection make sure to review the garbage collection tuning section.
Although slightly out of the scope of this white paper, it is important to also monitor high CPU
utilization and network saturation. OpenJDK concurrent and throughput garbage collectors are very
efficient so seeing the CPUs pegged near or at 100% during a garbage collection is normal but
consistently high CPU utilization could lead to unpredictable behavior. In a replicated or distributed
environment in which there are large network transfers, it is important to look for network saturation.
Consult your system administrator for the network saturation rate compared to the theoretical
throughput for your particular network. You can gather metrics from your local NIC using open source
tools like nicstat which will automatically print the utilization rate. However, you may also need to see
into the switches and routers to get the full picture. If the network is saturated then it will become a
large bottleneck for JBoss Data Grid.
For JBoss Data Grid implementations ensure to monitor the memory footprint and garbage collection
impact on the application. Consistently high CPU usage and network saturation can also cause
unpredictable behavior.

Benchmarking JBoss Data Grid


JBoss Data Grid can be benchmarked accurately following several important guidelines. The
22

first guideline is to benchmark using the target configuration. If JBoss Data Grid will be in production
as a distributed cache, make sure to benchmark it as a distributed cache as it will behave differently
than as a local cache. The second guideline is to make sure you have a warm-up period. As noted in the
compilation section, the HotSpot JVM will generate machine language for the frequently executed
portions of the application. As a general rule you will want wait at least 30 seconds and run through at
least 10,000 iterations of the code before benchmarking it. Third, make sure that memory and garbage
collection parameters have been tuned appropriately. Additionally, always ensure that your tests are
multi-threaded; this is especially true for blocking operations like synchronous puts into the cache.
Determining the ideal number of threads will be trial and error but using 100 threads is a good starting
point for servers with 16 cores. If the CPU is not near maximum capacity then try adding more threads.
Finally, it is best to use bulk operations like putAll() as opposed to put(). This will reduce the number
of messages that need to traverse the network which will greatly increase the throughput. Adjusting the
size of the bulk operation is another trial and error process. Run benchmarking tests several times to
ensure that differences are not random chance. If necessary use t-tests to give the statistical significance
of testing.
Fortunately, there are several open source projects that will assist with benchmarking JBoss
Data Grid. The most important tool is Radar Gun which is a benchmarking tool specifically targeted at
distributed data structures. Radar Gun can measure performance and scalability while helping to
identify and fix bottlenecks in your JBoss Data Grid application. Yahoo Cloud Serving Benchmarking
(YCSB) can also be used to benchmark JBoss Data Grid in client/server mode. YCSB will test for
latency on the remote JBoss Data Grid server. Apache JMeter can also be used to generate load on
JBoss Data Grid in client/server mode.
Benchmark JBoss Data Grid applications using the recommend guidelines. When possible, use open
source benchmarking tools to generate load on the data grid.

Conclusion
Developing and tuning for an in memory distributed cache requires a lot more attention to detail
than a standard Java application. In order to maximize performance, tuning will be required mostly
focusing on heap sizes and garbage collection tuning. Garbage collection pauses will have the greatest
impact on JBoss Data Grid latency. Even minor garbage collections are stop-the-world events in all
Java garbage collectors, including the concurrent collectors. Most JBoss Data Grid applications will see
23

maximum performance using either the CMS or G1 collectors. The CMS collector, if tuned properly,
may provide the highest throughput. However, the G1 collector will provide high throughput without
all the tuning headaches and will also provide a better worst case scenario than the CMS collector. A
finely tuned G1 collector will never stop JBoss Data Grid for a full garbage collection, and therefore is
the recommend garbage collector of choice.
A JBoss Data Grid application that is producing large outliers in response time probably needs
tuning to the heap sizes or garbage collection parameters. While this white paper makes several
suggestions, the only way to truly maximize JBoss Data Grid is to experiment with various parameters.
This white paper can be used as an enhancement to existing JBoss Data Grid documentation to assist in
JBoss Data Grid experimentation. OpenJDK 7 provides several tools to analyze the various segments
of the heap as well analyze garbage collection impact on JBoss Data Grid. When benchmarking JBoss
Data Grid make sure to use the recommend guidelines and leverage open source frameworks like Radar
Gun.

24

You might also like