You are on page 1of 22

S309137

Migrating Applications from Local to Distributed Caching with Oracle Coherence



Contents
Introduction: Oracle Coherence Hands-On Lab ........................................................................................ 3
Base Environment ................................................................................................................................... 3
Quick overview of the application............................................................................................................ 4
Running the application with local cache ................................................................................................. 5
Explore the application. Understand how it works. ................................................................................ 11
Update application to run with distributed cache .................................................................................. 13
Configuration Subsystem ................................................................................................................... 15
Cache Store ....................................................................................................................................... 16
Distributed Cache Worker implementation ........................................................................................ 16
Run application with distributed cache .................................................................................................. 18
Test failover and redundancy ................................................................................................................ 22
Shutdown everything ............................................................................................................................ 22

Introduction: Oracle Coherence Hands-On Lab

In this hands-on lab, you will take an application that loads data from a set of large files, caches the data
in memory, and performs complex queries and updates against the cached data, with changes persisted
to the file. You will then modify this application to cache the data in a coherence data grid and perform
these computations against it, learning the common Coherence API's, configuration files and usage
patterns as you go along. By doing this, you will become familiar with some core coherence concepts
like the configuration sub-system, the read-through and write-through mechanisms for loading and
persisting data to the backend data source, the JMX subsystem, and the use of EntryProcessors for lock-
free concurrent and well performing updates.

For this hands-on lab, we have built a sample application that simulates the access and update of a set
of dictionaries of some outdated (and frankly made up) languages. In the next section, you will get an
overview of the application. The initial version of the application will work against the dictionaries
loaded up in your application's local JVM. We will then walk you through updating the application to
work with the data loaded in a Coherence data grid.

This hands-on lab is focused on Coherence. There is no dependency on a database or an application
server. The knowledge you gain here is directly applicable in use cases where Coherence is used within
an Application Server or fronting a database.

Base Environment
The lab machine provides you with the following environment that you will be using in this lab.
Oracle Coherence 3.5 for Java
Eclipse 3.5 Galileo
JRockit 1.6.0
On the desktop, you will find a folder named S309137 Migrating Applications from Local to Distributed
Caching with Oracle Coherence. This folder contains a shortcut to the lab which you will be working
with. The actual lab folder (accessed from the shortcut) contains scripts to assist you with the lab. There
is also a solutions directory in there, which contains the solutions for this lab (for reference when stuck,
etc).

Quick overview of the application

This sample application allows you to access the following information about words in a set of
languages.
description
synonyms
antonyms
daily frequency of use of each word
the year which the word became obsolete

The words and their metadata (as described above) are stored in a number of zip files in the working
directory. For each language LANG, a zip file called lang-LANG.zip exists and contains an entry for each
word.

The application exposes a command line interface with very simple commands to interact with the
dictionaries.

The section below shows some typical usage.


Running the application with local cache

For your convenience, batch scripts are provided which allow you setup your environment and run the
application. A hint is provided below where double-clicking on the script allows you bypass typing into
the cmd shell.
Every command must be run with the environment properly setup. You will need to do this for every
cmd shell window you open.

set JAVA_HOME=c:\jrmc_3.1
set COHERENCE_HOME=c:\coherence
set CLASSPATH=.;%COHERENCE_HOME%\lib\coherence.jar;%COHERENCE_HOME%\lib\tangosol.jar
set PATH=.;%JAVA_HOME%\bin;%PATH%;

To start, compile the application.
javac -d . app\*.java
Hint: you can just run the app-compile.bat script.

Now generate the dictionaries as zip files containing the words for different languages.This will take
about 5 minutes.
Java -Xms1g Xmx1g app.CreateDictionary
Hint: you can just run the app-create-dict.bat script.

Now run the application using the local cache. Here, the cached data is stored in some Collection objects
on the local java heap.
java -Xms1g -Xmx1500m -Xmanagement app.Main -local
Hint: You can just run the app-local.bat script.

This will open up a prompt as below:
Main>

At the prompt, type help to see the different commands and how to use them
Main> help


Let us get familiar with the application from a user's point of view. At startup, the dictionary has not
been loaded up into memory.

Let us look up information about the word word100003 from the language lang1. Since the dictionary is
not loaded up into memory, the application will retrieve the information from the zip files directly, and
cache just that result into memory.

Main> show lang1 word100003


Now, update the description associated with the word word100003 from the language lang1
Main> update -d lang1 word100003 this new description is cool
Run the show command again to ensure that the cache was updated. Also, check the zip file (lang-
lang1.zip) and look at the entry for word100410, and ensure that the change was written transparently
to the backend zip file. You will notice that updates are slow. Unfortunately, java has poor support for
updating zip files, so we had to completely recreate the zip file for each changed word (this is while
update takes a while).


Load up the whole dictionary for all languages into memory so that we can do some more complex
queries. This may take up to 30 seconds to load all 10 dictionaries.

Main> load

Now, perform a query. Search for words which exist in the language lang1, that are synonymous with
synonym674

Main> find lang1 -s synonym674

Let us do a more complex query. Search for words which exist in the language lang1, that are
synonymous with synonym674 and synonym1093, but opposite (antonym) of antonym948

Main> find lang1 -s synonym674 -s synonym1430 -a antonym2243

For both of these, you got an UnsupportedOperationException because the query support is hard to
implement in our local cache. A database and SQL would have made life much easier but we do not have
access to that for our sample application.
Hint: This is possible and easy using Coherence.

Let us try some more commands.

Look up the languages currently supported in our dictionary
Main> langs

Now look up the stats of the currently loaded cache. This should show you the number of words
currently loaded into the cache for each language.
Main> stats



Explore the application. Understand how it works.

Now, we will go ahead and explore the application to see how it works under the hood.
The application is run from the working directory. For each language LANG, there is a zip file in that
directory called lang-LANG.zip which contains meta-data for each word in the language. Open up lang-
lang1.zip and look at the contents for familiarity.


This working directory is already setup as an Eclipse project. Open up eclipse and import the project into
your workspace. From the file menu, select Import. From the dialog box that comes up, open the
General node in the tree menu and select Existing Project into Workspace. Click Next. Click the
radio button beside Select Root Directory and click the Browse button to select the working
directory. Select the project oow_hol_coherence and click Finish.
Hint: You can just run the eclipse.bat script. The working directory is setup as a pre-configured eclipse
workspace and project.

You should have the project as below:

The java source files all reside under the app directory. Please read through them to get familiar with
what they do.
Record.java This encapsulates a word in a language
Main.java This is the main command line interface that parses the user inputs and calls the
appropriate commands
Worker.java This interface allows us decouple the implementation of the cache from the
application
LocalWorker.java This implementation of Worker uses a local hash map to store the data, and (tries
to) manages the computation itself
Helper.java This contains some shared helper functions
Metrics.java This is a table model containing the languages and memory used for each language
within the cache.
Monitor.java This is a swing UI which regularly gets updated metrics from the cache and displays
as a Swing table.


Once you have a good understanding of the application, feel free to go back to the previous section to
run the lab again and get familiar with the application. The rest of the lab will go a lot smoother once
you get a hang of what the application is trying to accomplish.


Update application to run with distributed cache

Now, the lab gets more interesting. We will walk through a number of steps to update the application so
it runs against the Coherence Data Grid.

In this lab, we aim to achieve the following using Coherence
Store the cached data in multiple external JVMs which look to us as a giant local heap with quick
access
Transparently read the records from the cache, even if they have not been loaded into
Coherence
Transparently let the cache persist updates to the backend data source
Perform queries and computation on the coherence cache
Configure the coherence cluster so that other users on the network do not conflict with us
Efficiently make updates with the minimum amount of network hops
Monitor the application (including the distributed cache) transparently from a single location


To achieve this, we will understand and leverage the following concepts from Coherence
Invocable Maps and Entry Processors
JMX functionality
Read-Through and Write-Through caching strategies
Configuration subsystem
Coherence Services
Partitioned (Distributed) and Near cache topologies

First, define an architecture strategy. In general, Coherence supports using a Partitioned cache where a
number of backups are stored on one or more members in the data grid for failover. In addition, a near
cache can wrap a partitioned cache, so that a copy of the data is stored on the local JVM for each (so
network access may be bypassed). We will leverage both mechanisms for our caching.

The actual words and their metadata are stored in the partitioned (distributed) cache. There will be a
different named cache used for each language. For example, language lang1 will be stored in the cache
called dict-lang1. The list of available languages and some other state we need will be stored in the near
cache (called app-shared). This near cache will actually wrap a different partitioned cache.

We will also use a CacheStore so that read requests to the cache will transparently go to the backend
when a cache miss happens, and updates will also transparently go to the backend zip files when we
want them to.

Configuration Subsystem
Coherence is an extensively configurable system.

At startup, Coherence will find the file tangosol-coherence-override.xml on your classpath. This
configuration is for system-wide settings. In this file, you can configure things like your cluster address,
cluster multicast port, cluster name, etc.

Copy tangosol-coherence-override.xml from the solutions directory into your working directory. Use the
configuration for a local machine restricted cluster, and also define a different distributed cache service
for storing the shared state (e.g. languages).

Once Coherence gets a request to lookup a Cache, it will load up the coherence-cache-config.xml file.
You should configure your caches here, setting up those that will use the distributed (partitioned) cache
separately from those that should use the replicated cache. Copy coherence-cache-config.xml from the
solutions directory into your working directory, and ensure the following:
Two separate cache schemes are defined for dictionary (using distributed scheme) and shared
caches (using near scheme) respectively.
A naming convention is used where caches with names matching dict-* are mapped to the
distributed scheme, while the cache app-shared is mapped to the near scheme.
A cache store is configured which takes the cache name as a parameter. The cache store should
apply to every dict-* cache alone, since these are the only ones which need to load up or persist
data to the backend zip files.

Now that we have gotten the configuration out of the way, let us go about working on the actual
application.
Cache Store
First, create the CacheStore which does the work of reading data during a cache-miss or persisting data
after a cache update. We want an explicit update to the cache to write through to the zip files, but not a
bulk load. This means that every cache put should not write back to the zip files. We can control this by
using a variable stored in the cache itself. The shared cache (app-shared) will be used to store a variable.
During a bulk load, we will set the variable in the cache, and remove it once the bulk load is done. The
cache store will not do a write-through to the backend if the variable is stored. Any other cache puts will
result in a write-through to the backend. Look at the way app\LocalWorker.java reads and writes
individual records to/from the zip files. Implement that same logic in app\DistCacheStore.java.
Hint: Look at the solutions directory for how this is done.
Distributed Cache Worker implementation
Next, create the actual implementation of the Worker interface, similar to the app\LocalWorker.java
implementation, that can interact with Coherence. Call this app\DistWorker.java. In this
implementation, we need to store all variables in the coherence cache, so that anyone in the coherence
cluster can access this. This includes:
languages
dictionaries (words)

Implement the following methods, with guidelines below:
getLanguages store the list of languages in the repl-shared cache and retrieve from there as needed
getRecord Simply retrieve the word from the cache. Coherence will ensure that it checks the
backend zip file if it does not have it, since the cache store has been configured.
update One way to implement this will be to retrieve the Record from Coherence to your local
JVM, make updates on the local JVM, and then send the updated Record over to
Coherence. However, this can cause unnecessary network traffic which could be a
bottleneck if the size of the records is large (especially compared to the change you
want to make). A more efficient way is to send your updates directly to the coherence
node that hosts the data. You achieve this using EntryProcessors and the
InvocableMap.
In addition, this method must be smart enough to write-through to the backend zip
files only when requested. This can be done by setting a variable in the cache before
putting the record in, and the CacheStore will only persist to the zip files if that flag is
set.
bulkInsert For large uploads, it is more efficient to upload to Coherence in batches. Coherence has
a putAll API which can be used for this.
find Unlike the app.LocalWorker (where implementing complex queries without a database
is difficult and un-implemented), Coherence has an extremely powerful query
functionality which can easily simulate complex SQL queries. This is the Coherence
Filters API. An added advantage is that Coherence will run your query in parallel across
all the nodes in the cluster and return your results faster. The more Coherence nodes
you have, the less data each one holds and the less work it does, meaning that your
query performance scales linearly with the number of Coherence nodes.
updateMetrics Coherence has an extensive JMX feature set, where all the management information
can be federated into any number of nodes in the coherence cluster that you deem
should hold federated management information for the cluster. We will leverage this
to keep track of the number of entries in the cache, how these entries are distributed
in the coherence data grid, and how memory is used across the grid.
clear This will remove all entries stored in the cache for a given language



Look in the solutions directory for the full solution for the DistWorker.java implementation.

Finally, update the command line interface app\Main.java to have the -dist command line parameter
switch to using the app.DistWorker implementation. Do this by un-commenting the call that
instantiates DistWorker below.

public static void main(String[] args) throws Exception {
Main m = new Main();
m.worker = new LocalWorker();
for(int i = 0; i < args.length; i++) {
if(args[i].equals("-dist")) {
//m.worker = new DistWorker();
}
}

m.run();
}



Run application with distributed cache

Compile your application as was done in one of the prior sections, by opening up a cmd shell, setting up
your environment and running javac.
Hint: You can run the app-compile.bat script

Now, start up three coherence cache servers, using the command line:
java -Xms256m -Xmx512m -Dtangosol.coherence.management.remote=true
com.tangosol.net.DefaultCacheServer
Hint: You can just run the coherence-cache-server.bat script. Double-click three times to start 3 servers.

This will start up a Coherence cache server configured to expose its management (JMX) information
around the cluster.

Now run the application using the distributed cache implementation, using the command line:
java -Xms256m -Xmx512m -Xmanagement -Dtangosol.coherence.management=all -
Dtangosol.coherence.management.remote=true -Dtangosol.coherence.distributed.localstorage=false
app.Main -dist
Hint: You can just run the app-dist.bat script

This will start the command line interface to your application, setting that JVM as a coherence cluster
member which does not store data, but which collects management information from all the coherence
cluster members and stores it in its JMX MBeanServer. You will thus locally be able to see all the
management information from the whole coherence cluster within your local JVM.
Note that once the data has been loaded into Coherence, you can run multiple clients and have them all
share the distributed in-memory data grid.

At the prompt, run stats which will pop up a Swing UI which updates itself as the Coherence cluster
membership and contents change. From this Swing UI table, you can see the membership of the
coherence cluster and how much of the data each member holds in near-real time (the Swing UI
updates itself every 5 seconds).

Main> stats

As mentioned above, this swing table updates itself every five seconds, with the JMX information which
has been federated at your local JVM. Each row in it represents how the amount of data stored by each
coherence server on behalf of the cluster, and the memory being used in MB. For example, in the
screenshot above, the coherence JVM with node-id 1 is the primary store for 10089 words of lang2,
10075 words for lang1, 9924 words for lang3, and uses 114MB of memory. The next rows show how
much coherence JVMs with node id 2 and 4 hold.
As you add or shutdown coherence cache server, and even as you bulk-load the records into coherence,
monitor this Swing UI and see how Coherence automatically distributes the cached data. Look at
app\DistMonitor.java for the full implementation.
Also, you can open up JConsole or JRMC to look at the rich set of JMX management metrics exposed by
Coherence.
Hint: You can run the jrockit-mc.bat script


Once that is done, run the other commands as was shown in the previous section. A sampling of those
commands is below.
Main> help
Main> langs
Main> show lang1 word100003
Main> update -d lang1 word100003 this new description is cool
Main> load
Main> find lang1 -s synonym674
Main> find lang1 -s synonym674 -s synonym1430 -a antonym2243
Main> exit
To show the beauty of this distributed solution, run another instance of the client. At the prompt, just
run the find command. You will see that it works. Any new client does not have to load up the data,
since all the data and state is stored externally in the distributed cache.
Hint: You can run the app-dist.bat script


Test failover and redundancy

You can test failover and redundancy of your Coherence implementation by shutting down some
instances and bringing others back up. You can do this by typing Ctrl-C in some of the Coherence Cache
Server windows, and/or bringing some new Coherence servers back up.
Hint: You can use the coherence-cache-server.bat script.

Watch the windows of the coherence cache servers that are live. Notice how the coherence cluster
automatically rebalances the data. Watch the Swing UI to see the updated stats on how the data is
rebalanced and memory used in the cache servers.


Shutdown everything

To shutdown everything, type Ctrl-C in all your open windows.

You might also like