Professional Documents
Culture Documents
com/
giladmanor@yahoo.com
Data buckets
For me, software development is just a nice way of saying ‘bit moving’. A
good friend of mine used to describe himself as a bit reorganizer. We
rearrange invisible magnets, he would say, setting their tiny arrows of
residual currents to point this way or that. We are a bunch of “bitniks”
and we are all about data.
It occurred to me, that the Eskimos had their fourteen words for snow, and they say
that the Bedouins have nine words to describe sand. I felt so alone. I felt a need for
discovering my own flavors of data. It took me a while, but then in a single perfect
moment of clarity, I had realized what lay before me.
The orchestration of the moment was this; in the middle of a design meeting, yelling
and shouting all around, we were discussing optimization and performance and
spirits were high. My thoughts went back to when I have learnt about application
design. The fact was that when developing any business application, the first step
to take is to determine the set of business flows that describe the scope and
functionality of that application within the organizations it’s meant to serve.
Listing these business-flows by rank and cardinality is no bother at all. The simplest
evaluation I could think of is according to frequency of use and the sheer number of
users that would eventually use the flow.
I thought that the categorization of the data body by the same yardstick could
provide me with the flavors of data I was looking for.
http://giladmanor.blogspot.com/
giladmanor@yahoo.com
Java is a wonderful language, my favorite actually, versatile and strong. In the
context of this discussion (!), Java has one drawback; the Java Virtual Machine is
located far, so far away from the data acted upon.
Unlike the hieroglyphic COBOL, java needs special machinery to access its data. In
this case, the number of solutions testifies for the complexity of the problem.
It’s safe to say that there are absolutely no free meals. Every solution ever invented
to accommodate the data access issue, bears with it its own cost and complications.
Careful mapping of the data orientation by category and flavor might reduce the
friction in complex systems that depend on the availability of massive bulks of data.
And here I am getting to my point: The mapping of the data reduces the friction in
complex systems and mapping of the data needs more flavors.
To make it interesting, I will refer to the use of external services and fixed
configuration.
It is easy to see that for the hierarchy of the business flows, the main workflow is for
the selling an insurance policy (stitching the customer to the product), followed by
the work flow for managing the customer base. Far behind would be the work flows
for creation, versioning and maintaining of the product list.
In my example application, this would be the data that is handled in the policy
selling work flow. The data consists of the stitching tables between the products and
the customer. The stitching tables may also describe a single shopping cart or a
single contract with the customer.
http://giladmanor.blogspot.com/
giladmanor@yahoo.com
In many cases the data that is added or modified is within the boundaries of a single
session there may be no reason for cache optimization.
In cases where concurrency of change for the same data is permitted, the
synchronization between sessions should be handled with great care and
understanding of the business implications. It’s important to remember that
deadlock issues are ten times easier to handle from the business standpoint then
technologically.
There are several solutions for second level caching, just to name a couple, there is
the EH cache project which I’m using in the product I’m working on and I have also
heard about the terracotta project.
In the example application, the customer base management work flows rank in
second. The customer base is modified intensively, when additional customers are
introduced to the database or when existing customers change status and detail.
Having two possible concurrent sessions (the main business session and the
customer data base update session) both accessing the customer data requires
special attention and awareness to the business implications of the concurrent
modification.
On top of that, the relevant product for the main business flow for selling insurance
policies is the subset of products that are complete and ready to be sold.
This implies that in relation to the selling work flow, the product list is static.
The caching implementation for the product list could then be very simple. A cache
pocket for the product list could be refreshed by messaging or time based, while
remaining static as far as the main business flow is concerned.
System configuration
Data that is retrieved from system configuration, property files, XML data structures
or tables belongs to the data bucket that reflects changes only when the system is
rebooted.
Relating to the example application, this bucket may contain anything between
configuration of connection pool sizes and i18n (internationalization bundles).
In my view, the special attention has to be for deciding what not to cache.
My way to relate to cache optimization for external data services is by using the
same ranking method as before but to devise which of the services caching is
irrelevant and for which it would be beneficial.
For the insurance example, I would never cache services that have a narrow scope
of relevance; a service that validates bank accounts is too volatile to cache. On the
other hand I might consider caching data for age group premium rates.
My view of best practice in optimization in the case of external data service would
be to exclude the task of updating the cached data from the thread that services
the business flow. Instead, I would consider maintaining an independent thread that
would checks for data modifications every once in a while, and updates the cached
data independently
http://giladmanor.blogspot.com/
giladmanor@yahoo.com
My Eskimo vocabulary
The basic motivation for refining the resolution in data terminology is for the
optimization of cache implementations.
There is an Eskimo saying that pre-mature optimization is the source of all evil (no
its not, I made that up). In most cases this is absolutely true, but I would like to
argue that cache optimization is elementary to the degree that has to be addressed
in early stages of application design.
But in any case, I have found my little Eskimo vocabulary for data in an application
and I am happy.