You are on page 1of 10

Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

A Modeling Tool for Multidimensional Data using the ADAPT Notation

Peter Gluchowski, Christian Kurze, Christian Schieder


Chemnitz University of Technology
{peter.gluchowski | christian.kurze | christian.schieder}@wirtschaft.tu-chemnitz.de

Abstract relies on a graphical notation that allows writing,


Conceptual models are designed to express user understanding and managing schemata by both
requirements of information systems in a formal and designers and users [1].
self-explanatory manner from a business user’s point The conceptual notation analyzed in this paper,
of view. Furthermore, they should contain enough called Application Design for Analytical Processing
information for IT staff to use them as a basis for Technologies (ADAPT) [2], was first published in
technical deployment. 1996 and offers a set of symbols which facilitate the
Within the domain of OLAP information systems, visualization of multidimensional data structures in a
multidimensional data models are used to translate the mostly self-explanatory way. To the best of our
specific user requirements into multidimensional data knowledge, ADAPT lacks any formal foundation to
structures. Application Design for Analytical date. The present article closes this gap by introducing
Processing Technologies (ADAPT) is one of the a metamodel which contains precise specifications for
appropriate notations to express these models on a admissible combinations and connections of model
conceptual level. Up to the publication of this paper, elements.
ADAPT lacks any formal foundation. We will While conceptual models are to capture user
contribute to solving this issue by proposing a requirements, logical models are to capture
metamodel for ADAPT and a software prototype that implementation aspects. They hide some details of
transforms conceptual data models into logical ones in physical data storage but could be implemented on a
order to ease the development of data warehouse computer system directly. Modeling multidimensional
systems. data structures on a conceptual level reveals the
navigation structures along the dimensions in a self-
1. Introduction explanatory way, but the logical representation, for
example by relational tables, does not; hierarchy levels
and attributes are simple columns without further
Numerous steps must be taken during the metadata.
development of data warehouse systems to accomplish The next step in implementing data models is their
a highly productive realization of user requirements. transformation into physical models. Their concepts
Although the development process is often not oriented describe the details of how data is stored in a specific
to sequential procedure models, but passes various system. Physical data models are meant for computer
rebounds and iterations, the process stages specialists and therefore not further covered within this
requirements engineering and technical design prove paper.
to be of great importance. Regrettably, the transfer Formally correct ADAPT models can be applied as
between the two phases turns out to be difficult, as the a base for the automatic compilation of logical data
qualitative description of business requirements cannot models. However, the transformation into a logical
immediately be converted into a formal technical model does not prove to be trivial as different database
specification. technologies and several realization options compete
Especially in the context of data modeling it is against each other [10]. This paper provides a solution
advisable to apply special techniques supporting the which leads to the automatic transformation of ADAPT
modeling of relevant structures on a conceptual level – diagrams into logical data schemata for diverse target
independently of the database technology eventually platforms.
used – to overcome this issue. These conceptual data The remainder of this paper is structured as
models summarize the results of the information follows: section two introduces basic multidimensional
demand analysis and serve as a foundation for further concepts; section three briefly overviews related work.
developments [8, 9]. Usually, conceptual modeling An introduction to both ADAPT and the metamodel is

978-0-7695-3450-3/09 $25.00 © 2009 IEEE 1


Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

provided in section four. Part five focuses on the understanding, measures describe the schema of facts.
essential conversion steps from conceptual towards Within the given example, the measure ‘turnover’ is
logical data models. The last section discusses the determined by the dimensions ‘product’,
prototype and gives an outlook on future research ‘organizational unit’ and ‘time’. Its data type should be
work. ‘currency’. Picking out individual dimension members
such as product ‘x’, organizational unit ‘y’ and time
2. Multidimensional Data Modeling period ‘z’ allows indexing a single turnover value, the
fact ‘t’.
Two different alternatives exist to model measures.
The ontological discrimination between schema On the one hand it is possible to create one data cube
and instance layer is a fundamental aspect in modeling for each measure. On the other hand, one cube can
multidimensional data structures. This section contain several measures by introducing a measure
elaborates on the conceptual elements used to describe dimension; with each member representing one
multidimensional data schemata [3, 20]: dimensions, measure.
hierarchies, measures, and attributes. They are used to For the remaining part of the paper it is essential
turn business data into actionable information. Indeed, that measures are not isolated from each other. Often,
a simple spreadsheet’s structure is a kind of they are linked via calculation rules. These rules are
multidimensional data structure; a two-dimensional used for the dynamic calculation of dependent facts
one. from the externally given independent facts. Complex
Dimensions represent the essential components of a dependencies within systems of financial control can
multidimensional schema. They are a partially ordered be created in this way.
set of dimension elements (also named as dimension Attributes are an integral part: business attributes in
members or dimension positions), which represent the particular give a deeper insight into multidimensional
dimension’s individual values; the dimension’s data structures. There are two distinct options.
instance. For example, an analysis of a company’s Attaching attributes to a dimension implies that every
turnover might be parameterized by the dimensions single element within this dimension inherits these
‘product’, ‘organizational unit’ and ‘time’. Each single attributes, for example each product and each product
product sold by our company is an element of the group. Applying an attribute to a certain hierarchy
product dimension. level opens up the possibility of attributes that are valid
Dimension elements might be grouped by means of for this individual level only; each product might have
hierarchically ordered levels, so-called dimension a weight whereas a product category will not have a
hierarchies. Within the schema description dimension weight.
elements are condensed into generalized abstract
dimension levels (or hierarchy levels). To refer to the
product dimension’s example, this leads to an 3. Related Work
aggregation of single products into product
subcategories and/or product categories. The examined This section categorizes some of the related work.
levels’ elements are connected by parent-child- Due to numerous studies in this area this can be a short
relationships. Usually this results in tree structures with overview only and makes no claim to be complete.
a root node (top level element), several leaf nodes (leaf
level elements) and multiple nodes in between on the 3.1. Modeling Frameworks
particular levels. Deviations from this ideal structure
will occur in practical use such as parallel hierarchies
Several methodologies are intended to represent
or unbalanced tree structures.
multidimensional data models on a conceptual level.
The semantics of measures are determined by the
They can be categorized into three different types [1]:
semantics of their descriptive dimensions. By spanning
extensions to the Entity-Relationship model,
a spatial structure of orthogonal dimensions and
extensions to the UML, and ad hoc models. Each one
defining cells at the dimension element intersections, a
is appropriate to represent basic multidimensional
multidimensional matrix is created. This is often
concepts but they differ significantly in their ability to
referred to as a data cube or hypercube. The contents
represent more sophisticated concepts such as irregular
of these cells, the so-called facts, are precise numerical
hierarchies.
values of the modeled business measures. Within
Entity-Relationship based models extend the basic
research community there is neither a distinct nor a
ER notation by means of multidimensional concepts.
generally accepted definition of the terms measure,
The ME/R model [4] extends the ER model by adding
measurement and fact. According to our
three elements: a fact relationship set, a dimension

2
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

level set and a rolls-up relationship set. Using this 3.2. Model Transformation
approach, only static data structures can be described;
dynamic and functional aspects are not covered. [5] Some of the above-named papers already include
uses dimensions, fact relationships, cardinalities in possibilities to transform conceptual models into
Martin notation, hierarchies and analysis criteria as logical representations. Several approaches concentrate
well as a ‘xor’ operator to express hierarchy anomalies. explicitly on transforming conceptual models into
Adaptations of UML are exemplified in [6] and [7] logical ones like [5]; and sometimes the other way
as well as the customizations of the model driven round, which is actually more relevant, as seen in [10].
architecture (MDA) such as [8]. The approach in [6] These approaches are often discussed under the term
uses stereotypes to typify classes as cubes, dimension schema evolution. It has gained much interest in both
levels, measures and so on; they are displayed with the research and practice. Therefore, an online
same icons as in the ADAPT notation. [7] proposes a bibliography concentrating on this area is available
UML profile for multidimensional data modeling. The [17]. Currently 418 papers on schema evolution are
MDA approach presented in [8] tries to offer a listed.
framework for all the relevant data warehouse In most cases, the methodologies concentrate on a
components, for example ETL processes, data sources star schema or one of its variants. The prototype
and repositories. The authors extend the UML as well presented in section 5 tries to offer an open architecture
as the Common Warehouse Metamodel (CWM) and which can be extended in order to support more than
use the Query/View/Transform (QVT) language for one target schema.
establishing transformations between different models. A new level of abstraction has been introduced by
Ad hoc approaches raise the level of abstraction by the concepts of model management, as commented in
directly using domain concepts. Therefore, ad hoc section 6. Currently, the cited online bibliography [17]
approaches can be seen as Domain Specific Languages contains 61 papers on this approach.
(DSLs) [18]. They are especially useful if the modelers
themselves are not software developers; language
visualization and ease of use are emphasized. Modelers 4. ADAPT foundations
do not have to cope with stereotyped classes or
variations of entity types; they can intuitively use The ADAPT notation emerged during the 1990s in
multidimensional modeling concepts like cubes and the course of an attempt to create a graphical, business
dimensions. oriented representation of OLAP data models [2]. Due
Therefore ad hoc approaches differ from ER or to its pragmatic roots, the notation lacks any formal
UML approaches by not adapting a certain notation to foundation. Further, by the time of the publication of
new fields of application, but developing a new visual this paper, the modeling language had largely been
language in order to support modelers on a higher level ignored in scientific publishing on conceptual
of abstraction. modeling. To overcome these deficiencies we will
ADAPT ranks among these ad hoc approaches. introduce the basic building blocks of the modeling
There are several other methodologies with a formal language, demonstrate their usage and subsequently
foundation such as the Dimensional Fact model [9]. provide a formal foundation by presenting a UML-
The model consists of some fact schemes whose basic based metamodel.
elements are facts, measures, attributes, dimensions
and hierarchies. They are accompanied by several 4.1. Modeling with ADAPT
other features such as attributes or the additivity of
attributes along dimensions. The notation provides a variety of symbols which
We have chosen ADAPT for the purpose of this are depicted in figure 1. Each one of them represents a
paper because it allows the creation of semantically conceptual object of an OLAP application.
rich models due to several different model elements. A common issue of modeling in analytical contexts
Furthermore, ADAPT is relatively easy to learn and is the necessity to not just model the schema of a
mostly self-explanatory. In contrast to other ad hoc multidimensional problem (such as hierarchy levels),
modeling techniques a stencil for Microsoft Visio has but to model specific instances within this schema
been made available free of charge, thus facilitating the where appropriate (such as individual dimension
application of the modeling language in practice. members). Dynamic aspects, especially calculation
models of derived measures, are of great importance.
ADAPT closes both gaps by offering special symbols,
as described within the following paragraphs.

3
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

The basic elements of this notation are Hypercube measure dimension. Dependencies between measures
(or Cube for short) and Dimension. A Hypercube is the or scenarios can be indicated by using the Model
basic unit of storage for business data in a element. Models help to represent calculation rules
multidimensional database, physically, an n- within systems of financial control, for example the
dimensional array. A dimension is the representation of DuPont-System. To leverage diagram clearness it can
an axis of such a cube. Detailed modeling of be useful to depict one cube per measure and use the
dimensions is done via the symbols Hierarchy, Level, model element to show calculations between cubes.
Member, Attribute, Scope, and Model.

Cube Hierarchy { } Member


Loose Strict Self Used By Connector
Dimension1 Precedence Precedence Precedence
Dimension2
{ } Level Attribute
Figure 3. Connector symbols
Dimension Model { } Scope
Among the conceptual representation of hierarchy
levels various types of relations may exist. The
Figure 1. Basic symbols
connection operators Loose Precedence (one arrow)
and Strict Precedence (two arrows) connecting the
We show the elements’ utilization and interaction
hierarchy levels allow the modeling of unbalanced tree
by applying the notation to model the business case of
and forest structures. The usage of Loose Precedence
a company which wants to analyze its product sales
between product and product subcategory suggests that
(e.g. units and turnover) by different organizational
not all products can be assigned to a specific product
units (e.g. stores) to customers (e.g. in different
line. In other words, there is not necessarily a parent
geographic locations) over time (e.g. months). This
object for each instance on the upper next level. Strict
simple example can be sufficiently modeled by using
Precedence on the other hand indicates the mandatory
one cube (Sales) with five dimensions (Organizational
aggregation of product subcategory to product
Unit, Customer, Product, Time, Measures). In a first
category; for each instance there is always a
step we refrain from displaying the dimensions in
corresponding parent element. Figure 3 displays the
detail. The resulting diagram is shown in figure 2.
provided connector symbols representing the different
relationships. Self Precedence qualifies recursive
Organizational
Unit
Sales
Product relationships within a hierarchy. Used By is a
Product specialized connector for characterizing input
Organizational Unit
Customer parameters for and dependencies between models. The
Customer Time Time
Measures simple Connector denotes all other relationships
between any two elements.
Scopes or subsets summarize related dimension
Measures elements. They are suitable for modeling categories
within dimensions. The elements of a scope are either
Figure 2. Sample cube with five dimensions explicitly defined or can be derived from other
database objects by calculations. Our example uses the
By taking a detailed look at the product dimension results from an ABC analysis to classify products into
we can demonstrate the other design elements. Our three different classes (A, B and C class products).
example company sells products from different
suppliers in several product subcategories which
aggregate into product categories. For the abstract Fully Exclusive Fully Overlapping Partially Exclusive Partially Overlapping
representation of these objects the Level element is
used. The sales department wants to analyze the data Figure 4. Subset operators
by supplier as well as product categorization.
Therefore, we need to model two parallel hierarchies Four Subset operators visualize relationships
(Category and Supplier) with different depths or among different scopes. Fully Exclusive indicates
number of levels. disjoint subsets which contain the complete original set
The Member element is particularly useful when of dimension members. Fully Overlapping separates
modeling dimensions with few elements such as the total dimension into non-disjoint parts. Partially
budget, actual and variance in a scenario dimension or Exclusive connects disjoint subsets, which do not cover
a small number of key performance indicators in a the complete population. Partially Overlapping

4
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

indicates partial division of the population into subsets; Figure 5 shows the entire representation of the
dimension members may be found in more than one product dimension in ADAPT as described in our
subset. In our example each product exclusively example case above. Our exposition thereby is limited
appears in one class. Therefore we apply the Fully to the application of the most important elements. For a
Exclusive operator. more detailed explanation see [2].
Description
English
4.2. ADAPT Metamodel
Description
Product
Spanish
This section introduces the ADAPT metamodel,
which is depicted in figure 6. It has been created
Category Supplier following the design science approach [19].
Furthermore, ADAPT can be seen as a Domain
Specific Languages and the development of
{ } Product
category
metamodels is a typical approach in implementing
DSLs [18]. We make use of the UML notation
{ } Product
subcategory { } Supplier whereby classes represent modeling objects as well as
operators; stereotyped associations represent the
different connector types.
{ } Product Package type
Within the first step of the development process,
{ } A class product we selected the most important elements of ADAPT
Package size
which support the basic ideas of multidimensional
{ } B class product
Weight
modeling presented above. These elements form the
classes within the metamodel. The second step
{ } C class product consisted of establishing relationships between the
modeling elements according to [2].
Figure 5. Sample product dimension The navigation directions of the metamodel’s
associations indicate the direction in which the arrow
By using the Attribute element we can depict heads of each connection symbol should lead. For
additional information on the characteristics of example, the association between Cube and Dimension
dimension elements. Attributes can be assigned to all is navigable from Cube towards Dimension and
elements of a dimension, elements of a certain level or stereotyped with connector. This implies the usage of a
members of a dimension scope. In our example we use connector (see figure 2) between Cube and Dimension
attributes to associate language-dependent descriptions with the arrow pointing towards the dimension. Please
to every single element of the dimension at all notice that the arrows of Strict Precedence and Loose
hierarchy levels. However, package type, package size Precedence are heading to the parent level when
and weight only make sense on single products. connecting hierarchy levels with each other.

<<connector>> <<strict precedence>>


<<connector>>
0..* 0..1
1 1 1 1 +parent 0..1 +parent
<<connector>> <<connector>>
Model Mem ber Hie rarchy {XOR} Leve l
1..* 1..*
0..* 2..* 1 1 1 1 +parent 0..1 0..1
0..*
<<strict precedence>>
<<loose precedence>> <<self precedence>>
<<used by>>
{XOR} <<connector>>
Fully Exclusive
<<connector>>
1 1 0..*
0..*

Dim ension Attribute Subset Operator Partially Exclusive


Cube

1..* 1..* 1 0..* 0..* 1 Fully Overlapping


<<connector>>
<<connector>> <<connector>>
1..* Partially Overlapping
<<connector>> 1
Scope

Figure 6. ADAPT metamodel

5
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

In order to increase clarity, the classes do not modeled members. It is also not intended to limit the
contain any attributes. Every class should have at least range of attributes’ possible values.
a name as well as containers for storing the connected The use of identifiers should be limited. Ideally,
elements. In case of Cube this would include a each identifier is distinct within its object type. For
collection of corresponding dimensions. A name can instance, each dimension and each hierarchy level
be omitted only at Subset Operator and its child require different names. The same does not apply to
classes. elements with different semantics, for example both a
A dimension is linked with one or more members, dimension and a hierarchy can be identically named.
hierarchies or attributes via connectors. The XOR-
Constraint expresses that either individual dimension 5. Modeling Tool Prototype
members or hierarchies exist.
There might be a connector or a strict precedence
The presented prototype should be easily
between hierarchies and the uppermost hierarchy level.
extendable by means of multiple database systems that
Using strict precedence claims that there should be an
can be accessed via the modeling tool. The main
artificial overall hierarchy level. Strict precedence and
problem behind this thought is the existence of various
loose precedence as well as self precedence connect
target platforms which differ in the way they store
single hierarchy levels. Strict or loose coupling allows
multidimensional data: ROLAP, MOLAP und
more than one parent level; in case of a recursive
HOLAP. Therefore, a way of accessing multiple
relationship exactly one parent and one child level
transformation algorithms must be found, and each
exist. Subset operators divide a hierarchy level into
algorithm has to be parameterized. The right-hand side
different dimension views. A connector establishes the
of figure 7 summarizes this approach. In order to
connection between level and subset operator; more
concentrate on this requirement we use the concept of
than one subset operator can be connected to one
reutilization by not inventing a new modeling language
hierarchy level. The subset itself consists of one or
or a new editor but using ADAPT and an existing
more scopes.
ADAPT stencil for Microsoft Visio.
Attributes are, as already stated, attached to
dimensions. Alternatively, they might be linked to
hierarchy levels and scopes via connectors. 5.1. Architecture concept
Calculation models have at least two inbound
members and create one calculated value. A member is The prototype is based on a three-layer architecture
allowed to take part in more than one calculation as shown in figure 7. An editor for ADAPT is situated
model. Calculation rules between different cubes within the graphical or presentation layer. In our case
should not be considered here. this will be Microsoft Visio because of an already well-
The proposed metamodel does not support all proven stencil. Starting with Visio 2003, it is possible
aspects of the ADAPT notation. For example, scopes to save diagrams in an XML format, DatadiagramML
are bound to hierarchy levels via subset operators. The (formerly known as XML for Visio).
authors of [2] see the possibility to link scopes directly The abstraction layer acts as an intermediary
with dimensions in order to categorize individually between graphical modeling and transformations.

multidimensional
problem

T SQL-DDL
{ }
ServiceLoader API

{ }
{ } T ...

ADAPT typed T XMI (CWM)


{ } { } diagramm graph

presentation layer abstraction layer transformation layer

Figure 7. Prototype architecture

6
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

ADAPT diagrams are represented as directed typed and time are shared by both cubes. Further research
graphs in order to reduce the loss of information will include evaluation in real-world business
between diagrams and internal storage. A typed graph applications. At the moment there are negotiations on
G can be defined as the following tuple: G = (N, E, tN, how to integrate our software in a large project within
tE, s, t) with a finite set of nodes N, a finite set of edges the telecommunication industry sector.
E, an assignment of a type to each node tN: N ĺ ™N as
well as to each edge tE: E ĺ ™E, and the assignment of 5.2.1. Galaxy Schema. Within a galaxy schema, cubes
source and target for each edge s, t: E ĺ N [10]. The and dimensions correspond to fact and dimension
instances of N, E, s and t arise from the ADAPT relations, respectively. They are connected via foreign
diagram whereas ™E and ™N reflect the modeling key relationships [3].
elements of the notation itself.
The third layer provides transformational Table 1. Prefixes for identifying
functionalities. Therefore, one of the most important relations and their attributes
requirements is the extensibility by means of adding
new transformation algorithms without intervening into Model element Prefix
the source code of the modeling toolset. Fact table fact_
In version 6 of the Java Standard Edition (SE), the Dimension table dim_
ServiceLoader API has been made public [16]. It helps Primary key pk_
to find, load and use so-called service providers. Foreign key fk_
Within the context of this API a Service is a collection Variable / Measure m_
of interfaces and classes that provide access to specific Hierarchy level level_
program functionality. A service provider reflects the Parent level of recursive parent_level_
actual implementation. In the case of the prototype relationships (self precedence)
each transformation algorithm represents a service Attribute attr_
provider. They are defined by the service provider Dimension scope (connected via subset_
interface (SPI), which consists of a set of public Exclusive operator)
interfaces and abstract classes. Only the classes and Dimension scope (connected via scope_
methods contained within the SPI are visible to the Partially operator)
actual application.
To create an SPI it is necessary to find a generic In order to identify the modeling constructs in the
interface which offers capabilities for creating generated schema, prefixes which precede each
multidimensional modeling constructs. The following identifier (such as relations or columns) are introduced.
operations are suitable for our purpose: A summary is given in table 1.
• create database <database>: This operation creates A second assumption concerns the data types of
a new database as a container for all other structures each individual modeling element. Table 2 outlines
and quantitative values. them. Our future task is to use transformation
• create dimension <dimension>: Dimensions are parameters to implement a solution which allows a
essential for multidimensional data structures. Each flexible setting of data types.
dimension has to receive a distinct name.
• create measure <measure>: As a next step, Table 2. Data types for modeling elements
measures have to be instantiated within the database. within a galaxy schema
• create cube <name> <var1>, <var2>, ..., <varm>
Model element Data type
<dim1>, <dim2>, ..., <dimn>: The fourth method
Primary key INTEGER
assembles data cubes from the given dimensions and
Foreign key (has to be the same INTEGER
variables.
as primary key)
A reference to a partial graph is passed to each method.
Fact DOUBLE
Hierarchy level VARCHAR(255)
5.2. Transformation Examples Attribute VARCHAR(255)
Dimension Scope VARCHAR(255)
The prototype is evaluated by a case study, which
has been simplified for the purpose of this paper. A Each dimension corresponds to a relation
second production cube containing four dimensions dim_<dimension name> with a serially-numbered
(Product, Plant, Time and Measures) extends the primary key. Attributes of the dimension itself
example given in section 4.1. The dimensions product correspond to a separate column attr_<attribute

7
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

name>. If there are dimension members, only one of each scope: subset_<name scope 1_name scope 2_
single column is generated <dimension name>. The ... _name scope n>. An alternative would be the
members are added later on via INSERT statements. creation of a column with a SET data type containing
one entry for each scope. Overlapping operators (Fully
Product Overlapping and Partially Overlapping) result in one
Plant Category
Organizational
Unit
column for each scope. This allows an element to be
contained in different scopes. Recursive connections
conceptual – ADAPT

between hierarchy levels are represented by a column


Product
{ } category
Customer
Production Sales

Product { } Product Product


Ogranizational Unit
parent_level_<level name>.
Fact tables fact_<cube name> are connected via
Plant
Time Customer
Measures
Measures Production Time Sales
Measures Sales
Time

{ } Units
foreign key relationships to dimension tables
Measures
Production Time { } Year
{ } Turnover
(fk_<dimension name>). For each measure a column
{ } Units fact_<fact name> is created. An alternative would be
{ } Month
the creation of a measure dimension; each measure
T would map onto one dimension element.

dim_product 5.2.2. Common Warehouse Metamodel (CWM).


dim_organizationalunit
pk_product
pk_organizationalunit
Another worthwhile approach is mapping structural
level_product category
level_product
...
information onto the CWM. The OLAP package within
...
the analysis layer appears appropriate for this task. The
logical – galaxy schema

fact_production
fact_sales assumptions in table 3 apply to data types.
fk_product
fk_oranizationalunit
fk_customer The Netbeans Metadata Repository (MDR) is an
fk_plant fk_product
fk_time fk_time implementation of a Meta Object Facility (MOF)
m_units m_units
m_turnover compliant repository. It is appropriate for generating
dim_customer

pk_cusotmer
interfaces according to the Java Metadata Interface
dim_plant dim_time

pk_plant pk_time ...


(JMI) specification. These interfaces are used to access
level_plant level_year
an extent within the repository which contains the
level_parent_plant
...
level_month
... metamodel and its instance.

Table 3. Data types for modeling elements


within the OLAP package of CWM
CREATE DATABASE ...;
Model element Data type
USE ...;

# dimension Product
CREATE TABLE dim_product(...); Attribute String
Identifying Attribute Integer
code – SQL DDL

# dimension Time
CREATE TABLE dim_time(...);
Fact Float
.
.
.
# cube Sales A schema has to be instantiated as a container for
CREATE TABLE fact_sales(...);
storing the model elements of the whole ADAPT
# cube Production
CREATE TABLE fact_production(...);
diagram. Each cube and each dimension map onto the
.
corresponding classes of the CWM. A
.
.
CubeDimensionAssociation connects cubes and
dimensions. The mapping of a dimension organizes as
Figure 8. Transformation steps from follows: for each dimension level an object has to be
conceptual ADAPT towards SQL DDL generated and has to be published to the according
dimension. Hierarchies are represented as instances of
According to the metamodel, hierarchies are the class LevelBasedHierarchy. Additional semantics
permitted only if there are no dimension members. can be given to hierarchies by using the class
Hierarchy levels are represented in a column ValueBasedHierarchy. It is useful for hierarchies that
level_<level name>; the same holds for attributes of have some kind of topological order within the
levels: attr_<attribute name>. Levels in parallel individual levels. Because of complexity reasons, the
hierarchies are represented by one column for each current implementation only supports instances of
level. Subsets connected via Fully Exclusive or LevelBasedHierarchy.
Partially Exclusive result in one column per subset Despite the relatively low depth of CWM, some
operator. The column’s name corresponds to the names arguments can be found to build a conceptual metadata

8
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

schema on the basis of CWM. Some parts of the into different logical representations. A suitable way of
metadata can be stored according to the standard which parameterization has to be implemented.
leads to an easier exchange of metadata via XMI. For an aggregation of the Java tool and Visio into
Standard conformant extensions of the reference model one implementation the features of Eclipse Graphical
offer the possibility to adapt it to enterprise specific Editing Framework (GEF) have to be evaluated. The
needs. These extensions can be easily published and research prototype GraMMi [12] offers interesting
used by other CWM users. The usage of XMI offers thoughts about a repository-based graphical modeling
the advantage of checking files against a document toolset. Another attractive proposition to develop
type definition or an XML schema in order to discover multidimensional data models collaboratively within a
transmission errors. web browser also needs to be investigated.
In this context some consideration has to be given
Product to a quality evaluation of the data models. [11]
Plant Category
Organizational
Unit
proposes a two-dimensional approach: the first
dimension is the classification concept (syntax,
conceptual – ADAPT

semantics and pragmatics), the second dimension


Product
{ } category
Customer
Production Sales

Product
Plant
{ } Product Product
Ogranizational Unit
includes measures (absolute and relative measures).
Time
Measures Production
Customer
Time
Measures Sales
Measures
Sales Further research work needs be checked and adapted to
ADAPT diagrams.
Time

{ } Units
Measures
Production Time { } Year
{ } Turnover
Within the transformation layer an optimization of
{ } Units
{ } Month
the generated schemata has to be implemented. Much
improvement can be made within a star or galaxy
T schema. Some suggestions are given in [5].
The prototype makes use of object-at-a-time
:LevelBasedHierarchy
operations. This can be understood as the
:Schema
transformation of the original problem into an object-
:CubeDimensionAssociation
Product: Dimension
:HierarchyLevelAssociation
oriented one and manipulating the models and
:CubeDimensionAssociation
:HierarchyLevelAssociation
mappings in that representation. To raise the level of
Production: Cube
logical – CWM

:CubeDimensionAssociation :Level abstraction it would be better to use model-at-a-time


Time: Dimension

:CubeDimensionAssociation :Level
and mapping-at-a-time operators. They are expected to
Sales: Cube
improve programmers’ productivity for metadata
:CubeDimensionAssociation Plant: Dimension
applications [13].
:CubeDimensionAssociation
Organizational Unit:
Dimension Systems supporting such functionality are
:CubeDimensionAssociation Customer: Dimension discussed under the term model management systems
(MMS). They support “the creation, compilation, reuse,
evolution, and execution of mappings between
schemas represented in a wide range of metamodels”
<?xml version = '1.0' encoding = 'utf-8' ?> [14]. Such an MMS is not a user-oriented tool. In fact
<XMI>
<XMI.header> it is a reusable component that can be integrated into
...
</XMI.header> specific applications relating to data programmability
<XMI.content>
problems.
code – XMI

<CWMOLAP:Schema>
<CWMOLAP:Cube>
... The most relevant operators for our problem at the
</CWMOLAP:Cube>
<CWMOLAP:Dimension> current state of research are generation as well as
...
</CWMOLAP:Dimension> execution of mappings. New transformations could be
.
. generated quickly, in the best case even without
.
</CWMOLAP:Schema> writing any Java code; a graphical definition would be
</XMI>
</XMI.content>
ideal.
An important topic on our research agenda is a
Figure 9. Transformation steps from reverse engineering approach of existing data
conceptual ADAPT towards XMI warehouses which is not currently implemented and
will require further research. In this context,
6. Discussion and Future Work visualization of data models is not restricted to the
ADAPT notation, sometimes different representations
The prototype shows the possibility of finding a are more useful. Fact sheets that can be viewed with
generic interface for algorithmic transformation of Excel showing a description or the calculation of a
conceptual data models based on the ADAPT notation measure are one example.

9
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

7. References [10] M. Blaschka, C. Sapia, G. Höfling, “On schema


evolution in multidimensional databases”, Proceedings of the
First International Conference on Data Warehousing and
[1] S. Rizzi, A. Abelló, J. Lechtenbörger, and J. Trujillo, Knowledge Discovery, Springer, London, 1999, pp. 153-164.
“Research in Data Warehouse Modeling and Design: Dead or
Alive?“, Proceedings of the 9th ACM international workshop [11] J.-P. van Belle, “A Proposed Framework for the
on Data warehousing and OLAP, ACM Press, Arlington, Analysis and Evaluation of Business Models”, Proceedings
Virginia, USA, 2006, pp. 3-10. of SAICSIT, 2004, pp. 210-215.
[2] D. Bulos, S. Forsman, “Getting Started with ADAPT”, [12] C. Sapia, M. Blaschka, G. Höfling, “GraMMi: Using a
Symmetry white paper, http://symcorp.com/downloads/ Standard Repository Management System to Build a Generic
ADAPT_white_paper.pdf [05-29-2008], 2006. Graphical Modeling Tool”, Proceedings of the 33rd Hawaii
International Conference on System Sciences, IEEE, 2000, p.
[3] R. Kimball, M. Ross, “The Data Warehouse Toolkit – 8058.
The Complete Guide to Dimensional Modeling”, Wiley, New
York, 2002. [13] P. A. Bernstein, “Applying Model Management to
classical Meta Data Problems”, Proceedings of the 2003
[4] C. Sapia, M. Blaschka, G. Höfling, B. Dinter, “Extending CIDR Conference, 2003, pp. 209-220.
the E/R Model for the Multidimensional Paradigm”,
Proceedings of the Workshops on Data Warehousing and [14] P. A. Bernstein, S. Melnik, “Model Management 2.0:
Data Mining: Advances in Database Technologies, Springer, Manipulating Richer Mappings”, Proceedings of the 2007
London, 1998. ACM SIGMOD international conference on Management of
data, ACM, 2007, pp. 1-12.
[5] E. Malinowski, E. Zimámyi, “Hierarchies in a
Multidimensional Model: From Conceptual Modeling to [15] D. Kensche, C. Quix, M. A. Chatti, M. Jarke, “GeRoMe:
Logical Representation”, Data & Knowledge Engineering, A Generic Role Based Metamodel for Model Management”,
Elsevier Science Publishers B. V., Amsterdam, 2006, pp. Journal on Data Semantics, LNCS 4380, Springer, 2007, pp.
348-377. 82-117.
[6] T. Priebe, G. Pernul, “Metadaten-gestützter Data- [16] J. O’Conner, “Creating Extensible Applications With
Warehouse-Entwurf mit ADAPTed UML”, H. U. Buhl, A. the Java Platform”, http://java.sun.com/developer/
Huther, B. Reitwiesner (Eds.), “Information Age Economy, technicalArticles/javase/extensible/ [05-12-2008].
5. Internationale Tagung Wirtschaftsinformatik 2001”,
Physica, Heidelberg, 2001 (in German). [17] P. Bernstein, E. Rahm, “An Online Bibliography on
Schema Evolution”, ACM SIGMOD Record, New York,
[7] S. Luján-Mora, J. Trujillo, I. Song, “A UML profile for 2006, pp. 30-31.
multidimensional modeling in data warehouses”, Data
Knowledge & Engineering, Elsevier Science Publishers B. [18] J. Luoma, S. Kelly, J.-H. Tolvanen, “Defining Domain-
V., Amsterdam, 2006, pp. 725-769. Specific Modeling Languages: Collected Experiences”,
Proceedings of the 4th OOPSLA Workshop on Domain-
[8] J.-N. Mazón, J. Trujillo, “An MDA approach for the Specific Modeling, Computer Science and Information
development of data warehouses”, Decision Support System Reports, Technical Reports, TR-33, University of
Systems, Elsevier Science Publishers B. V., Amsterdam, Jyväskylä, Finland 2004.
2008, pp. 41-58.
[19] A. R. Hevner, S. T. March, J. Park, S. Ram, “Design
[9] M. Golfarelli, D. Maio, S. Rizzi, “The Dimensional Fact Science in Information Systems Research”, MIS Quarterly,
Model: A Conceptual Model for Data Warehouses”, Vol. 28 No. 1, 2004, pp. 75-105.
International Journal of Cooperative Information Systems,
World Scientific Publishing, 1998, pp. 215-247. [20] N. Raden, “Modeling a Data Warehouse”,
http://www.hiredbrains.com/artic3.html [08-21-2008], 1995.

10

You might also like