You are on page 1of 25

CSCI 8715 Spatial databases

Spatial Data warehouses


A Survey
Group 4: Nipun Garg 4282567 Surabhi Mithal 4282643 http://www-users.cs.umn.edu/~smithal/

Abstract The field of Spatial Data warehouses has been emerging since the past decade due to the need to analyze large volumes of spatial data. The data once stored in a spatial data warehouse has to be queried using spatial online analytical processing (SOLAP) systems. The research in field of spatial data warehouses has been on conceptual models, materialization of spatial indexes, aggregation operations and SOLAP. In this paper we give an overview of the core concepts in a spatial data warehouse and recent advancements in the field.

1. Introduction
Spatial data warehouses aim at effective and efficient querying of spatial data. Spatial databases are suited for answering regular transactional queries where there is not a lot of historical component or aggregation. The class of queries that are needed to support the decision making process are difficult on spatial data bases. This gave a rise to the field of spatial data warehouses which is idea of combining the traditional data warehouses with spatial databases. Spatial data warehouses are based on the concepts of Data warehouses and additionally provide support to store, index, and aggregate and analyze spatial data [33]. A data warehouse consists of facts and dimensions modeled in a star or snowflake schema. A data cube is a lattice of cuboids which represents hierarchies. The data cube may have cells which are pre computed for efficient query processing. Common OLAP operations include slicing, dicing, Roll up and Roll down. These concepts are extended to spatial data in a spatial data warehouse. Integration of spatial data and data cubes make many interesting spatial aggregation queries possible.

Figure 1: The Taxonomy of Spatial and Spatiotemporal data warehouse Figure 1 shows the relationship between spatial, spatiotemporal and a data warehouse.

The major characteristics of a spatial data warehouse include: Conceptual model: Star and snowflake schemas for spatial attributes. Spatially enable components: Spatial measures, spatial dimensions and spatial hierarchies. Spatial OLAP operations: Operations like Roll up, drill down extended to spatial predicates [36]. Efficient query processing: o Indices and materialized views o Joins and aggregations queries.

Spatial data warehouses have been an active topic of research in the past decade. This is because of the popularity of spatial information such as maps created from the images/information received from satellites has increased tremendously. These data sets are huge and have to be analyzed efficiently to make best use of the information gathered. The research in this field mostly focuses on: Spatial multidimensional models: Conceptual models for efficient representation of spatial data warehouses. Materialized spatial indexes: Extension of spatial Indexes to store aggregated spatial data. Aggregate operations: Various aggregation operations over hierarchies. SOLAP: Client applications on spatial data warehouses [19].

Some examples of spatial data warehouse are the US census dataset, EOS archives [54], Microsoft Terra server [17] and Spatial Eye [55].

Spatial Data Acquisition & Consolidation (ETL)

Addition of Temporal features (Optional)

Spatial Data warehouse (Spatial Data Marts and Metdata)

Spatial Cubes Formation

Presentation thorugh SOLAP tools

Figure 2 shows the high level view of the various phases in implementation of a spatial data warehouse

Spatial data warehouses have a wide variety of application domains like Logistics, Forecasting, security, detection of environment changes and health monitoring. These involve spatial data which has to be analyzed over various dimensions on multiple resolutions.

1.1. Contributions and Related work


Related work on spatial data warehouses is in the form of: Journal papers and conference proceedings [1-45][48-52]: These are the basic resources for research in the field of spatial data warehousing and current trends. Books [46] which explain in details the basic concepts of spatial and temporal data warehouses. Reference book [47] explains the details of terms and terminologies of spatial databases. Online Encyclopedias like Wikipedia [53] are good source for basic level information and concepts.

Spatial data warehouses and SOLAP is widely gaining importance due to the capabilities it provides. To our knowledge, there is no recent survey paper in literature on spatial data warehouses. There exists some literature on overview of general concepts of spatial data warehouses [19] [20] [48] but we could not find a recent detailed survey which covers all the important aspects of a spatial data warehouse. In [48] though some aspects of spatial and spatio temporal data warehouses are covered some important concepts like benchmarks and spatial OLAP tools are not discussed at length. There are many open research issues in spatial data warehouses and we have classified them and presented them in a consolidated manner. We have also analysed trends like spatiotemporal data warehouses and why they are gaining importance.

The below table Figure 2 shows the classification in terms of topics and research literature which exists for spatial data warehouses.
Topic Conceptual Models Subcategory Spatial Multidimensional model Requirements of a conceptual model for SDW Mapping of Conceptual model to Physical schema Spatio Temporal model Indexing Subcategory Papers [1],[4],[ 5],[12],[23] [21] [3] [6], [14] Materialized Spatiotemporal GIST Object-Based Pre Aggregation Geometric Aggregation Model [7],[8],[10] [9], [43], [44] [27] [11] [12],[28]

Storage and Querying

Selective Materialization Aggregation General Concepts and Issues

Aggregation Operations Spatial OLAP ( SOLAP) General Concepts Tools for SOLAP Benchmark Extension of OLAP cubes Spatio-temporal DW General concepts and Issues Trajectory Data Warehousing

[30] [15], [22], [47] [20], [30], [32] [32], [34], [35], [36] [25], [37], [38], [39], [40], [41] [18] [13], [16], [48] [13], [42], [44]

Table 1: Classification of topics & research literature

The broad classification in terms of concepts and algorithms is presented below in Figure 3.

R*a tree Materialized aRtree GIST based Index aRB tree

Spatiotemporal

aHRB tree

MultiDimER model Spatial Data warehouses Conceptual Model Other Multidiemnsional models

3 RDB tree

Pre Aggregation Concepts Geometric Aggregation Aggregation Distribuive, Algebraic and Holistic Big Cube Aggrregation operations

Aggregation Operations

Figure 3: Broad classification presented as hierarchy tree

1.2. Scope
The scope of this paper is to study the broad concepts in a spatial data warehouse and the research needs. The paper discusses conceptual models, Indices for spatial data warehouses, Materialization and Aggregation over hierarchies, SOLAP and benchmarks. It also outlines the broad research areas in these topics. In addition, the latest emerging trend in this field spatiotemporal data warehouse is explored.

1.3. Organization
The paper is organized in the following order: First we give a brief overview of conceptual models for spatial warehouses. Focus will be on existing models. In section 3, the current storage and indexing techniques for SDW are presented as well as future research needs are analyzed. Section 4 describes the important concept of aggregation. Section 5 and 6 cover the SOLAP tools for efficiently querying spatial data warehouses and evaluation benchmarks in SDWs respectively. Finally, section 7 describes spatio-temporal data warehouses which are the latest trend.

2. Conceptual models for Spatial Data warehouses


A conceptual model is a representation of the concepts and relationships between them [37]. It is primarily for capturing requirements of the decision making users without worrying about the implementation details. There are a conceptual models existing for relational and spatial data bases but they do not scale well to spatial data warehouses due to presence of hierarchies, aggregations, measures and dimensions.

2.1. Existing models.


There have been various proposals for multidimensional models for spatial data warehouses. [1] proposes a multidimensional model where measures and dimension are modelled as complex objects. It provides concepts of entity schema and entity instances and uses these to define hierarchies, aggregations and data cube. MultiDimER model [4] [5] is a conceptual model for spatial data warehouse which introduces concepts like spatial level, spatial hierarchies, spatial measures and spatial fact relationships. It is quite flexible in the sense that it does not require spatial dimensions to present for a spatial fact to exist. It allows representation of real word hierarchies [5] in the model. The basic concepts in MultiDimER multidimensional model are: Spatial Level Spatial levels are levels where spatial characteristics are stored. A topological relationship exists between different spatial levels. Spatial Hierarchy A hierarchy which includes at least one spatial level. Spatial dimensions: Extending the concept of dimensions in a Data warehouse, spatial dimensions are dimensions that have at least one spatial hierarchy. In the model spatial dimensions are of 3 types with the following hierarchies: o Non-spatial. o Spatial-to-non-spatial.

o Fully spatial Spatial Fact Relationship: A Spatial fact relationship is fact relationship that requires a spatial join between two or more spatial dimensions. Spatial Measures: Spatial measures are measures are either numerical values calculated using topological operators (Length in the figure below) or Geometries which can be aggregated with the hierarchies.

Figure 4: MultidimER model for Highway maintenance [4] The MultidimER model in figure 4 has the following attributes, Length is a spatial measure, city and Highway Segment are spatial dimensions as they have spatial hierarchies. An example for the spatial hierarchy here are City and State. [3] describes the mapping of the MultidimER conceptual model into a physical model. The physical model is implemented in Oracle 10g spatial. The paper discusses the implementation issues of schemas created using conceptual models. A Spatial level defined in MultidimER model corresponds to a table in a database. The relationships between levels are represented by many to one relationship between tables. The basic requirements for the design of an effective multidimensional model for spatial data warehouses are described in [21]. We have classified the requirements presented in [21] based on the area they belong to. Figure 5 summarizes our classification:

Simplicity Easy Easy to understand but capturing all basic elements Independence Independence in specification s and Implmentation Conceptual Model Implementation Implementation independent Flexible Flexible in terms of spatia and non spatial attributes Hierarchy Multiple Multiple as well as Explicit Handle Handle data with different granularities Irregular Irregular spatial hierarchies supported Aggregation Support Support for thematic & geometric aggregation User User defined aggregates Avoid Avoid incorrect aggregation Dimenensionless Dimenensionless and measure less aggregation Goespatial Goespatial aggregation Data Handles Handles changes over time OLAP OLAP operations including drill through and drill across Handle Handle Uncertainity

Figure 5: Requirements of a spatial multidimensional nsional model

3. Storage and Indexing


3.1. Indexing
Indices form an important part of a data warehouse spatial or non spatial. If the right index structures are built on columns of dimensions and facts, the performance of queries, especially ad hoc queries is greatly enhanced. There have been some proposals [8] [ [11] [27] for extending the index structures which currently exist to suit the needs of a spatial data warehouse.

[27] describes the extension of the Generalized Index search tree [GIST] framework for efficient OLAP queries on a Spatial Data warehouse. GIST provides 2 interfaces to extend. These are the Predicate and gist interfaces. The search algorithm in GIST uses the predicate Consistent to find all the leaf nodes which are consistent with query predicate. A new state for this predicate is introduced called Partial true. Also, a new search algorithm is proposed for efficient results during a OLAP query.

3.1.1. Materialized Indexing


R*a tree [10] extends the R*- tree for efficient OLAP operations using materialization of the index structure. The paper shows that storing aggregates in the inner nodes of the index tree will improve the response time of OLAP slice and dice queries as the number of accesses to the secondary memory will reduce. A modified recursive range query algorithm is presented in the paper which uses this pre computation and highlights this will be quite useful in range queries. The results also show that the extra space needed for storing the aggregated data is linear to the size of the structure.

While R*a tree [10] highlights of concepts of storing aggregates in the index it does not consider spatial objects. The aR tree [8] extends along the same idea of materialization of the index by extending R tree for spatial data warehouses. OLAP operations may need a specific hierarchy which is not defined at design time for spatial data. aR tree stores the results of aggregation functions on all the objects stored by each MBR. The example shown in Figure 6 depicts an aR tree which shows 5 MBRs (a1, a2 a5) and the COUNT of spatial objects within them.

` Figure 6: The aR tree [8] The advantages of this approach are:

aR tree defines a hierarchy among MBRs that forms a data cube lattice model. This will give scope for selective materialization of the structure. This idea can be extended to storing results of window queries or all other types of aggregate operators.

While the aR tree is considered quite effective for aggregation queries, the effectiveness it provides gets degraded when the number of dimensions is quite large [37]. The complexity is similar to the sequential scanning when the number of dimensions is significant. [7] presents an implementation and exploration of aR-trees [8] for spatial data warehouses. 3.1.2. Spatiotemporal Indices Most indexing approaches for spatial data warehouses focus on spatial [8] [10] [27] or temporal indexes [43] [44]. Spatio-temporal data warehouses need the integration of spatial and temporal structures for efficiency. The aggregate R- B-tree (aRB-tree) [9] is an extended R tree which has a pointer to a B tree which stores historical aggregated data about the MBR. This has been proposed for static spatial dimensions. Figure 7 shows the structure for an aRB tree.

Figure 7: an aRB tree [9] The Aggregate Historical R-B-tree (aHRB) [9] combines the concept of aRB tree and Historical R tree [HR tree] for indexing of dynamic spatial dimensions. Each node stores the time span to indicate if it is valid or not and when was it valid in history. Other form of entries of a node is similar to the aRB tree. Each time an update happens a new R tree is created at that timestamp. Figure 8 shows the structure of an aHRB tree.

Figure 8: an aHRB tree [9] Another proposal for dynamic spatial dimension indexing is the aggregate 3 dimensional RB tree (3RDB-tree) [9] which improves on the limitation of the size of the tree for the aHRB tree. It forms one large R tree for the whole history as opposed to the many small R trees created in aHRB tree. The large R tree stores different version of all the regions in the same tree.

3.2. Selective materialization


The selective materialization of a data cube has been studied in detail and techniques have been proposed for effectively choosing the set of cuboids to materialize [40].
100 A 50 B D 20 G 7 H 10 30 E C F 40 75

Figure 9: Example lattice with space cost for selective materialization [40] Figure 9 above shows the lattice model, which forms the key for selective materialization. The edge from node 1 to node 2 (from up to down) shows that the query for node 2 can be answered by the grouping done for node 1. The greedy algorithm proposed in [40] gives the output as the selected nodes to materialize based on space cost minimization.

Though the problem of selective materialization extends naturally to spatial data cubes, the difference in the spatial case is that the computation cost. If there is no materialization of the spatial data cubes the online computation becomes very time consuming. This is due to the computationally expensive joins and other operations for spatial data. In [11] a finer granularity approach is suggested for spatial data a cube which focuses on cell level materialization instead of cuboid level. The approach is called object based materialization and focuses on selecting a few spatial objects. The selective materialization is based on the relative access frequency of the sets of mergeable spatial regions. The pre computation occurs if they are expected to be accessed frequently. The algorithms they propose assume that pre computation cuboids are already identified by algorithms in [40] or by minor extension to them.

3.3. Research needs


The index structures discussed above focus on materialization of the index structure storing aggregations of spatial measures. Most of the existing work is limited to numerical aggregations and other simple operations. There is a need to study the materialization of indexes for supporting spatio temporal measures like the direction in which a movement is happening. The index selection problem is widely known problem in the databases world. The problem extends naturally to spatial data warehouses where efficiency of retrieval is of prime importance. The methods proposed for selective materialization of spatial data cubes assume that there exists information about the access frequencies of a set of selected cuboids. Methods need to be proposed which are independent of this assumption.

4. Aggregation
Aggregation in data warehouses refers to the summarizing of the properties of data over particular dimensions of interest. The most commonly used of these are time and geographic location and applying an aggregation operation of interest to the measure/fact data. Aggregation over spatial data warehouses refers to computing of the aggregated operations on measures on the union of the areas which are considered for aggregation. An example is computing the total size of a union of a number of areas.

4.1. Aggregation operations and techniques


4.1.1. Aggregation operations for spatial and non spatial data

Aggregations functions for spatial data have been grouped into three categories Distributive, Algebraic and Holistic [22] [47]. Table 2 describes the grouping of the spatial aggregate operations based on three basic categories.

Data Type Set of numbers

Set of Geometries

Aggregate Operations Distributive Algebraic Count, Min, Max, Sum Average, Standard Deviation, MaxN() & Min N() Minimal, Orthogonal Centroid, Center of Bounding Box, Gravity and Center of Geometric union , Mass Geometric Intersection

Holistic Median, MostFrequent, Rank Nearest Neighbor Index, Equi Partition.

Table 2: Set of aggregate operations [47] [15] describes the BigCube model for multidimensional spatial data. They define aggregation operations as additive, semi-additive and Non Additive and describe how these are incorporated in the multidimensional model. The operations defined are listed in Table 3.

Type

Big Cube Aggregate Operator Additive Semi Additive Count, Min ( Base), Average, Standard Max, Sum (Apex), Deviation, Variance Concatenate, Convex ,MaxN() & Min N() Hull, Spatial Union Centroid, Center of , Spatial Intersection Gravity, Center of Mass

Non Additive Median, MostFrequent, Rank, LastNonNullValue, FirstNonNullValue, Minimum Bounding Box, Nearest Neighbor, Equi Partition.

Table 3: Aggregate operations for BigCube [15]

4.2. Agggregation concepts


The operators presented in [15] [22] [47] work well with spatial objects but aggregation of spatial measures requires to consider the topological relationships existing between them. This is because of the problem of double counting while aggregation. A building listed as a bowling alley and discotheques would be counted twice under aggregation for entertainment [12].

[28] deals with this problem and describes the pre-aggregation of spatial measures. The pre-processing of facts is done for computing their disjoint parts. They propose the classification of topological relationships between spatial measures. The preaggregation works if the spatial properties of the objects are distributive over some aggregate function. The drawback of the approach in [28] is that they do not address forms other than polygons. [30] describes a formal model for geometric aggregation. They define three parts namely algebraic part, geometric part, and the Classical OLAP or Application part each of which maintain separate hierarchies and interact with each other to answer queries. Figure 10 shows an example of the three parts.

Figure 10: Geometric, algebraic and application part [30]

4.3. Research Needs


The Multiple representation problem is widely known problem in spatial databases [52]. The same spatial object may be considered as a point in one application as a polygon in the other. In some other scenario 3 dimensional representation may be followed which considers the object as a cuboid or polyhedron. Figure 11 depicts the three different representations possible of the same spatial object which may be a building in this case.

Point

Polygon

Cuboid

Figure 11: Multiple representation of the same object The multiple representation problem is particularly problematic in case of spatial data warehouses because of 2 major reasons [12]. 1. Aggregation and Consolidation of data from different sources where a different representation is followed. 2. SOLAP operations: While doing an operation like roll up and drill down over hierarchies same level may have different representation for the same object making it difficult to choose one.

Double Counting while Aggregation Double Counting means incorrect aggregation of measures due to some overlapping property. An example would be the same park being used for a concert and a fair may be counted twice while aggregating the objects classified as entertainment. The problem of double counting has been addressed in [11] considering topological relationships between spatial measures and only doing aggregations over objects which are disjoint thus avoiding the problems of incorrect aggregation. This is still an open problem due to concepts of multiple representation and topological relationships when the objects are represented in 3 dimensions.

5.

Spatial Online Analytical Processing (SOLAP)


OLAP is an approach to swiftly answer multi-dimensional analytical (MDA) queries [48]. It is a category of decision-support tools often used to provide access in an efficient and intuitive manner to a data warehouse. Some of the examples include Cognos Powerplay, Business Objects and Oracle Express. OLAP tools are not robust to analyze spatial and temporal data. GIS tools are also helpful in analyzing spatial data but still are not good enough to make full utilization of spatio temporal datasets [32]. Therefore, a new approach is to couple of OLAP and GIS functionalities. In this way it will be possible to have decision support tools that are

better adapted for spatio temporal exploration and analysis of data. These are called Spatial OLAP systems, or SOLAP.

5.1. Concepts

OLAP supports spatial data but it treats a spatial dimension as any other dimension and it does not pay attention to the cartographic component of the data. Data visualization facilitates better understanding of the structure of the data and helps in better decision making capabilities [34]. Maps and graphics do more than make data visible, they can help in driving the historical data analysis. Without a cartographic display, OLAP tools lack an essential feature, which could help the completion of spatiotemporal exploration and analysis processes [30].

OLAP

SOLAP

GIS

Figure 12: SOLAP is created by combining concepts/features of conventional OLAP &GIS This creates a need for SOLAP which has been defined in [24] as a visual platform built especially to support rapid and easy spatio temporal analysis and exploration of data following a multidimensional approach comprised of aggregation levels available in cartographic displays as well as in tabular and diagram.

5.2. Tools for SOLAP


In this section, we have summarized the current available tools for SOLAP. SOLAP tools can be divided in three different categories. [32] [34] [35][36]. OLAP dominant (Business Objects, Cognos, Knosys) which provide means for aggregation of data. GIS dominant which focus on geometric operations. Visual data selections or Integrated OLAP and GIS solutions (Geo cube, Sovat).

Figure 13 shows the classification of SOLAP tools into various categories.

SOLAP TOOLS
OLAP based
Business Objects

GIS based
LGS Group Inc.

Integrated

Cognos

Knosya

Geo cube

SOVAT

Figure 13: An overview of Spatial OLAP tools

6. Benchmarks for Spatial Data warehouses


Benchmarking is to evaluate or check (something) by comparison with a standard.
o How well is the performance of your spatial data warehouse? o Does it need improvements or is it really good?

To answer such questions, it is critical to assess the warehouse's performance, relative to an achievable "standard" or "benchmark." Every benchmark should have well-defined success criteria. Before creating a detailed benchmark specification, it is important to decide about the most crucial technical requirements of a data warehouse. This helps in focusing the benchmark in those lines.

6.1. Types of benchmarks


There are 2 types of benchmarks:
o Functional benchmarks These are the standards to evaluate what functions a system can do. o Performance benchmarks These benchmarks helps to determine and compare how fast the system is.

In past few years several concepts are implemented to improve query processing over spatial data warehouses. Few are indices creation and materialized views. In order to evaluate how efficient these techniques are, different datasets with different properties are used. The benchmarks used for spatial data warehouses query processing should fit spatial data warehouse evaluation needs. Also, the benchmark should be able to analyze the performance of operations such as spatial roll up and drill down.

6.2. Overview of existing benchmarks for spatial data


The following benchmarks exist for spatial data:
Benchmark VESPA [37] Measuring performance by considering spatial joins [38] TPC-D Description/ Limitations Both of these benchmarks focus on the spatial predicate computation but not aimed at assessing the efficiency of SOLAP operations. Benchmark by Transaction Processing Performance Council for decision support systems. But it does not support indices nor materialized views [39]. This provides individual queries that are not known in advance. However, its schema differs from the traditional star schema. This benchmark is more realistic then the previous ones. It suppresses the schema issue with a snowflake schema, but is aimed at refreshing warehouse with new and changed data originating from the operational side of the business. It extends the TPC-H to enable the analysis of historical trends and provides a set of predefined queries to run over its star schema. The SSBs queries refer to descriptive locations of suppliers and customers. However, the SSB does not hold spatial attributes nor stores maps that would enable multidimensional queries with spatial predicates.

TPC-H [39] TPCDS - [40]

Star Schema Benchmark (SSB)[41]

Spadawan benchmark [25]

Spatial data warehouse benchmark (Spadawan), focuses on this problem by using predefined spatial hierarchies. Helps to address the query processing performance on spatial rollup and drilldown operations. It is a performance benchmark.

Table 3 : Various Benchmarks and their limitations


Spadawn [25] is the considered very effective for spatial data warehouse benchmarking as it not only generates SDW datasets composed of points and polygons in spatial attributes but also supports evaluation of different types of spatial queries (SOLAP) that enable the performance evaluation of intersection range queries, containment range queries and enclosure range queries in the spatial predicate. It enables the evaluation of spatial roll-up and drill-down operations.

6.3. Research Needs


There is a need to research on developing benchmarks for the evaluation of:

Spatial data such as lines, polygons with holes and with islands. Spatial data generation and SOLAP query processing.

Additional SOLAP query types to analyze drill-across operations on extended SDW schemas.

7. Trends: Spatiotemporal data warehouses


Data warehousing applications are based on high-performance databases. Many fields deal with the data that has spatial information as well, like address, location. If we integrate the spatial component of the data with the data warehouse, the decision making potential of such organizations grows manifold.

7.1. Introduction to Spatio- temporal data warehouses


Consider the query, How many objects visit a given area during a given time period? This query includes both spatial component and time component. While spatial data warehouses look at many types and dimensions of data including the spatial context, there is a need to include the temporal aspects as well. This will allow applications to see hidden relationships and patterns in data. 7.1.1. Challenges Many applications refer to moving objects and require spatio-temporal modeling for specific analysis. This type of object motion defines a continue variation in space and time which makes it very difficult to handle such huge datasets. 7.1.2. Organization of Temporal data Two concepts of time are involved in temporal characteristic of geographic entities- World time and System time [13]. World time refers to the time when an entity change take place in reality whereas the system time means the time that records the entity change in database. Depending on the requirement, users might want to use only system time (eg GIS) or both (Data warehouses) which makes it even harder to model two types of time dimensions in the spatio temporal data warehouses.

7.2. Trajectory data warehousing Tools and techniques


Trajectory data warehousing is a branch of spatiotemporal warehousing. Spatio temporal data cubes are essential to support trajectory data. It should allow analysis along temporal dimensions, spatial dimensions at different levels of granularity (point, cell, road) and thematic dimensions, containing, for instance, demographic data.

STAU: A spatio-temporal extension for the ORACLE DBMS. It provides data management infrastructure for historical moving objects. It is a system extension to Oracle 10g ORDBMS data management infrastructure for historical MODs. Hermes[13] :Hermes is a database engine for handling objects that change location, shape and size, either discretely or continuously in time.The prototype has been designed as an extension of STAU and it supports the demands of real time dynamic applications (e.g. Location-Based Services LBS). o It is a robust framework that provides functionality for handling spatio-temporal data. o It enables the modeling, construction and querying a database with dynamic objects that change location, shape and size. o Hermes provides spatio-temporal functionality to state-of-the-art Object-Relational DBMS (ORDBMS).

The GeoPKDD trajectory data warehouse [42]- GeoPKSS is a project which aims at extracting user-consumable forms of knowledge from large amounts of raw spatio temporal geographic data. Figure 14 below illustrates the GeoPKDD architecture.

Figure 14: The GeoPKDD architecture [44]

Description of the architecture At the beginning, location data is captured, and is forwarded to a trajectory stream manager, which does some preprocessing operations such as, splitting the raw data according to some criteria, providing a trajectory identifier.

These trajectories are then loaded into a moving object database (MOD). MOD is managed by the Hermes system. Basically, the MOD includes a relation MOD Trajectories with schema (Oid, trajectoryid, trajectory), where trajectory is of type Moving Point. In MOD, appropriate querying and Extract-Transform-Load (ETL) processes are applied to update the TDW with trajectory information. The trajectory data warehouse model mentioned is based on the classic star schema. It has a standard temporal dimension, and two spatial dimensions.

7.3. Research needs


The motivation of having spatio temporal data warehouses is to utilize valuable information that can be used for decision making purposes in applications, such as mobile marketing, location-based services and traffic control management. Trajectory warehousing is an important step in this. It is an invaluable field which has lot of scope. Owing to high scalability of this type of historical data, future research should focus on modeling, aggregation and indexing to improve efficiency in such warehouses.

8. Future Work
Domain specific application of spatial data warehouses are much talked about the research literature [49][50][51]. Future work in this direction would be classification of literature that exists in specific domain and identification of common concepts in each domain. This not only will give present a broad example of use of spatial data warehouses in a domain but also would give an idea of the core concepts which are applied in each domain. Other research directions we would like to include going ahead would be 3 Dimensional spatial objects in terms of spatial data warehouses. 3 Dimensional queries on spatial data warehouses may be helpful in domains like urban planning and Disaster management [12]. The topological relationships for 3 dimensional objects would include relationships like INSIDE, ANYINTERACT [56].

9. Summary
Spatial data warehouses have been an active area for research over the last decade. Concepts like big data are evolving with a big chunk of spatial information to process, store and analyze. Given the latest trends, spatial data warehouses can be considered as a big part of the future research due to their capability to provide decision making users relevant and concise data. The survey we presented covers the broad topics of spatial data warehouses and overview of trends like spatio temporal data warehouses. The topics include conceptual models, storage and indexing, aggregations and spatial OLAP. For some of the topics we have provided the areas

where future research is needed. We have also summarized the benchmarks that currently exist and compared and contrasted them for spatial data warehouses.

References
[1] Towards a Spatial Multidimensional Model - S. Bimonte, A. Tchounikine and M. Miquel DOLAP05, November 45, 2005, Bremen, Germany [2] Modelling multiple representations into spatial data warehouses: a UML-based approach Bdard Yvan, Ph.D, Marie-Jose Proulx, M.Sc, Suzie Larrive B.Sc., Eveline Bernier, M.Sc. [3] IMPLEMENTING SPATIAL DATAWAREHOUSE HIERARCHIES IN OBJECTRELATIONAL DBMSs Elzbieta Malinowski and Esteban Zimanyi [4] Representing Spatiality in a Conceptual Multidimensional model [5] Spatial Hierarchies and Topological Relationships in the Spatial MultiDimER model? E. Malinowski?? and E. Zimanyi [6] Multidimensional Model Representing Continuous Fields in Spatial Data Warehouses Alejandro Vaisman, Esteban Zimnyi ACM GIS 09 November 4-6, 2009. Seattle, WA, USA Copyright 2009 [7] Materialized aR-Tree in Distributed Spatial Data Warehouse Marcin Gorawski and Rafal Malczok [8] Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao [9] Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, and Jun Zhang [10] The R*a-tree: An improved R*-tree with Materialized Data for Supporting Range Queries on OLAP-Data - Marcus Jurgens, Hans-J. Lenz [11] Object-Based Selective Materialization for Efficient Implementation of Spatial Data Cubes Nebojsa Stefanovic, Member, IEEE Computer Society, Jiawei Han, Member, IEEE Computer Society, and Krzysztof Koperski, Member, IEEE Computer Society [12] Spatial Data Warehouses: Some Solutions and Unresolved Problems Elzbieta Malinowski and Esteban Zimanyi [13] RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE WANG Jizhou, LI Chengming [14] Spatio-Temporal Data Warehouse Design for Human Activity Pattern Analysis L. Savary, T. Wan, K. Zeitouni [15] Viswanathan, G., Schneider, M.: BigCube: A MetaModel for Managing Multidimensional Data. In: Proceedings of the 19th Int. Conf. on Software Engineering and Data Engineering (SEDE). (2010) 237242 [16] What is Spatio-Temporal Data Warehousing? Alejandro Vaisman and Esteban Zimanyi [17] Microsoft TerraServer: A Spatial Data Warehouse Tom Barclay, Jim Gray and Don Slutz [18] Map Cube: A Visualization Tool for Spatial Data Warehouses S. Shekhar, C.T. Lu, X. Tan, S. Chawla, and R. Vatsavai [19] Fundamentals of spatial data warehousing for geographic knowledge discovery Yvan Bdard, Tim Merrett and Jiawei Han [20] SOLAP: a new type of user interface to support spatio-temporal multidimensional data exploration and analysis S. Rivest, Y. Bdard, M.J. Proulx, M. Nadeau [21] On the Requirements for User-Centric Spatial Data Warehousing and SOLAP Ganesh Viswanathan & Markus Schneider [22] Gray J., Bosworth A., Layman A., Pirahesh H. Data Cube: a Relational Aggregation Operator

Generalizing Group-by, Cross-tabs and Subtotals. ICDE,1996 [23] Spatial OLAP Modelling: An Overview Base on Spatial Objects Changing over Time GabrieI Pestana Miguel Mira da Silva Yvan BCdard [24] Bdard, Y., S. Larrive, M.-J. Proulx, P.-Y. Caron and F. Ltourneau. 1997. Geospatial Data Warehousing: Positionnement technologique et stratgique. Rapport pour le Centre de recherche pour la defense de Valcartier (CRDV) [25] Benchmarking Spatial Data Warehouses Thiago Lus Lopes Siqueira1,2, Ricardo Rodrigues Ciferri2, Valria Cesrio Times3, Cristina Dutra de Aguiar Ciferri [26] Efficient OLAP Operations for Spatial Data Using Peano Trees Baoying Wang Fei Pan Dongmei Ren Yue Cui Qiang Ding William Perrizo [27] Spatial Hierarchy and OLAP-Favored Search in Spatial Data Warehouse Fangyan Rao, Long Zhang, Xiu Lan Yu, Ying Li, Ying Chen [28] Pre Aggregation in Spatial Data Warehouses Torben Bach Pedersen and Nektaria Tryfona [29] Spatial Aggregation: Data Model and Implementation Sofie Haesevoets, Bart Kuijpers and Alejandro Vaisman [30] Hermes A Framework for Location-Based Data Management- Nikos Pelekis, Yannis Theodoridis, Spyros Vosinakis and Themis Panayiotopoulos. [31] Selective Materialization: An Efficient Method for Spatial Data Cube Construction Jiawei Han, Nebojsa Stefanovic, and Krzysztof Koperski [32] TOWARD BETTER SUPPORT FOR SPATIAL DECISION MAKING: DEFINING THE CHARACTERISTICS OF SPATIAL ON-LINE ANALYTICAL PROCESSING (SOLAP)GEOMATICA Vol. 55, No. 4,2001, pp. 539 to 555 [33] MacEachren, A. M. and M.-J. Kraak. 2001. Research challenges in geovisualization. Cartography and Geographic Information Science. [34] Bimonte, S., Tchounikine, A., Miquel, M.: Geocube, a multidimensional model and navigation operators handling complex measures: Application in spatial olap. Advances in Information Systems (2006) 100109 [35] Scotch, M., Parmanto, B.: SOVAT: Spatial OLAP visualization and analysis tool. In: Proceedings of the 38th Annual Hawaii Int. Conf. on System Sciences (HICSS), IEEE (2005) 142b [36] Marchand, P., Brisebois, A., Bedard, Y., Edwards, G.: Implementation and evaluation of a hypercube-based method for spatiotemporal exploration and analysis ISPRS journal of photogrammetric and remote sensing 59(1-2) (2004) 620 [37] Paton, N.W., Williams, M.H., Dietrich, K., Liew, O., Dinn, A., Patrick, A.: "VESPA: a benchmark for vector spatial databases", In BNCOD, pages 81-101, 2000. [38] Gnther, O., Oria, V., Picouet, P., Saglio, J., Scholl, M.: "Benchmarking spatial joins la carte", In SSDBM, pages 32-41, 1998. [39] Poess, M., Floyd, C.: "New TPC benchmarks for decision support and web commerce", SIGMOD Record, 29(4):64-71, 2000. [40] Poess, M., Smith, B., Kollar, L., Larson, P.: "TPC-DS, taking decision support benchmarking to the next level", In SIGMOD, pages 582-587, 2002. [41] O'Neil, P., O'Neil, E., Chen, X., Revilak, S.: "The star schema benchmark and augmented fact table indexing", In TPCTC, pages 237-252, 2009. [42] Geographic privacy aware Knowledge Discovery and Delivery, Damiani, Vangenot, Frentzos, Marketos, Theodoridis, Veryklos, and Raffaeta (2007) [43] Kim, J., Kang, S., Kim, M. Effective Temporal Aggregation using Point-based Trees. DEXA, 1999. [44] Yang, J., Widom, J. Incremental Computation of Temporal Aggregates. ICDE, 2001. [45] Harinarayan V., Rajaraman A., Ullman J. Implementing Data Cubes Efficiently.ACM SIGMOD, 1996. [46] Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications

(Data-Centric Systems and Applications) - Elzbieta Malinowski, Esteban Zimnyi ; Springer; 1st ed. 2008. Corr. 2nd printing edition (April 6, 2011) [47] Spatial Databases: A Tour - Shashi Shekhar, Sanjay Chawla ; Prentice Hall 1 edition (June 20, 2003) [48] Leticia I. Gmez, Bart Kuijpers, Bart Moelans, Alejandro A. Vaisman: A Survey of SpatioTemporal Data Warehousing ; International Journal of Data Warehousing and Mining 2009 [49] Spatial Data Warehousing for Hospital Organizations : An ESRI whitepaper [50] Octavio Glorio, Jose-Norberto Mazn, Irene Garrigs, Juan Trujillo - Using Web-based Personalization on Spatial Data Warehouses [51] Michael McGuirea, Aryya Gangopadhyayb, Anita Komlodib, Christopher Swanc - A usercentered design for a spatial data warehouse for data exploration in environmental research [52] S. Zlatanova, J.E. Stoter and W.Quak : Management of multiple representations in spatial DBMSs [53] http://en.wikipedia.org/wiki/Online_analytical_processing [54] http://terra.nasa.gov/ [55] http://www.spatial-eye.com/Engels/Applications/Spatial-DWH/page.aspx/117 [56] http://docs.oracle.com/cd/B28359_01/appdev.111/b28400/sdo_intro.htm#BABIDJJB

You might also like