You are on page 1of 61

Scalable Data Management with DB2

Matthias Nicola IBM Silicon Valley Lab mnicola@us.ibm.com

Scalable Data Management with DB2 Matthias Nicola IBM Silicon Valley Lab mnicola@us.ibm.com © 2009 IBM Corporation

© 2009 IBM Corporation

Agenda

  • Introduction

  • DB2 Scalability for OLTP and Data Warehousing

  • DB2's Database Partitioning Feature (DPF)

Overview Data partitioning, clustering, placement Join Methods

  • TPoX Scalability in a DPF database

Scalability vs. Performance Benchmark configuration & results

  • pureScale Overview

  • Summary

DB2 Data Server Editions

DB2 Data Server Editions DB2 for z/OS DB2 Workgroup Edition DB2 Express-C (free!) DB2 Everyplace DB2

DB2 for z/OS

DB2 Workgroup Edition DB2 Express-C (free!) DB2 Everyplace
DB2 Workgroup Edition
DB2 Express-C (free!)
DB2 Everyplace

DB2 Enterprise Edition /

IBM InfoSphere Warehouse

DB2 Data Server Editions DB2 for z/OS DB2 Workgroup Edition DB2 Express-C (free!) DB2 Everyplace DB2

DB2

Business Value of Scalability

  • More historical data = more precise forecasts

Data mining needs a lot of data for pattern accuracy OLAP needs a lot of data for forecast accuracy

  • Predictable costs when growth occurs

Often the budget is the controlling factor, not technology Low maintenance cost is important

  • No forced migrations from technology limitations

Enabling very large databases

DB2 Scalability for OLTP and Data Warehousing

  • Database Partitioning Feature (DPF)

  • DB2 pureScale

  • Range partitioning

  • Multi-Dimensional Clustering (MDC)

  • Compression

  • Self-Tuning Memory Management (STMM)

  • Automatic Storage

  • Workload Management

  • High Availability, Disaster Recovery

  • Recovery

  • Security and Compliance

  • Utilities: Load, Backup & Restore, Redistribute

  • Archiving

  • etc.

Agenda

  • Introduction

  • DB2 Scalability for OLTP and Data Warehousing

  • DB2's Database Partitioning Feature (DPF)

Overview Data partitioning, clustering, placement Join Methods

  • TPoX Scalability in a DPF database

Scalability vs. Performance Benchmark configuration & results

  • pureScale Overview

  • Summary

DB2's Database Partitioning Feature (DPF)

select … from table

DB2's Database Partitioning Feature (DPF) select … from table Tables FCM network Engine Engine Engine Engine
DB2's Database Partitioning Feature (DPF) select … from table Tables FCM network Engine Engine Engine Engine
Tables
Tables
DB2's Database Partitioning Feature (DPF) select … from table Tables FCM network Engine Engine Engine Engine
FCM network Engine Engine Engine Engine … data+log data+log data+log data+log Partition 1 Partition 2 Partition
FCM network
Engine
Engine
Engine
Engine
data+log
data+log
data+log
data+log
Partition 1
Partition 2
Partition 3
Partition n
Database
Database
  • Database is divided into multiple database partitions

  • Database partitions run on same or separate servers (shared-nothing)

  • Each partition has its own table spaces, log, configuration, etc.

  • Data is spread over N database partitions

  • Queries are executed in parallel on all database partitions

Flexible configuration options

  • Possible hardware configurations

    • All database partitions on a single machine (logical partitions)

      • easy exploitation of multi-core systems

    • All database partitions on separate machines (physical partitions)

    • Hybrid: multiple machines with several logical partitions on each

FCM (Fast Communication Manager)

Flexible configuration options  Possible hardware configurations  All database partitions on a single machine (logical
Flexible configuration options  Possible hardware configurations  All database partitions on a single machine (logical
Flexible configuration options  Possible hardware configurations  All database partitions on a single machine (logical
Flexible configuration options  Possible hardware configurations  All database partitions on a single machine (logical
Flexible configuration options  Possible hardware configurations  All database partitions on a single machine (logical
DB2 DB2 DB2 DB2 Partition Partition Partition Partition SMP server SMP server I/O Channels Storage server
DB2
DB2
DB2
DB2
Partition
Partition
Partition
Partition
SMP server
SMP server
I/O Channels
Storage server
Flexible configuration options  Possible hardware configurations  All database partitions on a single machine (logical
Flexible configuration options  Possible hardware configurations  All database partitions on a single machine (logical
Flexible configuration options  Possible hardware configurations  All database partitions on a single machine (logical
Flexible configuration options  Possible hardware configurations  All database partitions on a single machine (logical
DB2 DB2 DB2 DB2 Partition Partition Partition Partition SMP server SMP server I/O Channels Storage server
DB2
DB2
DB2
DB2
Partition
Partition
Partition
Partition
SMP server
SMP server
I/O Channels
Storage server

Example: 4 physical machines, 2 database partitions per machine

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.partition.doc/doc/c0004569.html

DB2's Database Partitioning Feature (DPF)

…. ….
….
….

The Distribution Map

Distribution key can consist of one

or multiple columns.

Avoid low cardinality columns, such as

"gender", "state", etc.

Distribution key

Unique indexes must contain all columns

of the distribution key

column name C1 column value 000120
column name
C1
column value
000120

DB2 hash algorithm

The Distribution Map • Distribution key can consist of one or multiple columns. • Avoid low

5

Distribution map

i

p(i)

0

1

2

  • 3 4

 

5

   
  • 6

    • 7

 

32k

1

2

3

  • 4 1

 

2

   
  • 3

    • 4

   
Partition1 Partition2 Partition3 Partition4
Partition1
Partition2
Partition3
Partition4

Agenda

  • Introduction

  • DB2 Scalability for OLTP and Data Warehousing

  • DB2's Database Partitioning Feature (DPF)

Overview Data partitioning, clustering, placement Join Methods

  • TPoX Scalability in a DPF database

Scalability vs. Performance Benchmark configuration & results

  • pureScale Overview

  • Summary

Single Server

Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server
Single Server

DB2 Database Partitioning Feature = Divide Work

Database Partition 1

Database Partition 1
Database Partition 1
Database Partition 1
Database Partition 1
Database Partition 1
Database Partition 1
Database Partition 1

Database Partition 2

Database Partition 2
Database Partition 2
Database Partition 2
Database Partition 2
Database Partition 2
Database Partition 2
Database Partition 2

Database Partition 3

Database Partition 3
Database Partition 3
Database Partition 3
Database Partition 3
Database Partition 3
Database Partition 3
Database Partition 3

Range Partitioning Further Reduces I/O

Database Partition 1 Database Partition 2 Database Partition 3 January CREATE TABLE sales (recordID salesdate INT,
Database Partition 1
Database Partition 2
Database Partition 3
January
CREATE TABLE sales (recordID
salesdate
INT,
DATE,
...
details
XML)
February
DISTRIBUTE BY HASH (recordID)
PARTITION BY RANGE (salesdate) EVERY 1 MONTHS ;
March

Multi-Dimensional Clustering to Further Reduce I/O

Database Partition 1 Database Partition 2 Database Partition 3 January CREATE TABLE sales (recordID salesdate productID
Database Partition 1
Database Partition 2
Database Partition 3
January
CREATE TABLE sales (recordID
salesdate
productID
INT,
DATE,
INTEGER,
storeID
INTEGER,
...
details
XML)
February
DISTRIBUTE BY HASH (recordID)
PARTITION BY RANGE (salesdate) EVERY 1 MONTHS
ORGANIZE BY (productID, storeID) ;
March

Compression Reduces I/O by a Factor of 3x to 4x

 

Database Partition 1

   

Database Partition 2

   

Database Partition 3

 
Database Partition 1 Database Partition 2 Database Partition 3 January February March
Database Partition 1 Database Partition 2 Database Partition 3 January February March
Database Partition 1 Database Partition 2 Database Partition 3 January February March

January

January
January
 
January
January
 
January
January
 

February

February
February
 
February
February
 
February
February
 

March

March
March
 
March
March
 
March
March
 

Data Partitioning and Placement Options

Can distribute a table across some or all database partitions. Can replicate a table to have an identical copy on each partition.

Part. 1 Table 3: Product
Part.
1
Table 3:
Product

Database Partitions

Part.

 

Part.

 

Part.

 

Part.

 

Part.

 

Part.

 

Part.

2

3

4

5

6

7

8

Table

1:

Sales

Table 2:

Customer

Table 3:

Table 3:

Table 3:

Table 3:

Table 3:

Table 3:

Table 3:

Product

Product

Product

Product

Product

Product

Product

(copy)

(copy)

(copy)

(copy)

(copy)

(copy)

(copy)

Agenda

  • Introduction

  • DB2 Scalability for OLTP and Data Warehousing

  • DB2's Database Partitioning Feature (DPF)

Overview Data partitioning, clustering, placement Join Methods

  • TPoX Scalability in a DPF database

Scalability vs. Performance Benchmark configuration & results

  • pureScale Overview

  • Summary

Join Processing - Example

create table tab1(pk1 int, c1 int, distribute by hash (pk1);

...

)

create table tab2(pk2 int, c2 int, distribute by hash (pk2);

...

)

Logical data in the tables:

tab1

tab2

pk1 c1

pk2 c2

1

3

3

2

2

3

4

8

3

4

5

3

7

7

7

4

8

12

8

15

11

10

10

10

12

15

12

12

 

15

7

Physical data distribution:

distribute by hash*
distribute by hash*
 

database

 

database

 
 

partition 1

partition 2

 

tab1

tab2

 

tab1

tab2

pk1 c1

pk2 c2

pk1 c1

pk2 c2

1

3

3

2

2

3

4

8

3

4

5

3

8

12

8

15

7

7

7

4

12

15

10

10

11

10

15

7

 

12

12

Collocated Join

create table tab1(pk1 int, c1 int,

)

distribute by hash (pk1);

create table tab2(pk2 int, c2 int,

)

distribute by hash (pk2);

select * from tab1, tab2 where tab1.pk1 = tab2.pk2;

Both tables are partitioned by the join key

Any join matches are guaranteed to be within any given partition

("co-located") No join matches across partitions

Allows local joins within each partition, no data movement

Best case, best performance

partition 1

 

partition 2

tab1

tab2

 

tab1

tab2

pk1

pk2

pk1

pk2

1

3

  • 2 4

3

3 5 8 8

5

8 8
  • 8 8

7

7

12

10

11

15

12

Directed Join

select * from tab1, tab2 where tab1.c1 = tab2.pk2;

permanent storage

on the fly / in memory

partition 1 partition 2 partition 1 partition 2 tab1 tab2 tab1 tab2 tab1' tab2 tab1' tab2
partition 1
partition 2
partition 1
partition 2
tab1
tab2
tab1
tab2
tab1'
tab2
tab1'
tab2
pk1 c1
pk2
pk1 c1
pk2
pk1 c1
pk2
pk1 c1
pk2
1
3
3
2
3
4
1
3
3
3
4
4
3
4
5
8
12
8
2
3
5
11
10
8
DTQ
7
7
7
12
15
10
7
7
7
8
12
10
11
10
15
12
12
15
15
12

Send rows from tab1 to those partitions where they can find join matches in tab2, i.e. redistribution of tab1, based on hashing of the join key c1.

Single Partition Directed Join

select * from tab1, tab2 where tab1.c1 = 3 and tab1.c1 = tab2.pk2;

partition 1

partition 2

partition 1

tab1 tab2 tab1 tab2 pk1 c1 pk2 pk1 c1 pk2 1 3 3 2 3 4
tab1
tab2
tab1
tab2
pk1 c1
pk2
pk1 c1
pk2
1
3
3
2
3
4
3
4
5
8
12
8
DTQ
7
7
7
12
15
10
11
10
15
12
tab1' tab2' pk1 c1 pk2 1 3 3 2 3
tab1'
tab2'
pk1 c1
pk2
1
3
3
2
3

partition 2

Single Partition Directed Join select * from tab1, tab2 where tab1. c1 = 3 and tab1.

Value predicates are used to optimize (reduce) the data flow

and eliminate irrelevant partitions from the join processing.

Repartitioned Join

select * from tab1, tab2 where tab1.c1 = tab2.c2;

partition 1 partition 2 partition 1 partition 2 tab1 tab2 tab1 tab2 tab1' tab2' tab1' tab2'
partition 1
partition 2
partition 1
partition 2
tab1
tab2
tab1
tab2
tab1'
tab2'
tab1'
tab2'
pk1 c1
pk2 c2
pk1 c1
pk2 c2
pk1 c1
pk2 c2
pk1 c1
pk2 c2
1
3
3
2
2
3
4
8
1
3
5
3
3
4
3
2
3
4
5
3
8
12
8
15
2
3
15
7
11
10
7
4
DTQ
7
7
7
4
12
15
10
10
7
7
4
11
8
12
10
10
11
10
15
7
12
12
12
15
8
15
12
12
DTQ

Redistribute both tables by hashing on their join keys so that matching rows end up on the same partition.

Broadcast Join

select * from tab1, tab2

partition 1

partition 2

partition 1

partition 2

tab1

tab2

 

tab1

tab2

 

tab1'

tab2

 

tab1'

tab2

pk1

pk2

pk1

pk2

pk1

pk2

pk1

pk2

  • 1 3

  • 3 5

  • 2 4

  • 8 8

BTQ
BTQ
  • 1 1

1 1 3 3
  • 3 3

  • 3 4

3 4 5 8
  • 5 8

  • 7 7

12

10

  • 7 7

  • 7 10

11

15

12

11

15

11

12

     

2

2

8

8

12

12

Broadcast a copy of one table to all database partitions.

Data Placement Option: Replicated Table

permanent storage

 

partition 1

partition 2

tab1

tab2

 

tab1(copy)

tab2

pk1

pk2

pk1

pk2

1

3

  • 1 4

3

5

  • 3 8

7

7

  • 7 10

11

15

11

12

2

2

8

8

12

12

Good choice for small tables with infrequent insert/update/delete activity, such as dimension tables in a star schema.

Agenda

  • Introduction

  • DB2 Scalability for OLTP and Data Warehousing

  • DB2's Database Partitioning Feature (DPF)

Overview Data partitioning, clustering, placement Join Methods

  • TPoX Scalability in a DPF database

Scalability vs. Performance Benchmark configuration & results

  • pureScale Overview

  • Summary

Scalability vs. Performance

  • Performance: Time to complete a given task with given resources

  • Scalability: Ability to add resources to

complete the same task more quickly handle a bigger task in about the same time

  • Example: Mowing the lawn…

Peter does it alone in 8 hours Peter and Bob work together and take 4 hours

  • Scalability is perfect, performance is poor!

Jim does it alone in 1 hour

Jim and John together do it in 1hrs20min

  • Performance is great, scalability is awful !

Mary mows the lawn in 30 minutes

Mary and Susan together need 15 minutes

Scalability vs. Performance  Performance: Time to complete a given task with given resources  Scalability:
Scalability vs. Performance  Performance: Time to complete a given task with given resources  Scalability:
Scalability vs. Performance  Performance: Time to complete a given task with given resources  Scalability:
  • Performance is great, scalability is also great !

Scalability Metrics

Fixed Database Size Query elapsed time
Fixed Database Size
Query elapsed time

# of partitions

Mathematically, these

two approaches are

equivalent….

Increasing Database Size Query elapsed time
Increasing Database Size
Query elapsed time

Database size & # of partitions

Make queries against a DB of a fixed

size faster by adding partitions (“speedup”). Amount of data per partition shrinks.

Hold response time constant for a

growing database by adding partitions in proportion (“scaleup”/"scale-out"). Amount of data per partition remains constant.

Basic assumption: Queries executed against a bigger database examine more data

30

Scalability Metrics Fixed Database Size Query elapsed time # of partitions Mathematically, these two approaches are

Our Test Design

Query elapsed time
Query elapsed time

n partitions

250GB

n*2 partitions

500GB

n*4 partitions

1 TB

  • Increasing database size: 250GB / 500GB / 1TB

  • Increasing number of database

partitions

  • Fixed ratio of data volume to number of partitions

  • Show constant query elapsed times to prove scalability

31

Our Test Design Query elapsed time n partitions 250GB n*2 partitions 500GB n*4 partitions 1 TB

TPoX Benchmark

  • TPoX = Transaction Processing over XML Data

  • Open Source Benchmark: http://tpox.sourceforge.net/

  • Financial transaction processing scenario: “online brokerage”

  • Realistic test for XML databases

Customers Customers Brokerage Brokerage House House DB DB
Customers
Customers
Brokerage
Brokerage
House
House
DB
DB
TPoX Benchmark  TPoX = T ransaction P rocessing o ver X ML Data  Open
Custacc 1 1 n 1 n Customer Account Holding 1 n 1 n n 1 Order
Custacc
1
1
n
1
n
Customer
Account
Holding
1
n
1
n
n
1
Order
Security
FIXML
CustAcc.xsd
Security.xsd
(41 XSD
files)
4 – 20 kb
1 – 2 kb
2 – 9 kb

FIXML: Standardized Financial XML Schema for Securities Trading !

32

TPoX Benchmark  TPoX = T ransaction P rocessing o ver X ML Data  Open
 ID  ID  Name CustAcc Order Security  Symbol  DateOfBirth  Name 
ID
ID
Name
CustAcc
Order
Security
Symbol
DateOfBirth
Name
Address
ID
SecurityType
Phone
OrignDt
SecurityInformation
Account
ID
TrdDt
StockInformation
Currency
Acct
Sector
OpeningDate
Side
Industry
Balance
Qty
Category
Sym
OutstShares
Holding
Symbol
FundInformation
Name
FundFamily
Type
Sector
Quantity
Industry
Holding
AssetGroup
Symbol
FixedIncome
Name
Type
ExpenseRatio
Quantity
TotalAssets
MinInitialInvestment
Holding…
MinSubsequentInvest.
Price/LastTrade
Account
ID
Ask/Bid
Currency
OpeningDate
50DayAvg
Balance
200DayAvg
Holding 
Symbol
Name
Type
Quantity
33

TPoX Data & Schema

FIXML: financial industry XML Schema

CustAcc: modeled after a real banking system

that uses XML

Security: information similar to investment web sites

TPoX Data & Schema FIXML : financial industry XML Schema CustAcc : modeled after a real
1 1 1 n n Customer Account Holding n 1 1 n n 1 Order Security
1
1
1
n
n
Customer
Account
Holding
n
1
1
n
n
1
Order
Security
FIXML
CustAcc.xsd
Security.xsd
(41 XSD
files)

Database schema for a non-DPF DB2 database:

  • create table custacc ( cadoc XML )

  • create table security ( sdoc XML )

  • create table order ( odoc XML )

  • Scale Factor “M”, 1 TB raw data

  • 500M Order documents, 50M CustAcc documents

  • 20,833 Securities, independent of scale factor

  • 3 Simple Tables + XML Indexes

TPoX Data & Schema FIXML : financial industry XML Schema CustAcc : modeled after a real

TPoX Database Schema for DPF

  • - Extract certain XML element values into relational cols as distribution keys

  • - Goal: enable partitioning of both tables by a common key

 ID  Name CustAcc  DateOfBirth  Address  Phone  …  Account 
ID
Name
CustAcc
DateOfBirth
Address
Phone
Account
ID
Currency
OpeningDate
Balance
Holding
Symbol
!
Name
Type
Quantity
Order
Holding
Symbol
Name
Type
ID
Quantity
Holding…
OrignDt
TrdDt
Account
ID
Currency
Acct
OpeningDate
Balance
Side
Qty
Holding
Symbol
Name
Sym
Type
Quantity
custid
secsym
odoc
custid
cdoc
integer
varchar
XML
integer
XML

order table (500M rows)
35

custacc table (50M rows)

TPoX Database Schema for DPF - Extract certain XML element values into relational cols as distribution

What is TPoX-DSS*?

Decision Support workload on top of the regular XML data of the TPoX benchmark

A set of complex SQL/XML queries

Includes massive table scans, aggregation, grouping, OLAP functions, etc.

Focus on single-user query response time

* we might come up with a better name in the near future
36

What is TPoX-DSS*? • Decision Support workload on top of the regular XML data of the

Business Questions Complex SQL/XML Queries

Q1: Popular Securities

Find securities that have more shares bought than sold across all orders.

List their order quantities grouped by year.

Q2: Top 10 Most Popular Trading Weeks, Ranked by Order Volume

For each year, find the ten most active weeks and return the buy, sell, and total order volumes for each week.

Q3: Average Account Balance of Premiun Customers

Calculate the average account balance of all premium customers, grouped by their number of accounts.

Q4: Average Balance per Number Of Accounts

Calculate the average account balance of all customers, grouped by their number of accounts.

Q5: Percentage of buy orders per sector and gender

37

For each stock in a given sector of securities, find the percentage of buy orders placed by male vs. female clients.

Business Questions  Complex SQL/XML Queries Q1: Popular Securities Find securities that have more shares bought

Business Questions Complex SQL/XML Queries

Q6: Max Stock Orders for an Industry

List the 20% (or: x%) most expensive orders for customer in a given state and

for a given industry (subset of securities).

Q7: Order Amounts for Two Major Currencies

Calculate the min, max and avg order amount for all orders in a given timeframe grouped by buy/sell for two major currencies.

Q8: Order Amounts for All Currencies

Calculate the min, max and avg order amount for all orders in a given

timeframe grouped by buy/sell and the order’s currency.

Q9: Balance per Currency

Each account is in a specific currency. Calculate the average account balance for each currency.

Q10: Sleeping Customers

Find all customers having less than x orders in a given timeframe.

38

Business Questions  Complex SQL/XML Queries Q6: Max Stock Orders for an Industry List the 20%

TPoX DSS: Query Characteristics

Query

Tables

Characteristics

Q1

Popular Securities

O, S

  • 2 x XMLTABLE,

Group By, Order By

Q2

Top 10 Most Popular Trading Weeks

O

Full scan of all orders,

OLAP Function rank()

Q3

Average Account Balance of Premiun

C

Indexed access to premium

Customers

customers, Group By, Order By

Q4

Average Balance per Number Of Accounts

C

Full scan of all customers

Q5

Percentage of buy orders per sector and

C, O, S

Aggregation, SQL OLAP Functions,

gender

  • 3 x XMLTABLE, 2 x XMLEXISTS

Q6

Max Stock Orders for an Industry

C, O, S

  • 2 x XMLTABLE, 2 x XMLEXISTS

Q7

Order Amounts for Two Major Currencies

O

Several predicates, CASE expression

Q8

Order Amounts for All Currencies

O

  • 4 aggregation functions,

Group By two XML attributes

Q9

Balance per Currency

C

Full scan of all accounts, aggregation

and grouping

Q10

Sleeping Customers

C, O

Common table expression

39

All queries available upon request, in SQL/XML notation.

TPoX DSS: Query Characteristics Query Tables Characteristics Q1 Popular Securities O, S 2 x XMLTABLE, Group

Q5: Percentage of buy orders per sector and gender

SELECT DISTINCT secsector, gender, SUM(ordqty) OVER (PARTITION BY secsector, gender) AS orderqty, SUM(ordqty) OVER (PARTITION BY secsector, gender) * 100 / SUM(ordqty) OVER (PARTITION BY secsector) AS percentage FROM security, order, custacc, XMLTABLE(' declare namespace s="http://tpox-benchmark.com/security"; $SDOC/s:Security' COLUMNS secsector VARCHAR(30) PATH '*:SecurityInformation//*:Sector', secname VARCHAR(50) PATH '*:Name') AS T1, XMLTABLE(' declare default element namespace "http://www.fixprotocol.org/FIXML-4-4"; $ODOC/FIXML/Order'

COLUMNS ordqty BIGINT

PATH '*:OrdQty/@Qty') AS T2,

XMLTABLE(' declare namespace c="http://tpox-benchmark.com/custacc";

$CADOC/c:Customer' COLUMNS gender VARCHAR(10) PATH '*:Gender') AS T3 WHERE order.secsym = security.secsym AND

order.custid = custacc.custid

AND

XMLEXISTS(' declare namespace s="http://tpox-benchmark.com/security"; $SDOC/s:Security/s:SecurityInformation/*[s:Industry="OfficeSupplies" and

s:MinInitialInvestment=5000]')

AND XMLEXISTS(' declare default element namespace "http://www.fixprotocol.org/FIXML-4-4"; $ODOC/FIXML/Order[@Side = "2"]') ORDER BY secsector, gender;

40

Q5: Percentage of buy orders per sector and gender SELECT DISTINCT secsector, gender, SUM (ordqty) OVER

Agenda

  • Introduction

  • DB2 Scalability for OLTP and Data Warehousing

  • DB2's Database Partitioning Feature (DPF)

Overview Data partitioning, clustering, placement Join Methods

  • TPoX Scalability in a DPF database

Scalability vs. Performance Benchmark configuration & results

  • pureScale Overview

  • Summary

Data Partitioning in a Cluster

Each node has 2 Intel Xeon 5169 dual-core CPUs, and 32GB RAM. 4 cores per node we use 4 database partitions per node.

Data Partitioning in a Cluster Each node has 2 Intel Xeon 5169 dual-core CPUs, and 32GB

45

8 processing nodes

Data Partitioning in a Cluster Each node has 2 Intel Xeon 5169 dual-core CPUs, and 32GB
  • 2 3

8 database partitions, 250GB

16 database partitions, 500 GB

32 database

  • 5 6

partitions, 1TB

  • 7 8

Data Partitioning in a Cluster Each node has 2 Intel Xeon 5169 dual-core CPUs, and 32GB

Scalability Results: Cluster

Source: IBM internally measured results, September 2009 Q1 Elapsed time (seconds) Query Q10 Q9 Q8 Q7
Source: IBM internally measured results, September 2009
Q1
Elapsed time (seconds)
Query
Q10
Q9
Q8
Q7
Q6
Q5
Q4
Q3
1TB / 32 partitions
250GB / 8 partitions
500GB / 16 partitions
TPoX/DSS Query Response Times (Cluster)
Q2

47

Query response times for 500GB and 1TB are close to the 250GB results!

Scalability Results: Cluster Source: IBM internally measured results, September 2009 Q1 Elapsed time (seconds) Query Q10

Agenda

  • Introduction

  • DB2 Scalability for OLTP and Data Warehousing

  • DB2's Database Partitioning Feature (DPF)

Overview Data partitioning, clustering, placement Join Methods

  • TPoX Scalability in a DPF database

Scalability vs. Performance Benchmark configuration & results

  • pureScale Overview

  • Summary

DB2 pureScale

Goals

  • Unlimited Capacity

DB2 pureScale Goals  Unlimited Capacity  Any transaction processing or ERP workload  Start small

Any transaction processing or ERP workload Start small Grow easily, with your business

  • Application Transparency

Avoid the risk and cost of tuning your applications to the database topology

  • Continuous Availability

Maintain service across planned and unplanned events

Webcast: http://www.channeldb2.com/video/db2-purescale-a-technology

Web site: http://www.ibm.com/software/data/db2/linux-unix-windows/editions-features-purescale.html

DB2 pureScale : Technology Overview

Clients
Clients

Single Database View

Member Member Member Member CS CS CS CS CS Cluster Interconnect CS CS 2 nd -ary
Member
Member
Member
Member
CS
CS
CS
CS
CS
Cluster Interconnect
CS
CS
2
nd -ary
Log
Log
Log
Log
Primary
Shared Storage Access
Database

Clients connect anywhere,… … see single database

Clients connect into any member

Automatic load balancing and client reroute may change underlying physical member to which client is connected

DB2 engine runs on several host computers

Co-operate with each other to provide coherent access to the

database from any member

Integrated cluster services

Failure detection, recovery automation, cluster file system

In partnership with STG (GPFS,RSCT) and Tivoli (SA MP)

Low latency, high speed interconnect

Special optimizations provide significant advantages on RDMA- capable interconnects (eg. Infiniband)

PowerHA pureScale technology

Efficient global locking and buffer management

Synchronous duplexing to secondary ensures availability

Data sharing architecture

Shared access to database

Members write to their own logs

Logs accessible from another host (used during recovery)

Scale with Ease

  • Without changing applications

Efficient coherency protocols designed to scale without application change

Applications automatically and transparently workload balanced across members

  • Without administrative complexity

No data redistribution required

  • To 128 members in initial

release

Limited by testing resources

Scale with Ease  Without changing applications  Efficient coherency protocols designed to scale without application

Single Database View

Scale with Ease  Without changing applications  Efficient coherency protocols designed to scale without application
 
DB2
DB2
DB2
DB2
DB2
DB2
DB2
DB2
DB2
DB2
 
           
Log Log Log Log Log
Log
Log
Log
Log
Log

What is a PowerHA pureScale ?

  • Software technology that assists in global buffer coherency management and global locking

Derived from System z Parallel Sysplex & Coupling Facility technology

Software based

  • Services provided include

Group Bufferpool (GBP)

Global Lock Management (GLM)

Shared Communication Area (SCA)

  • Members duplex GBP, GLM, SCA state to both a primary and secondary

Done synchronously

Duplexing is optional (but recommended)

Set up automatically, by default

bufferpool(s) db2 agents & other threads log buffer, dbheap, & other heaps
bufferpool(s)
db2 agents & other
threads
log buffer,
dbheap, &
other heaps
bufferpool(s) db2 agents & other threads log buffer, dbheap, & other heaps
bufferpool(s)
db2 agents & other
threads
log buffer,
dbheap, &
other heaps
Log
Log

Primary

Log
Log
GBP GLM SCA
GBP GLM SCA

Secondary

Shared database (Single database partition)
Shared database
(Single database partition)

The Role of the GBP

  • GBP acts as fast disk cache

Dirty pages stored in GBP, then later,

written to disk

Provides fast retrieval of such pages when

needed by other members

  • GBP includes a “Page Registry”

Keeps track of what pages are buffered in

each member and at what memory

address

Used for fast invalidation of such pages

when they are written to the GBP

  • Force-at-Commit (FAC) protocol ensures coherent access to data across members

DB2 “forces” (writes) updated pages to

GBP at COMMIT (or before)

GBP synchronously invalidates any copies

of such pages on other members

  • New references to the page on other members will retrieve new copy from GBP

  • In-progress references to page can continue

Select from T1 Client C : where C2=Y Select from T1 where C2=Y Member 2
Select from T1
Client C :
where C2=Y
Select from T1
where C2=Y
Member 2
bufferpool(s)
bufferpool(s)

Client B :

Update T1 set C1=X

Commit

where C2=Y

The Role of the GBP  GBP acts as fast disk cache  Dirty pages stored

Member 1

bufferpool(s)

bufferpool(s)
GBP GLM SCA Page Registry M1 M2 M2
GBP
GLM
SCA
Page
Registry
M1
M2
M2
The Role of the GBP  GBP acts as fast disk cache  Dirty pages stored

Stealth System Maintenance

Stealth System Maintenance  Goal: allow DBAs to apply system maintenance without negotiating an outage window
  • Goal: allow DBAs to apply system maintenance without negotiating an outage window

  • Procedure:

    • 1. Drain (aka Quiesce)

    • 2. Remove & Maintain

    • 3. Re-integrate

    • 4. Repeat until done

  • Enables continuous availability

  • Single Database View

    Stealth System Maintenance  Goal: allow DBAs to apply system maintenance without negotiating an outage window
    Stealth System Maintenance  Goal: allow DBAs to apply system maintenance without negotiating an outage window
    DB2 DB2 DB2 DB2 Log Log Log Log
    DB2
    DB2
    DB2
    DB2
    Log
    Log
    Log
    Log

    Achieving Efficient Scaling : Key Design Points

    • Deep RDMA exploitation over low latency fabric

    Enables round-trip response time

    ~10-15 microseconds

    • Silent Invalidation

    Informs members of page updates

    requires no CPU cycles on those

    members

    No interrupt or other message

    processing required

    Increasingly important as cluster grows

    • Hot pages available without disk I/O from GBP memory

    RDMA and dedicated threads enable

    read page operations in

    ~10s of microseconds

    Lock Mgr Lock Mgr Lock Mgr Lock Mgr Buffer Mgr GBP GLM SCA
    Lock Mgr
    Lock Mgr
    Lock Mgr
    Lock Mgr
    Buffer Mgr
    GBP
    GLM
    SCA
    Achieving Efficient Scaling : Key Design Points  Deep RDMA exploitation over low latency fabric 

    of Transaction Throughput

    Questions / Discussion

    mnicola@us.ibm.com

    Backup

    Slides

    Features to Minimize Planned Outages

    • Backup: Fast, scalable, granular

    Online or offline Fully parallel and scalable Can be throttled Partition-level backup Table space-level backup Full, Incremental, or Delta Volume snapshot support

    • Load: Fast, scalable and granular

    Fully parallel and scalable Partition-level Online load Online index rebuild

    • Automatic log management

    • Other utilities

    Online statistics collection Online index create and reorganization Online reorganization Online inspect

    • Dynamic operations

    Configuration parameters Buffer pool operations Container operations

    • Space management

    Online container management Automatic storage Online index reorganization

    Features to Minimize Unplanned Outages

    • Hardware failures

    Integration with TSA cluster manager Built-in redundancy can't be turned off Consistency bits Log mirroring Automatic mirroring of critical data files Support for RAID

    • Fast recovery

    Continuous check pointing Parallel recovery Automatic recovery tuning Filtered recovery Dynamic debugging capability

    • High availability

    Clustering / failover support Integrated with TSM Automatic client reroute

    • Human and Application Errors

    Point-in-Time (POT) recovery Drop table recovery

    • Miscellaneous

    Infinite active logging Online container operations

    OLAP Optimization Advisor

    • InfoSphere Warehouse will design the aggregates to support dimensional analysis for you using:

    Hybrid line Statistics Meta-data that describes the cubes

    Hierarchies, dimensions, measures, etc.

    Optimizes to understand impact to load times and performance trade-off

    OLAP Optimization Advisor  InfoSphere Warehouse will design the aggregates to support dimensional analysis for you

    Universal Cubing Services Access

    Portals, Web Applications, Dashboards, Interactive Reports, Ad Hoc Analysis, Common Desktop Tools
    Portals, Web Applications, Dashboards, Interactive Reports,
    Ad Hoc Analysis, Common Desktop Tools

    IBM Cognos 8 BI

    Universal Cubing Services Access Portals, Web Applications, Dashboards, Interactive Reports, Ad Hoc Analysis, Common Desktop Tools
    Universal Cubing Services Access Portals, Web Applications, Dashboards, Interactive Reports, Ad Hoc Analysis, Common Desktop Tools

    IBM DataQuant & DB2 QMF

    Universal Cubing Services Access Portals, Web Applications, Dashboards, Interactive Reports, Ad Hoc Analysis, Common Desktop Tools

    Microsoft Excel

    Universal Cubing Services Access Portals, Web Applications, Dashboards, Interactive Reports, Ad Hoc Analysis, Common Desktop Tools

    Cubeware Cockpit

    Universal Cubing Services Access Portals, Web Applications, Dashboards, Interactive Reports, Ad Hoc Analysis, Common Desktop Tools
    Universal Cube Access (ODBO, XMLA) InfoSphere Warehouse
    Universal Cube Access
    (ODBO, XMLA)
    InfoSphere Warehouse

    InfoSphere Warehouse Data Mining

    Data Mining Embedded into Applications and Processes

    Data Mining Embedded into Applications and Processes SOA Processes BI Analytical Tools Web Analytical Apps Mining

    SOA Processes

    Data Mining Embedded into Applications and Processes SOA Processes BI Analytical Tools Web Analytical Apps Mining

    BI Analytical Tools

    Data Mining Embedded into Applications and Processes SOA Processes BI Analytical Tools Web Analytical Apps Mining

    Web Analytical Apps

    Data Mining Embedded into Applications and Processes SOA Processes BI Analytical Tools Web Analytical Apps Mining

    Mining Visualizer

    SQL Interface DB2 InfoSphere Warehouse Model Modeling • Enterprise-Level Data Results Mining • High-Speed, In-Database Scoring
    SQL Interface
    DB2 InfoSphere Warehouse
    Model
    Modeling
    • Enterprise-Level Data
    Results
    Mining
    • High-Speed, In-Database
    Scoring
    In-Database
    SQL
    Data Mining
    Scoring
    Structured &
    Unstructured
    Functions
    Data

    InfoSphere Warehouse Text Analytics

    • Analyze and extract structured data from text

    Makes data available to normal reporting and analysis tools From customer call center records, claim forms, etc.

    • Benefits

    Target specific information hidden within text Competitive edge by driving further business insight Drives a greater ROI for your applications

    • Business value examples

    Better product categorization Early warning on customer attrition Fraud detection Product defect analysis Better customer profiling

    Simple text analysis capabilities for text

    columns stored in warehouse tables

    Pattern matching rules and simple linguistics

    Enhance existing reports and data mining

    with insights gleaned from text

    Simple rules and dictionary editor

    InfoSphere Warehouse Design Studio

    • Leverage and extend InfoSphere Data Architect:

    Design and modify database physical models (schema & storage design, etc) Design and model OLAP objects Design and model warehouse transformation and mining flows

    Key Features:

    Database design, or reverse engineer an existing database or DDL (RDA)

    View/Modify the schema

    Compare/Sync DB objects

    Analyze design (best practices and

    dependencies), Validation

    DB2 Storage Modeling: Table Space,

    Buffer Pool, Partition

    Generate script & Deploy: on data models, and flow models

    Impact Analysis: on data models and flow models

    InfoSphere Warehouse Design Studio  Leverage and extend InfoSphere Data Architect: – Design and modify database

    What’s new in TPoX 2.0

    • TPoX 2.0 includes pervasive change to the benchmark

    • TPoX 2.0 test results not comparable to previous versions of TPoX

    Data Generator

    TPoX V1.3 and Earlier

    TPoX 2.0

    Based on Toxgene

    A single java based program

    3 rd party tool, lack of support

    Complete rewrite

    Slow (> 5 days for 1TB data)

    Fast (6 hours for 1TB data)

    Can’t generate dense account IDs

    Account IDs are now dense

    for CUSTACC

    Large amount of small XML files

    Small amount of larger files, each contains 50K XML documents

    Data Distribution

     

    TPoX V1.3

    TPoX 2.0

    and Earlier

    # of CUSTACC vs # of ORDER

    1:5

    1:10

    XML document size range

    1-20KB

    1-23KB

    ACCOUNT IDs of customer

    Not dense

    Dense

    Total XML document size of

    Slightly less

    Slight larger

    “100GB” scale

    than 100GB

    than 100GB

    avg # of accounts per customer

    1.5

    2.0

    Workload and WorkloadDriver

    TPoX V1.3 and Earlier

    TPoX 2.0

    Workload description file in proprietary format, hard to read

    Workload description file in XML format, easy to read and create

    WorkloadDriver reads input

    WorkloadDriver reads input

    documents from large amount of small files

    documents from smaller amount of larger files, improved performance for reading XML input

    documents

    Update transaction U1, U5 and U6 select account for update based on customer ID

    Update transaction U1, U5 and U6 select account for update based on account ID

    Changes have improved performance of

    generating and consuming TPoX XML data

    in large scale TPoX benchmarks !

    NOTE: please refer to TPoX V2.0 Release Note at http://sourceforge.net/projects/tpox for more detail

    72

    What’s new in TPoX 2.0  TPoX 2.0 includes pervasive change to the benchmark  TPoXhttp://sourceforge.net/projects/tpox for more detail 72 " id="pdf-obj-59-169" src="pdf-obj-59-169.jpg">
    • More information on XML data management in DB2 for Linux, UNIX, Windows and DB2 for z/OS

    • http://tinyurl.com/pureXML

     More information on XML data management in DB2 for Linux, UNIX, Windows and DB2 for

    73

     More information on XML data management in DB2 for Linux, UNIX, Windows and DB2 for