You are on page 1of 51

10g Automatic Workload

Management

Erik Peterson
Real Application Clusters Development
Agenda

y Paradigm Shift
y AWM Background & Benefits
y AWM Blueprints
Traditional Deployment - Silos

HR DW CRM Retail
Batch
Daytime

HR DW CRM Retail
Batch
Night – Xmas Season

HR DW CRM Retail
Batch
Night – Payday

HR DW CRM Retail
Batch
Grid - Daytime
idle

DW HR Retail
Batch

CRM
Grid – Night Payday
idle

DW Retail
Batch

HR
CRM
Grid – Night Xmas
idle

DW

Retail
Batch

HR CRM
What is Automatic Workload
Management?
y AWM is an abstraction that customers use to
divide work into logical workloads.
y Services is the first stage in AWM
y Each service represents a workload with
– common function
– common service level thresholds
– common priority
– (common resource footprint)
– e.g. mail server – iMap, postman, garbage
collector, monitor
Services Types
y Application services
– Functional
Sessions using a function are grouped together
e.g. SAP dialog and update functions
– Data dependent
Mapping of work requests to services occurs in the object
relational mapping layer and TP monitors.
Because the database is shared, ranges are fully dynamic
– TAF Pre-connect
y Internal services
– SYS$BACKGROUND
– SYS$USERS
Sample Work Sheet
Service Usage Priority Response time Preferred Available
(sec) – warning Instances Instances
/ critical

ERP Client service 1 0.5, 0.75 RAC01, RAC02 RAC03, RAC04

CRM Client service 2 0.5, 1.0 RAC03, RAC04 RAC01, RAC02

SELF_SERVIC Client service 2 1.0, 1.5 RAC01, -


E RAC02,
RAC03, RAC04

HOT_BATCH Job scheduler 3 RAC01 RAC02,


RAC03, RAC04

STD_BATCH Job scheduler 4 RAC01, -


RAC02,
RAC03, RAC04
Creating Services

y Services are maintained in the data dictionary


y Each service has attributes
– globally unique name
– service thresholds (response time, CPU/service)
– resource consumer group (ratios or priorities)
y HA business rules are maintained in OCR
– preferred configuration for availability
– TAF policy
– Data Guard site role
Creating Services

y Oracle 10g single instance


– DBMS_SERVICE to create and administer
services
y Oracle 10g RAC
– DBCA, NETCA or SRVCTL to create and
administer services
y EM or PL/SQL to create thresholds, consumer
groups, job classes, detail monitoring, traces
y See Configuration Guide
Using Services

Client-side usage
– TNS connect data - e.g for JDBC, Easy Connect -
scott/tiger@myservice
TNS names for OCI/Net
URL for thin JDBC, JDBC OCI
or maintained in the Oracle Internet Directory

Server-side usage
– Job class definition
– PQ/PDML – inherited from the query co-ordinator
Services are a unit for
performance
y A new level dimension for performance tuning
– workloads are visible and measurable
– tuning by “service and SQL” replaces “session and
SQL” in most systems where sessions are shared.
– Performance measures for real transactions
y Alerts and actions when performance goals are
violated
Automatic Workload Management
Provides Visibility
Service Measurement

Goal:
y Hands-free sharing of resources based on
business rules
– response time, availability, and priority, not on
physical hardware and software limitations
y The workload measurement features are fully
integrated with the Oracle 10g
– single instance and RAC environments.
Service Measurement

y End users complain about


– Response time problems
– Throughput problems
for the services that they use.
y Need to measure the REAL user experience
– by service for important functions, transparently
– superior to using synthetic queries, session
counts, run queue length
Service Thresholds and Alerts

y DBMS_SERVER_ALERT.SET_THRESHOLD
– SERVICE_ELAPSED_TIME
– SERVICE_CPU_TIME
– Warning and critical levels for observed periods
– Import from EM baselines
y Comparison of response time against
accepted minimum levels
– a desire for the wall clock time to be, at most, a
certain value.
Example Service Metrics
Service time - current
NAME ELA(s)/CALL CPU(s)/CALL
--------------------------- ------------ ------------
ERP 0.1940 0.0082

Service time - history every 60 seconds

NAME ELA(s)/CALL CPU (s)/CALL


--------------------------- ------------ ------------
ERP 0.1940 0.0082
0.2046 0.0085
0.2154 0.0093
0.2248 0.0105
0.2160 0.0097
0.2185 0.0104
0.2211 0.0104
30 service, module, action statistics
y user calls y workarea executions - optimal
y DB time – response time y workarea executions - onepass
y DB CPU – CPU/service y workarea executions - multipass
y parse count (total) y session cursor cache hits
y parse time elapsed y user rollbacks
y parse time cpu
y db block changes
y execute count
y gc cr blocks received
y sql execute elapsed time
y gc cr block receive time
y sql execute cpu time
y opened cursors cumulative y gc current blocks received
y session logical reads y gc current block receive time
y physical reads y cluster wait time
y physical writes y concurrency wait time
y redo size y application wait time
y user commits y user I/O wait time
Useful Service Views

y Service status in V$ACTIVE_SERVICES,


DBA_SERVICES, V$SESSION,
V$ACTIVE_SESSION_HISTORY
y Service performance in V$SERVICE_STATS,
V$SERVICE_EVENTS,
V$SERVICE_WAIT_CLASSES,
V$SERVICEMETRIC,
V$SERVICEMETRIC_HISTORY
y Service, MODULE, and ACTION performance in
V$SERV_MOD_ACT_STATS.
AWR Automatically Measures
Service
y AWR measures response time, resource used
– Automatically for work done in every service
y AWR monitors thresholds, sends AWR alerts
– response time, cpu used
– maintains runtime history every 30 minutes
y Statistics collection and tracing are HA
– persistent for service location / instance restart.
– enabled and disabled globally for RAC.
Service, Module, Action

y Each service can be qualified further by


MODULE, ACTION to identify important operations
– a user-explicable unit for measuring response time
and resource consumption.
– NO SQL*NET round trip using OCI in 10g
y DBMS_MONITOR.
SERV_MOD_ACT_STAT_ENABLE
– statistics and tracing for services, modules and
actions
Automatic Workload Management
Performance Tracking

• Set MODULE / ACTION using JDBC


• Note: no extra message exchange with server
• – setting “bundled” with call

String[] metrics = new


String[OracleConnection.END_TO_END_STATE_INDEX_MAX];
metrics[OracleConnection.END_TO_END_MODULE_INDEX] = "myModule";
metrics[OracleConnection.END_TO_END_ACTION_INDEX] = "myAction";
OracleConnection conn = ds.getConnection();
conn.setEndToEndMetrics(metrics, 0);
Automatic Workload Management
Performance Tracking

• Set MODULE / ACTION using OCI


• No extra message exchanged – “bundled”
OCIAttrSet(session, OCI_HTYPE_SESSION,(dvoid *)
“set salary", (ub4)strlen(“set salary"),
OCI_ATTR_ACTION, error_handle);

• Set MODULE / ACTION using PL/SQL


• Does require extra message exchanges
DBMS_APPLICATION_INFO.SET_MODULE 9
module_name => ‘add_employee’
action_name => ‘record contact info’);
Services are a unit for
availability
y Services are recovered and operated
– fast, independently, in parallel, and according to
business rules
– when instances are later repaired, services not
running are automatically restarted
– immediately a service changes state, a callout can
trigger application recovery and load balancing
– no need to start entire software stacks afresh
Services are a unit for
management
y Single system image for workloads
y Full location transparency
– number of instances is transparent to the application
y Each workload is managed in isolation
– configured, administered, enabled, disabled, measured
– Rolling changes by workload
y Prioritization for workloads
Everything switches to service

y Data dictionary maintains services


y AWR measures performance of services
y Database resource manager uses service in
place of users for priorities
y Job scheduler, PQ, and streams queues run
under services
y RAC keeps services available within site
y Data Guard Broker with RAC keeps primary
services available cross sites
Database Resource Manager

y DBMS_RESOURCE_MANAGER.
SET_CONSUMER_GROUP_MAPPING
– Automatically sets consumer groups for services at connect
time.
– alter session to change consumer groups within service
y Using ratios
– e.g. two-thirds of resource to payroll and one-third to CRM.
y Using priorities
– satisfy the highest priority services first, followed by the next
priority services, and so on.
Service High Availability using
RAC 10g
y Focus is on protecting the application services
– more flexible/more cost effective than other HA
that focus on availability of single physical systems
y Services are available continuously with load
shared across one or more instances.
y Any server in the RAC can offer services
– in response to failures
– in response to planned maintenance.
– in response to runtime demands
Handling Planned Outages

A rolling patch needs to be applied to each instance.


Use SRVCTL for all operations.
1. Relocate services running at instance-1, optionally
force the sessions to disconnect.
2. Stop instance-1. Optionally disable.
3. Change instance-1 to use the patched environment
and complete tests to verify correctness.
4. Enable, if disabled. Start instance-1.
5. Repeat for each other instance.
End to End HA

y Fast, out of band notification of service


changes
– service, database, instance, hostname, status,
reason, timestamp
– 3 states - up, down, not restarting
y Built into JDBC connection pools
y Application callouts for everyone
End to End HA - Example

y Instance 1 fails / failing ERP service


Event 1 : instance DOWN
Action – Log a fault ticket for instance 1
Event 2 : service ERP DOWN at instance 1
Action – Cleanup sessions using ERP at instance 1
y Service ERP starts at instance 3
Event 3 : service ERP UP at instance 3
Action – Balance sessions using ERP to start using
instance 3
AWM Blueprints
Retailer

Challenge
y Large DW Box
y Small OLTP Box
y Need HA for OLTP
Retailer
DW & OLTP on mixed sized nodes

Node-1 (16 CPUs)

Node-2
(4 CPUs)
DW
OLTP

OLTP’

Only OLTP will fail over to DW


Game Company – Login System

Challenge
y Need Session Stickiness
y Want Full Use of Resources
Game Company
Stickiness Sets

Node-1 Node-2 Node-3 Node-4

Initial A, B, C D, E, F G, H, I J, K, L

Alternate D’, G’, J’ A’, H’, K’ B’, E’, L’ C’, F’, L’

Sessions kept on same instance


All nodes fully utilized
Stock Exchange

Challenge
y 3 critical applications
y Want to share spare capacity
Stock Exchange
Singleton Services

Node-1 Node-2 Node-3 Node-4

Service A Preferred 0-100% 0-100% Alt


0-100%

Service B 0-100% Preferred 0-100% 0-100%


Alt

Service C 0-100% 0-100% Preferred 0-100%


Alt

0-100% 0-100% 0-100%

Gains: Dedicated Servers, Shared Alternative


Manufacturer - DW

Challenges
y Consolidation effort of 4 different systems
y Critical reports getting affected by Ad Hoc
Queries
y Users starting reporting before aggregation
has finished
Manufacturer - DW
Services by Priority
Shared Nodes

Node-1 Node-2 Node-3 Node-4

Critical P1 (50%) 0-100% 0-100% 0-100% 0-100%

Standard P2 (30%) 0-100% 0-100% 0-100% 0-100%

Adhoc P3 (100%) 0-100% 0-100% 0-100% 0-100%

Loads/Aggregation P2 (30%) 0-100% 0-100% 0-100% 0-100%

Gains: Manages Priorities, Visibility of Use,


Can turn off reporting during aggregation
Manufacturer DW
Implement Modules for Further Visibility

y Finance
y Services
y Sales

Gains: Greater visibility & control


Manufacturer DW
EM Screen View
Top Modules (by Service)

Top Services

critical sales (standard) finance (critical)


standard
services (critical) sales (adhoc)
adhoc
load finance (standard) finance (adhoc)

finance (load) sales (load)

Gains: Greater visibility & control


Hosting Provider

Challenges
y Need to provide any application with the
power it needs at the time it needs it
y Specific usage for any one application varies
strongly
y Dedicated servers are not cost effective
Hosting Provider
Hosting GRID
Shared Nodes

Node-1 Node-2 Node-3 Node-4 Node-5 Node-6 Node-7 Node-8

Service-A 0-100% 0-100% 0-100% 0-100


TBD 0-100
TBD 0-100
TBD 0-100
TBD 0-100
TBD

Service-B TBD
0-100 TBD
0-100 TBD
0-100 0-100% 0-100% 0-100% TBD
0-100 TBD
0-100

Service-C 0-100
TBD 0-100
TBD 0-100
TBD 0-100
TBD 0-100
TBD 0-100
TBD 0-100% 0-100%

Service-D 0-100% 0-100% 0-100% 0-100% 0-100% 0-100% 0-100% 0-100%

8nodes(8Shared) RAC per service on 8 nodes.


(32nodes needed without grid)
Getting the best out of Oracle
- Configuration
y Plan your services
– application to service, data range to service
– global name, HA configuration, priority, response
time
y Use service: not SID, not Instance, not Host
– Use service to connect
– Use virtual IP for database access
– Use cluster alias to eliminate address lists.
y Use service for jobs and PQ.
Getting the best out of Oracle
- Runtime
y Make applications measurable
– instrument with MODULE and ACTION
– use the DBMS_MONITOR to gather statistics
y For priorities – use resource manger
y For load balancing
– use CLB to balance connections by service.
– use service metrics to “deal requests” from mid-
tier connection pools by service.
Getting the best out of Oracle
- Recovery
y Use JDBC connection pools for fast failover.
– Surviving sessions continue FAST.
– Interrupted sessions detect the error FAST.
y Use TAF callbacks to trap and handle errors.
y Use HA callouts/events (up, down, not
restarting) to notify the application to take
appropriate action.
– Save and recall non-transactional state.
– Check transaction outcome and resubmit.

You might also like