Professional Documents
Culture Documents
Platinum
Services
Fault
Monitoring
What to Expect
Contents
Document Objective ..................................................................................................................................... 4
Overview ....................................................................................................................................................... 4
Remote Fault Monitoring .......................................................................................................................... 4
Fault Monitoring Framework ........................................................................................................................ 5
Key Components of the Gateway ............................................................................................................. 5
Managing the OASG................................................................................................................................. 6
Oracle Advanced Support Portal................................................................................................................... 6
Fault Monitoring Details ............................................................................................................................... 7
Customer Requirement and Obligations ................................................................................................... 7
Fault Monitoring Roles and Responsibilities ............................................................................................ 8
Activities ................................................................................................................................................... 9
Oracle Platinum Services Fault Monitoring Implementation Prerequisites .......................................... 9
Oracle Platinum Services Fault Monitoring Implementation ............................................................. 10
Oracle Platinum Services Fault Monitoring Event for Oracle Exadata and Oracle Zero Data Loss
Recovery Appliance ............................................................................................................................ 11
Activity ....................................................................................................................................................... 11
Who ............................................................................................................................................................ 11
When .......................................................................................................................................................... 11
Oracle Platinum Services Fault Monitoring Events for Oracle SuperCluster, Oracle Exalogic, and
Oracle Exadata .................................................................................................................................... 12
Appendix I Oracle Platinum Services Fault Monitoring Events .............................................................. 13
ASR Fault Events.................................................................................................................................... 13
OEM Fault Events................................................................................................................................... 13
*Related to Oracle Database component only ............................................................................................ 17
Appendix II Description of Common For-Fee Monitoring Items ............................................................ 17
Appendix III Access Requirements ......................................................................................................... 18
Oracle Access to Data ............................................................................................................................. 21
Appendix IV Process Flow Diagrams ......................................................................................................... 22
High-level Process Flow for Oracle SuperCluster, Oracle Exalogic, and Oracle Exadata ..................... 22
Overview
Remote Fault Monitoring
Remote Fault Monitoring, referred to as Fault Monitoring in this document is a deliverable of Oracle
Platinum Services. Oracle Platinum Services remotely monitors for faults in the hardware, database,
operating system and networking components of Certified Platinum Configurations twenty-four (24)
hours per day, seven (7) days per week and provides a mechanism to trigger the creation of a Service
Request (SR) on behalf of the customer. Fault Monitoring is subject to the Oracle Platinum Services
Technical Support Policy.
Fault Monitoring focuses on helping you maintain system and component functionality. Oracle
determines whether an event constitutes a fault. For a list of Oracle Platinum Services fault monitoring
events, please see Appendix I.
You may purchase additional monitoring services for a fee. Examples of for-fee monitoring include but
are not limited to performance, availability and capacity monitoring. For a description of each, please see
Appendix II.
One gateway can monitor multiple Engineered Systems (for example, up to eight (8) Full Rack
SuperCluster machines) as long as they are network accessible and the network connection between the
OASG and the Engineered System is reliable with low latency. In conjunction with the Oracle
Continuous Connection Network (OCCN) transport layer, the OASG establishes secure connectivity to
Oracle via SSL. Learn more about the gateway security by watching this video.
Oracle Enterprise Manager Oracle Enterprise Manager (OEM) is the standard tool for
monitoring and managing Oracle products. With Oracle Platinum Services Fault Monitoring,
OEM is the primary tool for detecting software faults. OEM software included with the OASG
also includes rule-based fault detection functionality that automatically creates a Service Request
(SR) and uploads related diagnostics, when available, upon detection of critical OEM issues with
Exadata and Recovery Appliance. A client side OEM agent is installed on the Certified Platinum
Configuration as a communication mechanism with OEM.
Oracle Auto Service Request Oracle Auto Service Request (ASR) is used to detect hardware
faults and automatically create the associated SR. ASR detects faults in compute nodes, storage
cells, and their Oracle Integrated Lights Out Managers (ILOM). For more information on ASR,
see Auto Service Request (ASR) documentation.
Oracle Configuration Manager Oracle Configuration Manager (OCM) captures Engineered
System configuration information and uploads the data to My Oracle Support. The configuration
data is extracted and uploaded every twenty-four (24) hours and is analyzed by Oracle Support
The OASG will be monitored, managed, and maintained by Oracle remotely via the OCCN connection.
Oracle monitors the entire event flow starting from OEM agent installed on the Certified Platinum
Configuration to the OCM Collections housed at Oracle. This ensures Oracle is alerted to any breakdown
in communication between components or software failure including detection of issues with OEM.
Oracle Platinum Services leverages OEM to monitor OASG system resources such as disk, memory,
CPU, etc. If the OASG is running on Oracle owned hardware, Oracle Platinum Services will leverage
ASR to monitor the key components of the hardware and engage Oracle support accordingly.
The OASP provides a view of your configuration items, incident management, change management, user
account management, and reporting.
See the OASP Quick Reference Guide or the Oracle Advanced Support Portal Demo for more
information. Sample fault event telemetry and sample configuration item details visible by the customer
can be found in Appendix VIII.
For additional details on required firewall ports, please see the Oracle Advanced Support Gateway
Security Guide. For additional details on access requirements, please see Appendix III.
Note: Without continuous inbound connection, Oracle will not be able to validate faults, which negates
the 15-minute resolution / 30-minute joint debug Oracle Platinum Service target response times.
1
sudo allows a user to execute a command or process with the privileges of another user typically superuser or
root without having to grant full access to those privileged accounts.
Updated: February 1, 2017 Page 7 of 34 Author: Oracle
Copyright 2017, Oracle and/or its affiliates. All rights reserved.
Fault Monitoring Roles and Responsibilities
Role Responsibility
Oracle Platinum Driver Oracle assigns each customer a Platinum Driver to provide key information during the
customers consideration of Platinum Services. The goal is to verify that the customer
is fully qualified, fully understands the requirements and responsibilities to the
service, is committed to Platinum Services, and completes prerequisites before
implementation begins. Once implementation of Platinum Services is underway, the
Platinum Driver may engage the customer to help resolve delays and to see that
customer expectations are being well managed and executed.
Oracle Implementation The Implementation Engineer is the primary point of contact and technical
Engineer (IE) manager for customers during the Oracle Platinum Services implementation.
From the point of receiving ownership of the Platinum Implementation SR
(PISR) to the point of hand over to the delivery organization, the IE acts as
the technical project lead during the implementation and remotely installs all
technical aspects of the fault monitoring, Oracle Automatic Service Request
(ASR), and Oracle Configuration Manager (OCM) solution. The IE is also
responsible for coordinating the resources and activities to deliver and install
the Engineered System.
Oracle Platinum The Oracle Platinum Control Center is responsible for fault event
Control Center management after a fault is detected including managing faults in OASP, fault
notification and SR creation.
Customer Contact Customer contact(s) are notified of verified fault events received by Oracle
Platinum Services. Notification is made by email only and can be to
individuals or an alias.
Oracle Field Engineer The Oracle Field Engineer (FE) is responsible for the Oracle Platinum
hardware gateway installation (on Oracle hardware), OASG installation and
Platinum connectivity to Oracle.
Customer Platinum The customer assigns an employee or contractor to fill the Customer Platinum
Manager Manager role. The Customer Platinum Manager is the point of contact (POC) for
Oracle and is responsible for the coordination of customer resources, installation-
related activities (for example, opening firewall ports), and decisions needed for a
smooth implementation. This POC is also responsible for the integration with
customer processes and meeting the planned Go Live schedule. Additional
responsibilities include managing customer stakeholder decisions and, when
necessary, consulting within the company to acquire expertise for service
integrationnetwork expert(s), security expert(s), and the target system owner(s).
The below list of Oracle Platinum Services monitored faults, determined by Oracle, are standard and not
subject to customization.
Exadata Exalogic SuperCluster Zero Data ZS3-ES and
Loss ZS4-4 Racked
Item Name Description
Recovery System
Appliance
ZFS Cluster x
7 Detection of standby node failure
ZFS # spare x
9 Detection of spare disks availability
disks available
Voting x x x CRS-160(4|5|6)
8 Disk
Alert*
OCR x x x CRS-(1006|1008|1010|1011|1009)
11
Alert*
Oracle x x x CRS-(1202|1402|1602|1603)
High
12 Availabilit
y Service
Alert*
CRS x x x CRS-120(3|5|6)
13 Resource
Alert*
x x x x ORA-
Generic (227|239|240|255|445|494|3137|4036|2
14
Incident* 4982|25319|29770|29771|32701|3270
3|32704|56729)
Cluster x x x ORA-29740
15
Error*
Generic x x x x ORA-600
Internal
Error
17 (Exadata
Storage
Cell and
DB)*
Out of x x x x ORA-403(0|1)
20
Memory*
File x x x x ORA-376
21 Access
Error*
Deadlock x x x x ORA-4020
22
(System)*
Media x x x x ORA-(1242|1243)
25
failure*
Recovery x ORA-(45168|45111)
27 Appliance
task failure
Recovery x ORA-45169
Appliance
28
timer
failure
Recovery x ORA-45109
Appliance
29
metadata
corruption
Corruption x ORA-(45132|45167)
30 in backup
piece
Corruption x ORA-45165
31 in backup
data
Module1 x x x x PDU
Phase2
35
Threshold
Evaluation
Module1 x x x x PDU
Phase1
36
Threshold
Evaluation
Module1 x x x x PDU
Phase3
37
Threshold
Evaluation
Performance Monitoring Measures IT service components against agreed upon metrics and
thresholds.
Availability Monitoring Measures the availability of key IT infrastructure components against
a defined availability target.
Capacity Monitoring Measures resource utilization and performance against the defined
capacity plan with the ability to adjust based on changing demand.
To set
SNMP
parameters
root Yes Yes x x x x x and create
orarom
monitoring
Integrated account
Lights Out
Manager Ongoing
Monitoring.
This account
orarom Yes Yes x x x x x is created
during the
setup by
Oracle
Required for
implementin
g solution,
creating
root Yes Yes x x x x
orarom user
and
configuring
monitoring
Compute/D Ongoing
B hosts Monitoring,
primary
owner of the
OEM agent.
orarom Yes Yes x x x x
This account
is created
during the
setup by
Oracle
define
SNMP
parameters
Ongoing
cellmonitor Yes x x x
monitoring
To configure
ASM
monitoring
ASM asmsnmp Yes Yes x x x
from OEM
and ongoing
monitoring
To configure
DB
monitoring
for OEM,
ongoing
DBMS dbsnmp Yes Yes x x x
monitoring
and
configuratio
n data
collection
IB define
Switches SNMP
parameters
To monitor
nm2user Yes Yes x x x x Infiniband
Switches
To define
SNMP
parameters;
enable Yes x x x x
To define
PDUs Admin Yes Yes x x x x SNMP
parameters
To create
ZFS root Yes Yes x x x
shares for
agent
installation
(Exalogic
only) and to
run
workflow to
enable OEM
monitoring
Created
during
installation
and assigned
orarom Yes Yes x x x to the agent
role, which
is used for
ongoing
monitoring
Control
VMs - for
Exalogic root Yes Yes x
for release
2.0.6.x.x
Ops Center
VM and
Exalogic
OVMM root Yes Yes x
VM for
release
2.0.4.x.x
Domains &
root Yes Yes x
Zones
Recovery
Ongoing
Appliance rasys Yes No x
monitoring
(Admin)
Initial
Activation
and one time
Recovery
root Yes Yes x SSH
Appliance
communicati
on between
nodes
Description
Hostname: xyzdb02
Product Type: EM ASR PRODUCT
Summary:SASR:ORA-600 - This is an automated database error on an Exadata System
Hardware Component:
Name:NA
Id:NA
Description
Hostname: xyzdb01
Product Type: ORCL,SPARC-T4-4
Summary:ASR:Memory module correctable errors exceeding acceptable levels.
The number of correctable errors associated with this memory module has exceeded
Message-ID: SUN4V-8002-3R
UUID: [UUID #]
Time: Jun 9, 2014 6:44 AM (UTC)
Severity: Major
FRU = hc://:chassis-mfg=unknown:chassis-name=ORCL,SPARC-T4-4:chassis-part=7020893:chassis-
serial=[chassis serial #]:fru-serial=[fru-serial #]:fru-part=07020578,HMT42GR7BMR4A-
G/chassis=0/cpuboard=0/dimm=8
Part number = 07020578,HMT42GR7BMR4A-G
Certainty = 95
Class = fault.memory.dimm-page-retires-excessive
For a demonstration of the OASP, see Oracle Platinum Services - Oracle Advanced Support Portal
(OASP) Overview - [ Video ] (Doc ID 1607117.1)
Name: ec1-vm-sample.sample.org
Type: Server
External Id:
Make: Sun
Description:
Status: Production
UUID:
Barcode: host
Architecture:
Firmware:
IP Address Type Primary Assigned CI
xx.yy.zzz.aa Management IP ec1-vm-sample.sample.org
Location
Manufac
Revision
Number
Number
Size In
PCI Id
Model
Name
Serial
Bytes
turer
Type
Part
Name/Identifier
Registry Source
Virtual Machine
Vendor Specific
Vendor Name
Information
Description
Media Type
Version
Name
ID
Installed Firmware Register
Installed Firmware
Description Type Version Installation Date Provider Release
Date