Professional Documents
Culture Documents
on z System
David Raften
Raften@us.ibm.com
Contents
2.2
Call Home............................................................................................ 6
2.3
IBM zAware:......................................................................................... 7
3.2
3.3
Disaster Recovery.................................................................................... 13
5.1
5.1.1
5.1.2
5.1.3
5.2
6
Summary................................................................................................. 18
6.2
Appendix B References................................................................................22
Data center managers are constantly being asked to do more with a fixed or
declining budget. Yet, the data center needs to have more capacity to
support more workloads, be more secure, and more available. With these
external forces, business as usual does not work.
There are four categories of data center expenses:
Hardware
Software
People / System Management
Facilities
In most places in the world, most of the I/T budget goes to either software
fees or system management.
While a single z System server may be more expensive than a single x86based server, server hardware is actually one of the smallest components of
the budget. In fact, hardware costs have tended to be constant for over 10
years while software and system management costs continue to increase.
Moving workloads to Linux on z Systems can help reduce expenses by
significantly reduce the software and people/system management costs by
simplifying the infrastructure and avoiding server sprawl, while at the same
time also reduce environmental expenses. This enables more resources to
concentrate on to developing a secure, modern, highly available
infrastructure.
The Linux operating system is designed to run on any platform: x86, Power,
or z System. The major difference between a Linux distribution running on a
PC and one on a z Systems mainframe is the device drivers and the
instruction set architecture to interface with the host hardware. The Linux
kernel, GNU environment, compilers, utilities, network protocols, memory
management, process management, etc. are all the same. Linux is designed
so one can create a Linux application on one platform and run it on any other.
More importantly, it has the same system management interface on any
platform. From a system programming or application programming point of
view, it does not matter what the underlying hardware is.
Although the Linux operating system and applications do not care what the
underlying hardware is, it does matter from a cost and availability
perspective. In the past, many data centers chose to run a single application
at a time on a server, often getting 10% utilization out of the server. Today
with virtualization such as VMware, the utilization is higher, but not much.
When one considers the amount of servers configured for:
Development
Quality Assurance / Test
Production
Backup for production at the primary site
Then double it for disaster recovery
The utilization is often no more than 35%. Although some users get the
primary production server at a higher utilization, the average across the data
center is low. This is even more dramatic when one considers the
configuration for a single application is then duplicated for each of the
hundreds or thousands of applications being run. Each of the tens of
thousands of servers incurs expenses of:
System Management. Also a large part of the data center budget, how
do you maintain all the software to keep it current? How do you
maintain the hardware?
Facilities. Each one uses electricity, floor space, and cooling chiller
systems. The availability of electricity has driven many companies to
spend millions of dollars to create new data centers away from cities
where power is more readily available.
Hardware. If you buy a product, expecting to use all of it, are you
happy if you can only use a third? Why is this acceptable for servers?
z System can improve the average utilization of all the servers to near
100%, one would need significantly less servers to run the workload. The
savings can be even greater by then using faster servers. Many sites have
seen a 20:1 ratio in the number of cores after moving applications to Linux on
z with up to 40% reduction in total data center expenses.
The IBM z Systems mainframe servers were designed with over 50 years of
experience with availability being the primary core value. The z in z
System stands for zero down time. Almost all the features developed for
the z Systems are available to any operating system and application being
hosted, including z/VM and Linux. In every generation the z System looks for
new ways to provide additional protection for its core components with
redundancy and seamless failover. These components include the core CPs,
Cache, Memory, I/O, Power, and Cooling, as well as looking at other areas
such as Security, since an outage caused by an external attack is still an
outage, or interaction with applications to proactively try to detect problems
before they occur.
As soon as the first Linux transaction runs on z Systems, it gets all
the availability protection that the z System is known for without
any application change.
IBM has a requirement: each System z server needs to be better than its
predecessor. For this to take place, for each of the major subsystems the z
Systems addresses the different levels of availability concerns. Some major
functions include the following:
bus and fabric. For Security there is a hardware-based tamperresistant cryptographic accelerator. The z System is the only class of
servers to obtain EAL Level 5 certification. Every part is stress tested
multiple times during the different manufacturing phases.
Call Home
IBM zAware
Call Home
IBM zAware:
As well as the Linux instance, the memory contents and interrupt stack are
also relocated. The design point is to avoid unplanned outages while doing a
planned outage.
VMware supports Live Guest Relocation with the Vmotion technology, but it
has a different design point. It was designed not to provide for planned
outages, but rather to try to help avoid unplanned outages if there is a
possible future hardware problem. They do whatever it takes to move guests
off of one server and on to another quickly. This sometimes results in guest
availability being negatively affected. z/VM runs on z System servers where
hardware availability is not an issue.
Another differentiation is the flexibility of where a guest can be relocated.
X86 servers often do not support full backward compatibility. In this
environment one must plan for the target for each guest and upgrade the
servers as a group or else have the administrator fence off specific instruction
sets if a guest is moved to an older server model. By design, z System
supports full backward compatibility. Applications written in 1965 can still run
on todays servers. While some hardware features such as hardware
encryption may not be consistent across the servers, the Linux guests will
still run uninterrupted.
One advantage of Linux is that it looks and feels the same across platforms
from an application and systems point of view. Applications can be ported
without changes. It has the same system management interface
independent of the base hardware platform. But the Linux functionality is not
the same across all platforms. There are a number of functions that the Linux
distributors put into their code to take advantage of the z System hardware
capability. A few examples include the capability to avoid unplanned and
planned outages, improve dump processing, manage the hardware resources,
load balancing, exploitation of the CryptoExpress hardware, and failover
across cryptographic adapters. It is recommended that you talk to your Linux
sales representative for a complete list of their value-add capabilities for IBM
z System.
The Linux Health Checker tool can identify potential problems before they
impact your systems availability or cause outages. It collects and compares
the active Linux settings and system status for a system with the values
provided by health-check authors or defined by the user. It produces output in
the form of detailed messages, which provide information about potential
problems and the suggested actions to take.
Although the Linux Health Checker will run on any Linux platform which
meets the software requirements, currently available health check plug-ins
focus on Linux on z Systems. Examples of health checks include:
Configuration errors
Single point-of-failures
Disaster Recovery
When most people think of when they would need to implement a disaster
recovery plan is in event of major front page events such as major natural
or man-made disasters such as flooding, earthquakes, or plane crashes. But
in reality, it is much more likely to require sustain a site failure or temporary
outage due to other, smaller, factors. Real examples have included air
conditioner failure, train derailment, a snake shorting out the power supply, a
coffee machine leaking water, or smoke from a nearby restaurant. Often the
management decision is to not declare a disaster because it would take too
long to restore service at the recovery site, there will be data loss, and there
is no easy plan to bring service back to the primary site. This is usually not
due to the issues with the z System servers, but rather with the distributed
environment. The decision is to gut it out, and wait until service can be
restored. While this is happening, money is being lost for the company.
There are two common options on how the recovery site is managed: It can
be in-house, owned by the company, or it can be managed by a business
resiliency service provider. These have two very different implications for
recovering the x86 servers with much less differences for z System servers.
the same generic (3390) disk due the standardized interface. This plus the
fact that there is no internal disk on z System reduces complexities managing
different disk devices and driver levels.
Distributed systems need to restore the production images prior to restoring
databases. These production images often have many different drive
volumes (C-drive, D-drive, etc.) sometimes with a dozen or more drives for
each system. This can easily amount to hundreds of drives that need to be
restored using tools such as the Tivoli Storage Manager, Symantec
NetBackUp servers, Fiber Channel libraries, etc. If restoring from tape, one
quickly runs into a tape drive bottleneck. If restoring from LAN, then the
network becomes a bottleneck. This process can typically take 6 or more
hours to just bring up the backup restore servers before even database
restores can be started. This process is not needed with System z. Just
connect to the RESLIB volume containing the z/VM libraries, IPL the LPARs,
and you have immediate access to the applications and data.
Database restoration on distributed systems can also be an issue. If using
tape, the manner in which tapes store data, and the data volume and file
data size are a factor in restores. If it takes the mounting of multiple tapes to
access a single servers data to restore, then other systems are waiting for
access to the tape drives. If the data is restored via the network, there needs
to be enough network bandwidth on Backup/Restore server network adapters
and LAN to restore the data. Crossing low bandwidth hops could cause a
restore bottleneck. z Systems can run 50 - 100 restore jobs at a time if
restoring from a fiber channel library media. Restores occur via SAN FICON
environment and are NOT LAN based. In addition, there are 8x8 configurable
FICON paths to reach 64GB bandwidth per subsystem.
Production sites have hundreds or more servers. But for testing purposes the
disaster recovery provider may not have the same number of servers
available. For example, one could have 250 servers in production, but only
able to access 150 servers for a D/R test. This leaves a hole in the D/R plan
since one is never sure that all the applications will come up without any
problems. This is not the case with z System as the business resiliency
providers all host enough z servers for valid testing.
Finally, z System supports extensive end-end automation such as with GDPS
(see section GDPS Virtual Appliance). This not only speeds up processes
but more importantly removes people as a Single Point of Failure during
recovery. It is designed to automate all actions needed to restart production
workload in under 1 hour. This works with not only z/OS, but also Linux on z
images.
Due to the time needed to fully restore the distributed environment, bring up
the applications, and resolve any data consistency issues between tape
restores and disk remote copy restores, many sites have gotten to the point
of just bringing up the distributed server environments and data after three
days, then declared, Success! without actually running the applications or
resolving the consistency issues. This leaves a big hole in the D/R testing
with the possibility of unknown problems coming up should a real situation
happen. z Systems are often fully restored and tested within a single shift.
Due to the issues described above, many of the larger corporations have
chosen to invest in the hardware and facility expenses for a dedicated inhouse recovery site. This resolves many of the issues described above, but
at the expense of costs to keep another copy of the physical hardware such
as servers, disk, routers, etc. at the remote site, the floor space and energy
usage of all the equipment, and the system management costs of making
sure the D/R site stays at a mirror image of the production site.
Consequently, many clients find themselves slowly moving production into
their recovery environments over time to justify costs. Eventually the fine
line between production and recovery environments will become blurred.
Similarly, an additional complication is that sites often want to run
Development and Test workloads on the recovery servers. In the wake of a
disaster the testing infrastructure disappears just when it is needed the most
since it becomes preempted for Production. One needs to plan for where this
work will now be run.
The most significant issue that is not resolved by in-house disaster recovery
is the complexity of recovery. The more heterogeneous servers there are, the
more one needs to constantly fine-tune and practice the D/R plan. Some
considerations include:
How many backup tools are used For each tool are multiple
people trained to use them?
Once the databases are restored, they may not be usable. Different
applications have different Recovery Time Objectives (RTO), or how long can
the business accept the application being unavailable, and Recovery Point
Objectives (RPO), how much data can the business afford to lose. The least
expensive option is to make a copy of the database every 24 hours and send
the tapes off site. This supports an RPO of 24 hours and RTO of typically
three days. At the other end of the spectrum is disk to disk remote copy,
which support an RPO of 0 (no data loss), with an RTO of 2 hours or less. A
list of disaster recovery options can be found at:
http://en.wikipedia.org/wiki/Seven_tiers_of_disaster_recovery . As one moves
up the tiers the cost increases. With that in mind, many use different D/R
options depending upon the application.
Many recommend a mixed-tier approach with the D/R solution being used
dependent on the recovery time objectives (RTO) and recovery point
objections (RPO) for the applications being run. Some servers will then
recover on a pre-staged dedicated assets and others on hot-site syndicated
hardware made available within 24 hours of an event. In this case, systems
and data may be recovered by order of priority with a staggered RTO. Critical
database application data, usually on z system, can be made available within
4 hours or less, and applications / web services / data on distributed systems
can be made available in 24 hours or greater. This causes complications. It
is often the case that applications share common files. Not only that, but
often Tier 1 applications rely on data generated by Tier 3 applications.
How is data consistency maintained when some data is 30 seconds old, and
others are 24 hours old? Are the applications run and the data corruption
accepted? How is the corruption resolved? Can the required nightly batch
jobs be run? Even if all the data is replicated by disk remote copy, due to the
different disk vendors being used, there is still the issue of a common
consistency group between the vendors.
The IBM San Volume Controller (SVC) can resolve the issue of providing a
single consistency group between disk vendors by using the same Metro
Mirror or Global Mirror session for all the disk being virtualized under it.
There are several tools that can be used to help manage and monitor the
remote copy environment. This includes IBM Spectrum Control, Virtual
Storage Center (VSC), Tivoli Productivity Center for Replication (TPC-R) and
GDPS. Note that the GDPS Control LPAR requires z/OS with ECKD disk.
In a real site disaster, there is not have the luxury of several weeks notice to
update D/R plans and get the key personnel to the remote site ahead of time.
In fact, the key personnel may be unavailable, may not physically be able to
get to the remote site or connect to it through the network, may have other
priorities such as the physical safety of family members or the home, or may
not survive the event. GDPS is an integrated end-to-end automated disaster
recovery solution designed to remove people as a Single Point of
Failure.
There are several different flavors of GDPS, depending on the type of
replication being used. It includes GDPS/PPRC HyperSwap Manager and
GDPS/PPRC to manage and automate synchronous Metro Mirror replication,
GDPS/XRC and GDPS/Global Mirror for asynchronous replication, and
GDPS/Active-Active based on long distance software based replication.
GDPS/PPRC enables HyperSwap, the ability to dynamically switch to
secondary disk without requiring applications to be quiesced. Swapping in
under seven seconds user impact time for 10,000 device pairs, this provides
near-continuous data availability for planned actions and unplanned events.
It provides disk remote copy management, and data consistency for remote
disk up to 200 km away with qualified DWDMs. GDPS/PPRC is designed to
fully automate the recovery at the remote site. This includes disk
reconfiguration, managing servers, Sysplex resources, CBU, activation
profiles, etc. GDPS/PPRC can be used with any disk vendor that supports the
Metro Mirror protocol. GDPS automation includes:
All this is done to support a Recovery Time Objective (RTO) less than an hour
with a Recovery Point Objective (RPO) of zero.
GDPS/PPRC is application and data independent. It can be used to provide a
consistent recovery for z/OS as well as non-z/OS data. This is especially
important for when a multi-tier application has dependencies upon multiple
operating system architectures. It is not enough that z/OS data is consistent,
but it needs to be consistent with non-IBM System z data to allow rapid
business resumption. As well as everything listed above, additional
automation of the Linux on z environment includes:
Coordinated Site Takeover with z/OS
Coordinated HyperSwap with z/OS
Single point of control
Coordinated recovery from a Linux node or cluster failure
Monitor heartbeats for node or cluster failure
Automatically re-IPL failing node(s) in the failing cluster
Data consistency across System z, Linux and/or z/VM
Disk Subsystem maintenance (planned actions)
Non-disruptively HyperSwap z/VM and guests or native Linux
Live Guest Relocation
Orderly shutdown / startup
Start / Stop Linux clusters and nodes
Start / Stop maintenance mode for clusters and nodes
Disk Subsystem failure (unplanned actions)
Non-disruptively HyperSwap z/VM and guests following a
HyperSwap trigger
Policy-based order to restart Linux clusters and nodes
Single point of control to manage disk mirroring configurations
GDPS Virtual Appliance is based on GDPS/PPRC, designed for sites who do
not have z/OS skills to manage the GDPS control system (K-Sys). The GDPS
Virtual Appliance delivers the GDPS/PPRC capabilities through a self-contained
GDPS controlling system that is delivered as an appliance. A graphical user
interface is provided for monitoring of the environment and performing
various actions including maintaining the GDPS control system, making z/OS
invisible to the system programmers. This provides IBM z Systems customers
who run z/VM and their associated guests such as Linux on z Systems with
similar high availability and disaster recovery benefits as what is available for
z/OS systems.
The automation capability of GDPS is unique and without peer in the
distributed world.
Summary
Power:
N+1 Power subsystems
N+1 Internal batteries
Dual AC inputs
Voltage transformation module (VTM) technology with triple
redundancy on the VTM.
Cooling
Hybrid cooling system
N+1 blowers
Modular refrigeration units
Cores:
Memory:
Redundant Array of Independent Memory (RAIM). Based on the
RAID concept for disk, memory can be set up to recover if there
are any failures in a memory array. This provides protection at
the dynamic random access memory (DRAM), dual inline
memory module (DIMM), and memory channel levels.
Extensive error detection and correction from DIMM level
failures, including components such as the controller application
specific integrated circuit (ASIC), the power regulators, the
clocks, and the board
Error detection and correction from Memory channel failures
such as signal lines, control lines, and drivers/receivers on the
MCM
ECC on memory, control circuitry, system memory data bus, and
fabric controller
Dynamic memory chip sparing
Hardware memory scrubbing
Storage protection facility
Memory capacity backup
Partial memory restart
Cache / Arrays
Translation lookaside buffer retry / delete
Redundant branch history table
Concurrent L1 and L2 cache delete
Concurrent L1 and L2 cache directory delete
L1 and L2 cache relocate
ECC for cache
Input / Output
FCP end to end checking
Redundant I/O interconnect
Multiple channel paths
Redundant Ethernet service network w/ VLAN
System Assist Processors (SAPs)
Separate I/O CHPIDs
Shared I/O capability
Security
Integrated cryptographic accelerator
Tamper-resistant Crypto Express feature
Trusted Key Entry (TKE) 5.2 with optional Smart Card reader
EAL Level 5 certified the only platform that attained this level
General
Extensive testing of all parts, components, and system during
the manufacturing phases
Comprehensive field tracking
Transparent Oscillator failover
Automatic Support Element switchover
Service processor reboot and sparing
ECC on drawer interconnect
Redundant drawer interconnect
Frame Bolt Down Feature
Storage Protection Keys
FlashExpress (improved dump data capture)
Power
Cooling
Concurrent Thermal maintenance
Cores
Memory
Concurrent
Concurrent
Concurrent
Concurrent
Concurrent
Security
Dynamically add Crypto Express processor
Concurrent Crypto-PCI upgrade
General
Concurrent Microcode (Firmware) updates Install and Activate
driver levels and MicroCode Load (MCL) levels based upon
bundle number while applications are still running.
Concurrent major LIC upgrades (CPUs, LPAR, channels), OSA,
Power and Thermal, Service Processor, HMC, )
Dynamic Swapping of Processor Types
On/Off Capacity Upgrades on Demand (OOCUoD)
Capacity Backup (CBU)
Concurrent service processor maintenance
Dynamic logical partition (LPAR) add
Dynamic add Logical CP to a Partition
Appendix B References
http://www.redbooks.ibm.com/redpieces/abstracts/sg246374.html?
Open
GDPS Home Page
http://www.ibm.com/systems/z/advantages/gdps/index.html
Linux on z Systems Tuning
http://www.ibm.com/developerworks/linux/linux390/perf/tuning_diskio.html
Linux on System z Disk I/O Performance
http://www.vm.ibm.com/education/lvc/LVC0918.pdf
Effectively running Linux on IBM System z in a virtualized environment and
cloud http://events.linuxfoundation.org/sites/events/files/eeus13_mild.pdf
Mainframe Total Cost of Ownership Issues
http://www-01.ibm.com/software/htp/tpf/tpfug/tgs07/tgs07e.pdf