You are on page 1of 25

Availability benefits of Linux

on z System
David Raften
Raften@us.ibm.com

Contents

Why Linux on z system..............................................................................3


1.1

The z System hardware level.....................................................................5


2.1

Designed for zero down time...............................................................5

2.2

Call Home............................................................................................ 6

2.3

IBM zAware:......................................................................................... 7

Availability at the z/VM Hypervisor level....................................................8


3.1

Hardware virtualization at native speed..............................................8

3.2

Live Guest Relocation..........................................................................9

3.3

700 times easier on z System............................................................10

The Linux level......................................................................................... 12


4.1

Linux Health Checker LNXHC..........................................................12

Disaster Recovery.................................................................................... 13
5.1

Disaster Recovery Disaster................................................................13

5.1.1

D/R using recovery service provider............................................13

5.1.2

D/R using in-house recovery site.................................................15

5.1.3

Maintaining a Consistency Group................................................16

5.2
6

Availability, the hidden expense..........................................................4

GDPS/PPRC and GDPS Virtual Appliance............................................17

Summary................................................................................................. 18

Appendix A Selected z System availability features....................................19


6.1

Unplanned outage avoidance............................................................19

6.2

Planned outage avoidance.................................................................21

Appendix B References................................................................................22

Data center managers are constantly being asked to do more with a fixed or
declining budget. Yet, the data center needs to have more capacity to
support more workloads, be more secure, and more available. With these
external forces, business as usual does not work.
There are four categories of data center expenses:
Hardware
Software
People / System Management
Facilities
In most places in the world, most of the I/T budget goes to either software
fees or system management.
While a single z System server may be more expensive than a single x86based server, server hardware is actually one of the smallest components of
the budget. In fact, hardware costs have tended to be constant for over 10
years while software and system management costs continue to increase.
Moving workloads to Linux on z Systems can help reduce expenses by
significantly reduce the software and people/system management costs by
simplifying the infrastructure and avoiding server sprawl, while at the same
time also reduce environmental expenses. This enables more resources to
concentrate on to developing a secure, modern, highly available
infrastructure.

Why Linux on z Systems

The Linux operating system is designed to run on any platform: x86, Power,
or z System. The major difference between a Linux distribution running on a
PC and one on a z Systems mainframe is the device drivers and the
instruction set architecture to interface with the host hardware. The Linux
kernel, GNU environment, compilers, utilities, network protocols, memory
management, process management, etc. are all the same. Linux is designed
so one can create a Linux application on one platform and run it on any other.
More importantly, it has the same system management interface on any
platform. From a system programming or application programming point of
view, it does not matter what the underlying hardware is.
Although the Linux operating system and applications do not care what the
underlying hardware is, it does matter from a cost and availability
perspective. In the past, many data centers chose to run a single application
at a time on a server, often getting 10% utilization out of the server. Today
with virtualization such as VMware, the utilization is higher, but not much.
When one considers the amount of servers configured for:
Development
Quality Assurance / Test
Production
Backup for production at the primary site
Then double it for disaster recovery

The utilization is often no more than 35%. Although some users get the
primary production server at a higher utilization, the average across the data
center is low. This is even more dramatic when one considers the
configuration for a single application is then duplicated for each of the
hundreds or thousands of applications being run. Each of the tens of
thousands of servers incurs expenses of:

Software. Often the biggest data center expense, many products


charge by the number of cores on the server. It doesnt matter if the
server is 35% busy, 100% busy, or 0% busy.

System Management. Also a large part of the data center budget, how
do you maintain all the software to keep it current? How do you
maintain the hardware?

Facilities. Each one uses electricity, floor space, and cooling chiller
systems. The availability of electricity has driven many companies to
spend millions of dollars to create new data centers away from cities
where power is more readily available.

Hardware. If you buy a product, expecting to use all of it, are you
happy if you can only use a third? Why is this acceptable for servers?

z System can improve the average utilization of all the servers to near
100%, one would need significantly less servers to run the workload. The
savings can be even greater by then using faster servers. Many sites have
seen a 20:1 ratio in the number of cores after moving applications to Linux on
z with up to 40% reduction in total data center expenses.

Availability, the hidden expense

When calculating expenses, hardware, software, system management, and


facilities costs are what is being tracked. But there is an expense that is not
often looked at: the cost to the business when applications are unavailable.
Depending on the industry, this cost can be $1 Million or more for each hour
of downtime. This can be calculated by summing:

Missed business opportunity. Look at the number of transactions


not run during the period of the outage and the average revenue
generated by each transaction. This needs to be modified by the
transactions that can be deferred until when the system comes up
again.

Loss of productivity. What is the hourly cost of all the employees


affected that can no longer do their job? What is the hourly cost of the
data center?

Loss of brand image and customers. If the system is often


unavailable or even just getting bad performance, how many
customers will permanently move to your competition?

Other factors. This includes financial penalties, overtime payments,

and wasted goods.


The cost per minute of an outage increases with the duration of the outage.
The effects of the impact to customer service are subjective and depends on
how frequently outages occur and for how long. The more customers are
affected by the outage, the more chance there is of them taking their
business elsewhere.
Different hardware platforms have different availability characteristics. This
affects the bottom line on the Total Cost of Ownership for the application
solution.

The z System hardware level


Designed for zero down time

The IBM z Systems mainframe servers were designed with over 50 years of
experience with availability being the primary core value. The z in z
System stands for zero down time. Almost all the features developed for
the z Systems are available to any operating system and application being
hosted, including z/VM and Linux. In every generation the z System looks for
new ways to provide additional protection for its core components with
redundancy and seamless failover. These components include the core CPs,
Cache, Memory, I/O, Power, and Cooling, as well as looking at other areas
such as Security, since an outage caused by an external attack is still an
outage, or interaction with applications to proactively try to detect problems
before they occur.
As soon as the first Linux transaction runs on z Systems, it gets all
the availability protection that the z System is known for without
any application change.
IBM has a requirement: each System z server needs to be better than its
predecessor. For this to take place, for each of the major subsystems the z
Systems addresses the different levels of availability concerns. Some major
functions include the following:

Unplanned outage avoidance.


While many platforms provide many n+1 components, the z Systems
goes further with dual power supplies, Transparent CPU Sparing so if
there is a problem with any general or special purpose core, then
spares that come with the server would detect this and take over. This
would be invisible to the applications and they would continue without
any interruption. z Systems has a Redundant Array of Independent
Memory (RAIM). Based on the RAID concept for disk, memory can be
set up to recover if there are any failures in a memory array. This
provides protection at the dynamic random access memory (DRAM),
dual inline memory module (DIMM), and memory channel levels,
extensive error detection and correction on all components including

bus and fabric. For Security there is a hardware-based tamperresistant cryptographic accelerator. The z System is the only class of
servers to obtain EAL Level 5 certification. Every part is stress tested
multiple times during the different manufacturing phases.

Planned outage avoidance


Every major hardware component supports dynamic maintenance and
repair. This encompasses the Cores (including oscillators), Power,
Cooling, Memory (including IO cage, STIs, channels), and cryptographic
processor. The z System supports dynamic firmware updates and
dynamic driver load updates. This can be done while your key
applications are running. Dynamic IO reconfiguration allows
redefinition of channel types. Dynamic swapping of processor types
allows for price advantages. Dynamic LPAR add allows for workload
flexibility.

Power and Thermal management


While providing the ability to save on data center power consumption,
Power and thermal management also provides improved availability for
the server. Static power save mode is designed to reduce power
consumption on z System servers when full performance is not
required. It can be switched on and off during runtime with no
disruption to currently running workloads, aside from the change in
performance. As well as providing 20% - 30% reduction in power
consumption (depending on system configuration), the availability
benefit is in silicon reliability when operating at lower temperatures, as
well as less mechanical component wear. Typical examples of using
static power save mode includes:

Periods of lower utilization - weekends, third shift.

Capacity backup systems - systems used for emergency backup;


keep them "running" but reduce energy consumption. Systems
can quickly be brought back to full performance.

Reduced IBM interaction (touches) with customer systems


The design includes extra hardware that will not be replaced on 1 st
failure in the customer site, better problem management, and better
diagnostics with first failure data capture.

A more detailed, although not inclusive, list of availability features of the z


Systems can be found in Appendix A Selected z System availability features.
The z System hardware provides other features to improve availability for
proactive error detection and notification. These include:

Call Home

IBM zAware

Call Home

The Call Home service is an automated notification process that detects


problem conditions on the server and reports them to IBM Support,
sometimes even before the problem manifests itself. The service watches
your system for error conditions such as:
Primary SE loss of communications with the Alternate SE
Memory Sparing Threshold is reached
High humidity is sensed inside the machine
Alternate SE is fenced due to automatic switchover.
The server also looks for degraded conditions. Although the server is still
operating, some hardware is not working.
Loss of channels due to CPC hardware failure
Loss of memory
The drawer is no longer functioning
Capacity BackUp (CBU) resources have expired
Processor cycle time reduced due to temperature problem
CPC was IMLed during cycle time reduction.
Repeated intermittent problems
When it detects a problem, the call home service automatically gathers the
basic information to resolve the problem and sends an email with log files or
other diagnostics for the failure condition. IBM Support processes the email
information, opens a Problem Management Record (PMR) and assigns it to a
support engineer who investigates the problem. The call home service
ensures that the PMR contains the required information about the system and
problem. The IBM Support server also sends an email to alert a designated
administration contact with the PMR number.
The ability to diagnose and report on troublesome, but still working
components has at times allowed IBM customer engineers (CEs) to come with
a replacement and dynamically change the part before any failure has
occurred.

IBM zAware:

The IBM System z Advanced Workload Analysis Reporter (IBM zAware) is an


integrated, self-learning, analytics solution that helps identify unusual
behaviors of workloads based on message pattern recognition analytics. It
intelligently examines Linux on System z messages for potential deviations,
inconsistencies, or variations from the norm, providing out-of-band
monitoring and machine learning of operating system health. Large
operating system environments can sometimes generate more than 25
million messages per day. This can make manual analysis time-consuming
and error-prone when exceptional problems occur. IBM zAware provides a
graphical user interface (GUI) and APIs for easy drill-down into message
anomalies, which can lead to faster problem detection and resolution,
increasing availability.

IBM zAware provides:


Support for native or guest Linux on z Systems message log analysis
The ability to process message streams with or without message IDs
The ability to group multiple systems that have similar operational
characteristics for modeling and analysis
Recognition of dynamic activation and deactivation of a Linux
image into a group, and appropriate modeling and analysis.
User-defined grouping. For Linux on IBM z Systems, the user can
group multiple systems' data into a combined model: by
workload (one for all web servers, one for all databases, and so
on); by "solution" (for instance, one model for your cloud); or by
VM host.
Heat map display which provides a consolidated/aggregated/higher
level view with the ability to drill down to detail views

Availability at the z/VM Hypervisor level

IBM z/VM is the premier mainframe virtualization platform, supporting


thousands of virtual servers in a single footprint, more than any other
platform. The z/VM hypervisor is designed and developed in conjunction
with the z System hardware. As such, it can exploit new hardware functions
for performance, security, and availability and pass these benefits on to its
guests such as Linux, as well as z/VSE, z/OS, and z/TPF.
There are many examples of z/VM exploiting hardware functions. Some
examples include High Performance FICON (zHPF) for more I/O throughput,
HyperPAV for less I/O contention, Simultaneous Multi-Threading (SMT) for
performance, hardware based cryptographic acceleration, zEDC Express for
high-performance, low-latency hardware data compression while reducing
disk space and improving channel and networking bandwidth, and of course,
all of the availability features described in the previous chapter.
As well as the performance and security capabilities, from an availability
perspective the power of z/VM is its ability to efficiently virtualize hardware
components as well as its implementation of Live Guest Relocation.

Hardware virtualization at native speed

The ability to virtualize hardware components adds another layer of


availability. Efficient virtualization of processor, memory, communications,
I/O, and networking resources help reduce the need to duplicate and manage
hardware, programming and data resources. z/VM can significantly overcommit these real resources and allow users to create a set of virtual
machines with assets that exceed the amount of real hardware available.
This reduces hardware requirements and simplifies system management.
Because resources are virtualized, a z/VM guest sees only what z/VM presents
to it. If there is a problem with a hardware component, z/VM can seamlessly
switch to use the redundant component and hide this from the guest.
One example is the ability to balance the workload across multiple
cryptographic devices, and should one device fail or be brought offline, z/VM
can transparently shift Linux systems using that device to an alternate
cryptographic device without user intervention.
Another example of this is Multi-VSwitch Link Aggregation support. It allows a
port group of OSA-Express network adapter features to span multiple virtual
switches within a single z/VM system or between multiple z/VM systems. A
single VSwitch can provide a link aggregation group across multiple network
adapters, and make that highly-available connection available to guests
transparently. Sharing a Link Aggregation Port Group with multiple virtual

switches increases optimization and utilization of the OSA-Express adapters


when handling larger traffic loads and enables sharing the network traffic
among multiple adapters while still presenting only a single network interface
to the guest.
HiperSockets can be used for communication between Linux, as well as z/OS,
z/VM, and z/VSE instances on the same server. It provides an internal virtual
IP network using memory to memory communication. This improves not only
response time improvements, but also saves on processor utilization. In
addition, the complete virtualization of the network infrastructure provides
efficient and secure communication.
If your system does not have good performance, do you consider it as
available? Certainly your users may not as well as the help desk
representatives that they are complaining to.
z/VM exploits the hardware functions through direct execution of the machine
instructions. Since it knows what hardware it is running on there is no need
for an additional layer to trap hardware-directed instructions such as for disk
or network access, and then emulate them. This allows z/VM and its guests
to run significantly faster than other solutions that need to interrupt
execution, emulate instructions, and then follow pointers to the emulated
code. The system runs at native speed.
The virtualization capabilities in a single System z footprint can help to
support thousands of virtual Linux servers. Since a single IBM System z
server doesnt require external networking to communicate between the
virtual Linux servers, all of the Linux servers are in a single box,
communicating via very fast internal I/O connections.
The ability of z/VM to provide simple virtualization at high performance helps
provide availability as seen by the end user.

Live Guest Relocation

The most prevalent outage type in a z Systems environment is for software or


hardware maintenance or upgrades. The IBM z/VM Single System Image
Feature provides live guest relocation, a process where a running virtual
machine can be relocated from one z/VM member system of a cluster to
another. Virtual servers can be moved to another LPAR on the same or a
different z Systems without disruption to the business. Relocating virtual
servers can be useful for load balancing and for moving workload off of a
physical server or member system that requires maintenance. After
maintenance is applied to a member, guests can be relocated back to that
member, thereby allowing z/VM maintenance while keeping the Linux on
System z virtual servers available.
Checks are in place before a Linux guest is relocated to help avoid application
disruption. Some checks include:

It has enough resources available on the target system, such as

memory, CPU, and so on.

It has the same networking definition, for example VLAN, VSWITCH.

It is disconnected and accessible when the guest is being relocated.

It has access to the same or equivalent devices on the target system.

As well as the Linux instance, the memory contents and interrupt stack are
also relocated. The design point is to avoid unplanned outages while doing a
planned outage.
VMware supports Live Guest Relocation with the Vmotion technology, but it
has a different design point. It was designed not to provide for planned
outages, but rather to try to help avoid unplanned outages if there is a
possible future hardware problem. They do whatever it takes to move guests
off of one server and on to another quickly. This sometimes results in guest
availability being negatively affected. z/VM runs on z System servers where
hardware availability is not an issue.
Another differentiation is the flexibility of where a guest can be relocated.
X86 servers often do not support full backward compatibility. In this
environment one must plan for the target for each guest and upgrade the
servers as a group or else have the administrator fence off specific instruction
sets if a guest is moved to an older server model. By design, z System
supports full backward compatibility. Applications written in 1965 can still run
on todays servers. While some hardware features such as hardware
encryption may not be consistent across the servers, the Linux guests will
still run uninterrupted.

Thousands times easier on z System

Sites have virtualized over 50 Linux distributed cores on a single z System


core. With 140 usable cores on the z13, these users can obtain a server
reduction of over 7000 to 1. IBM internal tests have in fact run over 41,000
separate Linux guests on a single server, all managed by a single z/VM
hypervisor.
Massive virtualization massively reduces the amount of hardware in the
infrastructure. There are that times many less servers, and since Linux to
Linux communication can take place by using virtual links, an exponential
times less cables and ports. From an availability point of view, the end to end
availability as measured by the user requires all of the components to be
available. For example, if there are four components that are touched such
as servers, routers, ports, etc., and each at 99% available, then the net
effective availability is .99 x .99 x .99 x .99 = .9606, or about only 96%
available.
System management is that much easier. Cloning servers is much easier
then installing servers. Provisioning new servers can be done in just a couple
of minutes as compared to days for real servers. All this affects availability.
There is also less hardware that can fail. The distributed model for providing

High Availability is to deploy redundant physical servers. Often, this means


more than just two, but rather several physical servers clustered together so
that if any one of them fails there will be enough spare capacity spread
around the surviving servers in the cluster to absorb the failed guys work.
But, something often not considered is that as the number of physical servers
increases, so do the number of potential points of failure you have
eliminated single points of failure, but by increasing physical components you
have increased the odds that something will fail. By contrast, you can put
z/VM LPARs on the same server and eliminate all single points of failure with
only two z/VM instances except for the z System server itself which, as
explained above, is highly available. Furthermore, since we can share CPU
capacity between those two LPARs, if one entire z/VM should fail the surviving
z/VM will instantly and transparently inherit the failed z/VMs CPU capacity
(although not its memory). It is like squeezing a balloon one side gets
smaller and the other gets bigger.
From a disaster recovery point of view, recovery planning and actions are that
much easier. There are less servers, hypervisors, and multi-vendor
provisioning tools to worry about. Recovery planning and activity is that
much easier. In the event of a total site failure, bringing production images
and workloads up at the recovery site can now consistently be done within a
single shift. Best of all, if a z/OS system is already installed in the site, the
same planning, tools, skills, and infrastructure can be used for z/OS as for
Linux on z. See Chapter 4, Disaster Recover for more discussion on this.
As an additional benefit, having less servers greatly reduces software fees,
system management expenses, total hardware expenses, energy usage, and
floor space.

The Linux level

One advantage of Linux is that it looks and feels the same across platforms
from an application and systems point of view. Applications can be ported
without changes. It has the same system management interface
independent of the base hardware platform. But the Linux functionality is not
the same across all platforms. There are a number of functions that the Linux
distributors put into their code to take advantage of the z System hardware
capability. A few examples include the capability to avoid unplanned and
planned outages, improve dump processing, manage the hardware resources,
load balancing, exploitation of the CryptoExpress hardware, and failover
across cryptographic adapters. It is recommended that you talk to your Linux
sales representative for a complete list of their value-add capabilities for IBM
z System.

Linux Health Checker LNXHC

The Linux Health Checker tool can identify potential problems before they
impact your systems availability or cause outages. It collects and compares
the active Linux settings and system status for a system with the values
provided by health-check authors or defined by the user. It produces output in
the form of detailed messages, which provide information about potential
problems and the suggested actions to take.
Although the Linux Health Checker will run on any Linux platform which
meets the software requirements, currently available health check plug-ins
focus on Linux on z Systems. Examples of health checks include:

Configuration errors

Deviations from best-practice setups

Hardware running in degraded mode

Unused accelerator hardware

Single point-of-failures

Some specific health checks include, but not limited to:

Verify that the bootmap file is up-to-date


Screen users with superuser privileges
Check whether the path to the OpenSSL library is configured correctly
Check for CHPIDs that are not available
Confirm that automatic problem reporting is activated
Ensure that panic-on-oops is switched on

Check whether the CPUs run with reduced capacity


Check for an excessive number of unused I/O devices
Spot getty programs on the /dev/console device
Check Linux on z/VM for the "nopav" DASD parameter
Check file systems for adequate free space
Check file systems for an adequate number of free inodes
Check whether the recommended runlevel is used and set as default
Check the kernel message log for out-of-memory (OOM) occurrences
Check for an excessive error ratio for outbound HiperSockets traffic
Check the inbound network traffic for an excessive error or drop ratio
Confirm that the dump-on-panic function is enabled
Identify bonding interfaces that aggregate qeth interfaces with the same
CHPID
Identify qeth interfaces that do not have an optimal number of buffers
Identify network services that are known to be insecure
Identify unusable I/O devices
Identify multipath setups that consist of a single path only
Identify unused terminals (TTY)
Identify I/O devices that are in use although they are on the exclusion list
Identify I/O devices that are not associated with a device driver

The Linux Health Checker is available for download from


http://lnxhc.sourceforge.net/

Disaster Recovery

When most people think of when they would need to implement a disaster
recovery plan is in event of major front page events such as major natural
or man-made disasters such as flooding, earthquakes, or plane crashes. But
in reality, it is much more likely to require sustain a site failure or temporary
outage due to other, smaller, factors. Real examples have included air
conditioner failure, train derailment, a snake shorting out the power supply, a
coffee machine leaking water, or smoke from a nearby restaurant. Often the
management decision is to not declare a disaster because it would take too
long to restore service at the recovery site, there will be data loss, and there
is no easy plan to bring service back to the primary site. This is usually not
due to the issues with the z System servers, but rather with the distributed
environment. The decision is to gut it out, and wait until service can be
restored. While this is happening, money is being lost for the company.

Disaster Recovery Disaster

There are two common options on how the recovery site is managed: It can
be in-house, owned by the company, or it can be managed by a business
resiliency service provider. These have two very different implications for
recovering the x86 servers with much less differences for z System servers.

D/R using recovery service provider

One difference between z System and distributed environments is the


variability. There are a lot of different distributed operating systems such as
Windows, Unix (AIX, Sun, HP-UX, ), or Linux (RHEL, SUSE, ). Each of
these come with different release and version levels. On top of that are the
different hypervisors such as VMware, KVM, or HyperV. As a result, many of
these have operating systems have dependencies tied to a specific hardware
abstraction layer which is tied to physical or virtual systems. It is impossible
for a recovery service provider to duplicate the exact same hardware
configuration for all its customers, so in a disaster recovery situation,
especially if there is a regional event, the hardware configuration at the
recovery site will be different from what is being run in the production site. In
fact, it has a possibility of being different from what was being tested.
Massive virtualization with z System using z/VM with its management tools
such as IBM Wave greatly simplifies this so that a Linux image is recovered at
the same level that it was running on in the primary production site with
compatible hardware.
How do you recovery on dissimilar hardware? With a finite amount of assets
at the recovery site, a recovery service provider cannot mirror the specific
hardware configuration for every client, including the server type, storage
type, firewalls, load balancers, routers, gateways, etc. Kernel drivers may be
tied to specific hardware, so before restored systems can be started, it may
be required to first modify or update the operating system level and device
drivers to match the target destination recovery hardware. Since the service
providers may guarantees an equal or greater hardware platform, nothing
is known ahead of time what will be used. If there are issues, then multiple
skill sets are needed to do problem determination. Consequently, you may
run into performance issues once the recovered systems come up due to
applications being tied to specific hardware devices or some systems may
just not be recover on the new hardware. This issue can be eliminated by
running Linux on z. Even though there can be different levels of z System
hardware (z196, zEC12, z13... ) at different driver levels, and all of the z
System hardware is backwards compatible. In addition, the extreme
virtualization provided by z/VM reduces the amount of hardware variability.
Unlike disk is also an issue for the same reasons as unlike servers. Although
SCSI attached disk is supported with Linux on z with support for FB format
data, many chose to place the Linux data on ECKD formatted disk for
advantages of system management, reliability, and less CPU consumption.
This disk is storage agnostic. ECKD disk on any storage vendor all appear as

the same generic (3390) disk due the standardized interface. This plus the
fact that there is no internal disk on z System reduces complexities managing
different disk devices and driver levels.
Distributed systems need to restore the production images prior to restoring
databases. These production images often have many different drive
volumes (C-drive, D-drive, etc.) sometimes with a dozen or more drives for
each system. This can easily amount to hundreds of drives that need to be
restored using tools such as the Tivoli Storage Manager, Symantec
NetBackUp servers, Fiber Channel libraries, etc. If restoring from tape, one
quickly runs into a tape drive bottleneck. If restoring from LAN, then the
network becomes a bottleneck. This process can typically take 6 or more
hours to just bring up the backup restore servers before even database
restores can be started. This process is not needed with System z. Just
connect to the RESLIB volume containing the z/VM libraries, IPL the LPARs,
and you have immediate access to the applications and data.
Database restoration on distributed systems can also be an issue. If using
tape, the manner in which tapes store data, and the data volume and file
data size are a factor in restores. If it takes the mounting of multiple tapes to
access a single servers data to restore, then other systems are waiting for
access to the tape drives. If the data is restored via the network, there needs
to be enough network bandwidth on Backup/Restore server network adapters
and LAN to restore the data. Crossing low bandwidth hops could cause a
restore bottleneck. z Systems can run 50 - 100 restore jobs at a time if
restoring from a fiber channel library media. Restores occur via SAN FICON
environment and are NOT LAN based. In addition, there are 8x8 configurable
FICON paths to reach 64GB bandwidth per subsystem.
Production sites have hundreds or more servers. But for testing purposes the
disaster recovery provider may not have the same number of servers
available. For example, one could have 250 servers in production, but only
able to access 150 servers for a D/R test. This leaves a hole in the D/R plan
since one is never sure that all the applications will come up without any
problems. This is not the case with z System as the business resiliency
providers all host enough z servers for valid testing.
Finally, z System supports extensive end-end automation such as with GDPS
(see section GDPS Virtual Appliance). This not only speeds up processes
but more importantly removes people as a Single Point of Failure during
recovery. It is designed to automate all actions needed to restart production
workload in under 1 hour. This works with not only z/OS, but also Linux on z
images.
Due to the time needed to fully restore the distributed environment, bring up
the applications, and resolve any data consistency issues between tape
restores and disk remote copy restores, many sites have gotten to the point
of just bringing up the distributed server environments and data after three
days, then declared, Success! without actually running the applications or
resolving the consistency issues. This leaves a big hole in the D/R testing
with the possibility of unknown problems coming up should a real situation

happen. z Systems are often fully restored and tested within a single shift.

D/R using in-house recovery site

Due to the issues described above, many of the larger corporations have
chosen to invest in the hardware and facility expenses for a dedicated inhouse recovery site. This resolves many of the issues described above, but
at the expense of costs to keep another copy of the physical hardware such
as servers, disk, routers, etc. at the remote site, the floor space and energy
usage of all the equipment, and the system management costs of making
sure the D/R site stays at a mirror image of the production site.
Consequently, many clients find themselves slowly moving production into
their recovery environments over time to justify costs. Eventually the fine
line between production and recovery environments will become blurred.
Similarly, an additional complication is that sites often want to run
Development and Test workloads on the recovery servers. In the wake of a
disaster the testing infrastructure disappears just when it is needed the most
since it becomes preempted for Production. One needs to plan for where this
work will now be run.
The most significant issue that is not resolved by in-house disaster recovery
is the complexity of recovery. The more heterogeneous servers there are, the
more one needs to constantly fine-tune and practice the D/R plan. Some
considerations include:

Documentation Is the plan is well documented, detailed, easy to


follow, consolidated, and current. In a real disaster, experienced staff
may not be available.

Complexity of applications Which applications are the critical


ones that need to be restarted first? What about their dependencies?
Is the e-mail system more important than customer facing applications
so you can communicate problems that may come up during
recovery?

Plethora of server types and levels The more components, the


more people on site are required to manage the recovery, and the
more things can go wrong. What about compatibility of the different
hardware and software levels. Does the configuration at the DR site
reflect the same configuration at the Production sites?

Multiple disk types How do you ensure data consistency across


vendors? How do you protect against corrupted data? Do the tapes
have all the needed current data

How many backup tools are used For each tool are multiple
people trained to use them?

Is there a plan to get back to the original environment This is


something often not taken into consideration. How do you

resynchronize the data back on the original site?


Despite having an in-house recovery site, due to the complexity of trying to
manage and control the recovery for hundreds or thousands of distributed
servers, meeting the recovery time objective (RTO) is at times not obtainable.

Maintaining a Consistency Group

Once the databases are restored, they may not be usable. Different
applications have different Recovery Time Objectives (RTO), or how long can
the business accept the application being unavailable, and Recovery Point
Objectives (RPO), how much data can the business afford to lose. The least
expensive option is to make a copy of the database every 24 hours and send
the tapes off site. This supports an RPO of 24 hours and RTO of typically
three days. At the other end of the spectrum is disk to disk remote copy,
which support an RPO of 0 (no data loss), with an RTO of 2 hours or less. A
list of disaster recovery options can be found at:
http://en.wikipedia.org/wiki/Seven_tiers_of_disaster_recovery . As one moves
up the tiers the cost increases. With that in mind, many use different D/R
options depending upon the application.
Many recommend a mixed-tier approach with the D/R solution being used
dependent on the recovery time objectives (RTO) and recovery point
objections (RPO) for the applications being run. Some servers will then
recover on a pre-staged dedicated assets and others on hot-site syndicated
hardware made available within 24 hours of an event. In this case, systems
and data may be recovered by order of priority with a staggered RTO. Critical
database application data, usually on z system, can be made available within
4 hours or less, and applications / web services / data on distributed systems
can be made available in 24 hours or greater. This causes complications. It
is often the case that applications share common files. Not only that, but
often Tier 1 applications rely on data generated by Tier 3 applications.
How is data consistency maintained when some data is 30 seconds old, and
others are 24 hours old? Are the applications run and the data corruption
accepted? How is the corruption resolved? Can the required nightly batch
jobs be run? Even if all the data is replicated by disk remote copy, due to the
different disk vendors being used, there is still the issue of a common
consistency group between the vendors.
The IBM San Volume Controller (SVC) can resolve the issue of providing a
single consistency group between disk vendors by using the same Metro
Mirror or Global Mirror session for all the disk being virtualized under it.
There are several tools that can be used to help manage and monitor the
remote copy environment. This includes IBM Spectrum Control, Virtual
Storage Center (VSC), Tivoli Productivity Center for Replication (TPC-R) and
GDPS. Note that the GDPS Control LPAR requires z/OS with ECKD disk.

GDPS/PPRC and GDPS Virtual Appliance

In a real site disaster, there is not have the luxury of several weeks notice to
update D/R plans and get the key personnel to the remote site ahead of time.
In fact, the key personnel may be unavailable, may not physically be able to
get to the remote site or connect to it through the network, may have other
priorities such as the physical safety of family members or the home, or may
not survive the event. GDPS is an integrated end-to-end automated disaster
recovery solution designed to remove people as a Single Point of
Failure.
There are several different flavors of GDPS, depending on the type of
replication being used. It includes GDPS/PPRC HyperSwap Manager and
GDPS/PPRC to manage and automate synchronous Metro Mirror replication,
GDPS/XRC and GDPS/Global Mirror for asynchronous replication, and
GDPS/Active-Active based on long distance software based replication.
GDPS/PPRC enables HyperSwap, the ability to dynamically switch to
secondary disk without requiring applications to be quiesced. Swapping in
under seven seconds user impact time for 10,000 device pairs, this provides
near-continuous data availability for planned actions and unplanned events.
It provides disk remote copy management, and data consistency for remote
disk up to 200 km away with qualified DWDMs. GDPS/PPRC is designed to
fully automate the recovery at the remote site. This includes disk
reconfiguration, managing servers, Sysplex resources, CBU, activation
profiles, etc. GDPS/PPRC can be used with any disk vendor that supports the
Metro Mirror protocol. GDPS automation includes:

Disk error detection


HyperSwap for disk availability
Freeze capability to ensure data consistency with intelligent
determination of freeze trigger (mirroring or disk issues)
Perform disk reconfiguration
Perform tape reconfiguration
Perform CF reconfiguration
Manage CBU / OOCUoD policies
Manage STP configuration
Shut down discretionary workload on Site 2
Load Production IODF
Modify activation profile on HMC
IPL Prod LPARs
Respond to startup messages
Initiate application startup
Verify network connections
Manage z/OS resources such as Couple Data Sets, checkpoint data
sets, etc.

Toggle between sites

All this is done to support a Recovery Time Objective (RTO) less than an hour
with a Recovery Point Objective (RPO) of zero.
GDPS/PPRC is application and data independent. It can be used to provide a
consistent recovery for z/OS as well as non-z/OS data. This is especially
important for when a multi-tier application has dependencies upon multiple
operating system architectures. It is not enough that z/OS data is consistent,
but it needs to be consistent with non-IBM System z data to allow rapid
business resumption. As well as everything listed above, additional
automation of the Linux on z environment includes:
Coordinated Site Takeover with z/OS
Coordinated HyperSwap with z/OS
Single point of control
Coordinated recovery from a Linux node or cluster failure
Monitor heartbeats for node or cluster failure
Automatically re-IPL failing node(s) in the failing cluster
Data consistency across System z, Linux and/or z/VM
Disk Subsystem maintenance (planned actions)
Non-disruptively HyperSwap z/VM and guests or native Linux
Live Guest Relocation
Orderly shutdown / startup
Start / Stop Linux clusters and nodes
Start / Stop maintenance mode for clusters and nodes
Disk Subsystem failure (unplanned actions)
Non-disruptively HyperSwap z/VM and guests following a
HyperSwap trigger
Policy-based order to restart Linux clusters and nodes
Single point of control to manage disk mirroring configurations
GDPS Virtual Appliance is based on GDPS/PPRC, designed for sites who do
not have z/OS skills to manage the GDPS control system (K-Sys). The GDPS
Virtual Appliance delivers the GDPS/PPRC capabilities through a self-contained
GDPS controlling system that is delivered as an appliance. A graphical user
interface is provided for monitoring of the environment and performing
various actions including maintaining the GDPS control system, making z/OS
invisible to the system programmers. This provides IBM z Systems customers
who run z/VM and their associated guests such as Linux on z Systems with
similar high availability and disaster recovery benefits as what is available for
z/OS systems.
The automation capability of GDPS is unique and without peer in the
distributed world.

Summary

With the proliferation of intelligent phones and mobile computing, users


increasingly have higher expectations for availability and when service is
unavailable, it is easier to share frustrations on social media to friends. Users
getting this information can cause dissatisfaction with a brand, even if they
personally were not even affected. This impacts customer retention and the
bottom line profitability.
If given a choice between what infrastructure to place customer-facing and
mission-critical applications, one would want to choose the platform that can
provide the most benefit for the corporation. Much has been written about
the Total Cost of Ownership (TCO) benefits of Linux on z System including
what is found at www.ibm.com/systems/z/os/linux/resources/doc_wp.html ,
even without considering the availability impacts to cost. When one adds the
benefits of a highly available and secure hardware base, extreme
virtualization that is also designed to share hardware resources, additional
RAS customization supplied by the Linux distributor, and fast and automated
end-end disaster recovery, placing the Linux applications on z System
becomes the best choice for the business.

Appendix A Selected z System


availability features
Unplanned outage avoidance

Unplanned outage avoidance by using n+1 components is what one


normally thinks of when thinking about availability, the z System goes well
beyond that. A partial list of availability features include:

Power:
N+1 Power subsystems
N+1 Internal batteries
Dual AC inputs
Voltage transformation module (VTM) technology with triple
redundancy on the VTM.

Cooling
Hybrid cooling system
N+1 blowers
Modular refrigeration units

Cores:

Dual instruction and execution with instruction retry


Concurrently checkstop individual cores without outage
Transparent CPU Sparing so if there is a problem with a core,
then spares that come with the server would detect this and
take over. This would be invisible to the applications and they
would continue without any interruption.
Point to point SMP fabric

Memory:
Redundant Array of Independent Memory (RAIM). Based on the
RAID concept for disk, memory can be set up to recover if there
are any failures in a memory array. This provides protection at
the dynamic random access memory (DRAM), dual inline
memory module (DIMM), and memory channel levels.
Extensive error detection and correction from DIMM level
failures, including components such as the controller application
specific integrated circuit (ASIC), the power regulators, the
clocks, and the board
Error detection and correction from Memory channel failures
such as signal lines, control lines, and drivers/receivers on the
MCM
ECC on memory, control circuitry, system memory data bus, and
fabric controller
Dynamic memory chip sparing
Hardware memory scrubbing
Storage protection facility
Memory capacity backup
Partial memory restart

Cache / Arrays
Translation lookaside buffer retry / delete
Redundant branch history table
Concurrent L1 and L2 cache delete
Concurrent L1 and L2 cache directory delete
L1 and L2 cache relocate
ECC for cache

Input / Output
FCP end to end checking
Redundant I/O interconnect
Multiple channel paths
Redundant Ethernet service network w/ VLAN
System Assist Processors (SAPs)
Separate I/O CHPIDs
Shared I/O capability

Address limit checking


Dynamic path reconnect
Channel subsystem monitoring

Security
Integrated cryptographic accelerator
Tamper-resistant Crypto Express feature
Trusted Key Entry (TKE) 5.2 with optional Smart Card reader
EAL Level 5 certified the only platform that attained this level

General
Extensive testing of all parts, components, and system during
the manufacturing phases
Comprehensive field tracking
Transparent Oscillator failover
Automatic Support Element switchover
Service processor reboot and sparing
ECC on drawer interconnect
Redundant drawer interconnect
Frame Bolt Down Feature
Storage Protection Keys
FlashExpress (improved dump data capture)

Planned outage avoidance

Another aspect of availability is avoidance of planned outage. Some System


z features in support of this include:

Power

Concurrent internal battery maintenance


Concurrent Power maintenance

Cooling
Concurrent Thermal maintenance

Cores

Concurrent processor book repair / add


Transparent Oscillator maintenance

Memory
Concurrent
Concurrent
Concurrent
Concurrent
Concurrent

memory repair / add


memory upgrade
memory bus adapter replacement
MBA hub upgrade
repair on all parts in an I/O cage

Upgrade on any I/O card type


Concurrently checkstop individual channels
Concurrent STI repair
Concurrent I/O cage controller maintenance
Dynamic I/O reconfiguration
Hot-pluggable I/O
Transparent SAP sparing
Dynamic SAP reassignment
Dynamic I/O Enablement

Security
Dynamically add Crypto Express processor
Concurrent Crypto-PCI upgrade

General
Concurrent Microcode (Firmware) updates Install and Activate
driver levels and MicroCode Load (MCL) levels based upon
bundle number while applications are still running.
Concurrent major LIC upgrades (CPUs, LPAR, channels), OSA,
Power and Thermal, Service Processor, HMC, )
Dynamic Swapping of Processor Types
On/Off Capacity Upgrades on Demand (OOCUoD)
Capacity Backup (CBU)
Concurrent service processor maintenance
Dynamic logical partition (LPAR) add
Dynamic add Logical CP to a Partition

Appendix B References

ZSW03236USEN High-Availability of System Resources: Architectures for


Linux on IBM System z Servers
ZSL03210USEN

Comparing Virtualization Methods

IBM Systems Journal


http://researchweb.watson.ibm.com/journal/index.html
SG24-6374
Capabilities

GDPS Family: An Introduction to Concepts and

http://www.redbooks.ibm.com/redpieces/abstracts/sg246374.html?
Open
GDPS Home Page
http://www.ibm.com/systems/z/advantages/gdps/index.html
Linux on z Systems Tuning
http://www.ibm.com/developerworks/linux/linux390/perf/tuning_diskio.html
Linux on System z Disk I/O Performance
http://www.vm.ibm.com/education/lvc/LVC0918.pdf
Effectively running Linux on IBM System z in a virtualized environment and
cloud http://events.linuxfoundation.org/sites/events/files/eeus13_mild.pdf
Mainframe Total Cost of Ownership Issues
http://www-01.ibm.com/software/htp/tpf/tpfug/tgs07/tgs07e.pdf

You might also like