Professional Documents
Culture Documents
Release 6.0
N281842
November 8, 2006
Veritas NetBackup
Backup Planning and Performance Tuning Guide
Copyright © 2003 - 2006 Symantec Corporation. All rights reserved.
Symantec, the Symantec logo, and NetBackup are trademarks or registered trademarks of
Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be
trademarks of their respective owners.
Portions of this software are derived from the RSA Data Security, Inc. MD5
Message-Digest Algorithm. Copyright 1991-92, RSA Data Security, Inc. Created 1991. All
rights reserved.
The product described in this document is distributed under licenses restricting its use,
copying, distribution, and decompilation/reverse engineering. No part of this document
may be reproduced in any form by any means without prior written authorization of
Symantec Corporation and its licensors, if any.
Symantec Corporation
20330 Stevens Creek Blvd.
Cupertino, CA 95014
www.symantec.com
Technical support
Index 163
Section I
Section I helps you lay the foundation of good backup performance through
planning and configuring your NetBackup installation. Section I also includes
some best practices.
Section I includes these chapters:
■ NetBackup capacity planning
■ Master server configuration guidelines
■ Media server configuration guidelines
■ Media configuration guidelines
■ Database backup guidelines
■ Best practices
Note: For a discussion of tuning factors and general recommendations that may
be applied to an existing installation, see Section II.
Chapter 1
NetBackup capacity
planning
This chapter explains how to design your backup system as a foundation for
good performance.
This chapter includes the following sections:
■ “Introduction” on page 13
■ “Analyzing your backup requirements” on page 14
■ “Designing your backup system” on page 16
■ “Questionnaire for capacity planning” on page 37
12 NetBackup capacity planning
Purpose
Purpose
Veritas NetBackup is a high-performance data protection application. Its
architecture is designed for large and complex distributed computing
environments. NetBackup provides a scalable storage management server that
can be configured for network backup, recovery, archival, and file migration
services.
This manual is for administrators who want to analyze, evaluate, and tune
NetBackup performance. This manual is intended to answer questions such as
the following: How big should the backup server be? How can the NetBackup
server be tuned for maximum performance? How many CPUs and tape drives
are needed? How to configure backups to run as fast as possible? How to
improve recovery times? What tools can characterize or measure how
NetBackup is handling data?
Note: Most critical factors in performance are based in hardware rather than
software. Hardware selection and configuration have roughly four times the
weight that software has in determining performance. Although this guide
provides some hardware configuration assistance, it is assumed for the most
part that your devices are correctly configured.
Disclaimer
It is assumed you are familiar with NetBackup and your applications, operating
systems, and hardware. The information in this manual is advisory only,
presented in the form of guidelines. Changes to an installation undertaken as a
result of the information contained herein should be verified in advance for
appropriateness and accuracy. Some of the information contained herein may
apply only to certain hardware or operating system architectures.
Introduction
The first step toward accurately estimating your backup requirements is a
complete understanding of your environment. Many performance issues can be
traced to hardware or environmental issues. A basic understanding of the entire
backup data path is important in determining the maximum performance you
can expect from your installation.
Every backup environment has a bottleneck. It may be a fast bottleneck, but it
will determine the maximum performance obtainable with your system.
Example:
Consider the configuration illustrated below. In this environment, backups run
slowly (in other words, they are not completing in the scheduled backup
window). Total throughput is eight to 10 megabytes per second.
What makes the backups run slowly? How can NetBackup or the environment be
configured to increase backup performance in this situation?
The explanation is that the LAN, having a speed of 100megabits per second, has
a theoretical throughput of 12.5 megabytes per second. In practice, 100BaseT
throughput is unlikely to exceed 70% utilization. Therefore, the best delivered
data rate is about 8 megabytes per second to the NetBackup server. The
throughput can be even lower than this, when TCP/IP packet headers,
TCP-window size constraints, router hops (packet latency for ACK packets
delays the sending of the next data packet), host CPU utilization, filesystem
overhead, and other LAN users’ activity are considered. Since the LAN is the
slowest element in the backup path, it is the first place to look in order to
increase backup performance in this configuration.
14 NetBackup capacity planning
Analyzing your backup requirements
backup system, explains how to calculate the amount of data you can
transfer over those networks in a given time.
Depending on the amount of data that you want to back up and the
frequency of those backups, you might want to consider installing a private
network just for backups.
■ What new systems will be added to your site in the next six months?
It is important to plan for future growth when designing your backup
system. By analyzing the potential future growth of your current or future
systems, you can insure the backup solution that you have accommodates
the kind of environment that you will have in the future. Remember to add
any resulting growth factor that you incur to your total backup solution.
■ Will user-directed backups or restores be allowed?
Allowing users to do their own backups and restores can reduce the time it
takes to initiate certain operations. However, user-directed operations can
also result in higher support costs and the loss of some flexibility.
User-directed operations can monopolize media and tape drives when you
most need them. They can also generate more support calls and training
issues while the users become familiar with the new backup system. You
will need to decide whether allowing user access to some of your backup
systems’ functions is worth the potential costs.
Other factors to consider when planning your backup capacity include:
■ Data type: What are the types of data: text, graphics, database? How
compressible is the data? How many files are involved? Will the data be
encrypted? (Note that encrypted backups may run slower. See “Encryption”
on page 133 for more information.)
■ Data location: Is the data local or remote? What are the characteristics of
the storage subsystem? What is the exact data path? How busy is the
storage subsystem?
■ Change management: Because hardware and software infrastructure will
change over time, is it worth the cost to create an independent test-backup
environment to ensure your production environment will work with the
changed components?
Note: The ideas and examples that follow are based on standard and ideal
calculations. Your numbers will differ based on your particular environment,
data, and compression rates.
■ “Calculate the required data transfer rate for your backups” on page 17
■ “Calculate how long it will take to back up to tape” on page 18
■ “Calculate how many tape drives are needed” on page 20
■ “Calculate the required data transfer rate for your network(s)” on page 21
■ “Calculate the size of your NetBackup catalog” on page 22
■ “Calculate the size of the EMM server” on page 23
■ “Calculate media needed for full and incremental backups” on page 25
■ “Calculate the size of the tape library needed to store your backups” on
page 26
■ “Design your master backup server based on your previous findings” on
page 27
■ “Estimate the number of master servers needed” on page 29
■ “Design your media server” on page 31
■ “Estimate the number of media servers needed” on page 32
■ “Design your NOM server” on page 33
■ “Summary” on page 36
Example: Calculating your ideal data transfer rate during the week
Assumptions:
Amount of data to back up during a full backup = 500 gigabytes
Amount of data to back up during an incremental backup = 20% of a full
backup Daily backup window = 8 hours
Solution 1:
Full backup = 500 gigabytes
Ideal data transfer rate = 500 gigabytes/8 hours = 62.5 gigabytes/hour
Solution 2:
Incremental backup = 100 gigabytes
Ideal data transfer rate = 100 gigabytes/8 hours = 12.5 gigabytes/hour
To calculate your ideal data transfer rate during the weekends, divide the
amount of data that needs to be backed up by the length of the weekend backup
window.
Note also that a backup of encrypted data may take more time. See “Encryption”
on page 133 for more information.
variables including your file system layout, system CPU load, and memory
usage.
Table 1-2 Drive controller data transfer rates
100BaseT (switched) 36 32
To calculate your NetBackup catalog size, you need to know how much data you
will be backing up for full and incremental backups, how often these backups
will be performed, and for how long they will be retained. Here are two simple
formulas to calculate these values:
Data being tracked = (Amount of data to back up) * (Number of backups) *
(Retention period)
NetBackup catalog size = 120 * (number of files)
Note: If you select NetBackup’s True Image Restore option, your catalog will be
twice as large as a catalog without this option selected. True Image Restore
collects the information required to restore directories to their contents at the
time of any selected full or incremental backup. Because the additional
information that NetBackup collects for incremental backups is the same as that
of a full backup, incremental backups take much more disk space when you
collect True Image Restore information.
Note: This space must be included when determining size requirements for a
master or media server, depending on where the EMM server is installed.
Space for the NBDB on the EMM server is required in the following two
locations:
UNIX
/usr/openv/db/data
/usr/openv/db/staging
Windows
install_path\NetBackupDB\data
install_path\NetBackupDB\staging
Calculate the required space for the NBDB in each of the two directories, as
follows:
60 MB + (2 KB * number of volumes configured for EMM)
where EMM is the Enterprise Media Manager, and volumes are NetBackup
(EMM) media volumes. Note that 60 MB is the default amount of space needed
for the NBDB database used by the EMM server. It includes pre-allocated space
for configuration information for devices and storage units.
Note: During NetBackup installation, the install script looks for 60 MB of free
space in the above /data directory; if there is insufficient space, the installation
fails. The space in /staging is only required when a hot catalog backup is run.
Example: Calculating how many tapes are needed to store all your
backups
Preliminary calculations:
Size of full backups = 500 gigabytes * 4 (per month) * 6 months = 12
terabytes
Size of incremental backups = (20% of 500 gigabytes) * 30 * 1 month = 3
26 NetBackup capacity planning
Designing your backup system
terabytes
Total data tracked = 12 terabytes + 3 terabytes = 15 terabytes
Solution 1:
Tape drive type = LTO gen 1
Tape capacity without compression = 100 gigabytes
Tape capacity with compression = 200 gigabytes
Without compression:
Tapes needed for full backups = 12 terabytes/100 gigabytes = 120
Tapes needed for incremental backups = 3 terabytes/100 gigabytes = 30
Total tapes needed = 120 + 30 = 150 tapes
With 2:1 compression:
Tapes needed for full backups = 12 terabytes/200 gigabytes = 60
Tapes needed for incremental backups = 3 terabytes/200 gigabytes = 15
Total tapes needed = 60 + 15 = 75 tapes
Solution 2:
Tape drive type = LTO gen 3
Tape capacity without compression = 400 gigabytes
Tape capacity with compression = 800 gigabytes
Without compression:
Tapes needed for full backups = 12 terabytes/400 gigabytes = 30
Tapes needed for incremental backups = 3 terabytes/400 gigabytes = 7.5 ~=
8
Total tapes needed = 30 + 8 = 38 tapes
With 2:1 compression:
Tapes needed for full backups = 12 terabytes/800 gigabytes = 15
Tapes needed for incremental backups = 3 terabytes/800 gigabytes = 3.75
~= 4
Total tapes needed = 15 + 4 = 19 tapes
Calculate the size of the tape library needed to store your backups
To calculate how many robotic library tape slots are needed to store all your
backups, take the number of tapes for backup calculated in “Calculate media
needed for full and incremental backups” on page 25 and add tapes for catalog
backup and cleaning:
Tape slots needed = (Number of tapes needed for backups) + (Number of
tapes needed for catalog backups) + 1 (for a cleaning tape)
A typical example of tapes needed for catalog backup is 2.
Additional tapes may be needed for the following:
■ If you plan to duplicate tapes or to reserve some media for special
(non-backup) use, add those tapes to the above formula.
NetBackup capacity planning 27
Designing your backup system
■ Add tapes needed for future data growth. Make sure your system has a
viable upgrade path as new tape drives become available.
In some cases, it may not be practical to design a generic server to back up all of
your systems. You might have one or several large servers that cannot be backed
up over a network within your backup window. In such cases, it is best to back up
those servers using their own locally-attached tape drives. Although this section
discusses how to design a master backup server, you can still use its information
to properly add the necessary tape drives and components to your other servers.
The next example shows how to configure a master server using the design
elements gathered from the previous sections.
Note: This table provides a rough estimate only, as a guideline for initial
planning. Note also that the RAM amounts shown below are for a base
NetBackup installation; RAM requirements vary depending on the NetBackup
features, options, and agents being used.
Component How many and what kind of component Number of CPU’s per
component
1 ATM card 1
Component How many and what kind of component Number of CPU’s per
component
OS and NetBackup 1
Example:
A system backing up clients over the network to a local tape drive at the
rate of 10MB/second would need 100MHz of available CPU power:
50MHz to move data from the network to the NetBackup server
50MHz to move data from the NetBackup server to tape.
■ Consider how much memory is needed (see “Memory needed per
master/media server component” on page 32).
At least 512 megabytes of RAM is recommended if the server is running a
Java GUI. NetBackup uses shared memory for local backups. NetBackup
buffer usage will affect how much memory is needed. See the “Tuning the
NetBackup data transfer path” chapter for more information on NetBackup
buffers.
Keep in mind that non-NetBackup processes need memory in addition to
what NetBackup needs.
A media server moves data from disk (on relevant clients) to storage
(usually disk or tape). The server must be carefully sized to maximize
throughput. Maximum throughput is attained when the server keeps its
tape devices streaming. (For an explanation of streaming, see “Tape
streaming” on page 126.)
Media server factors to consider for sizing include:
■ Disk storage access time
■ Adapter (for example, SCSI) speed
■ Bus (for example, PCI) speed
■ Tape device speed
■ Network interface (for example, 100BaseT) speed
■ Amount of system RAM
■ Other applications, if the host is non-dedicated
The platform chosen must be able to drive all network interfaces and keep all
tape devices streaming.
Sizing considerations
The size of your NOM server depends largely on the number of NetBackup
objects that NOM manages. See the following table.
Number of policies
Number of media
Based on the above factors, the following NOM server components should be
sized accordingly.
Disk space (for installed NOM binary + NOM database, described below)
RAM
The next section describes the NOM database and how it affects disk space
requirements, followed by a description of sizing guidelines for NOM.
NOM database
The Sybase database used by NOM is similar to that used by NetBackup and is
installed as part of the NOM server installation.
■ The disk space needed for the initial installation of NOM depends on the
volume of data initially loaded onto the server, based on the following:
number of policy data records, number of job data records, number of media
data records, and number of catalog image records.
■ The rate of NOM database growth depends on the quantity of data being
managed: policy data, job data, media data, and catalog data.
NetBackup capacity planning 35
Designing your backup system
Sizing guidelines
The following guidelines are presented in groups based on the number of objects
that your NOM server manages. The guidelines are intended for basic planning
purposes, and do not represent fixed recommendations or restrictions.
It is assumed that your NOM server is a standalone host (the host is not acting as
a NetBackup master server).
Note: Installation of NOM server software on the NetBackup master server is not
recommended.
In Table 1-8, find the installation category that matches your site, based on
number of master servers that your NOM server manages, number of jobs per
day across all master servers, and so forth.
Table 1-8 NetBackup installation categories (these are maximums for all
master servers combined)
Note: If your NetBackup installation is larger than those listed here (as to
number of NetBackup master servers, number of jobs per day, and so forth), the
behavior of NOM is unpredictable. In this case, Symantec recommends using
multiple NOM servers.
Using the NetBackup installation category from above (A, B, or C), read across in
Table 1-9 to the minimum requirements for NOM hardware.
Summary
Using the guidelines provided in this chapter, design a solution that can do a full
backup and incremental backups of your largest system within your time
window. The remainder of the backups can happen over successive days.
Eventually, your site may outgrow its initial backup solution. By following these
guidelines, you can add more capacity at a future date without having to
redesign your basic strategy. With proper design and planning, you can create a
backup strategy that will grow with your environment.
As outlined in the previous sections, the number and location of the backup
devices are dependent on a number of factors.
■ The amount of data on the target systems,
■ The available backup and restore windows,
■ The available network bandwidth, and
■ The speed of the backup devices.
If one drive causes backup window time conflicts, another can be added,
providing an aggregate rate of two drives. The trade-off is that the second drive
imposes extra CPU, memory, and I/O loads on the media server.
If you find that you cannot complete backups in the allocated window, one
approach is to either increase your backup window or decrease the frequency of
your full and incremental backups.
Another approach is to reconfigure your site to speed up overall backup
performance. Before you make any such change, you should understand what
determines your current backup performance. List or diagram your site network
NetBackup capacity planning 37
Questionnaire for capacity planning
and systems configuration. Note the maximum data transfer rates for all the
components of your backup configuration and compare these against the rate
you must meet for your backup window. This will identify the slowest
components and, consequently, the cause of your bottlenecks. Some likely areas
for bottlenecks include the networks, tape drives, client OS load, and filesystem
fragmentation.
Question Explanation
System name Any unique name to identify the machine. Hostname or any unique name for each
system.
Vendor The hardware vendor who made the system (for example, Sun, HP, IBM, generic PC)
Model For example: Sun E450, HP K580, Pentium II 300MHZ, HP Proliant 8500
Used storage Total used internal and external storage capacity - if the amount of data to be backed up
is substantially different from the amount used, please note that.
Type of external array For example: Hitachi, EMC, EMC CLARiiON, STK.
Network connection For example, 10/100MB, Gigabit, T1. It is important to know if the LAN is a switched
network or not.
Hot backup required? If so, this requires the optional database agent if backing up a database.
Key application For example: Exchange server, accounting system, software developer's code repository,
NetBackup critical policies.
Backup window For example: incrementals run M-F from 11PM to 6AM, Fulls are all day Sunday. This
information helps determine where potential bottlenecks will be and how to configure a
solution.
Retention policy For example: incrementals for 2 weeks, full backups for 13 weeks. This information will
help determine how to size the number of slots needed in a library.
38 NetBackup capacity planning
Questionnaire for capacity planning
Question Explanation
Comments? Any special situations to be aware of? Any significant patches on the operating system?
Will the backups be over a WAN? Do the backups need to go through a firewall?
Chapter 2
Master server
configuration guidelines
This chapter provides guidelines and recommendations for better performance
on the NetBackup master server.
This chapter includes the following sections:
■ “Managing NetBackup job scheduling” on page 40
■ “Miscellaneous considerations” on page 44
■ “Merging/splitting/moving servers” on page 48
■ “Guidelines for policies” on page 49
■ “Managing logs” on page 50
40 Master server configuration guidelines
Managing NetBackup job scheduling
Note: The Activity Monitor may not update if there are thousands of jobs to
view. If this happens, you may need to change the memory setting using the
NetBackup Java command jnbSA with the -mx option. Refer to the
“INITIAL_MEMORY, MAX_MEMORY” subsection in the NetBackup System
Administrator’s Guide for UNIX and Linux, Volume I. Note that this situation
does not affect NetBackup's ability to continue running jobs.
Windows
echo 0 > NOexpire
■ Prevent backups from starting by shutting down bprd (NetBackup Request
Manager). This will suspend scheduling of new jobs by nbpem. To shut
down bprd, you can use the Activity Monitor in the NetBackup
Administration Console.
Restart bprd to resume scheduling.
Miscellaneous considerations
Consider the following issues when planning for or troubleshooting NetBackup.
Disk staging
With disk staging, images can be created on disk initially, then copied later to
another media type (as determined in the disk staging schedule). The media type
for the final destination is typically tape, but could be disk. This two-stage
process leverages the advantages of disk-based backups in the near term, while
preserving the advantages of tape-based backups for long term.
Note that disk staging can be used to increase backup speed. For more
information, refer to the NetBackup System Administrator’s Guide, Volume I.
Master server configuration guidelines 45
NetBackup catalog strategies
/usr/openv/
NBDB.db server.conf
EMM_DATA.db databases.conf
EMM_INDEX.db
NBDB.log
/class /error /images /vault
BMRDB.db
/class_template /config /jobs
BMRDB.log
/failure_history
BMR_DATA.db
/client /media
BMR_INDEX.db
vxdbms.conf
/client_1 /Master
/Media_server /client_n
Note: For NetBackup release 6.0 and beyond, it is recommended that you use
schedule-based, incremental hot catalog backups with periodic full backups as
your preferred catalog backup method.
When defining the file list, use absolute pathnames for the locations of the
NetBackup and Media Manager catalog paths and include the server name
in the path. This is in case the media server performing the backup is
changed.
■ Back up the catalog using online, hot catalog backup
This type of catalog backup is for highly active NetBackup environments in
which continual backup activity is occurring. It is considered an online, hot
method because it can be performed while regular backup activity is taking
place. This type of catalog is policy-based and can span more than one tape.
It also allows for incremental backups, which can significantly reduce
catalog backup times for large catalogs.
■ Store the catalog on a separate file system
The NetBackup catalog can grow quickly depending on backup frequency,
retention periods, and the number of files being backed up. If you store the
NetBackup catalog data on its own file system, this ensures that other disk
resources, root file systems, and the operating system are not impacted by
the catalog growth. For information on how to move the catalog, refer to
“Catalog compression” on page 48.
■ Change the location of the NetBackup relational database files
The location of the NetBackup relational database files can be changed
and/or split into multiple directories for better performance. For example,
by placing the transaction log file, NBDB.log, on a physically separate drive,
you gain better protection against disk failure and increased efficiency in
writing to the log file. Refer to the procedure in the section “Moving NBDB
Database Files After Installation” in the “NetBackup Relational Database”
appendix of the NetBackup System Administrator’s Guide, Volume I.
■ Delay to compress catalog
The default value for this parameter is 0, which means that NetBackup does
not compress the catalog. As your catalog increases in size, you may want to
use a value between 10 and 30 days for this parameter. When you restore
old backups, which requires looking at catalog files that have been
compressed, NetBackup automatically uncompresses the files as needed,
with minimal performance impact. For information on how to compress the
catalog, refer to “Catalog compression” on page 48.
Catalog compression
When the NetBackup image catalog becomes too large for the available disk
space, there are two ways to manage this situation:
■ Compress the image catalog
■ Move the image catalog.
For details, refer to “Moving the Image Catalog” and “Compressing and
Uncompressing the Image Catalog” in the NetBackup System Administrator’s
Guide, Volume I.
Note that NetBackup compresses images after each backup session, regardless
of whether or not any backups were successful. This happens right before the
execution of the session_notify script and the backup of the catalog. The actual
backup session is extended until compression is complete.
Merging/splitting/moving servers
A master server schedules and maintains backup information for a given set of
systems. The Enterprise Media Manager (EMM) server and its database maintain
centralized device and media related information used on all servers that are
part of the configuration. By default, the EMM server and the NetBackup
Relational Database (NBDB) that contains the EMM data are located on the
master server. A large and dynamic data center can expect to periodically
reconfigure the number and organization of its backup servers.
Master server configuration guidelines 49
Guidelines for policies
excluded. Should disaster (or user error) strike, not being able to recover
files costs much more than backing up extra data.
When a policy specifies that all local drives be backed up
(ALL_LOCAL_DRIVES), nbpem initiates a parent job (nbgenjob) that
connects to the client and runs bpmount -i to get a list of mount points.
Then nbpem initiates a job with its own unique job identification number
for each mount point. Next the client bpbkar starts a stream for each job.
Then, and only then, the exclude list is read by NetBackup. When the entire
job is excluded, bpbkar exits with a status 0, stating that it sent 0 of 0 files
to backup. The resulting image files are treated just as any other successful
backup's image files. They expire in the normal fashion when the expiration
date in the image header files specifies they are to expire.
Critical policies
For online, hot catalog backups (a new feature in NetBackup 6.0), make sure to
identify those policies that are crucial to recovering your site in the event of a
disaster. For more information on hot catalog backup and critical policies, refer
to the NetBackup System Administrator’s Guide, Volume I.
Schedule frequency
To minimize the number of times you back up files that have not changed, and
to minimize your consumption of bandwidth, media, and other resources,
consider limiting the frequency of your full backups to monthly or even
quarterly, followed by weekly cumulative incremental backups and daily
incremental backups.
Managing logs
Optimizing the performance of vxlogview
As explained in the NetBackup Troubleshooting Guide, the vxlogview command
is used for viewing logs created by unified logging (VxUL). The vxlogview
command will deliver optimum performance when a file ID is specified in the
query.
For example: when viewing messages logged by the NetBackup Resource Broker
(nbrb) for a given day, you can filter out the library messages while viewing the
nbrb logs. To achieve this, run vxlogview as follows:
vxlogview –o nbrb –i nbrb –n 0
Note that -i nbrb specifies the file ID for nbrb. Specifying the file ID improves
the performance, because the search is confined to a smaller set of files.
Master server configuration guidelines 51
Managing logs
3 Type of message 2
4 Severity of error: 4
1: Unknown
2: Debug
4: Informational
8: Warning
16: Error
32: Critical
7 (optional entry) 0
8 (optional entry) 0
1 Unknown
2 General
4 Backup
8 Archive
16 Retrieve
32 Security
64 Backup status
Note: Make sure that both your inbound network connection and your SCSI/FC
bus have enough bandwidth to feed all of your tape drives.
Example:
iSCSI (360 GB/hour)
Two LTO gen 3 drives, each rated at approximately 300 GB/hour (2:1
compression)
In this example, the tape drives require more speed than provided by the iSCSI
bus. Only one tape drive will stream given this configuration. The solution is to
add a second iSCSI bus, or to move to a connection that is fast enough to
efficiently feed data to the tape drives.
Adjusting media_error_threshold
To configure the NetBackup media error thresholds, use the nbemmcmd
command on the media server as follows. NetBackup freezes a tape volume or
downs a drive for which these values are exceeded. For more detail on the
nbemmcmd command, refer to the man page or to the NetBackup Commands
Guide.
UNIX
/usr/openv/netbackup/bin/admincmd/nbemmcmd -changesetting
-time_window unsigned integer -machinename string
-media_error_threshold unsigned integer -drive_error_threshold
unsigned integer
Windows
<install_path>\NetBackup\bin\admincmd\nbemmcmd.exe
-changesetting -time_window unsigned integer -machinename string
-media_error_threshold unsigned integer -drive_error_threshold
unsigned integer
For example, if the -drive_error_threshold is set to the default value of 2,
the drive is downed after 3 errors in 12 hours. If the
-drive_error_threshold is set to a value of 6, it would take 7 errors in the
same 12 hour period before the drive would be downed.
56 Media server configuration guidelines
How to change the threshold for media errors
Note: The following description has nothing to do with the number of times
NetBackup retries a backup/restore that fails. That situation is controlled by the
global configuration parameter “Backup Tries” for backups and the bp.conf
entry RESTORE_RETRIES for restores. This algorithm merely deals with
whether I/O errors on tape should cause media to be frozen or drives to be
downed.
4 Use devfsadm to recreate the device nodes in /devices and the device
links in /dev for tape devices by running any one (not all) of the following
commands:
/usr/sbin/devfsadm -i st
/usr/sbin/devfsadm -c tape
/usr/sbin/devfsadm -C -c tape (Use this command to enforce cleanup if
dangling logical links are present in /dev.)
5 Reload the st driver:
/usr/sbin/modload st
6 Restart the NetBackup and Media Manager daemons.
Pooling
Here are some useful conventions for media pools (formerly known as volume
pools):
■ Configure a scratch pool for management of scratch tapes. If a scratch pool
exists, EMM can move volumes from that pool to other pools that do not
have volumes available.
■ Use the available_media script in the goodies directory. You can put the
available_media report into a script which redirects the report output to a
file and emails the file to the administrators daily or weekly. This helps
track which tapes are full, frozen, suspended, and so on. By means of a
script, you can also filter the output of the available_media report to
generate custom reports.
To monitor media, you can also use the NetBackup Operations Manager
(NOM). For instance, NOM can be configured to issue an alert if there are
fewer than X number of media available, or if more than X% of the media is
frozen or suspended.
■ Use the none pool for cleaning tapes.
■ Do not create too many pools. The existence of too many pools causes the
library capacity to become fragmented across the pools. Consequently, the
library becomes filled with many partially-full tapes.
Introduction
Before you create a database, decide how to protect the database against
potential failures. Answer the following questions before developing your
backup strategy.
■ Is it acceptable to lose any data if a hardware failure damages some of the
files that constitute a database?
■ Will you ever need to recover to past points-in-time?
■ Does the database need to be available at all times (24x7)?
For specific information on backing up and restoring your database, refer to the
NetBackup administrator’s guide for your database product. In addition, the
manufacturer of your database product may provide publications that document
backup recommendations and methods.
Here are the tape drive cleaning methods that can be used in a NetBackup
installation:
■ Frequency-based cleaning
■ On-demand cleaning
■ TapeAlert
■ Robotic cleaning
Frequency-based cleaning
NetBackup does frequency-based cleaning by tracking the number of hours a
drive has been in use. When this time reaches a configurable parameter,
NetBackup creates a job that mounts and exercises a cleaning tape. This cleans
the drive in a preventive fashion. The advantage of this method is that typically
there are no drives unavailable awaiting cleaning. There is also no limitation on
platform or robot type. On the downside, cleaning is done more often than
necessary. This adds system wear and consumes time that could be used to write
to the drive. Another limitation is that this method is hard to tune. When new
tapes are used, drive cleaning is needed less frequently; the need for cleaning
increases as the tape inventory ages. This increases the amount of tuning
administration needed and, consequently, the margin of error.
Best practices 67
Best practices: tape drive cleaning
On-demand cleaning
Refer to the NetBackup Media Manager System Administrator’s Guide for more
information on this topic.
TapeAlert
TapeAlert allows reactive cleaning for most drive types. TapeAlert allows a tape
drive to notify EMM when it needs to be cleaned. EMM then performs the
cleaning. You must have a cleaning tape configured in at least one library slot in
order to utilize this feature. TapeAlert is the recommended cleaning solution if
it can be implemented.
Not all drives, at all firmware levels, support this type of reactive cleaning. In
the case where reactive cleaning is not supported on a particular drive,
frequency-based cleaning may be substituted. This solution is not vendor or
platform specific. The specific firmware levels have not been tested by
Symantec, however the vendor should be able to confirm that the TapeAlert
feature is supported.
■ How TapeAlert works
To understand NetBackup's behavior with drive-cleaning TapeAlerts, it is
important to understand the TapeAlert interface to a drive. The TapeAlert
interface to a tape drive is via the SCSI bus, based on a Log Sense page,
which contains 64 alert flags. The conditions that cause a flag to be set and
cleared are device-specific and are determined by the device vendor.
The configuration of the Log Sense page is via a Mode Select page. The
Mode Sense/Select configuration of the TapeAlert interface is compatible
with the SMART diagnostic standard for disk drives.
NetBackup reads the TapeAlert Log Sense page at the beginning and end of
a write/read job. TapeAlert flags 20 to 25 are used for cleaning management
although some drive vendors’ implementations may vary from this.
NetBackup uses TapeAlert flag 20 (Clean Now) and TapeAlert flag 21 (Clean
Periodic) to determine when it needs to clean a drive.
When a drive is selected by NetBackup for a backup, the Log Sense page is
reviewed by bptm for status. If one of the clean flags is set, the drive will be
cleaned before the job starts.
If a backup is in progress and one of the clean flags is set, the flag is not read
until a tape is dismounted from the drive.
If a job spans media and, during the first tape, one of the clean flags is set,
the cleaning light comes on and the drive will be cleaned before the second
piece of media is mounted in the drive.
The implication is that the present job will conclude its ongoing write
despite a TapeAlert Clean Now or Clean Periodic message. That is, the
TapeAlert will not require the loss of what has been written to tape so far.
68 Best practices
Best practices: storing tape cartridges
Robotic cleaning
Robotic cleaning is not proactive, and is not subject to the limitations detailed
above. By being reactive, unnecessary cleanings are eliminated, frequency
tuning is not an issue, and the drive can spend more time moving data, rather
than in maintenance operations.
Library-based cleaning is not supported by EMM for most robots, since robotic
library and operating systems vendors have implemented this type of cleaning
in many different ways.
Complete site disaster Yes Vaulting and off site media storage
■ Put the NetBackup catalog on different online storage than the data
being backed up.
In the case of a site storage disaster, the catalogs of the backed-up data
should not reside on the same disks as production data. The reason
behind this is straightforward: you want to avoid the case where, if a
disk drive loses production data, it also loses the catalog of the
production data, resulting in increased downtime.
■ Regularly confirm the integrity of the NetBackup catalog.
On a regular basis, such as quarterly or after major operations or
personnel changes, walk through the process of recovering a catalog
from tape. This essential part of NetBackup administration can save
hours in the event of a catastrophe.
Policy names
One good naming convention for policies is platform_datatype_server(s).
Example 1: w2k_filesystems_trundle
This policy name designates a policy for a single Windows server doing file
system backups.
Example 2: w2k_sql_servers
This policy name designates a policy for backing up a set of Windows 2000 SQL
servers. Several servers may be backed up by this policy. Servers that are
candidates for being included in a single policy are those running the same
operating system and with the same backup requirements. Grouping servers
within a single policy reduces the number of policies and eases the management
of NetBackup.
Schedule names
Create a generic scheme for schedule naming. One recommended set of schedule
names is daily, weekly, and monthly. Another recommended set of names is
incremental, cumulative, and full. This convention keeps the management of
NetBackup at a minimum. It also helps with the implementation of Vault, if your
site uses Vault.
72 Best practices
Best practices: naming conventions
Performance tuning
Overview
The final measure of NetBackup performance is the length of time required for
backup operations to complete (usually known as the backup window), or the
length of time required for a critical restore operation to complete. However, to
measure existing performance and improve future performance by means of
those measurements calls for performance metrics more reliable and
reproducible than simple wall clock time. This chapter will discuss these metrics
in more detail.
After establishing accurate metrics as described here, you can measure the
current performance of NetBackup and your system components to compile a
baseline performance benchmark. With a baseline, you can apply changes in a
controlled way. By measuring performance after each change, you can
accurately measure the effect of each change on NetBackup performance.
Server variables
It is important to eliminate all other NetBackup activity from your environment
when you are measuring the performance of a particular NetBackup operation.
One area to consider is the automatic scheduling of backup jobs by the
NetBackup scheduler.
When policies are created, they are usually set up to allow the NetBackup
scheduler to initiate the backups. The NetBackup scheduler will initiate backups
based on the traditional NetBackup frequency-based scheduling or on certain
days of the week, month, or other time interval. This process is called
calendar-based scheduling. As part of the backup policy definition, the Start
Window is used to indicate when the NetBackup scheduler can start backups
using either frequency-based or calendar-based scheduling. When you perform
backups for the purpose of performance testing, this setup might interfere since
the NetBackup scheduler may initiate backups unexpectedly, especially if the
operations you intend to measure run for an extended period of time.
Measuring performance 77
Controlling system variables for consistent testing conditions
The simplest way to prevent the NetBackup scheduler from running backup jobs
during your performance testing is to create a new policy specifically for use in
performance testing and to leave the Start Window field blank in the schedule
definition for that policy. This prevents the NetBackup scheduler from initiating
any backups automatically for that policy. After creating the policy, you can run
the backup on demand by using the Manual Backup command from the
NetBackup Administration Console.
To prevent the NetBackup scheduler from running backup jobs unrelated to the
performance test, you may want to set all other backup policies to inactive by
using the Deactivate command from the NetBackup Administration Console. Of
course, you must reactivate the policies to start running backups again.
You can use a user-directed backup to run the performance test as well.
However, using the Manual Backup option for a policy is preferred. With a
manual backup, the policy contains the entire definition of the backup job,
including the clients and files that are part of the performance test. Running the
backup manually, straight from the policy, means there is no doubt which policy
will be used for the backup, and makes it easier to change and test individual
backup settings: from the policy dialog.
Before you start the performance test, check the Activity Monitor to make sure
there is no NetBackup processing currently in progress. Similarly, check the
Activity Monitor after the performance test for unexpected activity (such as an
unanticipated restore job) that may have occurred during the test.
Additionally, check for non-NetBackup activity on the server during the
performance test and try to reduce or eliminate it.
Network variables
Network performance is key to achieving optimum performance with
NetBackup. Ideally, you would use a completely separate network for
performance testing to avoid the possibility of skewing the results by
encountering unrelated network activity during the course of the test.
In many cases, a separate network is not available. Ensure that non-NetBackup
activity is kept to an absolute minimum during the time you are evaluating
performance. If possible, schedule testing for times when backups are not
active. Even occasional short bursts of network activity may be enough to skew
78 Measuring performance
Controlling system variables for consistent testing conditions
the results during portions of the performance test. If you are sharing the
network with production backups occurring for other systems, you must
account for this activity during the performance test.
Another network variable you must consider is host name resolution.
NetBackup depends heavily upon a timely resolution of host names to operate
correctly. If you have any delays in host name resolution, including reverse
name lookup to identify a server name from an incoming connection from a
certain IP address, you may want to eliminate that delay by using the HOSTS
(Windows) or /etc/hosts (UNIX) file for host name resolution on systems
involved in your performance test environment.
Client variables
Make sure the client system is in a relatively quiescent state during
performance testing. A lot of activity, especially disk-intensive activity such as
virus scanning on Windows, will limit the data transfer rate and skew the results
of your tests.
One possible mistake is to allow another NetBackup server, such as a production
backup server, to have access to the client during the course of the test. This
may result in NetBackup attempting to back up the same client to two different
servers at the same time, which would severely impact the results of a
performance test in progress at that time.
Different file systems have different performance characteristics. For example,
comparing data throughput results from operations on a UNIX VxFS or
Windows FAT file system to those from operations on a UNIX NFS or Windows
NTFS system may not be valid, even if the systems are otherwise identical. If you
do need to make such a comparison, factor the difference between the file
systems into your performance evaluation testing, and into any conclusions you
may draw from that testing.
Data variables
Monitoring the data you are backing up improves the repeatability of
performance testing. If possible, move the data you will use for testing backups
to its own drive or logical partition (not a mirrored drive), and defragment the
drive before you begin performance testing. For testing restores, start with an
empty disk drive or a recently defragmented disk drive with ample empty space.
This will reduce the impact of disk fragmentation on the NetBackup
performance test and yield more consistent results between tests.
Similarly, for testing backups to tape, always start each test run with an empty
piece of media. You can do this by expiring existing images for that piece of
media through the Catalog node of the NetBackup Administration Console, or by
Measuring performance 79
Evaluating performance
Evaluating performance
There are two primary locations from which to obtain NetBackup data
throughput statistics: the NetBackup Activity Monitor and the NetBackup All
Log Entries report. The choice of which location to use is determined by the type
of NetBackup operation you are measuring: non-multiplexed backup, restore, or
multiplexed backup.
80 Measuring performance
Evaluating performance
You can obtain statistics for all three types of operations from the NetBackup All
Log Entries report. You can obtain statistics for non-multiplexed backup or
restore operations from the NetBackup Activity Monitor. For multiplexed
backup operations, you can obtain the overall statistics from the All Log Entries
report after all the individual backup operations which are part of the
multiplexed backup are complete. In this case, the statistics available in the
Activity Monitors for each of the individual backup operations are relative only
to that operation, and do not reflect the actual total data throughput to the tape
drive.
There may be small differences between the statistics available from these two
locations due to slight differences in rounding techniques between the entries in
the Activity Monitor and the entries in the All Logs report. For a given type of
operation, choose either the Activity Monitor or the All Log Entries report and
consistently record your statistics only from that location. In both the Activity
Monitor and the All Logs report, the data-streaming speed is reported in
kilobytes per second. If a backup or restore is repeated, the reported speed can
vary between repetitions depending on many factors, including the availability
of system resources and system utilization, but the reported speed can be used
to assess the performance of the data-streaming process.
The statistics from the NetBackup error logs show the actual amount of time
spent reading and writing data to and from tape. This does not include time
spent mounting and positioning the tape. Cross-referencing the information
from the error logs with data from the bpbkar log on the NetBackup client
(showing the end-to-end elapsed time of the entire process) indicates how much
time was spent on operations unrelated to reading and writing to and from the
tape.
■ Elapsed Time: This field shows the total elapsed time from when the job
was initiated to job completion and can be used as in indication of total
wall clock time for the operation.
■ KB per Second: This is the data throughput rate.
■ Kilobytes: Compare this value to the amount of data. Although it
should be comparable, the NetBackup data amount will be slightly
higher because of administrative information, known as metadata,
saved for the backed up data.
For example, if you display properties for a directory containing 500
files, each 1 megabyte in size, the directory shows a size of 500
megabytes, or 524,288,000 bytes, which is equal to 512,000 kilobytes.
The NetBackup report may show 513,255 kilobytes written, reporting
1255 kilobytes more than the file size of the directory. This is true for a
flat directory. Subdirectory structures may diverge due to the way the
operating system tracks used and available space on the disk. Also, be
aware that the operating system may be reporting how much space was
allocated for the files in question, not just how much data is actually
there. For example, if the allocation block size is 1 kilobyte, 1000 1-byte
files will report a total size of 1 megabyte, even though 1 kilobyte of
data is all that exists. The greater the number of files, the larger this
discrepancy may become.
Note: The messages shown here will vary according to the locale setting of
the master server.
Entry Statistic
started backup job for client <name>, The Date and Time fields for this entry show the time at
policy <name>, schedule <name> on storage which the backup job started.
unit <name>
82 Measuring performance
Evaluating performance
Entry Statistic
successfully wrote backup id <name>, copy For a multiplexed backup, this entry shows the size of
<number>, <number> Kbytes the individual backup job and the Date and Time fields
show the time at which the job finished writing to the
storage device. The overall statistics for the multiplexed
backup group, including the data throughput rate to the
storage device, are found in a subsequent entry below.
successfully wrote <number> of <number> For multiplexed backups, this entry shows the overall
multiplexed backups, total Kbytes <number> statistics for the multiplexed backup group including
at Kbytes/sec the data throughput rate.
successfully wrote backup id <name>, copy For non-multiplexed backups, this entry essentially
<number>, fragment <number>, <number> Kbytes combines the information in the previous two entries
at <number> Kbytes/sec for multiplexed backups into one entry showing the size
of the backup job, the data throughput rate, and the
time, in the Date and Time fields, at which the job
finished writing to the storage device.
the requested operation was successfully The Date and Time fields for this entry show the time at
completed which the backup job completed. This value is later than
the “successfully wrote” entry above because it includes
extra processing time at the end of the job for tasks such
as NetBackup image validation.
begin reading backup id <name>, (restore), The Date and Time fields for this entry show the time at
copy <number>, fragment <number> from media which the restore job started reading from the storage
id <name> on drive index <number> device. (Note that the latter part of the entry is not
shown for restores from disk, as it does not apply.)
successfully restored from backup id <name>, copy For a multiplexed restore (generally speaking, all
<number>, <number> Kbytes restores from tape are multiplexed restores as
non-multiplexed restores require additional action from
the user), this entry shows the size of the individual
restore job and the Date and Time fields show the time
at which the job finished reading from the storage
device. The overall statistics for the multiplexed restore
group, including the data throughput rate, are found in
a subsequent entry below.
successfully restored <number> of <number> For multiplexed restores, this entry shows the overall
requests <name>, read total of <number> statistics for the multiplexed restore group, including
Kbytes at <number> Kbytes/sec the data throughput rate.
Measuring performance 83
Evaluating performance
Entry Statistic
successfully read (restore) backup id media For non-multiplexed restores (generally speaking, only
<number>, copy <number>, fragment <number>, restores from disk are treated as non-multiplexed
<number> Kbytes at <number> Kbytes/sec restores), this entry essentially combines the
information from the previous two entries for
multiplexed restores into one entry showing the size of
the restore job, the data throughput rate, and the time,
in the Date and Time fields, at which the job finished
reading from the storage device.
Additional information
The NetBackup All Log Entries report will also have entries similar to those
described above for other NetBackup operations such as image duplication
operations used to create additional copies of a backup image. Those entries
have a very similar format and may be useful for analyzing the performance of
NetBackup for those operations.
The bptm debug log file for tape backups (or bpdm log file for disk backups) will
contain the entries that are in the All Log Entries report, as well as additional
detail about the operation that may be useful for performance analysis. One
example of this additional detail is the intermediate data throughput rate
message for multiplexed backups, as shown below:
... intermediate after <number> successful, <number> Kbytes at
<number> Kbytes/sec
This message is generated whenever an individual backup job completes that is
part of a multiplexed backup group. In the debug log file for a multiplexed
backup group consisting of three individual backup jobs, for example, there
could be two intermediate status lines, then the final (overall) throughput rate.
For a backup operation, the bpbkar debug log file will also contain additional
detail about the operation that may be useful for performance analysis.
Keep in mind, however, that writing the debug log files during the NetBackup
operation introduces some overhead that would not normally be present in a
production environment. Factor that additional overhead into any calculations
done on data captures while debug log files are in use.
The information in the All Logs report is also found in
/usr/openv/netbackup/db/error (UNIX) or
install_path\NetBackup\db\error (Windows).
See the NetBackup Troubleshooting Guide to learn how to set up NetBackup to
write these debug log files.
84 Measuring performance
Evaluating UNIX system components
To measure disk I/O using the bpdm_dev_null touch file (UNIX only)
For UNIX systems, the procedure below can be useful as a follow-on to the
bpbkar procedure (above). If the bpbkar procedure shows that the disk read
performance is not the bottleneck and does not help isolate the problem, then
the bpdm_dev_null procedure described below may be helpful. If the
bpdm_dev_null procedure shows poor performance, the bottleneck is
somewhere in the data transfer between the bpbkar process on the client and
the bpdm process on the server. The problem may involve the network, or shared
memory (such as not enough buffers, or buffers that are too small). To change
shared memory settings, see “Shared memory (number and size of data buffers)”
on page 102.
Caution: If not used correctly, the following procedure can lead to data loss.
Touching the bpdm_dev_null file redirects all disk backups to /dev/null, not
just those backups using the storage unit created by this procedure. You should
disable active production policies for the duration of this test and remove
/dev/null as soon as this test is complete.
Note: The bpdm_dev_null file re-directs any backup that uses a disk
storage unit to /dev/null.
2 Create a new disk storage unit, using /tmp or some other directory as the
image directory path.
3 Create a policy that uses the new disk storage unit.
4 Run a backup using this policy. NetBackup will create a file in the storage
unit directory as if this were a real backup to disk. This degenerate image
file will be zero bytes long.
5 To remove the zero-length file and clear the NetBackup catalog of a backup
that cannot be restored, run this command:
/usr/openv/netbackup/bin/admincmd/bpexpdate -backupid backupid
-d 0
where backupid is the name of the file residing in the storage unit directory.
Note: It is recommended that a remote host be used for monitoring of the test
host, to reduce load that might otherwise skew results.
Note: The default scale for the Processor Queue Length may not be equal to 1. Be
sure to read the data correctly. For example, if the default scale is 10x, then a
reading of 40 actually means that only 4 processes are waiting.
Overview
This chapter contains information on ways to optimize NetBackup. This chapter
is not intended to provide tuning advice for particular systems. If you would like
help fine-tuning your NetBackup installation, please contact Symantec
Consulting Services.
Before examining the factors that affect backup performance, please note that
an important first step is to ensure that your system meets NetBackup’s
recommended minimum requirements. Refer to the NetBackup Installation
Guide and NetBackup Release Notes for information about these requirements.
Additionally, Symantec recommends that you have the most recent NetBackup
software patch installed.
Many performance issues can be traced to hardware or other environmental
issues. A basic understanding of the entire data transfer path is essential in
determining the maximum obtainable performance in your environment. Poor
performance is often the result of poor planning, which can be based on
unrealistic expectations of any particular component of the data transfer path.
The requirements for database backups may not be the same as for file system
backups. This information applies to file system backups unless otherwise
noted.
Tuning the NetBackup data transfer path 91
Basic tuning suggestions for the data path
Tuning suggestions:
■ Use multiplexing.
Multiplexing is a NetBackup option that lets you write multiple data
streams from several clients at once to a single tape drive or several tape
drives. Multiplexing can be used to improve the backup performance of
slow clients, multiple slow networks, and many small backups (such as
incremental backups). Multiplexing reduces the time each job spends
waiting for a device to become available, thereby making the best use of the
transfer rate of your storage devices.
Multiplexing is not recommended when restore speed is of paramount
interest or when your tape drives are slow. To reduce the impact of
multiplexing on restore times, you can improve your restore performance
by reducing the maximum fragment size for the storage units. If the
fragment size is small, so that the backup image is contained in several
fragments, a NetBackup restore can quickly skip to the specific fragment
containing the file to be restored. In contrast, if the fragment size is large
enough to contain the entire image, the NetBackup restore starts at the very
beginning of the image and reads through the image until it finds the
desired file.
Multiplexed backups can be de-multiplexed to improve restore performance
by using bpduplicate to move fragmented images to a sequential image
on a new tape.
92 Tuning the NetBackup data transfer path
Basic tuning suggestions for the data path
compensating for one issue can cause another. The best approach is to use
the defaults unless you anticipate or encounter an issue.
■ Adjust the backup load on the server.
Change the Limit jobs per policy attribute for one or more of the
policies that the server is backing up. For example, decreasing Limit
jobs per policy reduces the load on a server on a specific subnetwork.
Reconfiguring policies or schedules to use storage units on other
servers also reduces the load. Another possibility is to use bandwidth
limiting on one or more clients.
■ Adjust the backup load on the server during specific time periods only.
Reconfigure schedules that execute during the time periods of interest,
so they use storage units on servers that can handle the load (assuming
you are using media servers).
■ Adjust the backup load on the clients.
Change the Maximum jobs per client global attribute. For example,
increasing Maximum jobs per client increases the number of
concurrent jobs that any one client can process and therefore increases
the load.
■ Reduce the time to back up clients.
Increase the number of jobs that clients can perform concurrently, or
use multiplexing. Another possibility is to increase the number of jobs
that the server can perform concurrently for the policy or policies that
are backing up the clients.
■ Give preference to a policy.
Increase the Limit jobs per policy attribute value for the preferred
policy relative to other policies. Alternatively, increase the priority for
the policy.
■ Adjust the load between fast and slow networks.
Increase the values of Limit jobs per policy and Maximum jobs per
client for the policies and clients on a faster network. Decrease these
values for slower networks. Another solution is to use bandwidth
limiting.
■ Limit the backup load produced by one or more clients.
Use bandwidth limiting to reduce the bandwidth used by the clients.
■ Maximize the use of devices
Use multiplexing. Also, allow as many concurrent jobs per storage unit,
policy, and client as possible without causing server, client, or network
performance issues.
■ Prevent backups from monopolizing devices.
Tuning the NetBackup data transfer path 95
NetBackup client performance
Limit the number of devices that NetBackup can use concurrently for
each policy or limit the number of drives per storage unit. Another
approach is to exclude some of your devices from Media Manager
control.
Note: Using AUTOSENSE may cause network problems and performance issues.
Network load
There are two key considerations to monitor when you evaluate remote backup
performance:
■ The amount of network traffic
■ The amount of time that network traffic is high
Small bursts of high network traffic for short durations will have some negative
impact on the data throughput rate. However, if the network traffic remains
consistently high for a significant amount of time during the operation, the
network component of the NetBackup data transfer path will very likely be the
bottleneck. Always try to schedule backups during times when network traffic is
low. If your network is heavily loaded, you may wish to implement a secondary
network which can be dedicated to backup and restore traffic.
For tape: because the default value for the NetBackup data buffer size is 65536
bytes, this formula results in a default NetBackup network buffer size of 263168
bytes for backups and 132096 bytes for restores.
For disk: because the default value for the NetBackup data buffer size is 262144
bytes, this formula results in a default NetBackup network buffer size of
1049600 bytes for backups and 525312 bytes for restores.
To set this parameter, create the following files:
UNIX
/usr/openv/netbackup/NET_BUFFER_SZ
/usr/openv/netbackup/NET_BUFFER_SZ_REST
Windows
install_path\NetBackup\NET_BUFFER_SZ
install_path\NetBackup\NET_BUFFER_SZ_REST
These files contain a single integer specifying the network buffer size in bytes.
For example, to use a network buffer size of 64 Kilobytes, the file would contain
65536. If the files contain the integer 0 (zero), the default value for the network
buffer size is used.
If the NET_BUFFER_SZ file exists, and the NET_BUFFER_SZ_REST file does not
exist, the contents of NET_BUFFER_SZ will specify the network buffer size for
both backup and restores.
If the NET_BUFFER_SZ_REST file exists, its contents will specify the network
buffer size for restores.
If both files exist, the NET_BUFFER_SZ file will specify the network buffer size
for backups, and the NET_BUFFER_SZ_REST file will specify the network buffer
size for restores.
Because local backup or restore jobs on the media server do not send data over
the network, this parameter has no effect on those operations. It is used only by
the NetBackup media server processes which read from or write to the network,
specifically, the bptm or bpdm processes. It is not used by any other NetBackup
processes on a master server, media server, or client.
This parameter is the counterpart on the media server to the Communications
Buffer Size parameter on the client, which is described below. The network
buffer sizes are not required to be the same on all of your NetBackup systems for
NetBackup to function properly; however, setting the Network Buffer Size
parameter on the media server and the Communications Buffer Size parameter
on the client (see below) to the same value has significantly improved the
throughput of the network component of the NetBackup data transfer path in
some installations.
Similarly, the network buffer size does not have a direct relationship with the
NetBackup data buffer size (described under “Shared memory (number and size
Tuning the NetBackup data transfer path 99
NetBackup network performance
Note: The NET_BUFFER_SZ_REST file is not used on the client. The value in the
NET_BUFFER_SZ file is used for both backups and restores.
100 Tuning the NetBackup data transfer path
NetBackup network performance
NOSHM forces a local backup to run as though it were a remote backup. A local
backup is a backup of a client that has a directly-attached storage unit, such as a
client that happens to be a master or media server. A remote backup is a backup
that passes the data across a network connection from the client to a master or
media server’s storage unit.
A local backup normally has one or more bpbkar processes that read from the
disk and write into shared memory, and a bptm process that reads from shared
memory and writes to the tape. A remote backup has one or more bptm (child)
processes that read from a socket connection to bpbkar and write into shared
memory, and a bptm (parent) process that reads from shared memory and
writes to the tape. NOSHM forces the remote backup model even when the client
and the media server are the same system.
For a local backup without NOSHM, shared memory is used between bptm and
bpbkar. Whether the backup is remote or local, and whether NOSHM exists or
not, shared memory is always used between bptm (parent) and bptm (child).
Note: NOSHM does not affect the shared memory used by the bptm process to
buffer data being written to tape. bptm uses shared memory for any backup,
local or otherwise.
It is okay for a client to have an entry for a server that is not currently on
the same network.
Note: Restores use the same buffer size that was used to back up the images
being restored.
UNIX Windows
Non-multiplexed backup 8 16
Tuning the NetBackup data transfer path 103
NetBackup server performance
Multiplexed backup 4 8
Verify 8 16
Import 8 16
Duplicate 8 16
UNIX Windows
Non-multiplexed backup 64K (tape), 256K (disk) 64K (tape), 256K (disk)
Multiplexed backup 64K (tape), 256K (disk) 64K (tape), 256K (disk)
Restore, verify, or import same size as used for the same size as used for the
backup backup
Duplicate read side: same size as read side: same size as used for
used for the backup; the backup;
write side: 64K (tape), write side: 64K (tape), 256K
256K (disk) (disk)
On Windows, a single tape I/O operation is performed for each shared data
buffer. Therefore, this size must not exceed the maximum block size for the tape
device or operating system. For Windows systems, the maximum block size is
generally 64K, although in some cases customers are using a larger value
successfully. For this reason, the terms “tape block size” and “shared data buffer
size” are synonymous in this context.
The NetBackup daemons do not have to be restarted for the new values to be
used. Each time a new job starts, bptm checks the configuration file and adjusts
its behavior.
Caution: It is critical to perform both backup and restore testing if the shared
data buffer size is changed. If all NetBackup media servers are not running in
the same operating system environment, it is critical to test restores on each of
the NetBackup media servers that may be involved in a restore operation. For
example, if a UNIX NetBackup media server is used to write a backup to tape
with a shared data buffer (block size) of 256 Kilobytes, then it is possible that a
Windows NetBackup media server will not be able to read that tape. In general, it
is strongly recommended you test restore as well as backup operations, to avoid
the potential for data loss. See “Testing changes made to shared memory” on
page 107.
To change the size of the shared data buffers, create the following file on the
media server:
UNIX
For tape
/usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS
For disk
/usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS_DISK
Windows
For tape
install_path\NetBackup\db\config\SIZE_DATA_BUFFERS
For disk
install_path\NetBackup\db\config\SIZE_DATA_BUFFERS_DISK
This file contains a single integer specifying the size of each shared data buffer
in bytes. The integer must be a multiple of 32 kilobytes (a multiple of 1024 is
recommended); see Table 8-16 for valid values. The integer represents the size
of one tape or disk buffer in bytes. For example, to use a shared data buffer size
of 64 Kilobytes, the file would contain the integer 65536.
The NetBackup daemons do not have to be restarted for the parameter values to
be used. Each time a new job starts, bptm checks the configuration file and
adjusts its behavior.
106 Tuning the NetBackup data transfer path
NetBackup server performance
Analyze the buffer usage by checking the bptm debug log before and after
altering the size of buffer parameters.
32 32768
64 65536
96 98304
128 131072
160 163840
192 196608
224 229376
256 262144
IMPORTANT: Because the data buffer size equals the tape I/O size, the value
specified in SIZE_DATA_BUFFERS must not exceed the maximum tape I/O size
supported by the tape drive or operating system. This is usually 256 or 128
Kilobytes. Check your operating system and hardware documentation for the
maximum values. Take into consideration the total system resources and the
entire network. The Maximum Transmission Unit (MTU) for the LAN network
may also have to be changed. NetBackup expects the value for NET_BUFFER_SZ
and SIZE_DATA_BUFFERS to be in bytes, so in order to use 32k, use 32768 (32 x
1024).
Note: Some Windows tape devices are not able to write with block sizes higher
than 65536 (64 Kilobytes). Backups created on a UNIX media server with
SIZE_DATA_BUFFERS set to more than 65536 cannot be read by some Windows
media servers. This means that the Windows media server would not be able to
import or restore any images from media that were written with
SIZE_DATA_BUFFERS greater than 65536.
Note: The size of the shared data buffers used for a restore operation is
determined by the size of the shared data buffers in use at the time the backup
was written. This file is not used by restores.
Tuning the NetBackup data transfer path 107
NetBackup server performance
Note: Note that AIX media servers do not need to tune shared memory because
AIX uses dynamic memory allocation.
or
15:26:01 [21544] <2> mpx_setup_restore_shm: using 12 data
buffers, buffer size is 65536
When you change these settings, take into consideration the total system
resources and the entire network. The Maximum Transmission Unit (MTU)
for the local area network (LAN) may also have to be changed.
Note: The following section refers to the bptm process on the media server
during back up and restore operations from a tape storage device. If you are
backing up to or restoring from a disk storage device, substitute bpdm for bptm
throughout the section. For example, to activate debug logging for a disk storage
device, the following directory must be created:
/usr/openv/netbackup/logs/bpdm (UNIX) or
install_path\NetBackup\logs\bpdm (Windows).
Achieving a good balance between the data producer and the data consumer
processes is an important factor in achieving optimal performance from the
NetBackup server component of the NetBackup data transfer path.
Producer - consumer relationship during a backup
NetBackup
Client Network
Consumer
BPTM BPTM
(Child Process) Shared Buffers (Parent Process)
Producer
Tape
Local clients
When the NetBackup media server and the NetBackup client are part of the
same system, the NetBackup client is referred to as a local client.
■ Backup of local client
For a local client, the bpbkar (UNIX) or bpbkar32 (Windows) process reads
data from the disk during a backup and places it in the shared buffers. The
bptm process reads the data from the shared buffer and writes it to tape.
■ Restore of local client
During a restore of a local client, the bptm process reads data from the tape
and places it in the shared buffers. The tar (UNIX) or tar32 (Windows)
process reads the data from the shared buffers and writes it to disk.
110 Tuning the NetBackup data transfer path
NetBackup server performance
Remote clients
When the NetBackup media server and the NetBackup client are part of two
different systems, the NetBackup client is referred to as a remote client.
■ Backup of remote client
The bpbkar (UNIX) or bpbkar32 (Windows) process on the remote client
reads data from the disk and writes it to the network. Then a child bptm
process on the media server receives data from the network and places it in
the shared buffers. The parent bptm process on the media server reads the
data from the shared buffers and writes it to tape.
■ Restore of remote client
During the restore of the remote client, the parent bptm process reads data
from the tape and places it into the shared buffers. The child bptm process
reads the data from the shared buffers and writes it to the network. The tar
(UNIX) or tar32 (Windows) process on the remote client receives the data
from the network and writes it to disk.
If a full buffer is needed by the data consumer but is not available, the data
consumer increments the Wait and Delay counters to indicate that it had to wait
for a full buffer. After a delay, the data consumer will check again for a full
buffer. If a full buffer is still not available, the data consumer increments the
Delay counter to indicate that it had to delay again while waiting for a full
buffer. The data consumer will repeat the delay and full buffer check steps until
a full buffer is available.
Tuning the NetBackup data transfer path 111
NetBackup server performance
}
}
If an empty buffer is needed by the data producer but is not available, the data
producer increments the Wait and Delay counter to indicate that it had to wait
for an empty buffer. After a delay, the data producer will check again for an
empty buffer. If an empty buffer is still not available, the data producer
increments the Delay counter to indicate that it had to delay again while waiting
for an empty buffer. The data producer will relate the delay and empty buffer
check steps until an empty buffer is available.
The algorithm for a data producer has a similar structure:
while (Buffer_Is_Not_Empty) {
++Wait_Counter;
while (Buffer_Is_Not_Empty) {
++Delay_Counter;
delay (DELAY_DURATION);
}
}
Analysis of the Wait and Delay counter values indicates which process, producer
or consumer, has had to wait most often and for how long.
There are four basic Wait and Delay Counter relationships:
■ Data Producer >> Data Consumer. The data producer has substantially
larger Wait and Delay counter values than the data consumer.
The data consumer is unable to receive data fast enough to keep the data
producer busy. Investigate means to improve the performance of the data
consumer. For a back up operation, check if the data buffer size is
appropriate for the tape drive being used (see below).
If data consumer still has a substantially large value in this case, try
increasing the number of shared data buffers to improve performance (see
below).
■ Data Producer = Data Consumer (large value). The data producer and the
data consumer have very similar Wait and Delay counter values, but those
values are relatively large.
112 Tuning the NetBackup data transfer path
NetBackup server performance
This may indicate that the data producer and data consumer are regularly
attempting to used the same shared data buffer. Try increasing the number
of shared data buffers to improve performance (see below).
■ Data Producer = Data Consumer (small value). The data producer and the
data consumer have very similar Wait and Delay counter values, but those
values are relatively small.
This indicates that there is a good balance between the data producer and
data consumer, which should yield good performance from the NetBackup
server component of the NetBackup data transfer path.
■ Data Producer << Data Consumer. The data producer has substantially
smaller Wait and Delay counter values than the data consumer.
The data producer is unable to deliver data fast enough to keep the data
consumer busy. Investigate ways to improve the performance of the data
producer. For a restore operation, check if the data buffer size (see below) is
appropriate for the tape drive being used.
If the data producer still has a relatively large value in this case, try
increasing the number of shared data buffers to improve performance (see
below).
The bullets above describe the four basic relationships possible. Of primary
concern is the relationship and the size of the values. Information on
determining substantial versus trivial values appears on the following pages.
The relationship of these values only provides a starting point in the analysis.
Additional investigative work may be needed to positively identify the cause of a
bottleneck within the NetBackup data transfer path.
Note: Writing the debug log files introduces some additional overhead and will
have a small impact on the overall performance of NetBackup. This impact will
be more noticeable for a high verbose level setting. Normally, you should not
need to run with debug logging enabled on a production system.
To determine wait and delay counter values for a local client backup:
1 Activate debug logging by creating these two directories on the media
server:
UNIX
/usr/openv/netbackup/logs/bpbkar
/usr/openv/netbackup/logs/bptm
Tuning the NetBackup data transfer path 113
NetBackup server performance
Windows
install_path\NetBackup\logs\bpbkar
install_path\NetBackup\logs\bptm
2 Execute your backup.
Look at the log for the data producer (bpbkar on UNIX or bpbkar32 on
Windows) process in:
UNIX
/usr/openv/netbackup/logs/bpbkar
Windows
install_path\NetBackup\logs\bpbkar
The line you are looking for should be similar to the following, and will have
a timestamp corresponding to the completion time of the backup:
... waited 224 times for empty buffer, delayed 254 times
In this example the Wait counter value is 224 and the Delay counter value is
254.
3 Look at the log for the data consumer (bptm) process in:
UNIX
/usr/openv/netbackup/logs/bptm
Windows
install_path\NetBackup\logs\bptm
The line you are looking for should be similar to the following, and will have
a timestamp corresponding to the completion time of the backup:
... waited for full buffer 1 times, delayed 22 times
In this example, the Wait counter value is 1 and the Delay counter value is
22.
To determine wait and delay counter values for a remote client backup:
1 Activate debug logging by creating this directory on the media server:
UNIX
/usr/openv/netbackup/logs/bptm
Windows
install_path\NetBackup\logs\bptm
2 Execute your backup.
3 Look at the log for the bptm process in:
UNIX
/usr/openv/netbackup/logs/bptm
Windows
install_path\NetBackup\Logs\bptm
4 Delays associated with the data producer (bptm child) process will appear as
follows:
... waited for empty buffer 22 times, delayed 151 times, ...
114 Tuning the NetBackup data transfer path
NetBackup server performance
In this example, the Wait counter value is 22 and the Delay counter value is
151.
5 Delays associated with the data consumer (bptm parent) process will appear
as:
... waited for full buffer 12 times, delayed 69 times
In this example the Wait counter value is 12, and the Delay counter value is
69.
To determine wait and delay counter values for a local client restore:
1 Activate logging by creating the two directories on the NetBackup media
server:
UNIX
/usr/openv/netbackup/logs/bptm
/usr/openv/netbackup/logs/tar
Windows
install_path\NetBackup\logs\bptm
install_path\NetBackup\logs\tar
2 Execute your restore.
Look at the log for the data consumer (tar or tar32) process in the tar log
directory created above.
The line you are looking for should be similar to the following, and will have
a timestamp corresponding to the completion time of the restore:
... waited for full buffer 27 times, delayed 79 times
In this example, the Wait counter value is 27, and the Delay counter value is
79.
3 Look at the log for the data producer (bptm) process in the bptm log
directory created above.
The line you are looking for should be similar to the following, and will have
a timestamp corresponding to the completion time of the restore:
... waited for empty buffer 1 times, delayed 68 times
In this example, the Wait counter value is 1 and the delay counter value is
68.
To determine wait and delay counter values for a remote client restore:
1 Activate debug logging by creating the following directory on the media
server:
UNIX
/usr/openv/netbackup/logs/bptm
Windows
install_path\NetBackup\logs\bptm
2 Execute your restore.
Tuning the NetBackup data transfer path 115
NetBackup server performance
3 Look at the log for bptm in the bptm log directory created above.
4 Delays associated with the data consumer (bptm child) process will appear
as follows:
... waited for full buffer 36 times, delayed 139 times
In this example, the Wait counter value is 36 and the Delay counter value is
139.
5 Delays associated with the data producer (bptm parent) process will appear
as follows:
... waited for empty buffer 95 times, delayed 513 times
In this example the Wait counter value is 95 and the Delay counter value is
513.
Note: When you run multiple tests, you can rename the current log file.
NetBackup will automatically create a new log file, which prevents you from
erroneously reading the wrong set of values.
Deleting the debug log file will not stop NetBackup from generating the debug
logs. You must delete the entire directory. For example, to stop bptm logging,
you must delete the bptm subdirectory. NetBackup will automatically generate
debug logs at the specified verbose setting whenever the directory is detected.
The receive network buffer is used by the bptm child process to read from
the network during a remote backup.
...setting receive network buffer to 263168 bytes
The send network buffer is used by the bptm child process to write to the
network during a remote restore.
...setting send network buffer to 131072 bytes
See “NetBackup media server network buffer size” on page 97 for more
information about the Network Buffer Size parameter on the media server.
Suppose you wanted to analyze a local backup in which there was a 30-minute
data transfer duration baselined at 5 Megabytes/second with a total data
transfer of 9,000 Megabytes. Because a local backup is involved, if you refer to
“Roles of processes during backup and restore operations” on page 110, you can
determine that bpbkar (UNIX) or bpbkar32 (Windows) is the data producer
and bptm is the data consumer.
You would next want to determine the Wait and Delay values for bpbkar (or
bpbkar32) and bptm by following the procedures described in “Determining
wait and delay counter values” on page 112. For this example, suppose those
values were:
bptm 95 105
Using these values, you can determine that the bpbkar (or bpbkar32) process
is being forced to wait by a bptm process which cannot move data out of the
shared buffer fast enough.
Next, you can determine time lost due to delays by multiplying the Delay
counter value by the parent or child delay value, whichever applies.
In this example, the bpbkar (or bpbkar32) process uses the child delay value,
while the bptm process uses the parent delay value. (The defaults for these
values are 20 for child delay and 30 for parent delay.) The values are specified in
milliseconds. See “Parent/child delay values” on page 108 for more information
on how to modify these values.
Tuning the NetBackup data transfer path 117
NetBackup server performance
Use the following equations to determine the amount of time lost due to these
delays:
This is useful in determining that the delay duration for the bpbkar (or
bpbkar32) process is significant. If this delay were entirely removed, the
resulting transfer time of 10:40 (total transfer time of 30 minutes minus delay of
19 minutes and 20 seconds) would indicate a throughput value of 14
Megabytes/sec, nearly a threefold increase. This type of performance increase
would warrant expending effort to investigate how the tape drive performance
can be improved.
The number of delays should be interpreted within the context of how much
data was moved. As the amount of data moved increases, the significance
threshold for counter values increases as well.
Again, using the example of a total of 9,000 Megabytes of data being transferred,
assume a 64-Kilobytes buffer size. You can determine the total number of
buffers to be transferred using the following equation:
Number_Slots =9,216,000 / 64
=144,000
The Wait counter value can now be expressed as a percentage of the total
divided by the number of buffers transferred:
bptm = 95 / 144,000
= 0.07%
In this example, in the 20 percent of cases where the bpbkar (or bpbkar32)
process needed an empty shared data buffer, that shared data buffer has not yet
been emptied by the bptm process. A value this large indicates a serious issue,
118 Tuning the NetBackup data transfer path
NetBackup server performance
In this example, on average the bpbkar (or bpbkar32) process had to delay
twice for each wait condition that was encountered. If this ratio is substantially
large, you may wish to consider increasing the parent or child delay value,
whichever one applies, to avoid the unnecessary overhead of checking for a
shared data buffer in the correct state too often. Conversely, if this ratio is close
to 1, you may wish to consider reducing the applicable delay value to check more
often and see if that increases your data throughput performance. Keep in mind
that the parent and child delay values are rarely changed in most NetBackup
installations.
The preceding information explains how to determine if the values for Wait and
Delay counters are substantial enough for concern. The Wait and Delay counters
are related to the size of data transfer. A value of 1,000 may be extreme when
only 1 Megabyte of data is being moved. The same value may indicate a
well-tuned system when gigabytes of data are being moved. The final analysis
must determine how these counters affect performance by considering such
factors as how much time is being lost and what percentage of time a process is
being forced to delay.
Note: Unless you have particular reasons for creating smaller fragments (such
as when restoring a few individual files, restoring from multiplexed backups, or
restoring from older equipment), larger fragment sizes are likely to yield better
overall performance.
Example 1:
Assume you are backing up four streams to a multiplexed tape, and each stream
is a single, 1 gigabyte file and a default maximum fragment size of 1 TB has been
specified. The resultant backup image logically looks like the following. ‘TM’
denotes a tape mark, or file mark, that indicates the start of a fragment.
TM <4 gigabytes data> TM
When restoring any one of the 1 gigabyte files, the restore positions to the TM
and then has to read all 4 gigabytes to get the 1 gigabyte file.
If you set the maximum fragment size to 1 gigabyte:
122 Tuning the NetBackup data transfer path
NetBackup server performance
TM <1 gigabyte data> TM <1 gigabyte data> TM <1 gigabyte data> TM <1
gigabyte data> TM
this does not help, since the restore still has to read all four fragments to pull
out the 1 gigabyte of the file being restored.
Example 2:
This is the same as Example 1, but assume four streams are backing up 1
gigabyte worth of /home or C:\. With the maximum fragment size (Reduce
fragment size) set to a default of 1 TB (and assuming all streams are relatively
the same performance), you again end up with:
TM <4 gigabytes data> TM
Restoring /home/file1 or C:\file1 and/home/file2 or C:\file2 from one of the
streams will have to read as much of the 4 gigabytes as necessary to restore all
the data. But, if you set Reduce fragment size to 1 gigabyte, the image looks like
this:
TM <1 gigabyte data> TM <1 gigabyte data> TM <1 gigabyte data> TM <1
gigabyte data> TM
In this case, home/file1 or C:\file1 starts in the second fragment, and bptm
positions to the second fragment to start the restore of home/file1 or C:\file1
(this has saved reading 1 gigabyte so far). After /home/file1 is done, if
/home/file2 or C:\file2 is in the third or forth fragment, the restore can position
to the beginning of that fragment before it starts reading as it looks for the data.
These examples illustrate that whether fragmentation benefits a restore
depends on what the data is, what is being restored, and where in the image the
data is. In Example 2, reducing the fragment size from 1 gigabyte to half a
gigabyte (512 Megabytes) increases the chance the restore can locate by
skipping instead of reading when restoring relatively small amounts of an
image.
NUMBER_DATA_BUFFERS_RESTORE setting
This parameter can help keep other NetBackup processes busy while a
multiplexed tape is positioned during a restore. Increasing this value causes
NetBackup buffers to occupy more physical RAM. This parameter only applies
to multiplexed restores. For more information on this parameter, see “Shared
memory (number and size of data buffers)” on page 102.
Do not edit these files, because they contain offsets and byte counts that are
used for seeking to and reading the image information.
When bprd, the request daemon on the master server, receives the first stream
of a multiplexed restore request, it triggers the MPX_RESTORE_DELAY timer to
start counting the configured amount of time. At this point, bprd watches and
waits for related multiplexed jobs from the same client before starting the
overall job. If another associated stream is received within the timeout period, it
is added to the total job, and the timer is reset to the MPX_RESTORE_DELAY
period. Once the timeout has been reached without an additional stream being
received by bprd, the timeout window closes, all associated restore requests are
sent to bptm, and a tape is mounted. If any associated restore requests are
received after this event, they are queued to wait until the tape that is now “In
Use” is returned to an idle state.
If MPX_RESTORE_DELAY is not set high enough, NetBackup may need to mount
and read the same tape multiple times to collect all of the necessary header
information necessary for the restore. Ideally, NetBackup would read a
multiplexed tape, collecting all of the header information it needs, with a single
pass of the tape, thus minimizing the amount of time to restore.
Example (Oracle):
Suppose that MPX_RESTORE_DELAY is not set in the bp.conf file, so its value is
the default of 30 seconds. Suppose also that you initiate a restore from an Oracle
RMAN backup that was backed up using 4 channels or 4 streams, and you use
the same number of channels to restore.
RMAN passes NetBackup a specific data request, telling NetBackup what
information it needs to start and complete the restore. The first request is
passed and received by NetBackup in 29 seconds, causing the
MPX_RESTORE_DELAY timer to be reset. The next request is passed and
received by NetBackup in 22 seconds, so again the timer is reset. The third
request is received 25 seconds later, resetting the timer a third time, but the
fourth request is received 31 seconds after the third. Since the fourth request
was not received within the restore delay interval, NetBackup only starts three
of the four restores. Instead of reading from the tape once, NetBackup queues
the fourth restore request until the previous three requests are completed. Since
all of the multiplexed images are on the same tape, NetBackup mounts, rewinds,
and reads the entire tape again to collect the multiplexed images for the fourth
restore request.
Note that in addition to NetBackup's reading the tape twice, RMAN waits to
receive all the necessary header information before it begins the restore.
If MPX_RESTORE_DELAY had been larger than 30 seconds, NetBackup would
have received all four restore requests within the restore delay windows and
collected all the necessary header information with one pass of the tape. Oracle
would have started the restore after this one tape pass, improving the restore
performance significantly.
126 Tuning the NetBackup data transfer path
NetBackup storage device performance
Media positioning
When a backup or restore is performed, the storage device must position the
tape so that the data is over the read/write head. Depending on the location of
the data and the overall performance of the media device, this can take a
significant amount of time. When you conduct performance analysis with media
containing multiple images, it is important to account for the time lag that
occurs before the data transfer starts.
Tape streaming
If a tape device is being used at its most efficient speed, it is said to be streaming
the data onto the tape. Generally speaking, if a tape device is streaming, there
will be little physical stopping and starting of the media. Instead the media will
be constantly spinning within the tape drive. If the tape device is not being used
at its most efficient speed, it may continually start and stop the media from
spinning. This behavior is the opposite of tape streaming and usually results in a
poor data throughput rate.
Data compression
Most tape devices support some form of data compression within the tape
device itself. Compressible data (such as text files) yields a higher data
throughput rate than non-compressible data, if the tape device supports
hardware data compression.
Tuning the NetBackup data transfer path 127
NetBackup storage device performance
Tape devices typically come with two performance rates: maximum throughput
and nominal throughput. Maximum throughput is based on how fast
compressible data can be written to the tape drive when hardware compression
is enabled in the drive. Nominal throughput refers to rates achievable with
non-compressible data.
Note: Tape drive data compression cannot be set by NetBackup. Follow the
instructions provided with your OS and tape drive to be sure data compression is
set correctly.
Note: For best performance, use only one data stream to back up each physical
device on the client. Running multiple concurrent streams from a single
physical device can adversely affect the time to back up that device because the
drive heads must move back and forth between tracks containing the files for
the respective streams.
server
clients
backup to tape
■ Multi-streaming writes multiple data streams, each to its own tape drive,
unless multiplexing is used.
server
backup to tape
Encryption
When the NetBackup encryption option is enabled, your backups may run
slower. How much slower depends on the throttle point in your backup path. If
the network is the issue, encryption should not hinder performance. If the
network is not the issue, then encryption may slow down the backup.
Note that some local backups actually ran faster with encryption than without
it. In some field test cases, memory utilization has been found to be roughly the
same with and without encryption.
Compression
Two types of compression can be used with NetBackup, client compression
(configured in the NetBackup policy) and tape drive compression (handled by
the device hardware). Some or all of the files may also have been compressed by
other means prior to the backup.
■ Avoid using both tape compression and client compression, as this can
actually increase the amount of backed-up data.
■ Only in rare cases is it beneficial to use client (software) compression. For
very dense data, compression algorithms take a long time and often
increase the overall size of the images when compressing an already
compressed image. In cases where the files are already compressed, devices
should be pointed to native device drivers. In other cases, NetBackup client
compression should be turned off, and the hardware should handle the
compression.
■ On UNIX: client compression reduces the amount of data sent over the
network, but impacts the client. The NetBackup client configuration setting
MEGABYTES_OF_MEMORY may help client performance. It is undesirable
to compress files which are already compressed. If you find that this is
happening with your backups, refer to the NetBackup configuration option
COMPRESS_SUFFIX. Edit this setting through bpsetconfig.
NetBackup Java
For performance improvement, refer to the following sections in the NetBackup
System Administrator’s Guide for UNIX and Linux, Volume I: “Configuring the
NetBackup-Java Administration Console,” and the subsection “NetBackup-Java
Performance Improvement Hints.” In addition, the NetBackup Release Notes
may contain information about NetBackup Java performance.
Vault
Refer to the “Best Practices” chapter of the NetBackup Vault System
Administrator’s Guide.
Tuning other NetBackup components 135
Fast recovery with bare metal restore
Note: BMR requires the True image restore option. This option has implications
on the size of the NetBackup catalog. Refer to “Calculate the size of your
NetBackup catalog” on page 22 for more details.
■ Run the following bpbkar throughput test on the client with Windows:
C:\Veritas\Netbackup\bin\bpbkar32 -nocont > NUL 2>
(for example, C:\Veritas\Netbackup\bin\bpbkar32 -nocont c:\
> NUL 2> temp.f)
■ When initially configuring the Windows server, optimize TCP/IP
throughput as opposed to shared file access.
■ Always select the choice of boosting background performance on Windows
versus foreground performance.
■ Turn off NetBackup Client Job Tracker if the client is a system server.
■ Regularly review the patch announcements for every server OS. Install
patches that affect TCP/IP functions, such as correcting out-of-sequence
delivery of packets.
FlashBackup
If using Advanced Client FlashBackup with a copy-on-write snapshot
method
If you are using the FlashBackup feature of Advanced Client with a
copy-on-write method such as nbu_snap, assign the snapshot cache device to a
separate hard drive. This will improve performance by reducing disk contention
and potential head thrashing due to the writing of data to maintain the
snapshot.
Note: Resizing the read buffer for incremental backups can result in a faster
backup in some cases, and a slower backup in others. The result depends on such
factors as the location of the data to be read, the size of the data to be read
relative to the size of the read buffer, and the read characteristics of the storage
device and the I/O stack. Experimentation may be necessary to achieve the best
setting.
Save the nomsrvctl file. Then stop and restart the NOM processes, as
follows:
/opt/VRTSnom/bin/NOMAdmin -stop_service
/opt/VRTSnom/bin/NOMAdmin -start_service
■ On Windows servers, open the Registry Editor and go to the following
location:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\
VRTSnomSrvr\Parameters
Increase the JVM Option Number 0 value. For example, enter -Xmx2048 for
the maximum heap size.
Then stop and restart the NOM services, as follows:
install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin.bat -stop_service
install_path\NetBackup Operations Manager\bin\admincmd\NOMAdmin.bat -start_service
Note: The critical factors in performance are not software-based. They are
hardware selection and configuration. Hardware has roughly four times the
weight that software has in determining performance.
140 Tuning disk I/O performance
Hardware performance hierarchy
Host
Memory
Level 5
PCI bridge PCI bridge
PCI bus PCI bus
PCI bus
Level 4
Array 1 Array 2
In general, all data going to or coming from disk must pass through host
memory. In the following diagram, a dashed line shows the path that the data
takes through a media server.
Host
Memory
Level 5
PCI bridge
PCI bus
Level 4
PCI card
Data moving
through host
Level 3 Fibre channel memory
Array 2
Raid controller Tape,
Level 2 Ethernet
, or
Shelf another
Level 1 Shelf non-disk
Drives
The data moves up through the ethernet PCI card at the far right. The card sends
the data across the PCI bus and through the PCI bridge into host memory.
NetBackup then writes this data to the appropriate location. In a disk example,
the data passes through one or more PCI bridges, over one or more PCI buses,
through one or more PCI cards, across one or more fibre channels, and so on.
Sending data through more than one PCI card increases bandwidth by breaking
up the data into large chunks and sending a group of chunks at the same time to
multiple destinations. For example, a write of 1 MB could be split into 2 chunks
going to 2 different arrays at the same time. If the path to each array is x
bandwidth, the aggregate bandwidth will be approximately 2x.
Each level in the Performance Hierarchy diagram represents the transitions
over which data will flow. These transitions have bandwidth limits.
Between each level there are elements that can affect performance as well.
142 Tuning disk I/O performance
Hardware performance hierarchy
Larger disk arrays will have more than one internal FC-AL. Shelves may even
support 2 FC-AL so that there will be two paths between the RAID controller and
every shelf, which provides for redundancy and load balancing.
Array Array
A typical host will support 2 or more PCI buses, with each bus supporting 1 or
more PCI cards. A bus has a topology similar to FC-AL in that only 2 endpoints
can be communicating at the same time. That is, if there are 4 cards plugged into
a PCI bus, only one of them can be communicating with the host at a given
instant. Multiple PCI buses are implemented to allow multiple data paths to be
communicating at the same time, resulting in aggregate bandwidth gains.
PCI buses have 2 key factors involved in bandwidth potential: the width of the
bus - 32 or 64 bits, and the clock or cycle time of the bus (in Mhz).
As a rule of thumb, a 32-bit bus can transfer 4 bytes per clock and a 64-bit bus
can transfer 8 bytes per clock. Most modern PCI buses support both 64-bit and
32-bit cards. Currently PCI buses are available in 4 clock rates:
■ 33 Mhz
■ 66 Mhz
■ 100 Mhz (sometimes referred to as PCI-X)
■ 133 Mhz (sometimes referred to as PCI-X)
PCI cards also come in different clock rate capabilities.
Backward-compatibility is very common; for example, a bus rated at 100 Mhz
will support 100, 66, and 33 Mhz cards.
Likewise, a 64-bit bus will support both 32-bit and 64-bit cards.
They can also be mixed; for example, a 100-Mhz 64-bit bus can support any mix
of clock and width that are at or below those values.
Tuning disk I/O performance 145
Hardware performance hierarchy
You should also remember that a PCI bus is a unidirectional bus, which means
that when it is doing a transfer in one direction, it cannot move data in the other
direction, even from another card.
Host
Memory
Level 5
PCI bridge PCI bridge
Example 1
A general hardware configuration could have dual 2-gigabit fibre channel ports
on a single PCI card. In such a case, the following is true:
■ Potential bandwidth is approximately 400 MB/second.
■ For maximum performance, the card must be plugged into at least a 66 Mhz
PCI slot.
■ No other cards on that bus should need to transfer data at the same time.
That single card will saturate the PCI bus.
■ Putting 2 of these cards (4 ports total) onto the same bus and expecting
them to aggregate to 800 MB/second will never work unless the bus and
cards are 133 Mhz.
Example 2
The following more detailed example shows a pyramid of bandwidth potentials
with aggregation capabilities at some points. Suppose you have the following
hardware:
■ 1x 66 Mhz quad 1 gigabit ethernet
■ 4x 66 Mhz 2 gigabit fibre channel
■ 4x disk array with 1 gigabit fibre channel port
■ 1x Sun V880 server (2x 33 Mhz PCI buses and 1x 66 Mhz PCI bus)
In this case, for maximum backup and restore throughput with clients on the
network, the following is one way to assemble the hardware so that no
constraints limit throughput.
■ The quad 1-gigabit ethernet card can do approximately 400 MB/second
throughput at 66 Mhz.
■ It requires at least a 66 Mhz bus, because putting it in a 33 Mhz bus would
limit throughput to approximately 200 MB/second.
■ It will completely saturate the 66 Mhz bus, so do not put any other cards on
that bus that need significant I/O at the same time.
Since the disk arrays have only 1-gigabit fibre channel ports, the fibre channel
cards will degrade to 1 gigabit each.
148 Tuning disk I/O performance
Tuning software for better performance
■ Each card can therefore move approximately 100 MB/second. With four
cards, the total is approximately 400 MB/second.
■ However, you do not have a single PCI bus available that can support that
400MB /second, since the 66-Mhz bus is already taken by ethernet card.
■ There are two 33-Mhz buses which can each support approximately 200
MB/second. Therefore, you can put 2 of the fibre channel cards on each of
the 2 buses.
This configuration can move approximately 400 MB/second for backup or
restore. Real-world results of a configuration like this show approximately 350
MB/second.
The optimum size of I/O operations is dependent on many factors and varies
greatly depending on the hardware setup.
Below is the performance hierarchy diagram, but in this version, each array only
has a single shelf.
Tuning disk I/O performance 149
Tuning software for better performance
Host
Memory
Level 5
PCI bridge PCI bridge
Array 1 Array 2
Drives Drives
You can implement software RAID-0 to make the two independent arrays
look like one logical device. RAID-0 is a plain stripe with no parity. Parity
protects against drive failure, and this configuration already has RAID-5
parity protecting the drives inside the array.
The software RAID-0 is configured for a stripe unit size of 512KB (the I/O
size of each unit) and a stripe width of 2 (1 for each of the arrays).
Since 1MB is the optimum I/O size for the volume (the RAID-0 entity on the
host), that size is used throughout the rest of the I/O stack.
■ If possible, configure the file system mounted over the volume for 1MB. The
application performing I/O to the file system also uses an I/O size of 1MB. In
NetBackup, I/O sizes are set in the configuration touch file
.../db/config/SIZE_DATA_BUFFERS_DISK. See “Changing the size of
shared data buffers” on page 105 for more information.
Chapter 11
OS-related tuning factors
This chapter provides OS-related tuning recommendations that can improve
NetBackup performance.
This chapter includes the following sections:
■ “Kernel tuning (UNIX)” on page 152
■ “Adjusting data buffer size (Windows)” on page 157
■ “Other Windows issues” on page 159
152 OS-related tuning factors
Kernel tuning (UNIX)
Note: Keep in mind that changing these parameters may affect other
applications that use the same parameters. Making sizeable changes to these
parameters may result in performance trade-offs. Usually, the best approach is
to make small changes and monitor the results.
Note: The parameters described in this section can be used on Solaris 8, 9, and
10. However, many of the following parameters are obsolete in Solaris 10. See
“Kernel parameters in Solaris 10” on page 154 for a list of the parameters now
obsolete in Solaris 10 and for further assistance with Solaris 10 parameters.
Below are brief definitions of the message queue, semaphore, and shared
memory parameters. The parameter definitions apply to a Solaris system. The
values for these parameters can be set in the file /etc/system.
■ Message queues
set msgsys:msginfo_msgmax = maximum message size
set msgsys:msginfo_msgmnb = maximum length of a message queue in
bytes. The length of the message queue is the sum of the lengths of all the
messages in the queue.
set msgsys:msginfo_msgmni = number of message queue identifiers
set msgsys:msginfo_msgtql = maximum number of outstanding messages
system-wide that are waiting to be read across all message queues.
■ Semaphores
set semsys:seminfo_semmap = number of entries in semaphore map
set semsys:seminfo_semmni = maximum number of semaphore identifiers
system-wide
set semsys:seminfo_semmns = number of semaphores system-wide
set semsys:seminfo_semmnu = maximum number of undo structures in
system
set semsys:seminfo_semmsl = maximum number of semaphores per id
OS-related tuning factors 153
Kernel tuning (UNIX)
Example:
This is an example of tuning the kernel parameters for NetBackup master
servers and media servers, for a Solaris 8 or 9 system. Symantec provides this
information only to assist in kernel tuning for NetBackup. See “Kernel
parameters in Solaris 10” on page 154 for Solaris 10.
These are recommended minimum values. If /etc/system already contains
any of these entries, use the larger of the existing setting and the setting
provided here. Before modifying /etc/system, use the command
/usr/sbin/sysdef -i to view the current kernel parameters.
After you have changed the settings in /etc/system, reboot the system to
allow the changed settings to take effect. After rebooting, the sysdef command
will display the new settings.
*BEGIN NetBackup with the following recommended minimum settings in a
Solaris /etc/system file
*Message queues
set msgsys:msginfo_msgmap=512
set msgsys:msginfo_msgmax=8192
set msgsys:msginfo_msgmnb=65536
set msgsys:msginfo_msgmni=256
set msgsys:msginfo_msgssz=16
set msgsys:msginfo_msgtql=512
set msgsys:msginfo_msgseg=8192
*Semaphores
set semsys:seminfo_semmap=64
set semsys:seminfo_semmni=1024
set semsys:seminfo_semmns=1024
154 OS-related tuning factors
Kernel tuning (UNIX)
set semsys:seminfo_semmnu=1024
set semsys:seminfo_semmsl=300
set semsys:seminfo_semopm=32
set semsys:seminfo_semume=64
*Shared memory
set shmsys:shminfo_shmmax=16777216
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=220
set shmsys:shminfo_shmseg=100
*END NetBackup recommended minimum settings
■ Socket Parameters on Solaris 8 and 9
The TCP_TIME_WAIT_INTERVAL parameter sets the amount of time to
wait after a TCP socket is closed before it can be used again. This is the time
that a TCP connection remains in the kernel's table after the connection has
been closed. The default value for most systems is 240000, which is 4
minutes (240 seconds) in milliseconds. If your server is slow because it
handles many connections, check the current value for
TCP_TIME_WAIT_INTERVAL and consider reducing it.
For Solaris or HP-UX, use the following command:
ndd -get /dev/tcp tcp_time_wait_interval
■ Force load parameters on Solaris 8 and 9
When system memory gets low, Solaris unloads unused drivers from
memory and reloads drivers as needed. Tape drivers are a frequent
candidate for unloading, since they tend to be less heavily used than disk
drivers. Depending on the timing of these unload and reload events for the
st (Sun), sg (Symantec), and Fibre Channel drivers, various issues may
result. These issues can range from devices “disappearing” from a SCSI bus
to system panics.
Symantec recommends adding the following “forceload” statements to the
/etc/system file. These statements prevent the st and sg drivers from
being unloaded from memory:
forceload: dev/st
forceload: dev/sg
Other statements may be necessary for various Fibre Channel drivers, such
as the following example for JNI:
forceload: dev/fcaw
mesg 1
156 OS-related tuning factors
Kernel tuning (UNIX)
msgmap 514
msgmax 8192
msgmnb 65536
msgssz 8
msgseg 8192
msgtql 512
msgmni 256
sema 1
semmap semmni+2
semmni 300
semmns 300
semmnu 300
semume 64
semvmx 32767
shmem 1
shmmni 300
shmseg 120
desired parameters. Once all the values have been changed, select Actions >
Process New Kernel. This will bring up a warning to inform that a reboot
will be required to move the values into place. After the reboot, the sysdef
command can be used to confirm that the correct value is in place.
Caution: Any changes to the kernel will require a reboot in order to move the
new kernel into place. Do not make changes to the parameters unless a system
reboot can be performed, or the changes will not be saved.
increased too far, this could result in damage to either the HBA or the tape
drives or any devices in between (fiber bridges and switches, for example).
encoding 159 W
traffic on network 97
wait/delay counters 108, 109, 112
transaction log file 47
analyzing problems 115
transfer rate
correcting problems 118
drive controllers 20
for local client backup 112
for backups 17
for local client restore 114
network 21
for remote client backup 113
required 19
for remote client restore 114
tape drives 18
wear of tape drives 126
True Image Restore option 23, 135
webgui command 138
tuning
Wide Ultra 2 SCSI 21
basic suggestions 91
wild cards in file lists 49
buffer sizes 97, 99
Windows Performance Monitor 86
client performance 95
Working Set, in memory 87
data transfer path, overview 90
device performance 126
FlashBackup read buffer 136
Linux kernel 157
network performance 96
restore performance 119, 124
search performance 123
server performance 102
software 148
Solaris kernel 152
U
Ultra-3 SCSI 21
Ultra320 SCSI 21
Unicode encoding 159
unified logging, viewing 50
Usenet news group 162
user-directed backup 77
V
Vault 134
verbosity level 135
Veritas-bu email list 161
viewing logs 50
virus scans 95, 135
vmstat 84
volDB 24
volume
frozen 55
pools 60
suspended 55
vxlogview 50
file ID 50
VxVM striped volumes 136
172 Index