You are on page 1of 25

================================================================================

SunFire 15/25K domains


--------------------------------------------------------------------------------
set ssh escape when coming to avoid going back to your original login window

%ssh -e ^v^o sun-sc-server-name #sets control-o as excape


#(a control-v is done first to
#go into VI edit mode

in /etc/inetd.conf for DR activities


pkill -1 inetd
----
#entries for Sun DR
sun-dr stream tcp wait root /usr/lib/dcs dcs
sun-dr stream tcp6 wait root /usr/lib/dcs dcs
----

console logs on SC: /var/opt/SUNWSMS/adm/$DOMAIN-LETTER


/var/opt/SUNWSMS/adm/platform

failover SC

/opt/SUNWSMS/SMS1.5/bin/failover force

restart sms

/opt/SUNWSMS/SMS1.5/bin/failover stop
/etc/init.d/sms stop
/etc/init.d/sms start
/opt/SUNWSMS/SMS1.5/bin/failover start

reconfig with smsconfig:

setfailover off
/etc/init.d/sms stop
smsconfig -m
/etc/init.d/sms start
setfailover on
setfailover force
#must reboot each SC when this is completed
#may have to change /etc/hostname.dman0 on each domain & reconfig that NIC

#repeat on other SC

delete a board:

cfgadm -av (determine system names and attachment names)


cfgadm -c disconnect Ap_Id
cfgadm -x unassign Ap_Id (powers the board off, amber light will come on board)

add a board:

cfgadm -av (determine system names and attachment names)


cfgadm -c assign Ap_Id
cfgadm -c configure Ap_Id

see if a board cannot be moved (kernel)- 'cfgadm -alv | grep for permanent'

You can check the system board status via:


cfgadm -av Ap_Id

Can check number of cpus and memory via "psrinfo" and "prtconf | head -5" commands from each OS
instance. Can see all cpu boards on a 6800 via "showboards -p cpu" , can see all cpus in a
domain via "showboards -d a -p cpu" for domain a. Can see all memory by using "memory" in place
of "cpu" in the last commands.

somehost:SC> showboards -d a -p cpu

Component Description
--------- -----------
/N0/SB0/P0 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB0/P1 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB0/P2 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB0/P3 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB2/P0 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB2/P1 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB2/P2 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB2/P3 UltraSPARC-III, 750MHz, 8M ECache

somehost:SC> showboards -d c -p cpu

Component Description
--------- -----------
/N0/SB1/P0 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB1/P1 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB1/P2 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB1/P3 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB3/P0 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB3/P1 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB3/P2 UltraSPARC-III, 750MHz, 8M ECache
/N0/SB3/P3 UltraSPARC-III, 750MHz, 8M ECache

somehost:SC> showboards -d a -p memory

Component Size Reason


--------- ---- ------
/N0/SB0 4096 MB
/N0/SB2 4096 MB

somehost:SC> showboards -d c -p memory

Component Size Reason


--------- ---- ------
/N0/SB1 4096 MB
/N0/SB3 4096 MB

showplatform

showboards -v -d A
showboards -d A

moveboard -d (todomain) -r 2 -t 300 SB#

deleteboard -r 2 -t 300 SB#


addboard -d (todomain) -r 2 -t 300 SB#

--------------------------------------------------------------------------------
poweron EX0 (May already be on)
poweron SB0
flashupdate -f /opt/SUNWSMS/hostobjs/sgcpu.flash SB0

You can run multiple flashes at the same time from multi windows to reduce
the amount of time it take( avg. 7Min/board ) - but they will all load the
system controller - the SC is the choke point - do not do too many at once.

As system boards become available the domains need to be reconfigured as


follows

server1: SB0 SB1 SB2 SB3


server2: SB4 SB5 SB6 SB7 SB8 SB9 SB10
server3: SB11 SB12 SB13 SB14 SB15 SB16 SB17

Reconfigure server:

As the boards are replaced reconfigure server to only have 4 SB and bring
that domain up first.

On the SC run.

deleteboard SB4 SB5 SB6 SB7 SB8 SB9 SB10 SB11 SB12 SB13 SB14 SB15 SB16 SB17
This will leave SB0 SB1 SB2 and SB3 in the server domain.

Then:

cd ~/extended Note: This directory has a .postrc file set to level 32.
setkeyswitch -d server on

This will run a level 32 POST on the boards and boot the domain.

After system comes up it can be turned over to the DBA's

Reconfigure server:

On the SC run the following command after the SB have been Flashed.

addboard -d server -y -c configure SB4 SB5 SB6 SB7 SB8 SB9 SB10
cd ~/extended
setkeyswitch -d server on

Reconfigure server:

On the SC run the following command after the SB have been Flashed.

addboard -d server -y -c configure SB11 SB12 SB13 SB14 SB15 SB16 SB17
cd ~/extended
setkeyswitch -d server on

Other info needed:

login as root and then:

su - sms-svc

To get to a console of a domain.

console -d server

To shutdown a domain:

from the domain:

init 0

Then from the SC:

setkeyswitch -d server off

To power a single board on

poweron SB0

Note: you may have to power the extender board on before the SB.( poweron EX0)

================================================================================
----------------------------------------------------------------------
MAN Configuration Files
----------------------------------------------------------------------

/etc/opt/SUNWSMS/config/MAN.cf

-This is the MAN configuration file.


-Located on both SCs.
-Read by mand when entering MAIN SC mode.
-This file contains all the IP addresses for the SC, domain, and external
community interfaces.
-This file is created by smsconfig -m during the MAN Network
configuration stage at install.
-This file is unique to each SC and should not be copied from one SC to
another.

DO NOT edit this file by hand. Use smsconfig to change the


configuration. See the smsconfig MAN page for details.

Example (from a CP1500 contained SC called, "sc0" on platform called "15k":

1.

/etc/opt/SUNWSMS/config/MAN.cf
#
15k
C SC-FLOATER-C1 sc 192.68.66.172
C SC-TEST-hme0-C1 sc0-hme0
C SC-TEST-eri1-C1 sc0-eri1
I1 NM-I1 netmask-i1 255.255.255.224
I1 SC-I1 15k-sc-i1 13.1.1.1
I1 DA-I1 15k-a 13.1.1.2
I1 DB-I1 15k-b 13.1.1.3
I1 DC-I1 15k-c 13.1.1.4
I1 DD-I1 15k-d 13.1.1.5
I1 DE-I1 15k-e 13.1.1.6
I1 DF-I1 15k-f 13.1.1.7
I1 DG-I1 15k-g 13.1.1.8
I1 DH-I1 15k-h 13.1.1.9
I1 DI-I1 15k-i 13.1.1.10
I1 DJ-I1 15k-j 13.1.1.11
I1 DK-I1 15k-k 13.1.1.12
I1 DL-I1 15k-l 13.1.1.13
I1 DM-I1 15k-m 13.1.1.14
I1 DN-I1 15k-n 13.1.1.15
I1 DO-I1 15k-o 13.1.1.16
I1 DP-I1 15k-p 13.1.1.17
I1 DQ-I1 15k-q 13.1.1.18
I1 DR-I1 15k-r 13.1.1.19
I2 NM-I2 netmask-i2 255.255.255.252
I2 SC0-I2 sc0-i2 10.2.1.1
I2 SC1-I2 sc1-i2 10.2.1.2

/var/opt/SUNWSMS/data/<domain>/idprom.image

-This file contains the ethernet address of the domain.


-Located on the SCs.
-This file is referenced by mand when creating the IOSRAM handoff
structure (end of HPOST).
-To view this file, you must run sysid -d <DomainID> on the SC.
-Back these files up. If lost, Sun Support will have to recreate the
files for you, or walk you through the process.

Example:

1.

sysid -d A
IDPROM in /var/opt/SUNWSMS/data/A/idprom.image for domain A
Format = 0x01
Machine Type = 0x82
Ethernet Address = 0:0:ae:a6:4a:42
Manufacturing Date = Wed Jun 27 14:58:00 MDT 2001
Serial number (Host ID) = 0xa64a42 (11029056)
Checksum = 0xac

/var/opt/SUNWSMS/doors/mand

-This is the doors for mand.


-Located on the SCs.
-This allows other processes to invoke mand methods. For example fomd
instructing mand to enter MAIN mode.
It is important to note that this is not a file, simply it is a file
structure which allows interprocess
communication.

Example:
# file /var/opt/SUNWSMS/doors/mand
/var/opt/SUNWSMS/doors/mand: door to mand[18356]

/etc/hosts

-This is the standard Solaris[TM] hosts listing.


-Located on SCs and domains in the platform.
-It contains the certain hosts specific to the MAN Networks.

Example (from same SC as shown in the MAN.cf example above):

cat /etc/hosts
#
# Internet host table
#
127.0.0.1 localhost
192.68.66.148 sc0 loghost
13.1.1.2 15k-a #smsconfig-entry#
13.1.1.3 15k-b #smsconfig-entry#
13.1.1.4 15k-c #smsconfig-entry#
13.1.1.5 15k-d #smsconfig-entry#
13.1.1.6 15k-e #smsconfig-entry#
13.1.1.7 15k-f #smsconfig-entry#
13.1.1.8 15k-g #smsconfig-entry#
13.1.1.9 15k-h #smsconfig-entry#
13.1.1.10 15k-i #smsconfig-entry#
13.1.1.11 15k-j #smsconfig-entry#
13.1.1.12 15k-k #smsconfig-entry#
13.1.1.13 15k-l #smsconfig-entry#
13.1.1.14 15k-m #smsconfig-entry#
13.1.1.15 15k-n #smsconfig-entry#
13.1.1.16 15k-o #smsconfig-entry#
13.1.1.17 15k-p #smsconfig-entry#
13.1.1.18 15k-q #smsconfig-entry#
13.1.1.19 15k-r #smsconfig-entry#
192.68.66.172 sc #smsconfig-entry#
13.1.1.1 sc-i1 #smsconfig-entry#
192.68.66.250 sc0-C1-failover #smsconfig-entry# 192.68.66.170 sc0-eri1
#smsconfig-entry#
192.68.66.171 sc0-hme0 #smsconfig-entry#
10.2.1.1 sc0-i2 #smsconfig-entry#
10.2.1.2 sc1-i2 #smsconfig-entry#

/etc/hostname.*

-This file shows the dman0, scman0, and scman1 varieties.


-Located on the SCs (dman0 being on domains only).
-They are referenced by drivers when the interfaces are plumbed during
the boot process.

Example from an SC:


# cat /etc/hostname.hme0
sc0
# cat /etc/hostname.scman0
13.1.1.1 netmask + private up
# cat /etc/hostname.scman1
10.2.1.1 netmask + private up

Example from a Domain:


# cat /etc/hostname.dman0
13.1.1.2 netmask + broadcast + private up

/etc/ipnodes
-This is the standard Solaris hosts listing which contains relevant IPv6
hosts (if used).
-Located on SCs and domains where appropriate.

/etc/netmasks

-This is the standard Solaris netmasks settings file.


-Located on system controllers and domains.
-It contains netmasks relevant to the MAN Networks.

Example:

1.

cat netmasks
#
2.

The netmasks file associates Internet Protocol (IP) address


3.

masks with IP network numbers.


#
4.

network-number netmask
#
5.

The term network-number refers to a number obtained from the Internet


Network
6.

Information Center. Currently this number is restricted to being a


class
7.

A, B, or C network number. In the future we should be able to support


8.

arbitrary network numbers per the Classless Internet Domain Routing


9.

guidelines.
#
10.

Both the network-number and the netmasks are specified in


11.

"decimal dot" notation, e.g.:


#
12.

128.32.0.0 255.255.255.0
#
192.168.66.0 255.255.255.0
192.168.103.0 255.255.255.224
192.168.103.32 255.255.255.252
13.1.1.0 255.255.255.224
10.2.1.0 255.255.255.252

Worth noting also it that if you do not have a default router for your
settings, you will notice it will take longer time for IPMP to start up
properly. This may have a negative effect in the essense that the IPMP
negotiation might take longer time and SMS might experience difficulties
during startup.
Internal Only: Top
References:
- Infodoc 82102 "Sun Fire[TM] 12K/15K/E20K/E25K: Supported Ethernet settings
for System Controllers"
- Infodoc 73002 "Sun Fire[TM] 12K/15K: MAN Interface Mapping"
- SRDB 48123 "Sun Fire[TM] 12K/15K: Troubleshooting the I1 MAN Network"
- Troubleshooting Article 72578 "Sun Fire[TM] 12K/15K: Troubleshooting MAN I2
Network."
- SRDB 48144 "Sun Fire[TM] 15K: IDPROM layout for OpenBoot[TM] PROM failed"
This shows how to recreate lost/corrupted idprom.image files, as does the
site,
http://has.central.sun.com/starcat/15kinfo/faq/recreate_idprom.image.html.

================================================================================
15/25k domain commands

frame1-sc0:sms-svc:3> help

usage:
addboard -d domain_indicator [-c function] [-r retry_count [-t timeout]]
[-q] [-f] [-y|-n] location...
addboard -h

usage:
addcodlicense <license-signature>
addcodlicense -h

usage:
addtag -d domain_indicator [-q] [-y|-n] new_tag
addtag -h

usage:
cancelcmdsync cmdsync_descriptor
cancelcmdsync -h

usage:
/opt/SUNWSMS/SMS1.5/bin/console -d domain_indicator [[-f] | [-l] | [-g] | [-r]] [-e
escapeChar]
/opt/SUNWSMS/SMS1.5/bin/console -h

usage:
deleteboard [-c function] [-r retry_count [-t timeout]] [-q] [-f] [-y|-n] location...
deleteboard -h

usage:
deletecodlicense [-f] <license-signature>
deletecodlicense -h

usage:
deletetag -d domain_indicator [-q] [-y|-n]
deletetag -h

usage:
disablecomponent [-d domain_indicator] [-i "reason"]
location...
disablecomponent [-h]

usage:
enablecodboard [-y|-n] [location]
enablecodboard -h

usage:
enablecomponent [-a | -d domain_indicator] location...
enablecomponent [-h]

usage:
flashupdate -d domain_indicator -f path [-q|-v] [-y|-n]
flashupdate -f path [-q|-v] [-y|-n] location...
flashupdate -h

path : Image file name.


location: slotID[/fp[0|1]]

usage:
help [command_name]
help -h

usage:
initcmdsync script_name [parameters]
initcmdsync -h

usage:
moveboard -d domain_indicator [-c function] [-r retry_count [-t timeout]]
[-q] [-f] [-y|-n] location
moveboard -h

usage:
poweroff [-q] [-y|-n] [location]
poweroff -h

usage:
poweron [-q] [-y|-n] [location]
poweron -h

Usage:
rcfgadm -d domain_indicator [-f] [-y|-n] [-v] [-o hardware_options]
-c function [-r retry_count [-T timeout] ] ap_id...
rcfgadm -d domain_indicator [-f] [-y|-n] [-v] [-o hardware_options]
-x hardware_function ap_id...
rcfgadm -d domain_indicator [-v] [-a] [-s listing_options]
[-o hardware_options] [-l [ap_id|ap_type...]]
rcfgadm -d domain_indicator [-v] [-o hardware_options] -t ap_id...
rcfgadm -d domain_indicator [-v] [-o hardware_options] -h
[ap_id|ap_type...]

usage:
reset -d domain_indicator[,domain_indicator]...
[-d domain_indicator[,domain_indicator]...]... [-q] [-y | -n] [-x]
reset -h

usage:
resetsc [-q] [-y|-n]
resetsc -h

usage:
runcmdsync script_name [parameters]
runcmdsync -h

usage:
savecmdsync -M identifier cmdsync_descriptor
savecmdsync -h

usage:
setbus [-q] [-y|-n] -c cs0|cs1|cs0,cs1 [-b buses] [location...]
setbus -h

usage:
setdatasync [-i interval] schedule filename
setdatasync push filename
setdatasync cancel filename
setdatasync backup
setdatasync -h

usage:
setdate [-d domain_indicator] [-u] [-q] [mmdd]HHMM|mmddHHMM[cc]yy[.SS]
setdate -h

usage:
setdefaults [-d domain_indicator [-p]] [-y|n]
setdefaults -h

usage:
setfailover on|off|force
setfailover -h

usage:
setkeyswitch -d domain_indicator [-q] [-y|-n]
position
setkeyswitch -h

usage:
setobpparams -d domain_indicator param=value...
setobpparams -h

usage:
setupplatform
setupplatform -p available [-d domain_indicator [-a|-r] location...]
setupplatform -p cod [headroom | -d domain_indicator domainRTU]
setupplatform [-d domain_indicator -]
setupplatform -h

usage:
showboards [-d domain_indicator] [-v]
showboards [-d domain_indicator] -c
showboards -h

usage:
showbus [-v]
showbus -h

usage:
showcmdsync [-v]
showcmdsync -h

usage:
showcodlicense [-r] [-v]
showcodlicense -h

usage:
showcodusage [-v] [-p <resource|domains>]
showcodusage -h

usage:
showcomponent [-a | -d domain_indicator] [-v] [location...]
showcomponent [-h]

usage:
showdatasync [-l|-Q] [-v]
showdatasync -h

usage:
showdate [-d domain_indicator] [-u] [-v]
showdate -h

usage:
showdevices [-v] [-p bydevice|byboard|query|force] location...
showdevices [-v] [-p bydevice|byboard]
-d domain_indicator
showdevices -h

usage:
showenvironment [-d domain_indicator[,domain_indicator]...]... [-p
temps|volts|currents|fans|powers[,temps|volts|currents|fans|powers]...]...
[-v]
showenvironment [-d domain_indicator[,domain_indicator]...]... [-p faults]
[-v]
showenvironment -h
usage:
showfailover [-r] [-v]
showfailover -h

usage:
showkeyswitch -d domain_indicator [-v]
showkeyswitch -h

usage:
showlogs [ -F ] [ -f filename ] [ -d domain_indicator ] [ -p m|c|s ] [ -v
]
showlogs [ -F ] [ -f filename ] [ -d domain_indicator ] [ -E ]
[ -p e [ <event_class> | list | ereport | ena0x<ena_value>
| uuid<uuid_value> | <event_code> ] [ <number> ] ]
showlogs -h

usage:
showobpparams -d domain_indicator [-v]
showobpparams -h

Usage:
showplatform [-d domain_indicator] [-p report] [-v]
showplatform -h

usage:
showxirstate -d domain_indicator [-v]
showxirstate -f filename [-v]
showxirstate -h

Back up the SMS environment


Usage:
smsbackup directory_name
smsbackup -h

Configure the SMS environment


Usage:
smsconfig -m
smsconfig -m I1 [ <domain_id> | sc | netmask ]
smsconfig -m I2 [ sc0 | sc1 | netmask ]
smsconfig -m L
smsconfig -g
smsconfig -a -u <username> -G <platform_role> platform
smsconfig -r -u <username> -G <platform_role> platform
smsconfig -a -u <username> -G <domain_role> <domain_id>
smsconfig -r -u <username> -G <domain_role> <domain_id>
smsconfig -s <security_option> *DEPRECATED*
smsconfig -l <domain_id>
smsconfig -l platform
smsconfig -v
smsconfig -h

usage:
smsconnectsc [-y|-n]
smsconnectsc -h
Restore up the SMS environment
Usage:
smsrestore filename
smsrestore -h

Change the active SMS version


Usage:
smsversion new_version
smsversion -t
smsversion -h

================================================================================
F6800 Configuration Recommendations
--------------------------------------------------------------------------------

Purpose

The purpose of this document is to examine Sun Fire Midframe servers and present
recommendations for exploiting their capabilities and avoiding issues that cause down time.

Assumptions

We'll begin by discussing a number of alternatives while making recommendations.

The first of these is to evaluate alternatives and make decisions that improve the Reliability,
Availability, and Serviceability of your platforms.

The primary goal of system administration is to provide stable, available platforms and services
so users can accomplish work in a predictable manner. It is necessary to choose options that
improve availability, thereby avoiding or reducing periods of non-availability due to planned or
unplanned events.

The second alternative is to optimize performance. Only after achieving a high degree of RAS
(Reliability, Availability, Serviceability) should considerations turn to improving system and
application performance.

Dynamic Reconfiguration

A core function of the Solaris Operating Environment, Dynamic Reconfiguration (DR) enables
you to safely add and remove CPU/Memory boards and I/O assemblies while the system is
operating. DR controls the software aspects of dynamically changing the hardware used by a
domain with minimal disruption to user processes running in the domain.

You can use DR to do the following:

� Shorten the interruption to system applications while installing or removing a board.

� Disable a failing device by removing it from the logical configuration before its
failure
can cause an operating system outage.

� Display the operational status of boards within the system.

� Initiate testing of a board within a domain while the domain continues to run.

� Add or remove resources in a domain while the domain continues to run.

� Invoke hardware specific functions of a board or related attachment.

In mission or business critical environments where availability is one of the most important
criteria, the real value of DR is the ability to perform numerous maintenance activities without
the need for downtime.

F6800 Configuration Recommendations

Memory Utilization and Configuration


On the Sun Fire Midframe server, memory controllers are implemented on the UltraSparc III
CPU and the memory is directly associated to the CPU.

When populating a Uniboard or server with memory, the following rules and recommendations
should be noted:

� A bank needs to be completely filled with the same size DIMM.

� Memory can only be placed into a bank associated with a CPU.

� Do not fill the second bank of a CPU until all of the other CPU's on that board have had
their first banks filled.

� Within a domain, do not fill the second bank of a CPU until all of the other system
boards
have had all of their first banks filled.

� Wherever possible, use only the same size DIMM to fill all of the banks on a system
board.

� Wherever possible, populate all system boards within a domain with the same amount of
memory.

� Ensure that the interleave-scope is set to within-board and interleave-mode is set to


optimal using the setupdomain command.

� When constructing domains using system boards of unequal memory size, always define
the domain from smallest installed memory to largest memory.

For example, if SB0 contains 8G of memory and SB2 and SB4 each contain 4G, issue the
following commands to define the domain (addboard -d A SB2 SB4 SB0). This will
ensure that SB2 is assigned as slice 0 and will contain the Solaris Operating
Environment when the domain is booted until the slice is reassigned to another board as
a result of a DR operation.

Memory within a system board is either permanent or non-permanent. Each is handled in a


different manner when performing DR operations.

Permanent memory is memory which is non-pageable, such as the kernel or OpenBoot PROM.
All other memory is referred to as non-permanent memory.

During DR operations involving memory, all memory must be unconfigured from the affected
system board. In the case of non-permanent memory, the memory can simply be flushed back to
disk, moved to another memory location, or swapped out. Permanent memory, on the other
hand, must be moved to another board of the same total memory size or larger using a copy-

F6800 Configuration Recommendations

rename technique. The reason for the different mechanism is that the permanent memory cannot
be swapped out since it contains critical kernel structures that control the operation of the
Solaris
Operating Environment.

Frame Configuration

The Sun Fire 6800 Midframe server is capable of being configured into multiple Segments
(partitions) and Domains. A domain is the logical subset of system resources running an
independent copy of the Solaris Operating Environment. Additionally, the power distribution
within the platform lends itself to another form of partitioning.

A segment or partition is established by the system controller for the purpose of logically
isolating connections between segments such that the failure of a domain in one segment should
not normally affect a domain running in the other segment.

While it is possible for each segment to contain two domains, the isolation of errors is the
primary reason that segmentation is recommended for multiple domain configurations. An
exception to this rule is the use of a temporary domain to facilitate DR operations.

Segmentation is not, however, capable of preventing all failures in one segment from affecting
domains in another segment. For example, the failure of a centerplane may not be contained to a
single segment.

When segmenting a 6800, it should also be noted that each segment is assigned a pair of
Fireplane Repeater Boards. A minimum of two Fireplane Repeater Boards are required for the
operation of a domain. Failure of a repeater board will cause failure of any domain within that
segment until it is replaced.

If DR operations are to be supported on a 6800 with one domain, our recommendation is the
system be kept in single-segment mode and that the Solaris Operating Environment be loaded on
domain A. This leaves domain B available for performing DR and POST testing as required for
the DR of I/O assemblies. If two domains are required, configure the box into a dual-segment
mode and load Solaris onto Domains A and C. This will leave domains B and D available for
use in DR operations.

Sun Fire 6800 Single-Segment Domains


RP0 RP1 RP2 RP3
Segment 0
Domain A
Domain B

F6800 Configuration Recommendations

Sun Fire 6800 Dual-Segment Domains


RP0 RP1 RP2 RP3
Segment 0 Segment 1
Domain A Domain C
Domain B Domain D

The Sun Fire 6800 Midframe server differs from all other Sun Fire models in that it has two
separate internal power grids, each supplied from a different RTU.

To ensure the loss of a power feed does not cause the loss of a domain, it is essential that
domains be constructed with all of the devices powered from the same grid (RTU). The boards
within the 6800 are separated as follows:

Power Grid 0 Power Grid 1


SB0 SB1
SB2 SB3
SB4 SB5
IB6 IB7
IB8 IB9
RP0 RP2
RP1 RP3

================================================================================
Document Audience: SPECTRUM
Document ID: 51772
Title: Sun Fire[TM] 12K/15K/E20K/E25K: Remote Dynamic Reconfiguration (DR)
generates "DCA/DCS Communication Error" and showdevices is Unable to get
device information from domain.
Copyright Notice: Copyright � 2006 Sun Microsystems, Inc. All Rights
Reserved
Update Date: Tue Sep 19 00:00:00 MDT 2006
Products: Sun Fire 15K Server, Sun Fire 12K Server, Sun Fire E25K
Server, Sun Fire E20K Server
Technical Areas: System Management

Keyword(s):starcat, 12k, 15k, dca, dcs, dr, DCA/DCS communication error,


E20K, E25K, Unable to get device information from domain, rcfgadm,
showdevices, console
Problem Statement Top

The rcfgadm or showdevices commands, generate errors from the system


controller (SC). The error message might be "DCA/DCS Communication Error"
when executing these commands. The command showdevices might generate the
following error (where x is the domain ID):

# showdevices -v -d x
Unable to get device information from domain x

This showdevices error could also be seen in Explorer data from the Main
System Controller (SC). The file, showdevices_-v_-d_x.out, which is in the
/explorer/sf15k/<Domain_ID> directory of Explorer will show the same "Unable
to get device information from domain x" error message.

Also the following messages might be logged in the platform log file on the
SC ( $SMSVAR/adm/platform/messages )

Jul 12 14:07:31 2005 xcat-sc0 showdevices[7496]: [0 2197706244444996 ERR


ri_init.cc 85] rcfgaRequestProxy->ri_init failed. Status= 4315
Jul 12 14:07:31 2005 xcat-sc0 showdevices[7496]: [4509 2197706254650079 ERR
RcfgaCallback.cc 521] server accept failed. RcfgaCallback::serverAccept:
failed in ioctl domain id = I

Resolution Top

To resolve the problem:

ensure that the network interfaces are properly configured and running,
*

verify that relevant parameters are not commented out of key files,
*

verify that the appropriate daemons are running.

scman0 and dman0

The dca <> dcs handshaking takes place over the I1 network. This means that
scman0 on the SC and dman0 on the domain must be configured and running
properly. This is often overlooked, so be sure to verify this information
with the following command:

# ifconfig -a

On SC:

scman0: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4>
mtu 1500 index 3
inet 10.10.1.1 netmask ffffffe0 broadcast 10.10.1.31

On domain:

dman0: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4>
mtu 1500 index 3
inet 10.10.1.3 netmask ffffffe0 broadcast 10.10.1.31
ether 0:0:be:a8:17:57

Note that the IP addresses and netmasks on the dman0 and scman0 interfaces
should match the information stored in the /etc/SUNWMSMS/config/MAN.cf file
on the SC.

This should be further confirmed by running the following command on the


domain:
# ndd /dev/dman man_get_hostinfo

manc_magic = 0x4d414e43
manc_version = 01
manc_csum = 0x0
manc_ip_type = AF_INET
manc_dom_ipaddr = 10.10.1.3
manc_dom_ip_netmask = 255.255.255.224
manc_dom_ip_netnum = 10.10.1.0
manc_sc_ipaddr = 10.10.1.1
manc_dom_eaddr = 0:0:be:a8:17:57
manc_sc_eaddr = 8:0:20:fa:5f:1a
manc_iob_bitmap = 0xa0 io boards = 5.1, 7.1,
manc_golden_iob = 5

Domain Configuration Agent (DCA)

The Domain Configuration Agent (DCA) daemon runs on the SC, one per domain.
Similar to a netcon session on a Sun Enterprise[TM] 10000 server, the DCA
provides communication between the DCA on the SC and the Domain Configuration
Server (DCS) on the specified domain. If DCA is not running, the showdevices
and the rcfgadm commands fail.

To verify that DCA is running, issue the following command on the SC:

# ps -ef | grep dca


sms-dca 1614 361 0 Feb 26 ? 0:00 dca -d A

Domain Configuration Server (DCS)

DCS is a domain daemon process that supports remote dynamic reconfiguration.


DCS must also be running on the domain in order for the showdevices or
rcfgadm commands to work on the domain.

If either command fails, check the domain for the following lines in the
/etc/inetd.conf file:

sun-dr stream tcp wait root /usr/lib/dcs dcs


sun-dr stream tcp6 wait root /usr/lib/dcs dcs

These lines must be in the /etc/inetd.conf file for the rcfgadm and
showdevices commands to work properly. If the lines are not in the file, and
showdevices fails from the SC, add the indicated lines above and restart the
inetd process as follows:

# ps -ef | grep inetd


root 151 1 0 Mar 11 ? 0:00 /usr/sbin/inetd -s
# kill -HUP 151

For additional information, refer to the man page about dcs.

Note for domains running Solaris[TM] 10 (without patch 120253-02 ):

The /etc/inetd.conf file is no longer directly used to configure inetd. inetd


is now configured in the Service Management Facility. You can get the list of
the list of all the SMF services installed.

# inetadm
ENABLED STATE FMRI
enabled online svc:/application/font/stfsloader:default
[output omitted]
disabled disabled svc:/network/talk:default
enabled online svc:/platform/sun4u/dcs:default
[output omitted]
The /platform/sun4u/dcs service must be enabled/online.

You can now get more information from the svc:/platform/sun4u/dcs service and
list its properties via the svccfg command :

# /usr/sbin/svccfg -s svc:/platform/sun4u/dcs:default listprop


general framework
general/enabled boolean true
restarter framework NONPERSISTENT
restarter/auxiliary_state astring none
restarter/next_state astring none
restarter/state astring online
restarter/state_timestamp time 1117463395.870876000
restarter/contract count 94
inetd_state framework NONPERSISTENT
inetd_state/cur_state integer 1
inetd_state/next_state integer 13
inetd_state/start_pids integer
svc:/platform/sun4u/dcs:default> quit

If any dcs processes are running, pids will be reported in


inetd_state/start_pids.

Note that, on domains running Solaris 10 w/o 120253-02, the dcs process will
not be running if the SC has not recently communicated with the domain. It's
forked by inetd upon request (Remote DR request started from the SC). Hence,
the PPID for dcs is the inetd PID.

Ex :

# ptree 304

159 /usr/sbin/inetd -s

304 dcs

Note for domains running Solaris[TM] 10 Update 2 (with patch 120253-02 ):

Due to the fixes for :

4792021 per-socket level IPsec policy for dynamic reconfiguration

6380945 Changes required for PSARC 2006/038

introduced in patch 120253-02, dcs does not belong to inetd any longer.

Since inetd does not support per-socket IPsec, dcs will be changed to run
standalone. Both dcs and cvcd will be controlled by SMF and use SMF
properties to define command line options.

Hence, running inetadm | grep dcs will not return information about dcs any
longer.

Use the following command to get the status from the dcs service :

# svcs dcs

STATE STIME FMRI

online 13:53:40 svc:/platform/sun4u/dcs:default


Note that, on domains running Solaris 10 U2 or w/ 120253-02, the dcs process
starts at boot time. And due to the new implementation, dcs will now be
running with different options and will accept command line arguments ("-a",
"-e", and "-u") allowing the administrator to configure the encryption and
authentication IPsec options. where

- "ah_auth" corresponds to the "-a" option.

- "esp_encr" corresponds to the "-e" option.

- "esp_auth" corresponds to the "-u" option.

See the manpages for dcs(1M) for more details.

Ex :

# ptree 220

220 /usr/lib/dcs -a md5

Note that the dcs process might not be running if the SC has not recently
communicated with the domain.

To check to see if any process is actually listening on the sun-dr port (port
665), run

netstat -an | grep 665

# netstat -an | grep 665


e25ka-dom-c# netstat -an | grep 665
*.665 *.* 0 0 49152 0 LISTEN
*.665 *.* 0 0 49152 0 LISTEN

This verifies that there is indeed some process listening on the sun-dr port,
665. If there is nothing listening on port 665, then the showdevices and
addboard / deleteboard commands on the SC can never work properly.

The /etc/services File

The /etc/services file must also have the following entry on the domain for
remote Dynamic Reconfiguration (DR):

sun-dr 665/tcp # Remote Dynamic Reconfiguration

If you are using the NIS+, make sure that above entry is present in the
/etc/services file of NIS+ server. You can check this using the following
command:

$ niscat services.org_dir | grep sun-dr

sun-dr sun-dr tcp 665 Remote Dynamic Reconfiguration

The /etc/inet/ipsecinit.conf File on the Domain

The /etc/inet/ipsecinit.conf file should contain the following entries:

{ dport sun-dr ulp tcp } permit { auth_algs md5 }


{ sport sun-dr ulp tcp } apply { auth_algs md5 sa unique }
{ dport cvc_hostd ulp tcp } permit { auth_algs md5 }
{ sport cvc_hostd ulp tcp } apply { auth_algs md5 sa unique }

If the entries do not exist, add them and then issue:

# ipsecconf -a /etc/inet/ipsecinit.conf
Use the following command to check that the system is now running with these
settings:

# ipsecconf

Domain X Server (DXS)

The console command uses DXS. It is similar to the netcon_server on the Sun
Enterprise[TM] 10000 server. DXS runs on the SC, one per domain.

To verify that DXS is running, issue the following command on the SC:

# ps -ef | grep dxs


sms-dxs 1609 361 0 Feb 26 ? 0:57 dxs -d A
sms-dxs 1609 361 0 Feb 26 ? 0:57 dxs -d B
sms-dxs 1609 361 0 Feb 26 ? 0:57 dxs -d C

Console commands take place over the console bus but can be toggled between
the console bus and I1 network using the ~= command.

When the domain is rebooting, a message appears on the SC that is similar to


"dxs disconnecting." The reboot of a domain causes an hpost -Q. which is a
quick POST from the SC.

Sun Fire[TM] 12K/15K/E20K/E25K key management daemon (sckmd)

The sckmd server process resides on a Sun Fire[TM] 12K/15K/E20K/E25K domain.


The sckmd daemon maintains the Internet Protocol Security (IPsec) Security
Associations (SAs) needed to secure the communication between the SC and the
cvcd and dcs daemons running on the domains.

The sckmd daemon must be running on the domain in order for the "showdevices"
or "rcfgadm" commands to work on the domain.

To verify that the sckmd daemon is running, issue the following command on
the domain:

# ps -ef | grep sckmd

root 24156 1 0 Apr 02 ? 0:00 /


usr/platform/SUNW,Sun-Fire-15000/lib/sckmd

Failure after a Solaris[TM] 10+ OS initial installation

Upon the initial installation of a Solaris 10+ domain, showdevices/rcfgadm


will not work successfully. Running the commands will generate domain-side
console messages such as:

Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=ADD, errno=22: Invalid


argument, diagnostic code=40: Unsupported authentication algorithm
Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=DELETE, errno=3: No such
process, diagnostic code=0: No diagnostic
Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=ADD, errno=22: Invalid
argument, diagnostic code=40: Unsupported authentication algorithm
Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=DELETE, errno=3: No such
process, diagnostic code=0: No diagnostic
Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=ADD, errno=22: Invalid
argument, diagnostic code=40: Unsupported authentication algorithm
Apr 27 13:53:25 xc18-a sckmd: PF_KEY error: type=DELETE, errno=3: Nosuch
process, diagnostic code=0: No diagnostic

To fix this, on the domain, issue the command:


# ipsecalgs -s

For a more detailed explanation on this issue, please see CR #6233334

Failure after a Solaris[TM] 10 Update 2 Installation or after installing


120253-02 on Solaris[TM] 10.

After this installation or patch update, at boot time, dcs has problem to get
online; staying in maintenance mode :

Jul 27 13:50:30 inetd[284]: Unspecified inetd_start method for instance


svc:/platform/sun4u/dcs:default

Jul 27 13:50:30 inetd[284]: Invalid configuration for instance


svc:/platform/sun4u/dcs:default, placing in maintenance

Jul 27 13:50:30 inetd[284]: Invalid configuration for instance


svc:/platform/sun4u/dcs:default, placing in maintenance

# svcs dcs

STATE STIME FMRI

maintenance 13:52:23 svc:/platform/sun4u/dcs:default

Check the reason why dcs never got online via the
/etc/svc/volatile/platform-sun4u-dcs:default.log log file.

# svcs -xv

svc:/platform/sun4u/dcs:default (domain configuration server)

State: maintenance since Thu 20 Jul 2006 13:50:30 AM MEST

Reason: Start method failed repeatedly, last exited with status 1.

See: http://sun.com/msg/SMF-8000-KS

See: man -M /usr/share/man -s 1M dcs

See: /etc/svc/volatile/platform-sun4u-dcs:default.log

Impact: This service is not running.

To fix this, on the domain, restart the services :

# svcadm disable dcs

# svcadm enable dcs

# svcs dcs
STATE STIME FMRI

online 13:53:40 svc:/platform/sun4u/dcs:default

Since dcs is not available, rcfgadm/showdevices not work successfully.

For a workaround and a more detailed explanation on this issue, please see CR
#6453706.

Failure after an upgrade to Solaris[TM] 10 Update 2.

After an upgrade to Solaris[TM] Update 2, the dcs service may fail to go


online, staying in maintenance mode and the dcs process is not running :

Sep 19 10:57:55 inetd[250]: Property 'name' of instance


svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 10:57:55 inetd[250]: Property 'endpoint_type' of instance
svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 10:57:55 inetd[250]: Property 'isrpc' of instance
svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 10:57:55 inetd[250]: Property 'wait' of instance
svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 10:57:55 inetd[250]: Unspecified inetd_start method for instance
svc:/platform/sun4u/dcs:default
Sep 19 10:57:55 inetd[250]: Invalid configuration for instance
svc:/platform/sun4u/dcs:default, placing in maintenance

# svcs -xv
svc:/platform/sun4u/dcs:default (domain configuration server)
State: maintenance since Tue Sep 19 10:57:55 2006
Reason: Restarter svc:/network/inetd:default gave no explanation.
See: http://sun.com/msg/SMF-8000-9C
See: man -M /usr/share/man -s 1M dcs
Impact: This service is not running.

Since dcs is not available, rcfgadm/showdevices not work successfully.

The new manifest /var/svc/manifest/platform/sun4u/dcs.xml provided by


120253-02 (bundled in S10U2) has not been applied properly so inetd is still
trying to start it. The general/restarter property for the dcs service should
now be startd and no longer be inetd.

# svcprop dcs
general/enabled boolean true
general/entity_stability astring Unstable
general/restarter fmri svc:/network/inetd:default

dcs/ah_auth astring md5

[...]

See CR# 6472374 for more details.

To fix this problem, the new manifest must be imported using the following
procedure :

# svcs dcs
STATE STIME FMRI
maintenance 10:57:55 svc:/platform/sun4u/dcs:default
# svcadm disable dcs
# Sep 19 11:02:13 v4u-15ka-c-epar02 inetd[250]: Property 'name' of instance
# svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 11:02:13 v4u-15ka-c-epar02 inetd[250]: Property 'endpoint_type' of
instance svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 11:02:13 v4u-15ka-c-epar02 inetd[250]: Property 'isrpc' of instance
svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 11:02:13 v4u-15ka-c-epar02 inetd[250]: Property 'wait' of instance
svc:/platform/sun4u/dcs:default is missing, inconsistent or invalid
Sep 19 11:02:13 v4u-15ka-c-epar02 inetd[250]: Unspecified inetd_start method
for instance svc:/platform/sun4u/dcs:default

# svcs dcs
STATE STIME FMRI
disabled 11:02:13 svc:/platform/sun4u/dcs:default
# svccfg -v delete dcs
# svcs dcs
svcs: Pattern 'dcs' doesn't match any instances
STATE STIME FMRI
# svccfg -v import /var/svc/manifest/platform/sun4u/dcs.xml
svccfg: Taking "initial" snapshot for svc:/platform/sun4u/dcs:default.
svccfg: Taking "last-import" snapshot for svc:/platform/sun4u/dcs:default.
svccfg: Refreshed svc:/platform/sun4u/dcs:default.
svccfg: Successful import.
# svcs dcs
STATE STIME FMRI
disabled 11:03:04 svc:/platform/sun4u/dcs:default
# svcadm enable dcs
# svcs dcs
STATE STIME FMRI
online 11:03:20 svc:/platform/sun4u/dcs:default
# svcs -p dcs
STATE STIME FMRI
online 11:03:20 svc:/platform/sun4u/dcs:default
11:03:20 717 dcs

# svcprop dcs

general/enabled boolean false

general/entity_stability astring Unstable

dcs/ah_auth astring md5

[...]

Note that when no general/restarter is mentionned, the default one - startd


is used.
--------------------------------------------------------------------------------
Document Audience: SPECTRUM
Document ID: 76047
Title: Sun Fire[TM] 12K/15K/E20K/E25K Servers: How to Replace a System Board
Using Dynamic Reconfiguration
Update Date: Fri May 13 00:00:00 MDT 2005
Products: Sun Fire 15K Server, Sun Fire 12K Server, Sun Fire E25K
Server, Sun Fire E20K Server
Technical Areas: System Controller, System Board, Firmware, Hardware
Troubleshooting

Keyword(s):DR, 12K, 15K, flashupdate, addboard, deleteboard, cfgadm. rcfgadm,


Sun Fire, 20K, 25K
Description

This document provides methods for replacing system boards on a running


domain:

-Method 1 utilizes the deleteboard and addboard commands on the


SystemController.

-Method 2 uses the cfgadm command within the domain.

-Method 3 uses the rcfgadm (remote cfgadm) command on the System Controller.
Note: When working on a security hardened system "Method 2" must be used.
Document Body
Method 1

This method uses the System Controller to replace the system board.
For this method, the System Controller issues all the commands.

1) Logically remove the board (unconfigures, disconnects, and powers off the
board).

/opt/SUNWSMS/bin/deleteboard -c unassign SB#

2) Physically remove the board.

It is safe to remove the board when the amber LED is on.

3) Physically install a new board.

4) Power on a new board:

/opt/SUNWSMS/bin/poweron SB#

5) Match firmware to existing boards:

/opt/SUNWSMS/bin/flashupdate -f

/opt/SUNWSMS/hostobjs/sgcpu.flash -v -y SB#

6) Power off the new board:

/opt/SUNWSMS/bin/poweroff SB#

7) Logically bring the board back into the system:

/opt/SUNWSMS/bin/addboard -d <domain> -c configure SB#

The board is now back in the system, and running the Solaris Operating
System.
From the domain, verify with prtdiag.

__________________________________________
Method 2

Method 2 replaces the system board from within the domain. For this method,
the
domain issues most of the commands. However, the System Controller issues
two
of the commands.

1) Logically remove the board (unconfigures, disconnects, and powers off


the board):

/usr/sbin/cfgadm -v -c disconnect SB#

2) Physically remove the board.

It is safe to remove the board when the amber LED is on.

3) Physically install the new board.

4) Issue the command on the System Controller to power on the new


board:

/opt/SUNWSMS/bin/poweron SB#

5) Issue the command on the System Controller to match the firmware to


existing boards:
/opt/SUNWSMS/bin/flashupdate -f /opt/SUNWSMS/hostobjs/sgcpu.flash -v
-y SB#

6) Issue the command on the System Controller to power off the new board:

/opt/SUNWSMS/bin/poweroff SB#

7) Logically bring the board back into the system:

/usr/sbin/cfgadm -v -c configure SB#

The board is now back in the system, and running the Solaris Operating
System.
From the domain, verify with prtdiag.

__________________________________________

Method 3

Method 3 uses the System Controller, which issues the cfgadm command, to
replace
the system board. This method is similar to Method 2, except the domain
needs to
be specified with the -d option.

1) Logically remove the board (unconfigures, disconnects, and powers off


the board):

/opt/SUNWSMS/bin/rcfgadm -d <domain> -v -c disconnect SB#

2) Physically remove the board.

It is safe to remove the board when the amber light is on.

3) Physically install the new board.

4) Power on the new board:

/opt/SUNWSMS/bin/poweron SB#

5) Match firmware to existing boards:

/opt/SUNWSMS/bin/flashupdate -f

/opt/SUNWSMS/hostobjs/sgcpu.flash -v -y SB#

6) Power off the new board:

/opt/SUNWSMS/bin/poweroff SB#

7) Logically bring the board back into the system:

/opt/SUNWSMS/bin/rcfgadm -d domain -v -c configure SB#

The board is now back in the system, and running Solaris Operating System.
From
the domain, verify with prtdiag.

NOTE: For all three methods above, be aware of kernel cage changes affecting
DR
that were introduced with Solaris 9 KU patch 118558-05.

Reference: Kernel Cage Splitting Overview Infodoc 80991

Would you recommend this Sun site to a friend or colleague?


SunSolve Feedback Contact About Sun News Employment Privacy Terms of Use
Trademarks Version 5.18 Copyright 2006 Sun Microsystems, Inc. All Rights
Reserved

--------------------------------------------------------------------------------

You might also like