You are on page 1of 8

This time we want to check if an attribute for a Virtual FC (VFC) adapter has been modified and

whether or not AIX has been restarted since the change. The attribute I’m interested in is
num_cmd_elems. This value is often changed from its default settings, in AIX environments, to
improve I/O performance on SAN attached storage.

From kdb you can identify the VFC adapters configured on an AIX system using the vfcs subcom
mand. Not only does this tell you what adapters you have, but it also identifies the VIOS each
adapter is connected to and the corresponding vfchostadapter. Nice!
(0)> vfcs
NAME ADDRESS STATE HOST HOST_ADAP OPENED NUM_ACTIVE
fcs0 0xF1000A00103D4000 0x0008 vio1 vfchost10 0x01 0x0000
fcs1 0xF1000A00103D6000 0x0008 vio2 vfchost10 0x01 0x0000

You can view the current (running) configuration of a VFC adapter using the kdb vfcs subcom
mand and the name of the VFC adapter, for example fcs1:
(0)> vfcs fcs1
adapter: fcs0 partition_num: 0x14 partition_name: aixlpar1
num_cmd_elems: 0xC8 location_code: U9119.FHA.87654A1-V20-C10-T1
state: 0x8 opened: 0x1 flags: 0x1
host_name: vio1 host_device: vfchost10
maxCmds: FA maxDMALength: 100000 SCSIid: 6F1E9A
portName: C050760399100168 nodeName: C050760399100168 nport_id:
6F1E9A
cmd_q.base_addr: 0xF1000A00102D0000 asyncq.base_addr: 0xF1000A0019761000
crq[0]: 0x8001000000000000 crq[1]: 0x8013960
ctl_head: 0x0 ctl_tail: 0x0
cmd_head: 0x0 cmd_tail: 0x0
pool_head: 0xF1000A001975BD08 pool_tail: 0xF1000A0019759D88
head_active: 0x0 tail_active: 0x0
cancel_head: 0xF1000A0019758090 cancel_tail: 0xF1000A0019758E58
num_cmds_allowed: 0xFA num_cmds_active: 0x0 num_cmds_pending: 0x0
head_flush_q: 0x0 tail_flush_q: 0x0 cmd_flush_q: 0x0
no_elem: 0x0 no_dma: 0x0 no_sglist: 0x0 bad_mad: 0x0 qfull: 0x0
h_dropped: 0x0 link_down_cnt: 0x0 num_frames: 0x0 tx_lock: 0x0
rcv_lock: 0x0 ctl_lock: 0x0 lock: 0xFFFFFFFFFFFFFFFF
ctl_event: 0xFFFFFFFFFFFFFFFF open_event: 0xFFFFFFFFFFFFFFFF
reset_cmd: 0x0 log_cmd: 0x0
link_cmd: 0x0 ctl_cmd->cmd_state: 0x0
inp_reqs: 0x12594 out_reqs: 0x1A7FD4 ctrl_reqs: 0x44
inp_bytes: 0x14973F90 out_bytes: 0xEC14EC18 npiv_logout_sent: 0x0
npiv_logout_rcvd: 0x0 act_hi_water: 0x10 pend_hi_water: 0x1

Using the output from this command we can determine the current (running) value for a number of
VFC attributes, includingnum_cmd_elems.
So I start with an adapter with a num_cmd_elems value of 200. Both the lsattr command and kdb
report 200 (C8 in hex) fornum_cmd_elems.
# lsattr -El fcs1 -a num_cmd_elems
num_cmd_elems 200 Maximum Number of COMMAND Elements True

# echo vfcs fcs1 | kdb | grep num_cmd_elems


num_cmd_elems: 0xC8 location_code: U9119.FHA.87654A1-V20-C10-T1

I change num_cmd_elems to 400 with chdev –P (remember, the –P flag only updates the AIX
ODM, and not the running configuration of the device in the AIX kernel. You must either reboot
for this change to take effect or offline & online the device).
# chdev -l fcs1 -a num_cmd_elems=400 -P
fcs1 changed

Now the lsattr command reports num_cmd_elems is set to 400 in the ODM.
# lsattr -El fcs1 -a num_cmd_elems
num_cmd_elems 400 Maximum Number of COMMAND Elements True

But kdb still has num_cmd_elems set to C8 (200).


# echo vfcs fcs1 | kdb | grep num_cmd_elems
num_cmd_elems: 0xC8 location_code: U9119.FHA.87654A1-V20-C10-T1

Using this technique I now have a way of checking if an AIX system has been restarted since an
attribute was changed on a VFC adapter.

At this point I could suggest a reboot of the system. Or I could take the adapter offline and then
online for the changes to take effect. For example:
# lspath
Enabled hdisk0 fscsi0
Enabled hdisk1 fscsi0
Enabled hdisk0 fscsi1
Enabled hdisk1 fscsi1

# rmpath -l hdisk0 -p fscsi1


# rmpath -l hdisk1 -p fscsi1

# lspath
Enabled hdisk0 fscsi0
Enabled hdisk1 fscsi0
Defined hdisk0 fscsi1
Defined hdisk1 fscsi1
# rmdev -Rl fcs1
fscsi1 Defined
fcs1 Defined
# cfgmgr
# lspath
Enabled hdisk0 fscsi0
Enabled hdisk1 fscsi0
Enabled hdisk0 fscsi1
Enabled hdisk1 fscsi1

# lsattr -El fcs1 –a num_cmd_elems


num_cmd_elems 400 Maximum Number of COMMAND Elements True

# echo vfcs fcs1 | kdb | grep num_cmd_elems


num_cmd_elems: 0x190 location_code: U9119.FHA.87654A1-V20-C10-T1

;; 0x190 hex = 400 decimal


Please note: Be careful when using the kdb command. If used incorrectly, you can crash an AIX
system!

On your blog, you explain how to find out the active value of num_cmd_elems of an fc-adapter by
using the kdb. So you can decide, if the value of lsattr is active or not ...
I wonder if you can find out the values fc_err_recov and dyntrk of the fscsiX device.?
# lsattr -El fscsi0
attach switch How this adapter is CONNECTED False
dyntrk yes Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id 0x1021f Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True

I try to use echo efscsi fscsi0 | kdb .. but I can't figure it out..

Can you help my please?”


I did a little research on his behalf and came up with an answer. However, I’m not at all surprised
he had trouble finding the right information. It's not easy, clear or documented!

I received the following information from my IBM AIX contacts.


“The following relies on internal structures that are subject to change.

The procedure was tested on 6100-06, 6100-07, and 7100-01. I don't have a lab system with
physical HBAs and 5.3 at the moment.

Hopefully the same steps should work for 5.3. You may need to first run efscsi without arguments
to load the kdb module before running efscsi fscsiX.
# kdb
(0)> efscsi fscsi1 | grep efscsi_ddi
struct efscsi_ddi ddi = 0xF1000A060084A080
(0)> dd 0xF1000A060084A080+20 2
F1000A060084A0A0: 0101020202010200 000000B400000028 ...............(
FFDD NNNNNNNN

FF = fc_error_recov: 01=delayed_fail 02=fast_fail


DD = dyntrk: 00=disabled 01=enabled
NNNN=num_cmd_elems - 20 (20 reserved)
e.g. 200 - 20 = 180 = B4
So in this example, fc_err_recov is set to fast_fail (02), dyntrk is set to yes (01) and num_cmd
_elems is set to 200.“

I tested this on a lab system running AIX 6.1 TL6 and AIX 7.1 TL1. Starting with an FC adapter
with dyntrk disabled (set to no), fc_err_recov disabled (set to delayed_fail) and num_cmd_elems
set to 500.
# lsattr -El fscsi1
attach none How this adapter is CONNECTED False
dyntrk no Dynamic Tracking of FC Devices True
fc_err_recov delayed_fail FC Fabric Event Error RECOVERY Policy True
scsi_id Adapter SCSI ID False
sw_fc_class 3 FC Class for Fabric True

# lsattr -El fcs1 -a num_cmd_elems


num_cmd_elems 500 Maximum number of COMMANDS to queue to the adapter True

# kdb
(0)> efscsi fscsi1 | grep efscsi_ddi
struct efscsi_ddi ddi = 0xF1000A060096E080

(0)> dd 0xF1000A060096E080+20 2
F1000A060096E0A0: 0101020201000100 000001E000000028 ...............(
FFDD NNNNNNNN

OK, let’s break it down. From the kdb output we can determine the following:
·fc_error_recov is currently set to delayed_fail (FF=01 = fc_error_recov = delayed_fail).
· dyntrk is currently set to no (DD=00 = dyntrk = disabled).
· num_cmd_elems is currently set to 500 (NNNNNN=1E0 = num_cmd_elems = 480 + 20 = 500).
If I set dyntrk to yes, we notice that the value changes immediately within the kernel running
config. I was able to make this change without a reboot as the device was not in use.
# chdev -l fscsi1 -a dyntrk=yes
# kdb
(0)> efscsi fscsi1 | grep efscsi_ddi
struct efscsi_ddi ddi = 0xF1000A060096E080
(0)> dd 0xF1000A0800CB6080+20 2
F1000A0800CB60A0: 0101020201010200 000001E000000028 ...............(
FFDD NNNNNNNN

And now dynamic tracking is enabled (DD=01 = dyntrk = enabled, set to yes).

Poor old AIX 5.3 struggled to provide me with any information using the steps provided.

So what about max_xfer_size? For a physical FC adapter we can find the current value using the
following kdb commands:
(0)> efcs fcs1 |grep ddi
struct efc_ddi ddi = 0xF1000A06006D0080
(0)> dd 0xF1000A06006D0080+60 4
F1000A06006D00E0: 00000000000000C8 0000012C900000C1 ...........,....
F1000A06006D00F0: 900000C1000FFC00 0010000000800000 ................
Based on the output, num_cmd_elems is set to 200 (C8) and max_xfer_size is set to 1048576
(100000).

The max_xfer_size for VFC is tricky because it is contained in a structure that can and does
change between SPs and TLs. In 6100-06-01 max_xfer_size is offset 3932 bytes into the structure
so we get the value like this:
(0)> vfcs
NAME ADDRESS STATE HOST HOST_ADAP OPENED NUM_ACTIVE
fcs2 0xF100010100B38000 0xFFFF nimlab102-vfchost0 0x00 0x0000
(0)> dcal 3932
Value decimal: 3932 Value hexa: 00000F5C
(0)> dd 0xF100010100B38000+F50
F100010100B38F50: 0000002800000002 000000C800100000 ...(............

Perhaps the easiest way to handle changes between versions is to use the fact thatmax_xfer_size is
immediately after num_cmd_elems and that is very unlikely to change. So, knowing that the
structure size does not change by very much you can grep in the general area:
(0)> vfcs fcs2 | grep elems
num_cmd_elems: 0xC8

(0)> dd 0xF100010100B38000 200 | grep 000000C8


F100010100B38F50: 0000002800000002 000000C800100000 ...(............

Here are the links to my previous posts on kdb:


https://www.ibm.com/developerworks/mydeveloperworks/blogs/cgaix/entry/checking_num_cmd_
elems_for_vfc_adapters_with_kdb1?lang=en
https://www.ibm.com/developerworks/mydeveloperworks/blogs/cgaix/entry/checking_your_queu
e_depth_with_kdb?lang=en
Attention: just a note about max_xfer_size and virtual FC adapters. In my experience, if the values
for this attribute on the VIOC do not match those on the VIOS, then you will have trouble
configuring the virtual FC adapters. Possible side effects may include your system never booting
again!

So if I change the value to 0x200000 on the client, without mirroring this value on the VIO server,
I may encounter the following effects:
# rmdev -Rl fcs1
sfwcomm1 Defined
fscsi1 Defined
fcnet1 Defined
fcs1 Defined

# chdev -l fcs1 -a max_xfer_size=0x200000


fcs1 changed

The cfgmgr command will report errors for the FC adapter.


# cfgmgr
Method error (/usr/lib/methods/cfgefscsi -l fscsi1 ):
0514-061 Cannot find a child device.
Method error (/usr/lib/methods/cfgstorfworkcom -l sfwcomm1 ):
0514-040 Error initializing a device into the kernel.

Errors, similar to the following, may appear in the AIX error report.
# errpt | grep fcs
0E0C5B31 0726123812 U S fcs1 Undefined error
8C9E9221 0726123812 I S fcs1 Informational message

You’ll observe messages in the error report that claim a request from the client was rejected by the
VIOS.
...
Request was rejected by VIOS
Response was rejected by the client
...
# errpt -aN fcs1
---------------------------------------------------------------------------
LABEL: VFC_ERR8
IDENTIFIER: 0E0C5B31
Date/Time: Thu Jul 26 12:38:29 EETDT 2012
Sequence Number: 1040
Machine Id: 00C123C64C00
Node Id: aixlpar1
Class: S
Type: UNKN
WPAR: Global
Resource Name: fcs1

Description
Undefined error

Probable Causes
PROCESSOR

Failure Causes
PROCESSOR
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
Error Location
0000 00E0
Error Type
00
RC
FFFF FFFF FFFF FFFF
VIO Server Partition Name
vio2
Physical Adapter Instance Name
vfchost50
Physical Adapter Location Code
U5873.001.8SS0071-P2-C6-T1
Physical Adapter DRC Name
U9119.FHB.87654C6-V7-C1100
Adapter N Port ID
0000 0000 0000 0000
Adapter State
0000 FFFF
Additional Information
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: VFC_ERR7
IDENTIFIER: 8C9E9221
Date/Time: Thu Jul 26 12:38:29 EETDT 2012
Sequence Number: 1039
Machine Id: 00C123C64C00
Node Id: aixlpar1
Class: S
Type: INFO
WPAR: Global
Resource Name: fcs1

Description
Informational message

Probable Causes
Request was rejected by VIOS
Response was rejected by the client

Failure Causes
PROCESSOR
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
Error Location
0000 0088
Error Type
00
RC
0000 0000 0010 0000
VIO Server Partition Name
vio2
Physical Adapter Instance Name
vfchost50
Physical Adapter Location Code
U5873.001.8SS0071-P2-C6-T1
Physical Adapter DRC Name
U9119.FHB.87654C6-V7-C1100
Adapter N Port ID
0000 0000 0000 0000
Adapter State
0000 0004

If you encounter this problem, restore the clients FC adapter attributes to their previous values
before restarting the system. If you don’t, then your LPAR may no longer boot and may hang on
LED 554. Change your VIOS first then update your VIO clients.

You might also like