Middleware Yada Yada: Solaris performance monitoring commands

iostat
vmstat
netstat

iostat

syntax:

iostat [options] interval count

option – let you specify the device for which information is needed like disk , cpu or terminal. (-d, -c, t or -tdc ). x options gives the extended statistics.

interval – is time period in seconds between two samples. iostat 4 will give data at each 4 seconds interval.

count – is the number of times the data is needed. iostat 4 5 will give data at 4 seconds interval 5 times

example:

 $ iostat -xtc 5 2
                      extended disk statistics       tty         cpu
 disk r/s  w/s Kr/s Kw/s wait actv svc_t  %w  %b  tin tout us sy wt id
 sd0   2.6 3.0 20.7 22.7 0.1  0.2  59.2   6   19   0   84  3  85 11 0
 sd1   4.2 1.0 33.5  8.0 0.0  0.2  47.2   2   23
 sd2   0.0 0.0  0.0  0.0 0.0  0.0   0.0   0    0
 sd3  10.2 1.6 51.4 12.8 0.1  0.3  31.2   3   31

The fields have the following meanings:
  disk    name of the disk
  r/s     reads per second
  w/s     writes per second
  Kr/s    kilobytes read per second
  Kw/s    kilobytes written per second
  wait    average number of transactions waiting for service (Q length)
  actv    average number of transactions actively being serviced (removed from the queue but not yet completed)
  %w      percent of time there are transactions waiting for service (queue non-empty)
  %b      percent of time the disk is busy (transactions in progress)

The values to look from the iostat output are:

Reads/writes per second (r/s, w/s)

Percentage busy (%b) (%b > 5 is bad)

Service time (svc_t) (svc_t > 30ms is bad)

If a disk shows consistently high reads/writes along with , the percentage busy (%b) of the disks is greater than 5 percent, and the average service time (svc_t) is greater than 30 milliseconds, then one of the following action needs to be taken-

Tune the application to use disk i/o more efficiently by modifying the disk queries and using available cache facilities of application servers.
Spread the file system of the disk on to two or more disk using disk striping feature of volume manager /disksuite etc.
Increase the system parameter values for inode cache, ufs_ninode, which is Number of inodes to be held in memory. Inodes are cached globally (for UFS), not on a per-file system basis.
Move the file system to another faster disk /controller or replace existing disk/controller to a faster one.

vmstat

syntax:

vmstat [options] interval count

option – let you specify the type of information needed such as paging -p, cache -c, interrupt -i etc. if no option is specified information about process, memory, paging, disk, interrupts & cpu is displayed.

interval – is time period in seconds between two samples. vmstat 4 will give data at each 4 seconds interval.

count – is the number of times the data is needed. vmstat 4 5 will give data at 4 seconds interval 5 times.

example:

$vmstat 5
procs  memory          page             disk      faults        cpu
r b w swap  free re mf pi p fr de sr s0 s1 s2 s3  in  sy  cs us sy id
0 0 0 11456 4120 1  41 19 1  3  0  2  0  4  0  0  48 112 130  4 14 82
0 0 1 10132 4280 0   4 44 0  0  0  0  0 23  0  0 211 230 144  3 35 62
0 0 1 10132 4616 0   0 20 0  0  0  0  0 19  0  0 150 172 146  3 33 64
0 0 1 10132 5292 0   0  9 0  0  0  0  0 21  0  0 165 105 130  1 21 78

procs
r     in run queue
b     blocked for resources I/O, paging etc.
w     swapped

memory (in Kbytes)
swap -  amount  of  swap   space   currently   available
free - size of the free list

page (in units per second).
re    page reclaims - see  -S option for how this field is modified.
mf    minor faults - see  -S option for how this field is modified.
pi    kilobytes paged in
po    kilobytes paged out
fr    kilobytes freed
de    anticipated short-term memory shortfall (Kbytes)
sr    pages scanned by clock algorithm

disk (operations per second).
There are slots for up to four disks, labeled with a single letter and number.
The letter indicates the type of disk (s = SCSI, i = IPI, etc). The number is
the logical unit number.

faults
in    (non clock) device interrupts
sy    system calls
cs    CPU context switches

cpu breakdown of percentage usage of CPU time. On multiprocessors this is an average across all processors.
us    user time
sy    system time
id    idle time

CPU issues:

Following columns has to be watched to determine if there is any cpu issue:

Processes in the run queue (procs r)
User time (cpu us)
System time (cpu sy)
Idle time (cpu id)

     procs      cpu
 r b w    us sy  id
 0 0 0    4  14  82
 0 0 1    3  35  62
 0 0 1    3  33  64
 0 0 1    1  21  78

Problem symptoms:

If the number of processes in run queue (procs r) are consistently greater than the number of CPUs on the system it will slow down system as there are more processes then available CPUs.
if this number is more than four times the number of available CPUs in the system then system is facing shortage of cpu power and will greatly slow down the processes on the system.
If the idle time (cpu id) is consistently 0 and if the system time (cpu sy) is double the user time (cpu us) system is facing shortage of CPU resources.

Resolution to these kind of issues involves tuning of application procedures to make efficient use of cpu and as a last resort increasing the cpu power or adding more cpu to the system.

Memory Issues:

Memory bottlenecks are determined by the scan rate (sr) . The scan rate is the pages scanned by the clock algorithm per second. If the scan rate (sr) is continuously over 200 pages per second then there is a memory shortage.

Resolution:

Tune the applications & servers to make efficient use of memory and cache.
Increase system memory.
Implement priority paging in s in pre Solaris 8 versions by adding line “set priority paging=1″ in /etc/system. Remove this line if upgrading from Solaris 7 to 8 & retaining old /etc/system file.

netstat

syntax:

netstat [option/s]

Options
-a              - displays the state of all sockets.
-r              - shows the system routing tables
-i              - gives statistics on a per-interface basis.
-m              - displays information from the network memory buffers. On Solaris, this shows statistics for streams
-p [proto]      - retrieves statistics for the specified protocol
-s              - shows per-protocol statistics. (some implementations allow -ss to remove fileds with a value of 0 (zero) from the display.)
-D              - display the status of DHCP configured interfaces.
-n              - do not lookup hostnames, display only IP addresses.
-d              - (with -i) displays dropped packets per interface.
-I [interface]  - retrieve information about only the specified interface.
-v              - be verbose
interval        - number for continuous display of statictics.

example:

$netstat -rn

Routing Table: IPv4
Destination           Gateway               Flags  Ref   Use   Interface
-------------------- -------------------- ----- ----- ------ ---------
192.168.1.0          192.168.1.11          U        1   1444      le0
224.0.0.0            192.168.1.11          U        1   0         le0
default              192.168.1.1           UG       1   68276
127.0.0.1            127.0.0.1             UH       1   10497     lo0

This shows the output on a Solaris machine who’s IP address is 192.168.1.11 with a default router at 192.168.1.1

Network availability

The command as above is mostly useful in troubleshooting network accessibility issues. When outside network is not accessible from a machine check the following

if the default router ip address is correct.
you can ping it from your machine.
If router address is incorrect it can be changed with route add command. See man route for more info.
route command examples:
$route add default [hostname]
$route add 192.0.2.32 [gateway_name]

If the router address is correct but still you can’t ping it there may be some network cable /hub/switch problem and you have to try and eliminate the faulty component.

Network Response

$ netstat -i
Name    Mtu     Net/Dest    Address     Ipkts   Ierrs   Opkts   Oerrs   Collis  Queue
lo0     8232    loopback    localhost   77814      0        77814      0         0         0
hme0    1500    server1     server1     10658566       3        4832511        0         279257      0

This option is used to diagnose the network problems when the connectivity is there but it is slow in response.

Values to look at:

Collisions (Collis)

Output packets (Opkts)

Input errors (Ierrs)

Input packets (Ipkts)

The above values will give information to workout.

Network collision rate as follows:

Network collision rate = Output collision counts / Output packets

Network-wide collision rate greater than 10 percent will indicate

Overloaded network,

Poorly configured network,

Hardware problems.

Input packet error rate as follows:

Input Packet Error Rate = Ierrs / Ipkts

If the input error rate is high (over 0.25 percent), the host is dropping packets. Hub/switch cables etc needs to be checked for potential problems.

Network socket & TCP Cconnection state

netstat gives important information about network socket and tcp state. This is very useful in finding out the open, closed and waiting network tcp connection.

Network states returned by netstat are following:
LISTEN         ---- Listening for incoming connections.
SYN_SENT       ---- Actively trying to establish connection.
SYN_RECEIVED   ---- Initial synchronization of the connection under way.
ESTABLISHED    ---- Connection has been established.
FIN_WAIT_1     ---- Socket closed; shutting down connection.
FIN_WAIT_2     ---- Socket closed; waiting for shutdown from remote.
CLOSE_WAIT     ---- Remote shut down; waiting for the socket to close.
CLOSING        ---- Closed, then remote shutdown; awaiting acknowledgement.
CLOSED         ---- Closed. The socket is not being used.
LAST_ACK       ---- Remote shut down, then closed; awaiting acknowledgement.
TIME_WAIT      ---- Wait after close for remote shutdown retransmission.

$netstat -a
Local Address    Remote Address             Swind   Send-Q      Rwind       Recv-Q      State
*.*                 *.*                         0   0           24576           0       IDLE
*.22                *.*                         0   0           24576           0       LISTEN
*.22                *.*                         0   0           24576           0       LISTEN
*.*                 *.*                         0   0           24576           0       IDLE
*.32771             *.*                         0   0           24576           0       LISTEN
*.4045              *.*                         0   0           24576           0       LISTEN
*.25                *.*                         0   0           24576           0       LISTEN
*.5987              *.*                         0   0           24576           0       LISTEN
*.898               *.*                         0   0           24576           0       LISTEN
*.32772             *.*                         0   0           24576           0       LISTEN
*.32775             *.*                         0   0           24576           0       LISTEN
*.32776             *.*                         0   0           24576           0       LISTEN
*.*                 *.*                         0   0           24576           0       IDLE
192.168.1.184.22    192.168.1.186.50457     41992   0           24616           0       ESTABLISHED
192.168.1.184.22    192.168.1.186.56806     38912   0           24616           0       ESTABLISHED
192.168.1.184.22    192.168.1.183.58672     18048   0           24616           0       ESTABLISHED

If you see a lots of connections in FIN_WAIT state tcp/ip parameters have to be tuned because the connections are not being closed and they gets accumulating. After some time system may run out of resource. TCP parameter can be tuned to define a time out so that connections can be released and used by new connection.

Middleware Yada Yada

About Me

Wednesday, October 7, 2009

Solaris performance monitoring commands

iostat

syntax:

iostat [options] interval count

example:

vmstat

syntax:

example:

netstat

syntax:

example:

No comments:

Blog Archive