Some basic tools for System Performance Turning.
================================================


You may use the following guidelines to identify the bottle 
neck of your system (whether  the system is DISK, CPU, MEMORY, 
or NETWORK bounded.)


Disk I/O
==========

use iostat -x

Look for the %b (% of busy)
If constantly over 15%, needs to investigate.
If constantly over 30%, needs to fix.
We can also use this figure for disk load balancing

Look for the service time (SVC_t)
If constantly stay over 40, needs to fix.


e.g.

csh> iostat -x 30

device    r/s  w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd0       0.3  0.4    2.0    3.8  0.0  0.0   47.8   0   1 
sd16     10.4 14.6   89.4  124.5  0.1  2.2   92.0   1  22 
sd17      9.9 26.3   69.4  188.9  0.1  3.3   93.0   0  25 
nfs1      0.0  0.0    0.1    0.0  0.0  0.0  323.5   0   0 
nfs2      0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0 
nfs4      0.0  0.0    0.0    0.0  0.0  0.0  374.5   0   0 
                               extended device statistics
device    r/s  w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
sd0       0.1  0.2    0.6    1.5  0.0  0.0   12.1   0   0 
sd16      5.7  5.4   41.3   42.4  0.0  0.3   26.9   0  10 
sd17      4.9  5.6   27.2   35.8  0.0  0.1    6.8   0   7 
nfs1      0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0 
nfs2      0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0 
nfs4      0.0  0.0    0.0    0.0  0.0  0.0    0.0   0   0 


CPU
===

Use mpstat and vmstat

First use mpstat to figure out how many CPU there.
Then look for the runable process (r colummn) in the vmstat
If (runable process / no. of CPU ) is constantly over 5, 
action need to be taken.

As for the CPU utilization (the last 3 column of vmstat),
ideally, it should be 85% user (us) and 15% system (sy).


e.g.

csh> mpstat
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    3   0    0   231   28   69    5    0    0    0   279    1   1   0  99

csh> vmstat 30
 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr aa dd f0 --   in   sy   cs us sy id
 0 0 0   1480  2224   0   3  0  0  0  0  0  0  0  0  0  131  279   69  1  1 99
 0 0 0 183728  3376   0  12  0  0  3  0  0  0  0  0  0  195  440  132  1  1 97
 0 0 0 184024  3656   0   0  0  0  0  0  0  0  1  0  0  166  391  102  1  1 98


Memory
======

Look for the scan rate (sr) in the vmstat of 30 minutes interval.
If the sr is over 200, need to investigate and keep close monitor.
You may use "ps -el" to find out (look for the SZ column) which 
process eat up the memory.

If the sr is over 300, need to take action to increase RAM.

The large no. in the w column may show healthy (process were
swapped due to inactivity).

The swap under the memory section indicate the amount of swap (in KB)
currently available. It should not fall below 20M.

e.g.

csh> vmstat 1800
 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s1 --   in   sy   cs us sy id
 0 0 0    936  1584   5  39 96 133 148 0 3  1 25 36  0 4294967202 277 176 5 14 81
 0 0 0 999528  7368   2  36 56 72 76  0  1  1 15 16  0  252  225  124  3  6 91


NETWORK I/O
============

Use  "netstat -i" 


Looks at the Ierrs,  Oerrs and Collis

netstat -i
Name  Mtu  Net/Dest      Address        Ipkts  Ierrs Opkts Oerrs Collis Queue
le0   1500 ethernet      grsun1         653637  20  116339  1    1478   0
lo0   1536 127.0.0.0     localhost      193     0    193    0    0      0


the Ierrs/Ipkts and Oerrs/Opkts should be < 0.025 % 


Large Ierrs => the interface just discards the packet

            => there may be fautly hardware on the network
                (Faulty hardware can be anything from another
                computer system that is generating packets 
                improperly to a bad connector or terminator)

            => or your system cannot receive packets fastenough
                (use spray to check it)


Large Oerrs => your system's network infterface is faulty.

            => something wrong the CPU and the ethernet cable

            => the problem should be local not from outsiders
                (we can do a loop back testing for the
                 ethernet interface
                 "test net" at the o.k. prompt )


Collisions are normal evernts and don't indicate hardware
problems. However, if  Collis/Opkts > 10 % constanly 
 => network overloaded

         We may use the snoop, etherfind, tcptop, and
protocol analyser to trace the source of the network traffic
(e.g the broadcast messages or NFS packets)