Some basic tools for System Performance Turning. ================================================ You may use the following guidelines to identify the bottle neck of your system (whether the system is DISK, CPU, MEMORY, or NETWORK bounded.) Disk I/O ========== use iostat -x Look for the %b (% of busy) If constantly over 15%, needs to investigate. If constantly over 30%, needs to fix. We can also use this figure for disk load balancing Look for the service time (SVC_t) If constantly stay over 40, needs to fix. e.g. csh> iostat -x 30 device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.3 0.4 2.0 3.8 0.0 0.0 47.8 0 1 sd16 10.4 14.6 89.4 124.5 0.1 2.2 92.0 1 22 sd17 9.9 26.3 69.4 188.9 0.1 3.3 93.0 0 25 nfs1 0.0 0.0 0.1 0.0 0.0 0.0 323.5 0 0 nfs2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 nfs4 0.0 0.0 0.0 0.0 0.0 0.0 374.5 0 0 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.1 0.2 0.6 1.5 0.0 0.0 12.1 0 0 sd16 5.7 5.4 41.3 42.4 0.0 0.3 26.9 0 10 sd17 4.9 5.6 27.2 35.8 0.0 0.1 6.8 0 7 nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 nfs2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 nfs4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 CPU === Use mpstat and vmstat First use mpstat to figure out how many CPU there. Then look for the runable process (r colummn) in the vmstat If (runable process / no. of CPU ) is constantly over 5, action need to be taken. As for the CPU utilization (the last 3 column of vmstat), ideally, it should be 85% user (us) and 15% system (sy). e.g. csh> mpstat CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 3 0 0 231 28 69 5 0 0 0 279 1 1 0 99 csh> vmstat 30 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr aa dd f0 -- in sy cs us sy id 0 0 0 1480 2224 0 3 0 0 0 0 0 0 0 0 0 131 279 69 1 1 99 0 0 0 183728 3376 0 12 0 0 3 0 0 0 0 0 0 195 440 132 1 1 97 0 0 0 184024 3656 0 0 0 0 0 0 0 0 1 0 0 166 391 102 1 1 98 Memory ====== Look for the scan rate (sr) in the vmstat of 30 minutes interval. If the sr is over 200, need to investigate and keep close monitor. You may use "ps -el" to find out (look for the SZ column) which process eat up the memory. If the sr is over 300, need to take action to increase RAM. The large no. in the w column may show healthy (process were swapped due to inactivity). The swap under the memory section indicate the amount of swap (in KB) currently available. It should not fall below 20M. e.g. csh> vmstat 1800 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s1 -- in sy cs us sy id 0 0 0 936 1584 5 39 96 133 148 0 3 1 25 36 0 4294967202 277 176 5 14 81 0 0 0 999528 7368 2 36 56 72 76 0 1 1 15 16 0 252 225 124 3 6 91 NETWORK I/O ============ Use "netstat -i" Looks at the Ierrs, Oerrs and Collis netstat -i Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue le0 1500 ethernet grsun1 653637 20 116339 1 1478 0 lo0 1536 127.0.0.0 localhost 193 0 193 0 0 0 the Ierrs/Ipkts and Oerrs/Opkts should be < 0.025 % Large Ierrs => the interface just discards the packet => there may be fautly hardware on the network (Faulty hardware can be anything from another computer system that is generating packets improperly to a bad connector or terminator) => or your system cannot receive packets fastenough (use spray to check it) Large Oerrs => your system's network infterface is faulty. => something wrong the CPU and the ethernet cable => the problem should be local not from outsiders (we can do a loop back testing for the ethernet interface "test net" at the o.k. prompt ) Collisions are normal evernts and don't indicate hardware problems. However, if Collis/Opkts > 10 % constanly => network overloaded We may use the snoop, etherfind, tcptop, and protocol analyser to trace the source of the network traffic (e.g the broadcast messages or NFS packets)