Friday, April 30, 2010

Trace port in AIX

AIX Command

1. netstat -Aan | grep
- This shows if the specified is being used. The hex number in the first column is the address of protocol control block (PCB)

2. rmsock tcpcb
- This shows the process who is holding the socket. Note that this command must be run as root.

AIX Example

Let's set SVCENAME to 30542, so that the listener will use this port. Then, use the commands above to check if the port is indeed being used by DB2 LUW.

$ db2 update dbm cfg using svcename 30542
$ db2start
$ netstat -Aan | grep 30542
f10000f303321b58 tcp4 0 0 *.30542 *.* LISTEN

The netstat command, above, shows that the port 30542 is being used for listening. To confirm that it is DB2 LUW that's using the port, run rmsock as root like following.

$ rmsock f10000f303321b58 tcpcb
The socket 0x3321800 is being held by proccess 692476 (db2sysc).

This shows that it's db2sysc process that's using the port, and its PID is 692476.

Note that rmsock, unlike what its name implies, does not remove the socket, if the socket is being used by any process. Instead of removing the socket, it just reports the process holding the socket. Also note that the second argument of rmsock is the protocol. It's tcpcb in the example to indicate that the protocol is TCP.

Windows commands for port

Windows Command

netstat -an |find /i "listening"
netstat -an |find /i "established"
netstat -ao |find /i "listening"

1. netstat -aon | findstr ""

This shows if the specified is being used. The number in the last column is the process id (PID) of the process holding the socket. Once PID is determined, one can refer to "Windows Task Manager" to determine which application corresponds to the PID.

Windows Example

C:\>netstat -aon | findstr "50000"
TCP 0.0.0.0:50000 0.0.0.0:0 LISTENING 2564

C:\>pslist 2564

pslist v1.28 - Sysinternals PsList
Copyright ¬ 2000-2004 Mark Russinovich
Sysinternals

Process information for MACHINENAME:

Name Pid Pri Thd Hnd Priv CPU Time Elapsed Time
db2syscs 2564 8 15 366 30912 0:00:02.859 2:12:08.564

-------------------------

To find and trace open ports in unix

Listing all the pids:
---------------------
/usr/bin/ps -ef | sed 1d | awk '{print $2}'


Mapping the files to ports using the PID:
-------------
/usr/proc/bin/pfiles 2>/dev/null | /usr/xpg4/bin/grep
or
/usr/bin/ps -o pid -o args -p | sed 1d


Mapping the sockname to port using the port number:
----------------------
for i in `ps -e|awk '{print $1}'`; do echo $i; pfiles $i 2>/dev/null | grep 'port: 8080'; done
or
pfiles -F /proc/* | nawk '/^[0-9]+/ { proc=$2} ; /[s]ockname: AF_INET/ { print proc "\n " $0 }'


There were two explanations why "lsof" did not show, what was expected:

1) One thing that might prevent lsof to print all, is if the ports are controlled by inetd
or some such (i.e. there is nothing actively listening on them until you try talking to them).

Also, try telneting to the port and then run lsof while the telnet session is connected.

2) On Solaris 10, using "lsof -i" to show mapping of processes to TCP ports incorrectly shows all
processes that have socket open as using port 65535, for example:

sshd 8005 root 8u IPv4 0x60007ebdac0 0t0 TCP *:65535
(LISTEN)
sendmail 1116 root 5u IPv4 0x60007ecce00 0t0 TCP *:65535
(LISTEN)

This is a known bug in lsof that can _not_ be fixed because of differences between Solaris 10
and previous versions. So the useful "lsof -i :" is now not useful.

Thursday, April 8, 2010

EJB Container tuning

If you use applications that affect the size of the EJB Container Cache, it is possible that the performance of your applications can be impacted by an incorrect size setting. Monitoring Tivoli Performance Viewer (TPV) is a great way to diagnose if the EJB Container Cache size setting is tuned correctly for your application. If the application has filled the cache causing evictions to occur, TPV will show a very high rate of ejbStores() being called and probably a lower than expected CPU utilization on the application server machine

managed &Unmanaged web server

Unmanaged web server
Unmanaed web servers reside on a System without a node agent. This is the only option in a standalone server environment and is a common option for Web Servers installed outside a firewall. The use of this topology requires that each time the plug-in configuration file is generated, it is copied from the machine where WebSphere Application Server is installed to machine where the server is running.

If the Web server is defined to an unmanaged node, you can do the following:

1.Check the status of the Web server.

2.Generate a plug-in configuration file for that Web server.

3.If the Web server is an IBM HTTP Server and the IHS Administration server is
installed and properly configured, you can also:

Display the IBM HTTP Server Error log (error.log) and Access log (access.log) files.

Start and stop the server.

Display and edit the IBM HTTP Server configuration file (httpd.conf).

Propagate the plug-in configuration file after it is generated.

You cannot propagate an updated plug-in configuration file to a non-IHS Web server that is defined to an unmanaged node. You must install an updated plug-in configuration file manually to a Web server that is defined to an unmanaged node
-------------------------------------------------------------------------------------
Managed Web Server

In a distributed server environment, you can define multiple Web servers. These
Web servers can be defined on managed or unmanaged nodes. A managed node
has a node agent. If the Web server is defined to a managed node, you can do
the following:

1.Check the status of the Web server.

2.Generate a plug-in configuration file for that Web server.

3.Propagate the plug-in configuration file after it is generated.

If the Web server is an IBM HTTP Server (IHS) and the IHS Administration
server is installed and properly configured, you can also:

Display the IBM HTTP Server Error log (error.log) and Access log
(access.log) files.

Start and stop the server.

Display and edit the IBM HTTP Server configuration file (httpd.conf)

Tuning data source - Connection pool tuning

You can tune the Connection pool from WAS Admin Console

Maximum connections Specifies the maximum number of physical connections that can be created in this pool. These are the physical connections to the backend datastore. When this number is reached, no new physical connections are created; requestors must wait until a physical connection that is currently in use is returned to the pool. For optimal performance, set the value for the connection pool lower than the value for the Web container threadpool size. Lower settings, such as 10 to 30 connections, might perform better than higher settings, such as 100

Minimum Connections: Specifies the minimum number of physical connections to maintain. Until this number is exceeded, the pool maintenance thread does not discard physical connections. If you set this property for a higher number of connections than your application ultimately uses at run time, you do not waste application resources. WebSphere Application Server does not create additional connections to achieve your minimum setting. Of course, if your application requires more connections than the value you set for this property, application performance diminishes as connection requests wait for fulfillment.

Connection Timeout : Specifies the interval, in seconds, after which a connection request times out and a ConnectionWaitTimeoutException is thrown.

This value indicates the number of seconds a request for a connection waits when there are no connections available in the free pool and no new connections can be created, usually because the maximum value of connections in the particular connection pool has been reached. For example, if Connection Timeout is set to 300, and the maximum number of connections are all in use, the pool manager waits for 300 seconds for a physical connection to become available. If a physical connection is not available within this time, the pool manager initiates a ConnectionWaitTimeout exception. It usually does not make sense to retry the getConnection() method; if a longer wait time is required you should increase the Connection Timeout setting value. If a ConnectionWaitTimeout exception is caught by the application, the administrator should review the expected connection pool usage of the application and tune the connection pool and database accordingly.
If the Connection Timeout is set to 0, the pool manager waits as long as necessary until a connection becomes available. This happens when the application completes a transaction and returns a connection to the pool, or when the number of connections falls below the value of Maximum Connections, allowing a new physical connection to be created.
If Maximum Connections is set to 0, which enables an infinite number of physical connections, then the Connection Timeout value is ignored.

Reap Time Specifies the interval, in seconds, between runs of the pool maintenance thread.

For example, if Reap Time is set to 60, the pool maintenance thread runs every 60 seconds. The Reap Time interval affects the accuracy of the Unused Timeout and Aged Timeout settings. The smaller the interval, the greater the accuracy. If the pool maintenance thread is enabled, set the Reap Time value less than the values of Unused Timeout and Aged Timeout. When the pool maintenance thread runs, it discards any connections remaining unused for longer than the time value specified in Unused Timeout, until it reaches the number of connections specified in Minimum Connections. The pool maintenance thread also discards any connections that remain active longer than the time value specified in Aged Timeout.

The Reap Time interval also affects performance. Smaller intervals mean that the pool maintenance thread runs more often and degrades performance.

To disable the pool maintenance thread set Reap Time to 0, or set both Unused Timeout and Aged Timeout to 0. The recommended way to disable the pool maintenance thread is to set Reap Time to 0, in which case Unused Timeout and Aged Timeout are ignored. However, if Unused Timeout and Aged Timeout are set to 0, the pool maintenance thread runs, but only physical connections which timeout due to non-zero timeout values are discarded.


Unused Timeout: Specifies the interval in seconds after which an unused or idle connection is discarded.

Set the Unused Timeout value higher than the Reap Timeout value for optimal performance. Unused physical connections are only discarded if the current number of connections exceeds the Minimum Connections setting. For example, if the unused timeout value is set to 120, and the pool maintenance thread is enabled (Reap Time is not 0), any physical connection that remains unused for two minutes is discarded


Aged TimeoutSpecifies the interval in seconds before a physical connection is discarded.

Setting Aged Timeout to 0 supports active physical connections remaining in the pool indefinitely. Set the Aged Timeout value higher than the Reap Timeout value for optimal performance. For example, if the Aged Timeout value is set to 1200, and the Reap Time value is not 0, any physical connection that remains in existence for 1200 seconds (20 minutes) is discarded from the pool. The only exception is if the connection is involved in a transaction when the aged timeout is reached. If it is the connection is closed immediately after the transaction completes.


Purge Policy: Specifies how to purge connections when a stale connection or fatal connection error is detected



EntirePool:All connections in the pool are marked stale. Any connection not in use is immediately closed. A connection in use is closed and issues a stale connection Exception during the next operation on that connection. Subsequent getConnection() requests from the application result in new connections to the database opening. When using this purge policy, there is a slight possibility that some connections in the pool are closed unnecessarily when they are not stale. However, this is a rare occurrence. In most cases, a purge policy of EntirePool is the best choice.

FailingConnectionOnly:Only the connection that caused the stale connection exception is closed. Although this setting eliminates the possibility that valid connections are closed unnecessarily, it makes recovery from an application perspective more complicated. Because only the currently failing connection is closed, there is a good possibility that the next getConnection() request from the application can return a connection from the pool that is also stale, resulting in more stale connection exceptions.
The connection pretest function attempts to insulate an application from pooled connections that are not valid. When a backend resource, such as a database, goes down, pooled connections that are not valid might exist in the free pool. This is especially true when the purge policy is failingConnectionOnly; in this case, the failing connection is removed from the pool. Depending on the failure, the remaining connections in the pool might not be valid.

Memory-CPU in unix

Task I - Identifying a memory DOS and responding
In this task, you will start a memory denial of service against yourself and then add
swap on the fly to attempt to buy more time. If you were the client from the module 4
exercise, you will have to create the memory DOS script (step 16) from the module 4 lab.
1) Open 3 seperate terminal windows. In the first terminal, start a vmstat with an interval of one second.
[1] # vmstat 1
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr dd f0 s5 s1 in sy cs us sy id
0 0 25 811888 326896 3 11 8 5 5 0 5 2 0 0 0 311 564 89 2 1 97
0 0 25 781816 264256 13 8 680 0 0 0 0 119 0 0 0 634 6097 360 1 21 78
0 1 25 781816 263288 20 0 896 0 0 0 0 135 0 0 0 742 3591 421 2 12 86

2) In the second terminal, invoke the "hog" script in the /export/home/guest directory.
[2] # cd /export/home/guest
[2] # ./hog
3) In the third terminal window, create an 128mb swap file and add it on the fly.
[3] # mkfile 128m /export/swapfile
[3] # swap -a /export/swapfile
4) Observer the vmstat output in terminal 1. Did the "swap" column grow?
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr dd f0 s5 s1 in sy cs us sy id
1 1 25 1560 8584 612 6607 624 8 8 200 0 69 0 0 0 514 7979 1086 27 72 1
1 1 25 1280 7680 534 2471 4536 3656 3744 40 439 145 0 0 0 841 3537 671 14 40
0 1 25 127608 8232 415 513 4640 5648 5688 0 202 100 0 0 0 546 615 245 1 11 88
0 3 25 123104 8464 449 558 5008 5080 5080 0 0 113 0 0 0 557 974 322 2 13 85
5) Observe the hog output in terminal 2. It appears that the script continued to run
even though /tmp was full. As soon as the swap space was added, the script started writing
files into the /tmp space.
cat: write error: No space left on device <---script still running even though /tmp is full
+ let x=x+1
+ [ 1805 -eq 100000 ]
+ cat /var/sadm/install/contents
+ 1>> /tmp/file.1805
cat: write error: No space left on device <---script still running even though /tmp is full
+ let x=x+1
+ [ 1806 -eq 100000 ]
+ cat /var/sadm/install/contents <---script appears to resume writing to /tmp
+ 1>> /tmp/file.1806
+ let x=x+1
6) Stop the script by issuing a ^C in terminal 2. Clean up the /tmp directory.
[3] # cd /tmp
[3] # rm -r /tmp/file*
7) Stop the vmstat by issuing a ^C in terminal 1.
Task II - Limiting the size of /tmp in the /etc/vfstab
In this exercise, you will limit the size of the /tmp filesystem and then run a memory DOS
against yourself to see if the filesystem limit worked.
1) Observer the size of the /tmp file system with the df command.
# df -k /tmp
Filesystem kbytes used avail capacity Mounted on
swap 900360 16 900344 1% /tmp
2) Edit the /etc/vfstab and configure /tmp to have a maximum size of 128m
# vi /etc/vfstab
swap - /tmp tmpfs - yes size=128m
3) Since /tmp cannot unmount. You will have to reboot the workstation
# init 6
4) After the workstation has rebooted, check the size of /tmp again.
Notice that it is much smaller than the previous size.
# df -k /tmp
Filesystem kbytes used avail capacity Mounted on
swap 131072 344 130728 1% /tmp
5) Open 3 terminal windows. In the first window, start a vmstat at 1 second
intervals.
[1] # vmstat 1
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr dd f0 s5 s1 in sy cs us sy id
0 0 0 856456 426696 29 122 134 0 0 0 0 19 0 0 0 363 662 157 3 8 89
0 0 0 878352 414024 0 8 0 0 0 0 0 0 0 0 0 356 171 91 0 0 100
6) In a second terminal window, start the hog script.
[2] # cd /export/home/guest
[2] # ./hog
7) Notice in terminal 2 that the hog script will error out more quickly with a "No space left on
device" while in terminal 1, the vmstat reports plenty of virtual memory in the "swap"
column. Since the /tmp file system has been limited, the workstation is now protected
against a /tmp DOS. However, the /tmp fiilesystem is still full. Any applications
that need to write to the /tmp space will be unable to do so.
Task III - Identifying a CPU DOS and responding to it.
The following task teaches how to detect a CPU DOS and prevent future CPU DOS.
The task requires you to fork bomb your own system. This may cause the system
to stop responding. Be sure to save all of your work.
1) Open 4 terminal windows. In the first terminal window, use the sar command to monitor
the process table size. Have sar monitor every second for 1000 seconds. Notice the proc-sz
value.
[1] # sar -v 1 1000
19:40:20 proc-sz ov inod-sz ov file-sz ov lock-sz
19:40:21 63/7914 0 1955/33952 0 392/392 0 0/0
19:40:22 63/7914 0 1955/33952 0 392/392 0 0/0
19:40:23 63/7914 0 1955/33952 0 392/392 0 0/0
19:40:24 63/7914 0 1955/33952 0 392/392 0 0/0
2) In the second terminal window, use the vmstat command to monitor the run queue (r) field.
[2] # vmstat 1
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr dd f0 s5 s1 in sy cs us sy id
0 0 0 800712 352024 26 102 40 0 0 0 0 5 0 0 0 334 432 124 2 3 95
0 0 0 875544 420648 0 8 0 0 0 0 0 0 0 0 0 431 317 120 1 0 99
3) In the third terminal window, login via telnet to the localhost as the user guest.
[3] # telnet localhost
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
SunOS 5.8
login: guest
Password:
Last login: Tue Jul 30 15:58:55 from localhost
Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001
$
4) As user guest, create a fork bomb by editing two scripts called "a" and "b". These
scripts will do nothing but call each other an execute sleep processes. They will
continue in an infinite loop until the process table fills to capacity.
[3] $ vi a
./b &
sleep 20 &
[3] $ vi b
./a &
sleep 20 &
5) Make the scripts executable.
[3] $ chmod 555 a b
6) Execute the scripts. As soon as these scripts execute, look immediatley to terminal
windows 1 and 2 and notice the drastic change.
[3] $ ./a &
7) If the system is still responsive, monitor the vmstat and sar output. Also, in terminal
window 4, issue a ps -ef command.
[4] # ps -ef
<>
guest 9723 9722 0 19:51:18 ?? 0:00 -sh
guest 9767 9766 0 19:51:18 ?? 0:00 -sh
guest 8427 1 0 0:00
guest 9709 9708 0 19:51:18 ?? 0:00 -sh
guest 9800 9774 0 19:51:18 ?? 0:00 -sh
8) As the system administrator, stop the CPU DOS by killing all of the user guest's
processes.
[4] # pkill -u guest
9) Search the user guest's home directory for any files created in the last day.
[4] # find /export/home/guest -mtime -1
/export/home/guest
/export/home/guest/a
/export/home/guest/b
Task IV - Preventing CPU DOS
The purpose of this task is to configure the /etc/system file on the server to limit
the ammount of processes a user can take.
1) As root on the server, open up the kernel using the mdb command in read mode. The
adb utility is used for core dump analysis and information gathering on a live kernel.
All of the features of mdb are covered in "ST-350 - System Fault Analysis". The following
mdb command will display the current value for the maximun amoumt of user proceeses allowed
on the server.
# mdb -k
Loading modules: [ unix krtld genunix ip ufs_log nfs isp ipc random ptm logindmux ]
>maxuprc/D <-----Ask the kernel how many proccess a user can own.
maxuprc:
maxuprc: 7909 <-----The kernel reports that a user can own 7909 process table slots
>max_nprocs/D <-----Ask the kernel what the total process table size can be.
max_nprocs:
max_nprocs: 7914 <-----The kernel reports that the maximum table size is 7914. This means
that a user can reserve almost the entire process table
$q <----- exit adb
2) Since a regular user can consume an entire process table. Set a kernel tuning parameter
in the /etc/system file to limit the maximum user processes.
# vi /etc/system
<>
set maxuprc=100
3) Reboot the workstation.
# init 6
4) After the workstation has rebooted, verify with mdb that the kernel tuning setting worked.
# mdb -k
Loading modules: [ unix krtld genunix ip nfs ipc ptm logindmux ]
> maxuprc/D
maxuprc:
maxuprc: 100
>
5) Open three terminal windows. In the first terminal window, use the sar command
to monitor the process table size.
[1] # sar -v 1 1000
SunOS gabriel 5.8 Generic_108528-13 sun4u 08/01/02
18:48:57 proc-sz ov inod-sz ov file-sz ov lock-sz
18:48:58 42/7914 0 1430/33952 0 264/264 0 0/0
6) In the second terminal window, use the su command to assume the identity of guest. Run
the fork bomb.
[2] # su - guest
[2] $ id
uid=1001(guest) gid=10(staff)
[2] $ ./a
7) Observe the output in terminal window #1. Did the process table continue to grow or
did it level off?
sar -v 1 1000
SunOS gabriel 5.8 Generic_108528-13 sun4u 08/01/02
18:48:57 proc-sz ov inod-sz ov file-sz ov lock-sz
18:48:58 42/7914 0 1430/33952 0 264/264 0 0/0
18:48:59 42/7914 0 1430/33952 0 264/264 0 0/0
18:49:00 42/7914 0 1430/33952 0 264/264 0 0/0
18:49:01 141/7914 0 1430/33952 0 461/461 0 0/0
18:49:02 141/7914 0 1430/33952 0 461/461 0 0/0
18:49:03 141/7914 0 1430/33952 0 461/461 0 0/0
8) The "guest" user was limited to 100 processes by tuning the kernel.