Archive for the ‘Unix’ Category

I measured the memory bandwidth of a server on using the popular STREAM benchmark tool. Compiled the STREAM code with the working array size set to 500 MB. The number of threads accessing the memory was determined and controlled by setting the environment variable OMP_NUM_THREADS to 1, 2, 5, 10, 20, 30  and 50.

To compile STREAM, I used the following compile command:

gcc -m64 -mcmodel=medium -O -fopenmp stream.c \
-DSTREAM_ARRAY_SIZE=500000000 -DNTIMES=[10-1000] \
-o stream_multi_threaded_500MB_[10-1000]TIMES

# Compiled Stream packages
# stream_multi_threaded_500MB_10TIMES
# stream_multi_threaded_500MB_100TIMES
# stream_multi_threaded_500MB_1000TIMES

Above, I compiled multiple versions of STREAM so can see the effect of various iterations from 10 to 1000.  Then I created, wrapper bash script for STREAM to execute and collect its output:

#!/bin/bash
#################################################
# STREAM Harness to analyze memory bandwidth
#################################################
bench_home=/$USER/stream
out_home=$bench_home/out
bench_exec=stream_multi_threaded_500MB_1000TIMES
host=`hostname`
echo "Running Test: $bench_exec"
# Timer
elapsed() {
   (( seconds  = SECONDS ))
   "$@"
   (( seconds = SECONDS - seconds ))
   (( etime_seconds = seconds % 60 ))
   (( etime_minuts  = ( seconds - etime_seconds ) / 60 % 60 ))
   (( etime_hours   = seconds / 3600 ))
   (( verif = etime_seconds + (etime_minuts * 60) + (etime_hours * 3600) ))
   echo "Elapsed time: ${etime_hours}h ${etime_minuts}m ${etime_seconds}s"
 }
mem_stream() {
 for n in 1 2 5 10 20 30 50
  do
   export OMP_NUM_THREADS=$n
   $bench_home/$bench_exec  > $out_home/$host.memory.$n.txt
   echo "Thread $OMP_NUM_THREADS complete"
  done
}
# Main
elapsed mem_stream
exit 0

Sample result – ADD:

Hope this writeup will help you get on the right path with STREAM.

While looking at some threading related issue the other day, I used the following commands for diagnostics.

Collecting paging activity information

To collect paging data, use the following command:

vmstat {time_between_samples_in_seconds} {number_of_samples} \
> vmstat.txt
vmstat 10 10 > vmstat.txt

If you start vmstat when the problem occurs, a value of 10 for time_between_samples_in_seconds and 10 for number_of_samples usually ensures that enough data is collected during the problem. Collect the vmstat.txt file 100 seconds later.

Collecting system CPU usage information

You can gather CPU usage information using the following command:

top -b > top.txt

You can then collect the top.txt file.

Collecting process CPU usage information
Gather process and thread-level CPU activity information at the point at which the problem occurs, using the following command:

top -H -b -c > top_threads.txt
cat top_threads.txt

top - 06:22:10 up 192 days, 19:00,  1 user,  load average: 0.00, 0.00, 0.00
Tasks: 542 total,   1 running, 541 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  0.0%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  164936028k total, 160272700k used,  4663328k free,        0k buffers
Swap:        0k total,        0k used,        0k free, 64188236k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24741 xxx      22   0 xxxg  xxxg  11m S  0.0 50.0   0:00.00 java

or if you would like to look into a specific process using PID then issue

top -H -b -c -p <pid> > top_threads_pid.txt

Allow this command to run for a short time. It produces a file called top_threads_pid.txt.

Is a certain process running your CPU right into the ground? How do you find said process without picking your way through the ps aux results? With this command:

ps -e -o pcpu,cpu,nice,state,cputime,args --sort pcpu | sed '/^ 0.0 /d'

…at which point you can kill it with sudo kill -9.

To get a list of all running processes, enter the command “ps auxw”. You might also want to try using “ps auxf” (or “ps auxfw” if the lines get truncated) – this prints everything in a nice tree format that may give you a better understanding of how and why things are running.

To get a complete listing of all listening network services using netstat, enter: netstat -altpu

You can also get similar information using lsof by entering: lsof -i | egrep -i ‘LISTEN|UDP’

Use the software cd 1 of 2 to recover root password:

1
2
3
4
5
6
7
8
9
10
11
12
boot cdrom -s from ok prompt
 
mkdir /tmp/a
 
mount /dev/dsk/c0t0d0s0 /tmp/a
 
cd /tmp/a/etc
 
TERM=vt100; export TERM
 
vi shadow

Delete passwd entry (13 chars) in the line for root.