Archive for the ‘Performance’ Category

I measured the memory bandwidth of a server using the popular STREAM benchmark tool. Compiled the STREAM code with the working array size set to 500 MB. The number of threads accessing the memory was determined and controlled by setting the environment variable OMP_NUM_THREADS to 1, 2, 5, 10, 20, 30  and 50.

To compile STREAM, I used the following compile command:

gcc -m64 -mcmodel=medium -O -fopenmp stream.c \
-DSTREAM_ARRAY_SIZE=500000000 -DNTIMES=[10-1000] \
-o stream_multi_threaded_500MB_[10-1000]TIMES

# Compiled Stream packages
# stream_multi_threaded_500MB_10TIMES
# stream_multi_threaded_500MB_100TIMES
# stream_multi_threaded_500MB_1000TIMES

Above, I compiled multiple versions of STREAM so can see the effect of various iterations from 10 to 1000.  Then I created, wrapper bash script for STREAM to execute and collect its output:

#!/bin/bash
#################################################
# STREAM Harness to analyze memory bandwidth
#################################################
bench_home=/$USER/stream
out_home=$bench_home/out
bench_exec=stream_multi_threaded_500MB_1000TIMES
host=`hostname`
echo "Running Test: $bench_exec"
# Timer
elapsed() {
   (( seconds  = SECONDS ))
   "$@"
   (( seconds = SECONDS - seconds ))
   (( etime_seconds = seconds % 60 ))
   (( etime_minuts  = ( seconds - etime_seconds ) / 60 % 60 ))
   (( etime_hours   = seconds / 3600 ))
   (( verif = etime_seconds + (etime_minuts * 60) + (etime_hours * 3600) ))
   echo "Elapsed time: ${etime_hours}h ${etime_minuts}m ${etime_seconds}s"
 }
mem_stream() {
 for n in 1 2 5 10 20 30 50
  do
   export OMP_NUM_THREADS=$n
   $bench_home/$bench_exec  > $out_home/$host.memory.$n.txt
   echo "Thread $OMP_NUM_THREADS complete"
  done
}
# Main
elapsed mem_stream
exit 0

Sample result – ADD:

Hope this write-up will help you get on the right path with STREAM.