Archive for the ‘Bash’ Category

I measured the memory bandwidth of a server using the popular STREAM benchmark tool. Compiled the STREAM code with the working array size set to 500 MB. The number of threads accessing the memory was determined and controlled by setting the environment variable OMP_NUM_THREADS to 1, 2, 5, 10, 20, 30  and 50.

To compile STREAM, I used the following compile command:

gcc -m64 -mcmodel=medium -O -fopenmp stream.c \
-DSTREAM_ARRAY_SIZE=500000000 -DNTIMES=[10-1000] \
-o stream_multi_threaded_500MB_[10-1000]TIMES

# Compiled Stream packages
# stream_multi_threaded_500MB_10TIMES
# stream_multi_threaded_500MB_100TIMES
# stream_multi_threaded_500MB_1000TIMES

Above, I compiled multiple versions of STREAM so can see the effect of various iterations from 10 to 1000.  Then I created, wrapper bash script for STREAM to execute and collect its output:

#!/bin/bash
#################################################
# STREAM Harness to analyze memory bandwidth
#################################################
bench_home=/$USER/stream
out_home=$bench_home/out
bench_exec=stream_multi_threaded_500MB_1000TIMES
host=`hostname`
echo "Running Test: $bench_exec"
# Timer
elapsed() {
   (( seconds  = SECONDS ))
   "$@"
   (( seconds = SECONDS - seconds ))
   (( etime_seconds = seconds % 60 ))
   (( etime_minuts  = ( seconds - etime_seconds ) / 60 % 60 ))
   (( etime_hours   = seconds / 3600 ))
   (( verif = etime_seconds + (etime_minuts * 60) + (etime_hours * 3600) ))
   echo "Elapsed time: ${etime_hours}h ${etime_minuts}m ${etime_seconds}s"
 }
mem_stream() {
 for n in 1 2 5 10 20 30 50
  do
   export OMP_NUM_THREADS=$n
   $bench_home/$bench_exec  > $out_home/$host.memory.$n.txt
   echo "Thread $OMP_NUM_THREADS complete"
  done
}
# Main
elapsed mem_stream
exit 0

Sample result – ADD:

Hope this write-up will help you get on the right path with STREAM.

Often times a few lines of bash script can go a long way. I have been using the following few lines and its variations for many years, and saved me a lot of time last week while trying to address a task I needed to perform in high volume. All it does, it reads comma-separated values from an input file, and performs operations to address task(s). I believe when there is a task that is being repeated more than once, the task needs to be automated. Besides, I have never been a typing wiz. In the example below, I have a simplified script (skeleton) to read any value(s) from input file and perform a task to echo the values read in from an input file. This input is read in as an argument to the script.

#!/bin/bash
###################################
# Takes param1 and param2 from file
# csv file:
###################################
# Set variables
INPUT_FILE=$1
OLDIFS=$IFS
IFS=,

[ ! -f $INPUT_FILE ] && { echo "$INPUT_FILE file not found"; exit 1; }
while read dbhost dbname
do
echo "DB Host : ${dbhost} DB Name : ${dbname}"
# Do parametrized command here
# Add login for processing below....

# Conclude your processing and add error handling as needed
echo "Task complete for ${dbhost}"
done < $INPUT_FILE
IFS=$OLDIFS

exit 0

Paste the above into a script, set the file permission to be executable and have fun.

chmod +x readfile.sh
./readfile.sh input.csv

If you are a Linux sysadmin or sysadmin "wannabe", dba or just a software developer .. or just tired of typing and want do to things on scale, this simple script will save you a lot of time!

I wanted to create a simple yet flexible way to parse command line arguments in bash. I used case statement, and some expression expansion technique to read arguments in a simple manner. I find this very handy, and hoping you will find it useful in solving or simplifying your task as well. Whether it is a serious script or a quick hack, clean programming makes your script more efficient and also easier to understand.

usage() {
      echo -e "No command-line argument\n"
      echo "Usage: $0 "
      echo "Arguments:"
      echo -e " --copy-from-hdfs\tcopy data set resides in HDFS"
      echo -e " --copy-to-s3\t\tcopy files to S3 in AWS"
      echo -e " --gzip\t\t\tcompress source files, recommended before sending data set to S3"
      echo -e " --remote-dir=\t\tpath to input directory (HDFS directory)"
      echo -e " --local-dir=\t\tlocal tmp directory (local directory)"
      echo -e " --s3-bucket-dir=\ts3 bucket directory in AWS"
      exit 1
}

# Check command line args
if [ -z $1 ]
 then
  usage
 else
 # Parsing commandline args
 for i in $*
 do
  case $i in
  -r=*|--remote-dir=*)
      #DM_DATA_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`  -- > this work but using expression expansion below is a much nicer and compact way 
      DM_DATA_DIR=${i#*=}
      ;;
  -l=*|--local-dir=*)
      #AMAZON_DATA_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`
      AMAZON_DATA_DIR=${i#*=}
      ;;
  -s3=*|--s3-bucket-dir=*)
      #S3_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`
      S3_DIR=${i#*=}
      ;;
  --copy-from-hdfs)
      COPY_FROM_HDFS=YES
      ;;
  --copy-to-s3)
      COPY_TO_S3=YES
      ;;
  -c|--gzip)
      COMPRESS=YES
      ;;
           *)
      # Unknown option
      ;;
   esac
 done

Thoughts, and suggestions are welcome!