Archive for the ‘Linux’ Category

I measured the memory bandwidth of a server using the popular STREAM benchmark tool. Compiled the STREAM code with the working array size set to 500 MB. The number of threads accessing the memory was determined and controlled by setting the environment variable OMP_NUM_THREADS to 1, 2, 5, 10, 20, 30  and 50.

To compile STREAM, I used the following compile command:

gcc -m64 -mcmodel=medium -O -fopenmp stream.c \
-DSTREAM_ARRAY_SIZE=500000000 -DNTIMES=[10-1000] \
-o stream_multi_threaded_500MB_[10-1000]TIMES

# Compiled Stream packages
# stream_multi_threaded_500MB_10TIMES
# stream_multi_threaded_500MB_100TIMES
# stream_multi_threaded_500MB_1000TIMES

Above, I compiled multiple versions of STREAM so can see the effect of various iterations from 10 to 1000.  Then I created, wrapper bash script for STREAM to execute and collect its output:

#!/bin/bash
#################################################
# STREAM Harness to analyze memory bandwidth
#################################################
bench_home=/$USER/stream
out_home=$bench_home/out
bench_exec=stream_multi_threaded_500MB_1000TIMES
host=`hostname`
echo "Running Test: $bench_exec"
# Timer
elapsed() {
   (( seconds  = SECONDS ))
   "$@"
   (( seconds = SECONDS - seconds ))
   (( etime_seconds = seconds % 60 ))
   (( etime_minuts  = ( seconds - etime_seconds ) / 60 % 60 ))
   (( etime_hours   = seconds / 3600 ))
   (( verif = etime_seconds + (etime_minuts * 60) + (etime_hours * 3600) ))
   echo "Elapsed time: ${etime_hours}h ${etime_minuts}m ${etime_seconds}s"
 }
mem_stream() {
 for n in 1 2 5 10 20 30 50
  do
   export OMP_NUM_THREADS=$n
   $bench_home/$bench_exec  > $out_home/$host.memory.$n.txt
   echo "Thread $OMP_NUM_THREADS complete"
  done
}
# Main
elapsed mem_stream
exit 0

Sample result – ADD:

Hope this write-up will help you get on the right path with STREAM.

Often times a few lines of bash script can go a long way. I have been using the following few lines and its variations for many years, and saved me a lot of time last week while trying to address a task I needed to perform in high volume. All it does, it reads comma-separated values from an input file, and performs operations to address task(s). I believe when there is a task that is being repeated more than once, the task needs to be automated. Besides, I have never been a typing wiz. In the example below, I have a simplified script (skeleton) to read any value(s) from input file and perform a task to echo the values read in from an input file. This input is read in as an argument to the script.

#!/bin/bash
###################################
# Takes param1 and param2 from file
# csv file:
###################################
# Set variables
INPUT_FILE=$1
OLDIFS=$IFS
IFS=,

[ ! -f $INPUT_FILE ] && { echo "$INPUT_FILE file not found"; exit 1; }
while read dbhost dbname
do
echo "DB Host : ${dbhost} DB Name : ${dbname}"
# Do parametrized command here
# Add login for processing below....

# Conclude your processing and add error handling as needed
echo "Task complete for ${dbhost}"
done < $INPUT_FILE
IFS=$OLDIFS

exit 0

Paste the above into a script, set the file permission to be executable and have fun.

chmod +x readfile.sh
./readfile.sh input.csv

If you are a Linux sysadmin or sysadmin "wannabe", dba or just a software developer .. or just tired of typing and want do to things on scale, this simple script will save you a lot of time!

While looking at some threading related issue the other day, I used the following commands for diagnostics.

Collecting paging activity information

To collect paging data, use the following command:

vmstat {time_between_samples_in_seconds} {number_of_samples} \
> vmstat.txt
vmstat 10 10 > vmstat.txt

If you start vmstat when the problem occurs, a value of 10 for time_between_samples_in_seconds and 10 for number_of_samples usually ensures that enough data is collected during the problem. Collect the vmstat.txt file 100 seconds later.

Collecting system CPU usage information

You can gather CPU usage information using the following command:

top -b > top.txt

You can then collect the top.txt file.

Collecting process CPU usage information
Gather process and thread-level CPU activity information at the point at which the problem occurs, using the following command:

top -H -b -c > top_threads.txt
cat top_threads.txt

top - 06:22:10 up 192 days, 19:00,  1 user,  load average: 0.00, 0.00, 0.00
Tasks: 542 total,   1 running, 541 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  0.0%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  164936028k total, 160272700k used,  4663328k free,        0k buffers
Swap:        0k total,        0k used,        0k free, 64188236k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24741 xxx      22   0 xxxg  xxxg  11m S  0.0 50.0   0:00.00 java

or if you would like to look into a specific process using PID then issue

top -H -b -c -p  > top_threads_pid.txt

Allow this command to run for a short time. It produces a file called top_threads_pid.txt.

s3fs is a open-source project which lets you mount your S3 storage locally to have access to your files at the system level so that you could actually work with them. I use this method to mount S3 buckets on my EC2 instances. Below, I go through the installation steps and also document some of the problems and their workarounds.

Dowload s3fs source code from to your EC2 instance and decompress it:

[ec2-user@ip-10-xx-xx-xxx ~]$ wget http://s3fs.googlecode.com/files/s3fs-1.63.tar.gz
--Make sure your libraries are installed/up-to-date
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo yum install gcc libstdc++-devel gcc-c++ fuse fuse-devel curl-devel libxml2-devel openssl-devel mailcap
[ec2-user@ip-10-xx-xx-xxx ~]$ cd s3fs-1.63
[ec2-user@ip-10-xx-xx-xxx ~]$ ./configure --prefix=/usr

At this point you might get the following exception indicating that s3fs requires a newer version of Fuse (http://fuse.sourceforge.net/).

configure: error: Package requirements (fuse >= 2.8.4 libcurl >= 7.0 libxml-2.0 >= 2.6 libcrypto >= 0.9) were not met:
Requested 'fuse >= 2.8.4' but version of fuse is 2.8.3
Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.

Alternatively, you may set the environment variables DEPS_CFLAGS
and DEPS_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details.

Follow the steps to upgrade your Fuse posted at http://fuse.sourceforge.net/.

[ec2-user@ip-10-xx-xx-xxx ~]$ wget http://downloads.sourceforge.net/project/fuse/fuse-2.X/2.8.4/fuse-2.8.4.tar.gz
[ec2-user@ip-10-xx-xx-xxx ~]$ tar -xvf fuse-2.8.4.tar.gz
[ec2-user@ip-10-xx-xx-xxx ~]$ cd fuse-2.8.4
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo  yum -y install "gcc*" make libcurl-devel libxml2-devel openssl-devel
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo ./configure --prefix=/usr
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo make && sudo make install
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo ldconfig
--Verify that the new version is now in place
[ec2-user@ip-10-xx-xx-xxx ~]$ pkg-config --modversion fuse
2.8.3

Now we can return to our s3fs installation step to add the AWS credentials in the following format: AWS Access Key:Secret Key

[ec2-user@ip-10-xx-xx-xxx ~]$ sudo vi /etc/passwd-s3fs
-- Set file permission
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo chmod 640 /etc/passwd-s3fs

Now you should be able to successfully mount your AWS S3 bucket onto your local folder as such:

[ec2-user@ip-10-xx-xx-xxx ~]$ sudo s3fs  

That is about it and thanks for reading!

I wanted to create a simple yet flexible way to parse command line arguments in bash. I used case statement, and some expression expansion technique to read arguments in a simple manner. I find this very handy, and hoping you will find it useful in solving or simplifying your task as well. Whether it is a serious script or a quick hack, clean programming makes your script more efficient and also easier to understand.

usage() {
      echo -e "No command-line argument\n"
      echo "Usage: $0 "
      echo "Arguments:"
      echo -e " --copy-from-hdfs\tcopy data set resides in HDFS"
      echo -e " --copy-to-s3\t\tcopy files to S3 in AWS"
      echo -e " --gzip\t\t\tcompress source files, recommended before sending data set to S3"
      echo -e " --remote-dir=\t\tpath to input directory (HDFS directory)"
      echo -e " --local-dir=\t\tlocal tmp directory (local directory)"
      echo -e " --s3-bucket-dir=\ts3 bucket directory in AWS"
      exit 1
}

# Check command line args
if [ -z $1 ]
 then
  usage
 else
 # Parsing commandline args
 for i in $*
 do
  case $i in
  -r=*|--remote-dir=*)
      #DM_DATA_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`  -- > this work but using expression expansion below is a much nicer and compact way 
      DM_DATA_DIR=${i#*=}
      ;;
  -l=*|--local-dir=*)
      #AMAZON_DATA_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`
      AMAZON_DATA_DIR=${i#*=}
      ;;
  -s3=*|--s3-bucket-dir=*)
      #S3_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`
      S3_DIR=${i#*=}
      ;;
  --copy-from-hdfs)
      COPY_FROM_HDFS=YES
      ;;
  --copy-to-s3)
      COPY_TO_S3=YES
      ;;
  -c|--gzip)
      COMPRESS=YES
      ;;
           *)
      # Unknown option
      ;;
   esac
 done

Thoughts, and suggestions are welcome!

1) start import of data.sql into a dummy db when both instances are running
2) pt-stalk –collect –collect-oprofile –no-stalk for the duration of the import

  • oprofile will show where MySQL spends most of its time during the import

3) run pt-diskstats -g all –devices-regex sdb1 for the duration of the import
4) run poor-man-profiler for the duration of the import

How do I check RAM speed and type (line DDR or DDR2) without opening my computer? I need to purchase RAM and I need to know the exact speed and type installed. How do I find out ram information from a shell prompt?

$ sudo dmidecode --type 17

dmidecode is a tool for dumping a computer’s DMI (some say SMBIOS) table contents in a human-readable format. This table contains a description of the system’s hardware components, as well as other useful pieces of information such as serial numbers and BIOS revision. Thanks to this table, you can retrieve this information without having to probe for the actual hardware. While this is a good point in terms of report speed and safeness, this also makes the presented information possibly unreliable.

Is a certain process running your CPU right into the ground? How do you find said process without picking your way through the ps aux results? With this command:

ps -e -o pcpu,cpu,nice,state,cputime,args --sort pcpu | sed '/^ 0.0 /d'

…at which point you can kill it with sudo kill -9.

./mysqld_safe --user=mysql --basedir=/usr/local/mysql-5.0.67-linux-x86_64-icc-glibc23
--ledir=/usr/local/mysql-5.0.67-linux-x86_64-icc-glibc23/bin --mysqld=mysqld
./mysqladmin ext -u root -p -ri60
./mysqladmin ext -u root -p -ri60 | grep tmp

Procedure to add a swap file

You need to use dd command to create swapfile. Next you need to use mkswap command to set up a Linux swap area on a device or in a file.

a) Login as the root user

b) Type following command to create 512MB swap file (1024 * 512MB = 524288 block size):

# dd if=/dev/zero of=/swapfile1 bs=1024 count=524288

c) Set up a Linux swap area:

# mkswap /swapfile1

d) Activate /swapfile1 swap space immediately:

# swapon /swapfile1

e) To activate /swapfile1 after Linux system reboot, add entry to /etc/fstab file. Open this file using text editor such as vi:

# vi /etc/fstab

Append following line:

/swapfile1 swap swap defaults 0 0

So next time Linux comes up after reboot, it enables the new swap file for you automatically.

g) How do I verify swap is activated or not?

Simply use free command:

$ free -m