Archive for the ‘regex’ Category

I wanted to create a simple yet flexible way to parse command line arguments in bash. I used case statement, and some expression expansion technique to read arguments in a simple manner. I find this very handy, and hoping you will find it useful in solving or simplifying your task as well. Whether it is a serious script or a quick hack, clean programming makes your script more efficient and also easier to understand.

usage() {
      echo -e "No command-line argument\n"
      echo "Usage: $0 <command line arguments>"
      echo "Arguments:"
      echo -e " --copy-from-hdfs\tcopy data set resides in HDFS"
      echo -e " --copy-to-s3\t\tcopy files to S3 in AWS"
      echo -e " --gzip\t\t\tcompress source files, recommended before sending data set to S3"
      echo -e " --remote-dir=\t\tpath to input directory (HDFS directory)"
      echo -e " --local-dir=\t\tlocal tmp directory (local directory)"
      echo -e " --s3-bucket-dir=\ts3 bucket directory in AWS"
      exit 1
}

# Check command line args
if [ -z $1 ]
 then
  usage
 else
 # Parsing commandline args
 for i in $*
 do
  case $i in
  -r=*|--remote-dir=*)
      #DM_DATA_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`  -- > this work but using expression expansion below is a much nicer and compact way 
      DM_DATA_DIR=${i#*=}
      ;;
  -l=*|--local-dir=*)
      #AMAZON_DATA_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`
      AMAZON_DATA_DIR=${i#*=}
      ;;
  -s3=*|--s3-bucket-dir=*)
      #S3_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`
      S3_DIR=${i#*=}
      ;;
  --copy-from-hdfs)
      COPY_FROM_HDFS=YES
      ;;
  --copy-to-s3)
      COPY_TO_S3=YES
      ;;
  -c|--gzip)
      COMPRESS=YES
      ;;
           *)
      # Unknown option
      ;;
   esac
 done

Thoughts, and suggestions are welcome!

<TAG\b[^>]*>(.*?)</TAG> matches the opening and closing pair of a specific HTML tag. Anything between the tags is captured into the first backreference. The question mark in the regex makes the star lazy, to make sure it stops before the first closing tag rather than before the last, like a greedy star would do.