Archive for the ‘AWS’ Category

s3fs is a open-source project which lets you mount your S3 storage locally to have access to your files at the system level so that you could actually work with them. I use this method to mount S3 buckets on my EC2 instances. Below, I go through the installation steps and also document some of the problems and their workarounds.

Dowload s3fs source code from to your EC2 instance and decompress it:

[ec2-user@ip-10-xx-xx-xxx ~]$ wget http://s3fs.googlecode.com/files/s3fs-1.63.tar.gz
--Make sure your libraries are installed/up-to-date
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo yum install gcc libstdc++-devel gcc-c++ fuse fuse-devel curl-devel libxml2-devel openssl-devel mailcap
[ec2-user@ip-10-xx-xx-xxx ~]$ cd s3fs-1.63
[ec2-user@ip-10-xx-xx-xxx ~]$ ./configure --prefix=/usr

At this point you might get the following exception indicating that s3fs requires a newer version of Fuse (http://fuse.sourceforge.net/).

configure: error: Package requirements (fuse >= 2.8.4 libcurl >= 7.0 libxml-2.0 >= 2.6 libcrypto >= 0.9) were not met:
Requested 'fuse >= 2.8.4' but version of fuse is 2.8.3
Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.

Alternatively, you may set the environment variables DEPS_CFLAGS
and DEPS_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details.

Follow the steps to upgrade your Fuse posted at http://fuse.sourceforge.net/.

[ec2-user@ip-10-xx-xx-xxx ~]$ wget http://downloads.sourceforge.net/project/fuse/fuse-2.X/2.8.4/fuse-2.8.4.tar.gz
[ec2-user@ip-10-xx-xx-xxx ~]$ tar -xvf fuse-2.8.4.tar.gz
[ec2-user@ip-10-xx-xx-xxx ~]$ cd fuse-2.8.4
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo  yum -y install "gcc*" make libcurl-devel libxml2-devel openssl-devel
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo ./configure --prefix=/usr
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo make && sudo make install
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo ldconfig
--Verify that the new version is now in place
[ec2-user@ip-10-xx-xx-xxx ~]$ pkg-config --modversion fuse
2.8.3

Now we can return to our s3fs installation step to add the AWS credentials in the following format: AWS Access Key:Secret Key

[ec2-user@ip-10-xx-xx-xxx ~]$ sudo vi /etc/passwd-s3fs
-- Set file permission
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo chmod 640 /etc/passwd-s3fs

Now you should be able to successfully mount your AWS S3 bucket onto your local folder as such:

[ec2-user@ip-10-xx-xx-xxx ~]$ sudo s3fs  

That is about it and thanks for reading!

I wanted to create a simple yet flexible way to parse command line arguments in bash. I used case statement, and some expression expansion technique to read arguments in a simple manner. I find this very handy, and hoping you will find it useful in solving or simplifying your task as well. Whether it is a serious script or a quick hack, clean programming makes your script more efficient and also easier to understand.

usage() {
      echo -e "No command-line argument\n"
      echo "Usage: $0 "
      echo "Arguments:"
      echo -e " --copy-from-hdfs\tcopy data set resides in HDFS"
      echo -e " --copy-to-s3\t\tcopy files to S3 in AWS"
      echo -e " --gzip\t\t\tcompress source files, recommended before sending data set to S3"
      echo -e " --remote-dir=\t\tpath to input directory (HDFS directory)"
      echo -e " --local-dir=\t\tlocal tmp directory (local directory)"
      echo -e " --s3-bucket-dir=\ts3 bucket directory in AWS"
      exit 1
}

# Check command line args
if [ -z $1 ]
 then
  usage
 else
 # Parsing commandline args
 for i in $*
 do
  case $i in
  -r=*|--remote-dir=*)
      #DM_DATA_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`  -- > this work but using expression expansion below is a much nicer and compact way 
      DM_DATA_DIR=${i#*=}
      ;;
  -l=*|--local-dir=*)
      #AMAZON_DATA_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`
      AMAZON_DATA_DIR=${i#*=}
      ;;
  -s3=*|--s3-bucket-dir=*)
      #S3_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`
      S3_DIR=${i#*=}
      ;;
  --copy-from-hdfs)
      COPY_FROM_HDFS=YES
      ;;
  --copy-to-s3)
      COPY_TO_S3=YES
      ;;
  -c|--gzip)
      COMPRESS=YES
      ;;
           *)
      # Unknown option
      ;;
   esac
 done

Thoughts, and suggestions are welcome!