Prometheus provides great short-term metrics storage for the Grafana stack. However, it has its scalability limitations. Using remoteWrite to transition to a new architecture while preserving the existing flow of metrics is a great option for some.

When enabled, remote write increases the memory footprint of Prometheus. On average the increase is about 25%, but that number depends on the shape of the data. It is because each remote write destination starts a queue which reads from the write-ahead log (WAL), writes the samples into an in memory queue owned by a shard, which then sends a request to the configured end-point.

       |--> queue (shard 1) ---> remote end-point
WAL ---|--> queue (shard 2) ---> remote end-point
       |--> queue (shard n) ---> remote end-point

Below example is in Helm and demonstrates how to enable remoteWrite.

  remoteWrite:
    - url: https://<target end-point e.g. mimir>/api/v1/push
      basic_auth:
        username: <user>
        password: <password>
      tls_config:
        insecure_skip_verify: false

It is through improving our ability to deliver software that organizations can deliver features faster, pivot when needed, respond to compliance and security changes, and take advantage of fast feedback to attract new customers and delight existing ones.

Forsgren PhD, Nicole. Accelerate (p. 31).

I measured the memory bandwidth of a server using the popular STREAM benchmark tool. Compiled the STREAM code with the working array size set to 500 MB. The number of threads accessing the memory was determined and controlled by setting the environment variable OMP_NUM_THREADS to 1, 2, 5, 10, 20, 30  and 50.

To compile STREAM, I used the following compile command:

gcc -m64 -mcmodel=medium -O -fopenmp stream.c \
-DSTREAM_ARRAY_SIZE=500000000 -DNTIMES=[10-1000] \
-o stream_multi_threaded_500MB_[10-1000]TIMES

# Compiled Stream packages
# stream_multi_threaded_500MB_10TIMES
# stream_multi_threaded_500MB_100TIMES
# stream_multi_threaded_500MB_1000TIMES

Above, I compiled multiple versions of STREAM so can see the effect of various iterations from 10 to 1000.  Then I created, wrapper bash script for STREAM to execute and collect its output:

#!/bin/bash
#################################################
# STREAM Harness to analyze memory bandwidth
#################################################
bench_home=/$USER/stream
out_home=$bench_home/out
bench_exec=stream_multi_threaded_500MB_1000TIMES
host=`hostname`
echo "Running Test: $bench_exec"
# Timer
elapsed() {
   (( seconds  = SECONDS ))
   "$@"
   (( seconds = SECONDS - seconds ))
   (( etime_seconds = seconds % 60 ))
   (( etime_minuts  = ( seconds - etime_seconds ) / 60 % 60 ))
   (( etime_hours   = seconds / 3600 ))
   (( verif = etime_seconds + (etime_minuts * 60) + (etime_hours * 3600) ))
   echo "Elapsed time: ${etime_hours}h ${etime_minuts}m ${etime_seconds}s"
 }
mem_stream() {
 for n in 1 2 5 10 20 30 50
  do
   export OMP_NUM_THREADS=$n
   $bench_home/$bench_exec  > $out_home/$host.memory.$n.txt
   echo "Thread $OMP_NUM_THREADS complete"
  done
}
# Main
elapsed mem_stream
exit 0

Sample result – ADD:

Hope this write-up will help you get on the right path with STREAM.

Often times a few lines of bash script can go a long way. I have been using the following few lines and its variations for many years, and saved me a lot of time last week while trying to address a task I needed to perform in high volume. All it does, it reads comma-separated values from an input file, and performs operations to address task(s). I believe when there is a task that is being repeated more than once, the task needs to be automated. Besides, I have never been a typing wiz. In the example below, I have a simplified script (skeleton) to read any value(s) from input file and perform a task to echo the values read in from an input file. This input is read in as an argument to the script.

#!/bin/bash
###################################
# Takes param1 and param2 from file
# csv file:
###################################
# Set variables
INPUT_FILE=$1
OLDIFS=$IFS
IFS=,

[ ! -f $INPUT_FILE ] && { echo "$INPUT_FILE file not found"; exit 1; }
while read dbhost dbname
do
echo "DB Host : ${dbhost} DB Name : ${dbname}"
# Do parametrized command here
# Add login for processing below....

# Conclude your processing and add error handling as needed
echo "Task complete for ${dbhost}"
done < $INPUT_FILE
IFS=$OLDIFS

exit 0

Paste the above into a script, set the file permission to be executable and have fun.

chmod +x readfile.sh
./readfile.sh input.csv

If you are a Linux sysadmin or sysadmin "wannabe", dba or just a software developer .. or just tired of typing and want do to things on scale, this simple script will save you a lot of time!

While looking at some threading related issue the other day, I used the following commands for diagnostics.

Collecting paging activity information

To collect paging data, use the following command:

vmstat {time_between_samples_in_seconds} {number_of_samples} \
> vmstat.txt
vmstat 10 10 > vmstat.txt

If you start vmstat when the problem occurs, a value of 10 for time_between_samples_in_seconds and 10 for number_of_samples usually ensures that enough data is collected during the problem. Collect the vmstat.txt file 100 seconds later.

Collecting system CPU usage information

You can gather CPU usage information using the following command:

top -b > top.txt

You can then collect the top.txt file.

Collecting process CPU usage information
Gather process and thread-level CPU activity information at the point at which the problem occurs, using the following command:

top -H -b -c > top_threads.txt
cat top_threads.txt

top - 06:22:10 up 192 days, 19:00,  1 user,  load average: 0.00, 0.00, 0.00
Tasks: 542 total,   1 running, 541 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us,  0.0%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  164936028k total, 160272700k used,  4663328k free,        0k buffers
Swap:        0k total,        0k used,        0k free, 64188236k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24741 xxx      22   0 xxxg  xxxg  11m S  0.0 50.0   0:00.00 java

or if you would like to look into a specific process using PID then issue

top -H -b -c -p  > top_threads_pid.txt

Allow this command to run for a short time. It produces a file called top_threads_pid.txt.

I had a task the other day where I had 110GB of compressed log files and wanted to import into Impala (Cloudera). Currently, Impala does not support compressed files so I had to decompress them all. I created this handy script and thought you might find it useful. I mounted the EC2 bucket using s3fs I mentioned in my earlier post.

#!/bin/bash
# Utils
elapsed()
{
   (( seconds  = SECONDS ))
   "$@"
   (( seconds = SECONDS - seconds ))
   (( etime_seconds = seconds % 60 ))
   (( etime_minuts  = ( seconds - etime_seconds ) / 60 % 60 ))
   (( etime_hours   = seconds / 3600 ))
   (( verif = etime_seconds + (etime_minuts * 60) + (etime_hours * 3600) ))

   echo "Elapsed time: ${etime_hours}h ${etime_minuts}m ${etime_seconds}s"
 }

convert()
{
# Remove the .gz extention from the compressed file name
UFILE=`echo ${FILE:0:${#FILE}-3}`

# Decompress gz file
sudo -u hdfs hdfs dfs -cat /user/hdfs/oms/logs/$FILE | \ 
sudo -u hdfs gunzip -d | sudo -u hdfs hdfs dfs -put - /user/hdfs/oms/logs/$UFILE

# Discard original gz file
sudo -u hdfs hdfs dfs -rm -skipTrash /user/hdfs/oms/logs/$FILE
sudo -u hdfs hdfs dfs -ls /user/hdfs/oms/logs/$UFILE
}

for FILE in `ls /media/ephemeral0/logs/`
  do
    elapsed convert $FILE
    echo "Decompressed $FILE to $UFILE on hdfs"
  done

exit 0

s3fs is a open-source project which lets you mount your S3 storage locally to have access to your files at the system level so that you could actually work with them. I use this method to mount S3 buckets on my EC2 instances. Below, I go through the installation steps and also document some of the problems and their workarounds.

Dowload s3fs source code from to your EC2 instance and decompress it:

[ec2-user@ip-10-xx-xx-xxx ~]$ wget http://s3fs.googlecode.com/files/s3fs-1.63.tar.gz
--Make sure your libraries are installed/up-to-date
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo yum install gcc libstdc++-devel gcc-c++ fuse fuse-devel curl-devel libxml2-devel openssl-devel mailcap
[ec2-user@ip-10-xx-xx-xxx ~]$ cd s3fs-1.63
[ec2-user@ip-10-xx-xx-xxx ~]$ ./configure --prefix=/usr

At this point you might get the following exception indicating that s3fs requires a newer version of Fuse (http://fuse.sourceforge.net/).

configure: error: Package requirements (fuse >= 2.8.4 libcurl >= 7.0 libxml-2.0 >= 2.6 libcrypto >= 0.9) were not met:
Requested 'fuse >= 2.8.4' but version of fuse is 2.8.3
Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.

Alternatively, you may set the environment variables DEPS_CFLAGS
and DEPS_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details.

Follow the steps to upgrade your Fuse posted at http://fuse.sourceforge.net/.

[ec2-user@ip-10-xx-xx-xxx ~]$ wget http://downloads.sourceforge.net/project/fuse/fuse-2.X/2.8.4/fuse-2.8.4.tar.gz
[ec2-user@ip-10-xx-xx-xxx ~]$ tar -xvf fuse-2.8.4.tar.gz
[ec2-user@ip-10-xx-xx-xxx ~]$ cd fuse-2.8.4
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo  yum -y install "gcc*" make libcurl-devel libxml2-devel openssl-devel
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo ./configure --prefix=/usr
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo make && sudo make install
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo ldconfig
--Verify that the new version is now in place
[ec2-user@ip-10-xx-xx-xxx ~]$ pkg-config --modversion fuse
2.8.3

Now we can return to our s3fs installation step to add the AWS credentials in the following format: AWS Access Key:Secret Key

[ec2-user@ip-10-xx-xx-xxx ~]$ sudo vi /etc/passwd-s3fs
-- Set file permission
[ec2-user@ip-10-xx-xx-xxx ~]$ sudo chmod 640 /etc/passwd-s3fs

Now you should be able to successfully mount your AWS S3 bucket onto your local folder as such:

[ec2-user@ip-10-xx-xx-xxx ~]$ sudo s3fs  

That is about it and thanks for reading!

Often time I come across situations where server comes under high CPU load. There is a simple way to find out which application thread(s) are responsible for the load. To get the thread which has the highest CPU load, issue:

ps -mo pid,lwp,stime,time,cpu -p 

LWP stands for Light-Weight Process and typically refers to kernel threads. To identify the thread, get the LWP with highest CPU load, and convert its unique number (xxxx) into a hexadecimal number (0xxxx).

Get the java stack dump using jstack -l command and find the thread based on the hex humber identified above.

I wanted to create a simple yet flexible way to parse command line arguments in bash. I used case statement, and some expression expansion technique to read arguments in a simple manner. I find this very handy, and hoping you will find it useful in solving or simplifying your task as well. Whether it is a serious script or a quick hack, clean programming makes your script more efficient and also easier to understand.

usage() {
      echo -e "No command-line argument\n"
      echo "Usage: $0 "
      echo "Arguments:"
      echo -e " --copy-from-hdfs\tcopy data set resides in HDFS"
      echo -e " --copy-to-s3\t\tcopy files to S3 in AWS"
      echo -e " --gzip\t\t\tcompress source files, recommended before sending data set to S3"
      echo -e " --remote-dir=\t\tpath to input directory (HDFS directory)"
      echo -e " --local-dir=\t\tlocal tmp directory (local directory)"
      echo -e " --s3-bucket-dir=\ts3 bucket directory in AWS"
      exit 1
}

# Check command line args
if [ -z $1 ]
 then
  usage
 else
 # Parsing commandline args
 for i in $*
 do
  case $i in
  -r=*|--remote-dir=*)
      #DM_DATA_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`  -- > this work but using expression expansion below is a much nicer and compact way 
      DM_DATA_DIR=${i#*=}
      ;;
  -l=*|--local-dir=*)
      #AMAZON_DATA_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`
      AMAZON_DATA_DIR=${i#*=}
      ;;
  -s3=*|--s3-bucket-dir=*)
      #S3_DIR=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`
      S3_DIR=${i#*=}
      ;;
  --copy-from-hdfs)
      COPY_FROM_HDFS=YES
      ;;
  --copy-to-s3)
      COPY_TO_S3=YES
      ;;
  -c|--gzip)
      COMPRESS=YES
      ;;
           *)
      # Unknown option
      ;;
   esac
 done

Thoughts, and suggestions are welcome!

Set innodb_stats_on_metadata=0 which will prevent statistic update when you query information_schema.

mysql> select count(*),sum(data_length) from information_schema.tables;
+----------+------------------+
| count(*) | sum(data_length) |
+----------+------------------+
|     5581 |    3051148872493 |
+----------+------------------+
1 row in set (3 min 21.82 sec)
mysql> show variables like '%metadata'
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_stats_on_metadata | ON    |
+--------------------------+-------+
mysql> set global innodb_stats_on_metadata=0;
mysql> show variables like '%metadata'
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_stats_on_metadata | OFF   |
+--------------------------+-------+

mysql> select count(*),sum(data_length) from information_schema.tables;
+----------+------------------+
| count(*) | sum(data_length) |
+----------+------------------+
|     5581 |    3051148872493 |
+----------+------------------+
1 row in set (0.49 sec)