The ‘paste’ command will merge multiple files line by line, and you can declare a delimiter between the files’ contents.

This would be really useful for, say, creating a CSV using the contents of multiple files. You could run the following command and instantly create a CSV.

$ paste -d',' column1.txt column2.txt

Another use I have found recently is to help generate large SQL create table statements. For instance, say that I have two files. File ‘file_a.txt’ has the following contents (SQL types).

INT
INT
STRING
DOUBLE
STRING
STRING

File ‘file_b.txt’ has the corresponding SQL column names.

width
length
name
cost
comment1
comment2

I can form a create table SQL statement in Bash very quickly using paste:

$ echo -e "CREATE TABLE my_table (\n$(paste -d' ' file_a.txt file_b.txt))"
CREATE TABLE my_table (
INT width
INT length
STRING name
DOUBLE cost
STRING comment1
STRING comment2)

A lot of times, I’ll create an externally managed Hive table as a step toward constructing something better (e.g., a Parquet columnar snappy-compressed table created by Hive for use in Impala or Spark). The data for that table is often broken down by day. Instead of writing an interactive BASH command to iterate dates and create the nested directory structure, I wrote the following script.

For instance, I want a root directory in HDFS (say, “/user/jason/my_root_dir”) to have date directories for all days in 2014, such as:
- /user/jason/my_root_dir/2014
- /user/jason/my_root_dir/2014/01
- /user/jason/my_root_dir/2014/01/01
- /user/jason/my_root_dir/2014/01/02
- /user/jason/my_root_dir/2014/01/03

- /user/jason/my_root_dir/2014/12/31

Running “./make_partitions /user/jason/my_root_dir 2014-01-01 2014-12-31″ accomplishes this. Keep in mind that this takes a while, as the directories are checked and created across the cluster.

#!/bin/bash
 
# Usage: ./make_partitions HDFS_root_dir start_date end_date
# Example: ./make_partitions /user/root/mydir 2014-01-01 2014-12-31
# Creates nested year, month, day partitions for a sequence of dates (inclusive).
# Jason B. Hill - jason@jasonbhill.com
 
# Parse input options
HDFSWD=$1
START_DATE="$(date -d "$2" +%Y-%m-%d)"
END_DATE="$(date -d "$3 +1 days" +%Y-%m-%d)"
 
# Function to form directories based on a date
function mkdir_partition {
    # Input: $1 = date to form partition
 
    # Get date parameters
    YEAR=$(date -d "$1" +%Y)
    MONTH=$(date -d "$1" +%m)
    DAY=$(date -d "$1" +%d)
 
    # If the year doesn't exist, create it
    $(hdfs dfs -test -e ${HDFSWD}/${YEAR})
    if [[ "$?" -eq "1" ]]; then
        echo "-- creating HDFS directory: ${HDFSWD}/${YEAR}"
        $(hdfs dfs -mkdir ${HDFSWD}/${YEAR})
    fi
    # If the month doesn't exist, create it
    $(hdfs dfs -test -e ${HDFSWD}/${YEAR}/${MONTH})
    if [[ "$?" -eq "1" ]]; then
        echo "-- creating HDFS directory: ${HDFSWD}/${YEAR}/${MONTH}"
        $(hdfs dfs -mkdir ${HDFSWD}/${YEAR}/${MONTH})
    fi
    # If the day doesn't exist (it shouldn't), create it
    $(hdfs dfs -test -e ${HDFSWD}/${YEAR}/${MONTH}/${DAY})
    if [[ "$?" -eq "1" ]]; then
        echo "-- creating HDFS directory: ${HDFSWD}/${YEAR}/${MONTH}/${DAY}"
        $(hdfs dfs -mkdir ${HDFSWD}/${YEAR}/${MONTH}/${DAY})
    fi
}
 
# Iterate over dates and make partitions
ITER_DATE="${START_DATE}"
until [[ "${ITER_DATE}" == "${END_DATE}" ]]; do
    mkdir_partition ${ITER_DATE}
    ITER_DATE=$(date -d "${ITER_DATE} +1 days" +%Y-%m-%d)
done
 
exit 0

The following bash script iterates over dates in a range.

#!/bin/bash
 
# $1 = start date (e.g.: yyyy-mm-dd)
# $2 = end date
 
# make sure the end date is formatted correctly
end_date=$(date -d "$2" +%Y-%m-%d)
 
# set iteration date to start date and format
iter_date=$(date -d "$1" +%Y-%m-%d)
 
until [[ ${iter_date} == ${end_date} ]]; do
    # print the date
    echo ${iter_date}
    # advance the date
    iter_date=$(date -d "${iter_date} +1 day" +%Y-%m-%d)
done

I’ve saved that in a file called “loopdates.sh” and chmod the file to 0755. An example usage follows.

$ ./loopdates.sh 2014-11-27 2014-12-02
2014-11-27
2014-11-28
2014-11-29
2014-11-30
2014-12-01