This one isn’t code related, but I come across this enough in Linux Mint VMs that I’m saving it here for myself and anyone else that may find it useful.

Problem

I currently use Linux Mint as my development environment. When I’m using Firefox (39.0+build5-0build0.14.04.1) and closing a browser window containing multiple tabs, I always get prompted by Firefox: “Do you want Firefox to save your tabs for the next time it starts?” There is a checkbox for Firefox to not raise this warning in the future, but no matter how often you check it you will still get the warning.

Solution

Open the ‘about:config’ page.

Set browser.tabs.warnOnClose to false (if it isn’t so set already).
Set browser.tabs.warnOnCloseOtherTabs to false.
Set browser.warnOnQuit to false.


Carriage Return -vs- Line Feed

Every now and then, I end up having a text parsing issue that simply comes down to carriage returns versus line feeds. For instance, with Twitter’s streaming API, you can receive multiple responses that all form a single JSON entity. You can realize this situation because the end of an entity will have a carriage return, while individual chunks before the end will only have line feed (new line) terminators.

What’s really a bit messed up is that Windows is a big fan of CRLF (‘\r\n’), while most of Unix prefers LF (‘\n’) for line terminators. Apparently, Apple has a thing for sometimes using single CR (‘\r’) because it likes to be special.

This all started when computers were supposed to mimic typewriters. At the end of a line, you needed to do two things at once: (1) Return the character carriage to the left so you could type more and (2) advance the line feed so you wouldn’t simply write over what you just wrote.

Recognizing CRLF -vs- LF

Both the carriage return (CR) and line feed (LF) are represented by non-printable characters. Your terminal or text editor simply knows how to interpret them. When you tell a script to write ‘\n’ for a line feed, you’re actually referencing the non-printable ASCII character 0x0A (decimal 10). When you write ‘\r’, you’re referencing 0x0D (decimal 12).

But those characters don’t actually print. They only instruct a terminal or text editor to display the text around the characters in specific ways. So, how do you recognize them?

The Linux ‘file’ command will tell you what sort of line terminators a file has, so that’s pretty quick. Here’s an example:

$ file my_file_created_on_Windows.txt
my_file_created_on_Windows.txt: ASCII text, with very long lines, with CRLF line terminators
 
$ file my_file_created_on_Linux
my_file_created_on_Linux: ASCII text, with very long lines

If the file uses only LF terminators, this is considered the default and you won’t be informed.

Removing CR Terminators

You have several options for getting rid of those ‘\r’ CR characters in text. One option is to simply ‘tr’ the text in the terminal:

$ tr -d '\r' < my_file_created_on_Windows.txt > my_new_file.txt

Another option is to use a utility such as ‘dos2unix.’ Yet another option would be to use a more advanced text parsing language, such as Python, and replace the characters manually:

import codecs
 
f_p = codecs.open('my_file_created_on_Windows.txt','r','utf-8')
g_p = codecs.open('my_new_file.txt','w','utf-8')
 
for line in f_p:
    g_p.write(line.replace('\r','').replace('\n',''))
f_p.close()
g_p.close()

A few notes on that Python code.. First, we use the codecs module to read text because it may contain non-ASCII characters such as Unicode. In this case, we’re reading the characters in UTF-8 encoding. Also, we replace both the CR and LF because Python will automatically write an LF at the end of the line, and we don’t want two LFs.


I had written an article about running scripts in parallel using GNU Parallel, and then I realized that GNU parallel isn’t in the CentOS repositories. Since the code that I’m writing requires standard repo support, I need to find a different solution.

If we want to perform the same action as in the referenced article, using xargs instead of GNU parallel, we’d run the following command.

$ echo {1..20} | xargs -n1 -P5 ./echo_sleep
1426008382 -- starting -- 2
1426008382 -- starting -- 5
1426008382 -- starting -- 1
1426008382 -- starting -- 3
1426008382 -- starting -- 4
1426008382 -- finishing -- 4
1426008382 -- starting -- 6
1426008383 -- finishing -- 1
1426008383 -- starting -- 7
1426008385 -- finishing -- 3
1426008385 -- starting -- 8
1426008386 -- finishing -- 7
1426008386 -- starting -- 9
1426008389 -- finishing -- 9
1426008389 -- starting -- 10
1426008390 -- finishing -- 2
1426008390 -- finishing -- 5
1426008390 -- starting -- 11
1426008390 -- starting -- 12
1426008391 -- finishing -- 6
1426008391 -- starting -- 13
1426008392 -- finishing -- 10
1426008392 -- starting -- 14
1426008394 -- finishing -- 8
1426008394 -- starting -- 15
1426008396 -- finishing -- 15
1426008396 -- starting -- 16
1426008397 -- finishing -- 16
1426008397 -- starting -- 17
1426008398 -- finishing -- 11
1426008398 -- starting -- 18
1426008399 -- finishing -- 12
1426008399 -- starting -- 19
1426008399 -- finishing -- 13
1426008399 -- starting -- 20
1426008399 -- finishing -- 20
1426008399 -- finishing -- 17
1426008400 -- finishing -- 14
1426008402 -- finishing -- 18
1426008408 -- finishing -- 19

Some things to note here: First, the “-n1″ or “-n 1″ option is critical, as it informs xargs how many arguments from the echo string that each instance of the script echo_sleep needs to take as input. Also, the output format controls for xargs aren’t as well developed. In fact, it’s entirely possible for stdout from invoked scripts to collide. For this reason, you may want to make sure that the invoked scripts are more advanced (in Python with better file handling, for instance) instead of simply redirecting bash output.


What I want to be able to do is to run a script on a massive number of inputs. But, I only want a specified maximum number of them to be running at any given time. GNU parallel can accomplish this very easily.

First, make sure you have GNU parallel installed. The package in most major repositories is simply called “parallel”.

Writing A Basic Script

I’m going to write a bash script that echos a timestamp and input, and then waits two seconds before exiting. The script looks like this.

#!/bin/bash                                                                                                                                              
 
VAL="$1"
TIME="$(date +%s)"
 
echo "${TIME} -- ${VAL}"
 
sleep 2

Just to make sure it is working, we chmod it to 0755 and then we call it with input “hi”.

$ ./echo_sleep hi
1425690292 -- hi

It worked just as expected: After it echoed the time and input, it slept for two seconds and then exited and my prompt returned.

Running the Script in Parallel

I want to run this script on 20 inputs, but I only ever want to have 5 instances running at any given time. Here’s how we do that with GNU parallel (where the input arguments for the script are denoted by ‘:::’). I’m just using numbers 1..20 and the inputs.

$ parallel -j5 ./echo_sleep ::: {1..20}
1425690566 -- 1
1425690566 -- 2
1425690566 -- 3
1425690566 -- 4
1425690566 -- 5
1425690569 -- 6
1425690569 -- 7
1425690569 -- 8
1425690569 -- 9
1425690569 -- 10
1425690571 -- 11
1425690571 -- 12
1425690571 -- 13
1425690571 -- 14
1425690571 -- 15
1425690573 -- 16
1425690573 -- 17
1425690573 -- 18
1425690573 -- 19
1425690573 -- 20

Note: You can use any bash IFS-separated sequence as the input. For instance, something like ‘::: 1 2 3 4 5 6 7 8 9 10′ works just as well as the sequence ‘{1..10}’.

As you can see, the times are two seconds apart. What if we change the script to sleep for a random amount of time? We’ll have each script instance wait between 0 and 9 seconds by changing the script as follows, asking for output both when the script starts and when it finishes.

#!/bin/bash                                                                                                                                            
 
VAL="$1"
 
echo "$(date +%s) -- starting -- ${VAL}"
 
sleep "$(($RANDOM % 10))"
 
echo "$(date +%s) -- finishing -- ${VAL}"

Now, when we run the script, we’ll expect each instance to take a different amount of time to finish. We’ll notice that all output for the script is sent to stdout at the same time. (There are output control mechanisms in GNU parallel, but we’re not using them here.)

$ parallel -j5 ./echo_sleep ::: {1..20}
1425690952 -- starting -- 2
1425690954 -- finishing -- 2
1425690952 -- starting -- 1
1425690955 -- finishing -- 1
1425690952 -- starting -- 5
1425690955 -- finishing -- 5
1425690952 -- starting -- 3
1425690957 -- finishing -- 3
1425690952 -- starting -- 4
1425690959 -- finishing -- 4
1425690955 -- starting -- 7
1425690959 -- finishing -- 7
1425690955 -- starting -- 8
1425690961 -- finishing -- 8
1425690954 -- starting -- 6
1425690963 -- finishing -- 6
1425690959 -- starting -- 10
1425690963 -- finishing -- 10
1425690959 -- starting -- 11
1425690965 -- finishing -- 11
1425690961 -- starting -- 12
1425690965 -- finishing -- 12
1425690957 -- starting -- 9
1425690966 -- finishing -- 9
1425690966 -- starting -- 17
1425690967 -- finishing -- 17
1425690963 -- starting -- 13
1425690967 -- finishing -- 13
1425690963 -- starting -- 14
1425690967 -- finishing -- 14
1425690967 -- starting -- 19
1425690967 -- finishing -- 19
1425690967 -- starting -- 18
1425690969 -- finishing -- 18
1425690965 -- starting -- 15
1425690973 -- finishing -- 15
1425690965 -- starting -- 16
1425690973 -- finishing -- 16
1425690967 -- starting -- 20
1425690973 -- finishing -- 20

To get the output immediately, we can use the –linebuffer option. (There is also an –ungroup option, but it suffers from the problem of potentially mashing the simultaneous output of two script instances.)

$ parallel -j5 --linebuffer ./echo_sleep ::: {1..20}
1425691272 -- starting -- 1
1425691272 -- starting -- 2
1425691272 -- starting -- 3
1425691272 -- starting -- 4
1425691272 -- starting -- 5
1425691273 -- finishing -- 1
1425691273 -- finishing -- 2
1425691273 -- starting -- 7
1425691273 -- starting -- 6
1425691275 -- finishing -- 4
1425691275 -- finishing -- 6
1425691275 -- starting -- 8
1425691275 -- starting -- 9
1425691276 -- finishing -- 3
1425691276 -- starting -- 10
1425691277 -- finishing -- 9
1425691277 -- starting -- 11
1425691278 -- finishing -- 8
1425691278 -- finishing -- 10
1425691278 -- starting -- 12
1425691278 -- starting -- 13
1425691280 -- finishing -- 5
1425691280 -- finishing -- 7
1425691280 -- starting -- 15
1425691280 -- starting -- 14
1425691282 -- finishing -- 12
1425691282 -- starting -- 16
1425691283 -- finishing -- 13
1425691283 -- starting -- 17
1425691284 -- finishing -- 17
1425691284 -- finishing -- 11
1425691284 -- starting -- 19
1425691284 -- starting -- 18
1425691285 -- finishing -- 14
1425691285 -- starting -- 20
1425691286 -- finishing -- 15
1425691286 -- finishing -- 18
1425691289 -- finishing -- 19
1425691289 -- finishing -- 16
1425691293 -- finishing -- 20

Now, all of the timestamps are in order.


Using the ‘du’ and ‘sort’ commands, we can get a listing of the largest directories in a given directory. For instance, in my home directory:

jason@mintSandbox ~ $ du -h --max-depth 1 | sort -hr
7.3G	.
4.7G	./data
1.4G	./mount0
391M	./.cache
389M	./bash
258M	./tmp
104M	./texts
35M	./python
28M	./.mozilla
12M	./.adobe
11M	./.config
8.8M	./bin
8.3M	./Downloads
3.7M	./.thumbnails
1.8M	./presentations
1.3M	./.macromedia
884K	./.gstreamer-0.10
524K	./.gimp-2.8
508K	./misc
388K	./.purple
164K	./.local
152K	./.java
144K	./.netExtenderCerts
140K	./.kde
72K	./scripts
64K	./.gconf
36K	./.pki
36K	./.gftp
32K	./.gnome2
28K	./.ssh
16K	./.linuxmint
12K	./.dbus
4.0K	./mount2
4.0K	./mount1
4.0K	./.gnome2_private
4.0K	./Desktop

Here, the ‘du’ command (standing for ‘disk usage’) estimates disk space taken by directories. The ‘-h’ option tells ‘du’ to make the output human readable. The ‘–max-depth 1′ option tells ‘du’ not to dig down within folders. (Try issuing the command without this option and see what happens.) The ‘sort’ command then takes the output of ‘du’ and sorts it by human readable numbers (again, the ‘-h’ option). The extra ‘-r’ option simply tells sort to reverse the sort, so the largest folders come first. We could pipe this to ‘head’ to reduce the number of rows returned to, say, 5 rows:

jason@mintSandbox ~ $ du -h --max-depth 1 | sort -hr | head -n 5
7.3G	.
4.7G	./data
1.4G	./mount0
391M	./.cache
389M	./bash

Say you have a bunch of subdirectories of your current working directory, all including variously named files. You want to iterate over those files and apply some Bash command.

For instance, I have folders named 01, 02, 03, …, 31 (representing days in a month), and inside each of those folders sits various files. I wish to gzip each of those files individually. Here’s how I do that with a single line in Bash:

$ for d in */; do for f in $d*; do echo "gzip ${f}";gzip ${f};done; done

The [date] program in Linux is incredibly powerful, and can be used to modify dates very quickly. Here are some examples.

The current time in my present locale:

$ date
Thu Nov 13 15:41:12 MST 2014

The current time in UTC: (Use UTC, not GMT. One is an international standard of keeping time based on atomic clocks while the other is a local old-fashioned timezone based on when the sun is highest in the sky … which isn’t exactly accurate enough for international business. Plus, GMT isn’t even used when daylight saving time is in effect. Anyway…)

$ date --utc
Thu Nov 13 22:44:06 UTC 2014

The time one day ago in UTC:

$ date --utc -d "now -1 day"
Wed Nov 12 22:44:51 UTC 2014

A specific date minus one day, formatted as we wish:

date -d "2014-10-01 -1 day" +%Y-%m-%d
2014-09-30

See the manual page for date for more options.


Say you want to transfer a large file over your network with scp, but you don’t want to hoard all of the network resources for this transfer. You can limit scp’s bandwidth usage by using the -l (lower case L) option and specifying your bandwidth limit in Kbit/s.

So, if I want to transfer a file and limit the bandwidth used to 1MB/s, I’d first compute that 1MB is equal to 1024KB, which is again equal to 8192Kb. (Here B=Byte and b=bit.) So, we end up with the command:

$ scp -l 8192 file_to_transfer user@host:/path-on-other-end

Problem:

You can connect to MySQL on computer #1 and you can connect from computer #2 to computer #1 via SSH, but you cannot connect directly from computer #2 to computer #1 with MySQL – and you want/need to do so.

Solution:

Create an SSH tunnel to route the MySQL connection. Here’s how.

On computer #2: Run the following command in a terminal:

ssh username@computer1address -L 3307:localhost:3306 -N

You’ll have to use your password/keys to log in at this point. What some of the options mean:
1) “-L 3307:localhost:3306″ means that we want to bind port 3307 on the local machine (computer #2) to port 3306 on the remote machine (computer #1). After the connection is made, connections on computer #2 going to localhost:3307 will actually be connecting to computer #1′s port 3306.
2) “-N” means to not expect anything to happen after the connection is made. It will just sit there and allow traffic in the background.

Now, you should be able to connect from computer #2 to MySQL on computer #1. On computer #2, now try:

mysql -h 127.0.0.1 -P 3307 -u username -ppassword

As of writing this (March 31, 2014), the version of Boost included in Debian Wheezy is 1.49. This is a bit problematic for me, as some very useful things are included in more recent versions (e.g., better syslog event handlers in the logging library). One of the problems I sometimes face with Boost is in remembering how to compile/install it. The Boost “Getting Started” pages basically say that most of Boost is header-only and doesn’t need to be compiled, but that there are 16 libraries that need to be compiled. Unfortunately, it stops there. So, this is for future reference and for others that may find it useful. How does one compile/install all of Boost?

The first thing we need to do is make sure all the dependencies for Boost are available. Using your favorite package manager, make sure you have the following. (Depending on your distro, they may be named slightly differently. But, make sure you have the dev version when required.)

- build-essential
- g++
- python-dev
- autotools-dev
- libicu-dev
- libbz2-dev

Next, download and decompress the Boost version that you’d like. I’m using 1.55.

$ wget http://downloads.sourceforge.net/project/boost/boost/1.55.0/boost_1_55_0.tar.gz
$ gunzip boost_1_55_0.tar.gz
$ tar xvf boost_1_55_0.tar

Finally, as root, build and install. Since I’m placing the installation in /usr/local (as is the default), I’m not issuing any prefix declaration here. There are really just two commands:

(root) $ ./bootstrap.sh
(root) $ ./b2 --with=all -j 2 install

The -j n option will build the libraries using n threads, so this can be changed based on your system. After completion, you should be able to include and link against the required headers and libraries.h