Taking the results of an Impala query in the impala-shell and saving them as a TSV is easy. In my experience, this is better through the shell than through a service such as Hue. When I’ve done this in Hue, there have been some issues with the name node running out of memory due to the resultset being so large. Dumping to TSV from the shell doesn’t seem to result in the same issue.
Here’s a brief explanation of the different options:
-i: As usual, this connects the shell to an impala daemon.
-o: Output to the following file.
-B: Turn off pretty printing. Use tab delimiters by default.
-f: Run the query in the following file.
The delimiter used can be changed using the –output_delimiter option. In the following example, I’m connecting to the data node at data_node_01.
$ impala-shell -i data_node_01 -o output_file.tsv -B -f impala_query.sql