Minimizing overhead when transmitting results to clients
Always try to minimize the overhead incurred when transmitting query results back to clients.
By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.
Use techniques such as:
- Always use the
LIMIT
clause in queries to sample data. See LIMIT Clause for more information. - Use the
NUM_ROWS_PRODUCED_LIMIT
query option to limit the number of rows produced by a query. This can be set as the default for user pools where exploratory queries are expected to run and the user might forget to add aLIMIT
clause. See NUM_ROWS_PRODUCED_LIMIT Query Option for more information. -
When using the
impala-shell
, use the-B
option to avoid “prettyprinting” the result set and redirect query results to a file using the--output_file
option. For example:impala-shell --ssl -k -i impala-coordinator1.company.com:21000 -B --output_file /tmp/result