Minimizing overhead when transmitting results to clients
Always try to minimize the overhead incurred when transmitting query results back to clients.
By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.
Use techniques such as:
- Always use the
LIMITclause in queries to sample data. See LIMIT Clause for more information. - Use the
NUM_ROWS_PRODUCED_LIMITquery option to limit the number of rows produced by a query. This can be set as the default for user pools where exploratory queries are expected to run and the user might forget to add aLIMITclause. See NUM_ROWS_PRODUCED_LIMIT Query Option for more information. -
When using the
impala-shell, use the-Boption to avoid “prettyprinting” the result set and redirect query results to a file using the--output_fileoption. For example:impala-shell --ssl -k -i impala-coordinator1.company.com:21000 -B --output_file /tmp/result
