Minimizing overhead when transmitting results to clients

Always try to minimize the overhead incurred when transmitting query results back to clients.

By: Manish Maheshwari, Data Architect and Data Scientist at Cloudera, Inc.

Use techniques such as:

  • Always use the LIMIT clause in queries to sample data. See LIMIT Clause for more information.
  • Use the NUM_ROWS_PRODUCED_LIMIT query option to limit the number of rows produced by a query. This can be set as the default for user pools where exploratory queries are expected to run and the user might forget to add a LIMIT clause. See NUM_ROWS_PRODUCED_LIMIT Query Option for more information.
  • When using the impala-shell, use the -B option to avoid “prettyprinting” the result set and redirect query results to a file using the --output_file option. For example:

    impala-shell --ssl -k -i impala-coordinator1.company.com:21000 -B --output_file /tmp/result