org.apache.hadoop.hive.ql.stats
Interface StatsAggregator

All Known Implementing Classes:
CounterStatsAggregator, CounterStatsAggregatorTez, FSStatsAggregator, JDBCStatsAggregator

public interface StatsAggregator

An interface for any possible implementation for gathering statistics.


Method Summary
 String aggregateStats(String keyPrefix, String statType)
          This method aggregates a given statistic from all tasks (partial stats).
 boolean cleanUp(String keyPrefix)
          This method is called after all statistics have been aggregated.
 boolean closeConnection()
          This method closes the connection to the temporary storage.
 boolean connect(org.apache.hadoop.conf.Configuration hconf, Task sourceTask)
          This method connects to the temporary storage.
 

Method Detail

connect

boolean connect(org.apache.hadoop.conf.Configuration hconf,
                Task sourceTask)
This method connects to the temporary storage.

Parameters:
hconf - HiveConf that contains the connection parameters.
sourceTask -
Returns:
true if connection is successful, false otherwise.

aggregateStats

String aggregateStats(String keyPrefix,
                      String statType)
This method aggregates a given statistic from all tasks (partial stats). After aggregation, this method also automatically removes all records that have been aggregated.

Parameters:
keyPrefix - a prefix of the keys used in StatsPublisher to publish stats. Any rows that starts with the same prefix will be aggregated. For example, if the StatsPublisher uses the following compound key to publish stats: the output directory name (unique per FileSinkOperator) + the partition specs (only for dynamic partitions) + taskID (last component of task file) The keyPrefix for aggregation could be first 2 components. This will aggregates stats across all tasks for each partition.
statType - a string noting the key to be published. Ex: "numRows".
Returns:
a string representation of a long value, null if there are any error/exception.

closeConnection

boolean closeConnection()
This method closes the connection to the temporary storage.

Returns:
true if close connection is successful, false otherwise.

cleanUp

boolean cleanUp(String keyPrefix)
This method is called after all statistics have been aggregated. Since we support multiple statistics, we do not perform automatic cleanup after aggregation. After this method is called, closeConnection must be called as well. This method is also used to clear the temporary statistics that have been published without being aggregated. Typically this happens when a job fails, or is forcibly stopped after publishing some statistics.

Parameters:
keyPrefix - a prefix of the keys used in StatsPublisher to publish stats. It is the same as the first parameter in aggregateStats().
Returns:
true if cleanup is successful, false otherwise.


Copyright © 2014 The Apache Software Foundation. All rights reserved.