Removing scratch directories

You need to know how to periodically clear scratch directories used by Apache Hive to prevent problems, such as failing jobs.

Scratch directories where Hive stores intermediate, or temporary, files accumulate too much data over time and overflow. You can configure Hive to remove scratch directories periodically and without user intervention. Using Cloudera Manager, you add the following properties as shown in the procedure:

hive.start.cleanup.scratchdir
Value: true
Cleans up the Hive scratch directory while starting the HiveServer.
hive.server2.clear.dangling.scratchdir
Value: true
Starts a thread in HiveServer to clear out the dangling directories from the file system, such as HDFS.
hive.server2.clear.dangling.scratchdir.interval
Example Value: 1800s
  1. In Cloudera Manager, click Clusters > Hive on Tez > Configuration. Clusters > Hive on Tez > Configuration.
  2. Search for the Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml setting.
  3. In the Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml setting, click +.
  4. InName enter the property name and in value enter the value.