Set up HDFS caching with Impala for improved
    performance.
        
      Decide how much memory to devote to the HDFS cache on each host. The
        total memory available for cached data is the sum of the cache sizes on
        all the hosts. By default, any data block is only cached on one host
        although you can cache a block across multiple hosts by increasing the
        replication factor. 
        
        - 
        Enable or disable HDFS caching through Cloudera Manager using the
          configuration setting Maximum Memory Used for Caching for the HDFS
          service. 
        
This control sets the HDFS configuration parameter
            dfs_datanode_max_locked_memory, which specifies the upper limit of HDFS
          cache size on each node. Set up the HDFS caching for your Hadoop cluster. 
        
          
            -  All the other manipulation of the HDFS caching settings, such as what files are
              cached, is done through the command line, either Impala DDL statements or the Linux
                hdfs cacheadmin command. 
 
          
         
       - Using the hdfs cacheadmin command, set up
          one or more pools owned by the same user as the
            impalad daemon (typically
          
impala). For example:
          
hdfs cacheadmin -addPool four_gig_pool -owner impala -limit 4000000000
 -  Once HDFS caching is enabled and one or more pools are
          available, on the Impala side, you specify the cache pool name defined
          by the 
hdfs cacheadmin command in the Impala DDL
          statements that enable HDFS caching for a table or partition, such as
            CREATE TABLE ... CACHED IN pool
          or ALTER TABLE ... SET CACHED IN
            pool.  - You can use hdfs cacheadmin -listDirectives
          to get a list of existing cache pools.
 - You can use hdfs cacheadmin -listDirectives
            -stats to get detailed information about the
        pools.