Configuring Impala
After installing Impala, review and configure the mandatory and recommended settings described in this topic. If Impala is installed using Cloudera Manager, some of these configurations are completed automatically. If you installed Impala without Cloudera Manager, or if you want to customize your environment, consider making the changes.
In some cases, depending on the level of Impala, CDP, and Cloudera Manager, you might need to add particular component configuration details in one of the free-form fields on the Impala configuration pages in Cloudera Manager. These fields are labelled Safety Valve or Advanced Configuration Snippet.
- You must enable short-circuit reads, whether or not Impala was installed through Cloudera Manager. This setting goes in the Impala configuration settings, not the Hadoop-wide settings.
- If you installed Impala in an environment that is not managed by Cloudera Manager, you must enable block location tracking, and you can optionally enable native checksumming for optimal performance.
Short-Circuit Reads
libhadoop.so
(the Hadoop Native Library) to be
accessible to both the server and the client. You must install it from
an .rpm
, .deb
, or parcel to use
short-circuit local reads.
To configure DataNodes for short-circuit reads:
- Copy the client
core-site.xml
andhdfs-site.xml
configuration files from the Hadoop configuration directory to the Impala configuration directory. The default Impala configuration location is/etc/impala/conf
. - On all Impala nodes, configure the following properties in Impala's
copy of
hdfs-site.xml
as shown:<property> <name>dfs.client.read.shortcircuit</name> <value>true</value> </property> <property> <name>dfs.domain.socket.path</name> <value>/var/run/hdfs-sockets/dn</value> </property> <property> <name>dfs.client.file-block-storage-locations.timeout.millis</name> <value>10000</value> </property>
- If
/var/run/hadoop-hdfs/
is group-writable, make sure its group isroot
. - After applying these changes, restart all DataNodes.
Block Location Tracking
Enabling block location metadata allows Impala to know which disk data blocks are located on, allowing better utilization of the underlying disks. Impala will not start unless this setting is enabled.
To enable block location tracking:
- For each DataNode, add the following to
the
hdfs-site.xml
file:<property> <name>dfs.datanode.hdfs-blocks-metadata.enabled</name> <value>true</value> </property>
Native Checksumming
Enabling native checksumming causes Impala to use an optimized native library for computing checksums, if that library is available.
To enable native checksumming:
If you installed CDP from packages, the native checksumming library is installed and setup correctly, and no additional steps are required.
If you installed by other means, native checksumming may not be
available due to missing shared objects. Finding the message
"Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
" in the Impala
logs indicates native checksumming may be unavailable.
To enable native checksumming, you must build and install
libhadoop.so
(the Hadoop Native Library).