Securing Impala

You must be aware of this set of security features Impala provides to protect your critical and sensitive data.

The Impala security features have several objectives.

  • At the basic level, security prevents accidents or mistakes that could disrupt application processing, delete or corrupt data, or reveal data to unauthorized users.
  • Advanced security features and practices can harden the system against malicious users trying to gain unauthorized access or perform other disallowed operations.
  • The auditing feature provides a way to confirm that no unauthorized access occurred, and detect whether any such attempts were made.

Based on the above objectives, Impala security features are divided into the following categories.

Authentication
How does Impala verify the identity of the user to confirm that they really are allowed to exercise the privileges assigned to that user? Impala relies on the Kerberos subsystem for authentication.
Authorization
Which users are allowed to access which resources, and what operations are they allowed to perform? Impala relies on the open source Ranger project for authorization. By default (when authorization is not enabled), Impala does all read and write operations with the privileges of the impala user, which is suitable for a development/test environment but not for a secure production environment. When authorization is enabled, Impala uses the OS user ID of the user who runs impala-shell or other client program, and associates various privileges with each user.
TLS/SSL Encryption
This feature encrypts TLS/SSL network encryption, between Impala and client programs, and between the Impala-related daemons running on different nodes in the cluster.
Auditing
What operations were attempted, and did they succeed or not? This feature provides a way to look back and diagnose whether attempts were made to perform unauthorized operations. You use this information to track down suspicious activity, and to see where changes are needed in authorization policies. The audit data produced by this feature is collected by the Cloudera Manager product and then presented in a user-friendly form by the Cloudera Manager product.

The following are the important guidelines to protect a cluster running Impala against accidents and mistakes, or malicious attackers trying to access sensitive data. See the subsequent topics that describe the security features in detail.

  • Secure the root account. The root user can tamper with the impalad daemon, read and write the data files in HDFS, log into other user accounts, and access other system services that are beyond the control of Impala.
  • Restrict membership in the sudoers list (in the /etc/sudoers file). The users who can run the sudo command can do many of the same things as the root user.
  • Ensure the Hadoop ownership and permissions for Impala data files are restricted.
    The Impala authorization feature makes use of the HDFS file ownership and permissions mechanism.
    1. Set up users and assign them to groups at the OS level, corresponding to the different categories of users with different access levels for various databases, tables, and HDFS locations (URIs).
    2. Create the associated Linux users using the useradd command if necessary.
    3. Add Linux users to the appropriate groups with the usermod command.
  • Ensure the Hadoop ownership and permissions for Impala log files are restricted.

    If you issue queries containing sensitive values in the WHERE clause, such as financial account numbers, those values are stored in Impala log files in the Linux file system, and you must secure those files.

  • Ensure that the Impala web UI (available by default on port 25000 on each Impala node) is password-protected.
  • Create a policy file that specifies which Impala privileges are available to users in particular Hadoop groups (which by default map to Linux OS groups). Create the associated Linux groups using the groupadd command if necessary.
  • Design your databases, tables, and views with database and table structure to allow policy rules to specify simple, consistent rules.

    For example, if all tables related to an application are inside a single database, you can assign privileges for that database and use the * wildcard for the table name. If you are creating views with different privileges than the underlying base tables, you might put the views in a separate database so that you can use the * wildcard for the database containing the base tables, while specifying the specific names of the individual views.

  • Enable authorization for all impalad daemons.

    The authorization feature does not apply to the statestored daemon, which has no access to schema objects or data files.

  • Set up authentication using Kerberos, to make sure users really are who they say they are.