Configuring Proxy Users to Access HDFS

Hadoop allows you to configure proxy users to submit jobs or access HDFS on behalf of other users; this is called impersonation. When you enable impersonation, any jobs submitted using a proxy are executed with the impersonated user's existing privilege levels rather than those of a superuser (such as hdfs). Because all proxy users are configured in one location, core-site.xml, Hadoop administrators to implement centralized access control.

To configure proxy users, set the hadoop.proxyuser.<proxy_user>.hosts, hadoop.proxyuser.<proxy_group>.groups and hadoop.proxyuser.<proxy_user>.users in core-site.xml properties.

For example, to allow user alice to impersonate a user belonging to group_a and group_b, set hadoop.proxyuser.<proxy_group>.groups as follows:

   <property>
     <name>hadoop.proxyuser.alice.groups</name>
     <value>group_a,group_b</value>
   </property>

To limit the hosts from which impersonated connections are allowed, use hadoop.proxyuser.<proxy_user>.hosts. For example, to allow user alice impersonated connections only from host_a and host_b:

<property>
   <name>hadoop.proxyuser.alice.hosts</name>
   <value>host_a,host_b</value>
</property>

If the configuration properties described are not present, impersonation is not allowed and connections will fail.

For looser restrictions, use a wildcard (*) to allow impersonation from any host and of any user. For example, to allow user bob to impersonate any user belonging to any group, and from any host, set the properties as follows:

  <property>
    <name>hadoop.proxyuser.bob.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.bob.groups</name>
    <value>*</value>
  </property>

The hadoop.proxyuser.<proxy_user>.hosts property also accepts comma-separated lists of IP addresses, IP address ranges in CIDR format, or host names. For example, to allow user kate access from hosts in the range 10.222.0.0-15 and 10.113.221.221, to impersonate user_a and user_b, set the proxy user properties as follows:

<property>
     <name>hadoop.proxyuser.super.hosts</name>
     <value>10.222.0.0/16,10.113.221.221</value>
</property>
<property>
     <name>hadoop.proxyuser.super.users</name>
     <value>user1,user2</value>
</property>

Proxy Users for Kerberos-Enabled Clusters

For secure clusters, the proxy users must have Kerberos credentials to impersonate another user.

Proxy users cannot use delegation tokens. If a user is allowed to add its own delegation token to the proxy user UGI, it also allows the proxy user to connect to the service with the privileges of the original user.

If a superuser wants to give a delegation token to a proxy-user UGI, for example, alice, the superuser must first impersonate alice, get a delegation token for alice, and add it to the UGI for the newly created proxy UGI. This way, the delegation token has its owner set to alice.