Configuring Proxy Users to Access HDFS

Hadoop allows you to configure proxy users to submit jobs or access HDFS on behalf of other users; this is called impersonation. When you enable impersonation, any jobs submitted using a proxy are executed with the impersonated user's existing privilege levels rather than those of a superuser (such as hdfs).

All proxy users are configured in one location, core-site.xml, for Hadoop administrators to implement centralized access control.

To configure proxy users, set the hadoop.proxyuser.<proxy_user>.hosts, hadoop.proxyuser.<proxy_group>.groups, and hadoop.proxyuser.<proxy_user>.users in core-site.xml properties.

For example, to allow user alice to impersonate a user belonging to group_a and group_b, set hadoop.proxyuser.<proxy_group>.groups as follows:

   <property>
     <name>hadoop.proxyuser.alice.groups</name>
     <value>group_a,group_b</value>
   </property>

To limit the hosts from which impersonated connections are allowed, use hadoop.proxyuser.<proxy_user>.hosts. For example, to allow user alice impersonated connections only from host_a and host_b:

<property>
   <name>hadoop.proxyuser.alice.hosts</name>
   <value>host_a,host_b</value>
</property>

If the configuration properties described are not present, impersonation is not allowed and connections will fail.

For looser restrictions, use a wildcard (*) to allow impersonation from any host and of any user. For example, to allow user bob to impersonate any user belonging to any group, and from any host, set the properties as follows:

  <property>
    <name>hadoop.proxyuser.bob.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.bob.groups</name>
    <value>*</value>
  </property>

The hadoop.proxyuser.<proxy_user>.hosts property also accepts comma-separated lists of IP addresses, IP address ranges in CIDR format, or host names. For example, to allow user kate access from hosts in the range 10.222.0.0-15 and 10.113.221.221, to impersonate user_a and user_b, set the proxy user properties as follows:

<property>
     <name>hadoop.proxyuser.super.hosts</name>
     <value>10.222.0.0/16,10.113.221.221</value>
</property>
<property>
     <name>hadoop.proxyuser.super.users</name>
     <value>user1,user2</value>
</property>