Setting Up Hue Using the Command Line

Hue Configuration

This section describes the Hue configuration file, hue.ini. The location of hue.ini varies depending on how Hue is installed and is displayed in Cloudera Manager at Hue > Configuration.

Viewing the Hue Configuration

When you log in to Hue, the start-up page displays information about any misconfiguration detected.

To view the Hue configuration, do one of the following:

  • Visit http://myserver:port and click the Configuration tab.
  • Visit http://myserver:port/desktop/dump_config.

Hue Server Configuration

This section describes Hue Server settings.

Specifying the Hue Server HTTP Address

These configuration properties are under the [desktop] section in the Hue configuration file.

Hue uses the CherryPy web server. You can use the following options to change the IP address and port that the web server listens on. The default setting is port 8888 on all configured IP addresses.
# Webserver listens on this address and port 
http_host=0.0.0.0 
http_port=8888

Specifying the Secret Key

For security, you should specify the secret key that is used for secure hashing in the session store:

  1. Open the Hue configuration file.
  2. In the [desktop] section, set the secret_key property to a long series of random characters (30 to 60 characters is recommended). For example,
    secret_key=qpbdxoewsqlkhztybvfidtvwekftusgdlofbcfghaswuicmqp

Authentication

In a non-secure deployment, the first user who logs in to Hue can choose any username and password and automatically becomes an administrator. This user can create other user and administrator accounts. Hue users should correspond to the Linux users who use Hue; make sure you use the same name as the Linux username.

By default, user information is stored in the Hue database. However, the authentication system is pluggable. You can authenticate Hue with LDAP (Active Directory or OpenLDAP), or you can import users and groups from an LDAP directory.

Configuring the Hue Server for TLS/SSL

You can optionally configure Hue to serve over HTTPS. As of CDH 5, pyOpenSSL is now part of the Hue build and does not need to be installed manually. To configure TLS/SSL, perform the following steps from the root of your Hue installation path:

  1. Configure Hue to use your private key by adding the following options to the Hue configuration file:
    ssl_certificate=/path/to/certificate
    ssl_private_key=/path/to/key
  2. On a production system, you should have an appropriate key signed by a well-known Certificate Authority. If you're just testing, you can create a self-signed key using the openssl command that may be installed on your system:
    # Create a key 
    $ openssl genrsa 1024 > host.key 
    # Create a self-signed certificate 
    $ openssl req -new -x509 -nodes -sha1 -key host.key > host.cert

Authentication Backend Options for Hue

The table below gives a list of authentication backends Hue can be configured with including the recent SAML backend that enables single sign-on authentication. The backend configuration property is available in the [[auth]] section under [desktop].

backend

django.contrib.auth.backends.ModelBackend

This is the default authentication backend used by Django.

desktop.auth.backend.AllowAllBackend

This backend does not require a password for users to log in. All users are automatically authenticated and the username is set to what is provided.

desktop.auth.backend.AllowFirstUserDjangoBackend

This is the default Hue backend. It creates the first user that logs in as the super user. After this, it relies on Django and the user manager to authenticate users.

desktop.auth.backend.LdapBackend

Authenticates users against an LDAP service.

desktop.auth.backend.PamBackend

Authenticates users with PAM (pluggable authentication module). The authentication mode depends on the PAM module used.

desktop.auth.backend.SpnegoDjangoBackend

SPNEGO is an authentication mechanism negotiation protocol. Authentication can be delegated to an authentication server, such as a Kerberos KDC, depending on the mechanism negotiated.

desktop.auth.backend.RemoteUserDjangoBackend

Authenticating remote users with the Django backend.

desktop.auth.backend.OAuthBackend

Delegates authentication to a third-party OAuth server.

libsaml.backend.SAML2Backend

Secure Assertion Markup Language (SAML) single sign-on (SSO) backend. Delegates authentication to the configured Identity Provider.

Beeswax Configuration

In the [beeswax] section of the configuration file, you can optionally specify the following:

hive_server_host

The fully qualified domain name or IP address of the host running HiveServer2.

hive_server_port

The port of the HiveServer2 Thrift server.

Default: 10000.

hive_conf_dir

The directory containing hive-site.xml, the HiveServer2 configuration file.

Impala Query UI Configuration

In the [impala] section of the configuration file, you can optionally specify the following:

server_host

The hostname or IP address of the Impala Server.

Default: localhost.

server_port

The port of the Impalad Server.

Default: 21050

impersonation_enabled

Turn on/off impersonation mechanism when talking to Impala.

Default: False

DB Query Configuration

The DB Query app can have any number of databases configured in the [[databases]] section under [librdbms]. A database is known by its section name (mysql, postgresql, and oracle as in the list below).

Database Type Configuration Properties

MySQL, Oracle or PostgreSQL:

[[[mysql]]]

# Name to show in the UI.      
## nice_name="My SQL DB"      

# For MySQL and PostgreSQL, name is the name of the database.      
# For Oracle, Name is instance of the Oracle server. For express edition      
# this is 'xe' by default.      
## name=mysqldb      

# Database backend to use. This can be:      
# 1. mysql      
# 2. postgresql      
# 3. oracle      
## engine=mysql      

# IP or hostname of the database to connect to.      
## host=localhost 
     
# Port the database server is listening to. Defaults are:      
# 1. MySQL: 3306      
# 2. PostgreSQL: 5432      
# 3. Oracle Express Edition: 1521      
## port=3306      

# Username to authenticate with when connecting to the database.      
## user=example     
 
# Password matching the username to authenticate with when      
# connecting to the database.      
## password=example

Pig Editor Configuration

In the [pig] section of the configuration file, you can optionally specify the following:

remote_data_dir

Location on HDFS where the Pig examples are stored.

Sqoop Configuration

In the [sqoop] section of the configuration file, you can optionally specify the following:

server_url

The URL of the sqoop2 server.

Job Browser Configuration

By default, any user can see submitted job information for all users. You can restrict viewing of submitted job information by optionally setting the following property under the [jobbrowser] section in the Hue configuration file:

share_jobs

Indicate that jobs should be shared with all users. If set to false, they will be visible only to the owner and administrators.

Job Designer

In the [jobsub] section of the configuration file, you can optionally specify the following:

remote_data_dir

Location in HDFS where the Job Designer examples and templates are stored.

Oozie Editor/Dashboard Configuration

By default, any user can see all workflows, coordinators, and bundles. You can restrict viewing of workflows, coordinators, and bundles by configuring either of the following properties under the [oozie] section of the Hue configuration file:

oozie_jobs_count

Maximum number of Oozie workflows or coordinators or bundles to retrieve in one API call.

remote_data_dir

The location in HDFS where Oozie workflows are stored.

As of CDH 5.4, Hue uses a new editor for Oozie documents. If documents were created in the old editor, they won't immediately be available to users other than the document owner. To resolve this problem, the document owner can share any documents again. Alternatively, you can revert to the old editor by setting the flag use_new_editor=false in the [oozie] section of the Hue configuration file.

Also see Liboozie Configuration.

Search Configuration

In the [search] section of the configuration file, you can optionally specify the following:

security_enabled

Indicate whether Solr requires clients to perform Kerberos authentication.

empty_query

Query sent when no term is entered.

Default: *:*.

solr_url

URL of the Solr server.

HBase Configuration

In the [hbase] section of the configuration file, you can optionally specify the following:

truncate_limit

Hard limit of rows or columns per row fetched before truncating.

Default: 500

hbase_clusters

Comma-separated list of HBase Thrift servers for clusters in the format of "(name|host:port)".

Default: (Cluster|localhost:9090)

HBase Impersonation: - To enable the HBase app to use impersonation, perform the following steps:
  1. Ensure you have a secure HBase Thrift server.
  2. Enable impersonation for the Thrift server by adding the following properties to hbase-site.xml on each Thrift gateway:
    <property>
      <name>hbase.regionserver.thrift.http</name>
      <value>true</value>
    </property>
    <property>
      <name>hbase.thrift.support.proxyuser</name>
      <value>true/value>
    </property>

    See: Configure doAs Impersonation for the HBase Thrift Gateway.

  3. Configure Hue to point to a valid HBase configuration directory. You will find this property under the [hbase] section of the hue.ini file.

    hbase_conf_dir

    HBase configuration directory, where hbase-site.xml is located.

    Default: /etc/hbase/conf

User Admin Configuration

In the [useradmin] section of the configuration file, you can optionally specify the following:

default_user_group

The name of the group to which a manually created user is automatically assigned.

Default: default.

Configuring an LDAP Server for User Admin

See Authenticate Hue Users with LDAP and Synchronize Hue with LDAP Server.

User Admin can interact with an LDAP server, such as Active Directory, in one of two ways:

  • You can import user and group information from your current Active Directory infrastructure using the LDAP Import feature in the User Admin application. User authentication is then performed by User Admin based on the imported user and password information. You can then manage the imported users, along with any users you create directly in User Admin.
  • You can configure User Admin to use an LDAP server as the authentication back end, which means users logging in to Hue will authenticate to the LDAP server, rather than against a username and password kept in User Admin. In this scenario, your users must all reside in the LDAP directory.
Enabling Import of Users and Groups from an LDAP Directory

User Admin can import users and groups from an Active Directory using the Lightweight Directory Authentication Protocol (LDAP). In order to use this feature, you must configure User Admin with a set of LDAP settings in the Hue configuration file.

  1. In the Hue configuration file, configure the following properties in the [[ldap]] section:

    Property

    Description

    Example

    base_dn

    The search base for finding users and groups.

    base_dn="DC=mycompany,DC=com"

    nt_domain

    The NT domain to connect to (only for use with Active Directory).

    nt_domain=mycompany.com

    ldap_url

    URL of the LDAP server.

    ldap_url=ldap://auth.mycompany.com

    ldap_cert

    Path to certificate for authentication over TLS (optional).

    ldap_cert=/mycertsdir/myTLScert

    bind_dn

    Distinguished name of the user to bind as – not necessary if the LDAP server supports anonymous searches.

    bind_dn="CN=ServiceAccount,DC=mycompany,DC=com"

    bind_password

    Password of the bind user – not necessary if the LDAP server supports anonymous searches.

    bind_password=P@ssw0rd

  2. Configure the following properties in the [[[users]]] section:

    Property

    Description

    Example

    user_filter

    Base filter for searching for users.

    user_filter="objectclass=*"

    user_name_attr

    The username attribute in the LDAP schema.

    user_name_attr=sAMAccountName

  3. Configure the following properties in the [[[groups]]] section:

    Property

    Description

    Example

    group_filter

    Base filter for searching for groups.

    group_filter="objectclass=*"

    group_name_attr

    The username attribute in the LDAP schema.

    group_name_attr=cn

Enabling the LDAP Server for User Authentication

You can configure User Admin to use an LDAP server as the authentication back end, which means users logging in to Hue will authenticate to the LDAP server, rather than against usernames and passwords managed by User Admin.

  1. In the Hue configuration file, configure the following properties in the [[ldap]] section:

    Property

    Description

    Example

    ldap_url

    URL of the LDAP server, prefixed by ldap:// or ldaps://

    ldap_url=ldap://auth.mycompany.com

    search_bind_ authentication

    Search bind authentication is now the default instead of direct bind. To revert to direct bind, the value of this property should be set to false. When using search bind semantics, Hue will ignore the following nt_domain and ldap_username_pattern properties.

    search_bind_authentication=
    false

    nt_domain

    The NT domain over which the user connects (not strictly necessary if using ldap_username_pattern.

    nt_domain=mycompany.com

    ldap_username_ pattern

    Pattern for searching for usernames – Use <username> for the username parameter. For use when using LdapBackend for Hue authentication

    ldap_username_pattern=
    "uid=<username>,ou=People,dc=mycompany,dc=com"
  2. If you are using TLS or secure ports, add the following property to specify the path to a TLS certificate file:

    Property

    Description

    Example

    ldap_cert

    Path to certificate for authentication over TLS.

    ldap_cert=/mycertsdir/myTLScert
  3. In the[[auth]] sub-section inside [desktop] change the following:

    backend

    Change the setting of backend from
    backend=desktop.auth.backend.AllowFirstUserDjangoBackend
    to
    backend=desktop.auth.backend.LdapBackend

Hadoop Configuration

The following configuration variables are under the [hadoop] section in the Hue configuration file.

HDFS Cluster Configuration

Hue currently supports only one HDFS cluster, which you define under the [[hdfs_clusters]] sub-section. The following properties are supported:

[[[default]]]

The section containing the default settings.

fs_defaultfs

The equivalent of fs.defaultFS (also referred to as fs.default.name) in a Hadoop configuration.

webhdfs_url

The HttpFS URL. The default value is the HTTP port on the NameNode.

YARN (MRv2) and MapReduce (MRv1) Cluster Configuration

Job Browser can display both MRv1 and MRv2 jobs, but must be configured to display one type at a time by specifying either [[yarn_clusters]] or [[mapred_clusters]] sections in the Hue configuration file.

The following YARN cluster properties are defined under the under the [[yarn_clusters]] sub-section:

[[[default]]]

The section containing the default settings.

resourcemanager_host

The fully qualified domain name of the host running the ResourceManager.

resourcemanager_port

The port for the ResourceManager IPC service.

submit_to

If your Oozie is configured to use a YARN cluster, then set this to true. Indicate that Hue should submit jobs to this YARN cluster.

proxy_api_url

URL of the ProxyServer API.

Default: http://localhost:8088

history_server_api_url

URL of the HistoryServer API

Default: http://localhost:19888

The following MapReduce cluster properties are defined under the [[mapred_clusters]] sub-section:

[[[default]]]

The section containing the default settings.

jobtracker_host

The fully qualified domain name of the host running the JobTracker.

jobtracker_port

The port for the JobTracker IPC service.

submit_to

If your Oozie is configured with to use a 0.20 MapReduce service, then set this to true. Indicate that Hue should submit jobs to this MapReduce cluster.

Liboozie Configuration

In the [liboozie] section of the configuration file, you can optionally specify the following:

security_enabled

Indicate whether Oozie requires clients to perform Kerberos authentication.

remote_deployment_dir

The location in HDFS where the workflows and coordinators are deployed when submitted by a non-owner.

oozie_url

The URL of the Oozie server.

Sentry Configuration

In the [libsentry] section of the configuration file, specify the following:

hostname

Hostname or IP of server.

Default: localhost

port

The port where the Sentry service is running.

Default: 8038

sentry_conf_dir

Sentry configuration directory, where sentry-site.xml is located.

Default: /etc/sentry/conf

Hue will also automatically pick up the HiveServer2 server name from Hive's sentry-site.xml file at /etc/hive/conf.

If you have enabled Kerberos for the Sentry service, allow Hue to connect to the service by adding the hue user to the following property in the /etc/sentry/conf/sentry-store-site.xml file.
<property>
  <name>sentry.service.allow.connect</name>
  <value>impala,hive,solr,hue</value>
</property>

ZooKeeper Configuration

In the [zookeeper] section of the configuration file, you can specify the following:

host_ports

Comma-separated list of ZooKeeper servers in the format "host:port".

Example: localhost:2181,localhost:2182,localhost:2183

rest_url

The URL of the REST Contrib service (required for znode browsing).

Default: http://localhost:9998

Setting up REST Service for ZooKeeper

ZooKeeper Browser requires the ZooKeeper REST service to be running. Follow the instructions below to set this up.

Step 1: Git and build the ZooKeeper repository

git clone https://github.com/apache/zookeeper
cd zookeeper
ant
Buildfile: /home/hue/Development/zookeeper/build.xml

init:
[mkdir] Created dir: /home/hue/Development/zookeeper/build/classes
[mkdir] Created dir: /home/hue/Development/zookeeper/build/lib
[mkdir] Created dir: /home/hue/Development/zookeeper/build/package/lib
[mkdir] Created dir: /home/hue/Development/zookeeper/build/test/lib
…

Step 2: Start the REST service

cd src/contrib/rest
nohup ant run&

Step 3: Update ZooKeeper configuration properties (if required)

If ZooKeeper and the REST service are not on the same machine as Hue, update the Hue configuration file and specify the correct hostnames and ports as shown in the sample configuration below:

[zookeeper]

...

[[clusters]]


...


[[[default]]]
          # Zookeeper ensemble. Comma separated list of Host/Port.
          # e.g. localhost:2181,localhost:2182,localhost:2183
          ## host_ports=localhost:2181

          # The URL of the REST contrib service
          ## rest_url=http://localhost:9998

You should now be able to successfully run the ZooKeeper Browser app.

Configuring CDH Components for Hue

To enable communication between the Hue Server and CDH components, you must make minor changes to your CDH installation by adding the properties described in this section to your CDH configuration files in /etc/hadoop/conf/. If you are installing on a cluster, make the following configuration changes to your existing CDH installation on each node in your cluster.

WebHDFS or HttpFS Configuration

Hue can use either of the following to access HDFS data:

  • WebHDFS provides high-speed data transfer with good locality because clients talk directly to the DataNodes inside the Hadoop cluster.
  • HttpFS is a proxy service appropriate for integration with external systems that are not behind the cluster's firewall.

Both WebHDFS and HttpFS use the HTTP REST API so they are fully interoperable, but Hue must be configured to use one or the other. For HDFS HA deployments, you must use HttpFS.

To configure Hue to use either WebHDFS or HttpFS, do the following steps:

  1. For WebHDFS only:
    1. Add the following property in hdfs-site.xml to enable WebHDFS in the NameNode and DataNodes:
      <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
      </property>
    2. Restart your HDFS cluster.
  2. Configure Hue as a proxy user for all other users and groups, meaning it may submit a request on behalf of any other user:

    WebHDFS: Add to core-site.xml:

    <!-- Hue WebHDFS proxy user setting -->
    <property>
      <name>hadoop.proxyuser.hue.hosts</name>
      <value>*</value>
    </property>
    <property>
      <name>hadoop.proxyuser.hue.groups</name>
      <value>*</value>
    </property>

    HttpFS: Verify that /etc/hadoop-httpfs/conf/httpfs-site.xml has the following configuration:

    <!-- Hue HttpFS proxy user setting -->
    <property>
      <name>httpfs.proxyuser.hue.hosts</name>
      <value>*</value>
    </property>
    <property>
      <name>httpfs.proxyuser.hue.groups</name>
      <value>*</value>
    </property>
    If the configuration is not present, add it to /etc/hadoop-httpfs/conf/httpfs-site.xml and restart the HttpFS daemon.
  3. Verify that core-site.xml has the following configuration:
    <property>  
    <name>hadoop.proxyuser.httpfs.hosts</name>  
    <value>*</value>  
    </property>  
    <property>  
    <name>hadoop.proxyuser.httpfs.groups</name>  
    <value>*</value>  
    </property>  
    If the configuration is not present, add it to /etc/hadoop/conf/core-site.xml and restart Hadoop.
  4. With root privileges, update hadoop.hdfs_clusters.default.webhdfs_url in hue.ini to point to the address of either WebHDFS or HttpFS.
    [hadoop]
    [[hdfs_clusters]]
    [[[default]]]
    # Use WebHdfs/HttpFs as the communication mechanism.
    WebHDFS:
    ...
    webhdfs_url=http://FQDN:50070/webhdfs/v1/

    HttpFS:

    ...
    webhdfs_url=http://FQDN:14000/webhdfs/v1/

Oozie Configuration

To run DistCp, Streaming, Pig, Sqoop, and Hive jobs in Job Designer or the Oozie Editor/Dashboard application, see Installing the Oozie ShareLib in Hadoop HDFS for instructions.

To configure Hue as a default proxy user, add the following properties to /etc/oozie/conf/oozie-site.xml:
<!-- Default proxyuser configuration for Hue -->
<property>
    <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
    <value>*</value>
</property>
<property>
    <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
    <value>*</value>
</property>

Search Configuration

See Search Configuration for details on how to configure the Search application for Hue.

HBase Configuration

See HBase Configuration for details on how to configure the HBase Browser application.

Hive Configuration

The Beeswax daemon has been replaced by HiveServer2. Hue should therefore point to a running HiveServer2. This change involved the following major updates to the [beeswax] section of the Hue configuration file, hue.ini.

[beeswax]
  # Host where Hive server Thrift daemon is running.
  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  ## hive_server_host=<FQDN of HiveServer2>

  # Port where HiveServer2 Thrift server runs on.
  ## hive_server_port=10000

Existing Hive Installation

In the Hue configuration file hue.ini, modify hive_conf_dir to point to the directory containing hive-site.xml.

No Existing Hive Installation

Familiarize yourself with the configuration options in hive-site.xml. See Hive Installation. Having a hive-site.xml is optional but often useful, particularly on setting up a metastore. You can locate it using the hive_conf_dir configuration variable.

Permissions

See File System Permissions in the Hive Installation section.