Using Search through a Proxy for High Availability
Using a proxy server to relay requests to and from the Apache Solr service can help meet availability requirements in production clusters serving many users.
A proxy server works a set of servers that is organized into a server group. A proxy server does not necessarily work with all servers in a deployment.
Overview of Proxy Usage and Load Balancing for Search
Configuring a proxy server to relay requests to and from the Solr service has the following advantages:
- Applications connect to a single well-known host and port, rather than keeping track of the hosts where the Solr service is running. This is especially useful for non-Java Solr clients such as web browsers or command-line tools such as curl.
- If any host running the Solr service becomes unavailable, application connection requests still succeed because you always connect to the proxy server rather than a specific host running the Solr server.
- Users can configure an SSL terminating proxy for Solr to secure the data exchanged with the external clients without requiring SSL configuration for the Solr cluster itself. This is relevant only if the Solr cluster is deployed on a trusted network and needs to communicate with clients that may not be on the same network. Many of the advantages of SSL offloading are described in SSL Offloading, Encryption, and Certificates with NGINX.
- The "coordinator host" for each Search query potentially requires more memory and CPU cycles than the other hosts that process the query. The proxy server can issue queries using round-robin scheduling, so that each connection uses a different coordinator host. This load-balancing technique lets the hosts running the Solr service share this additional work, rather than concentrating it on a single machine.
The following setup steps are a general outline that apply to any load-balancing proxy software.
- Download the load-balancing proxy software. It should only need to be installed and configured on a single host.
- Configure the software, typically by editing a configuration file. Set up a port on which the load balancer listens to relay Search requests back and forth.
- Specify the host and port settings for each Solr service host. These are the hosts that the load balancer chooses from when relaying each query. In most cases, use 8983, the default query and update port.
- Run the load-balancing proxy server, pointing it at the configuration file that you set up.
Special Proxy Considerations for Clusters Using Kerberos
In a cluster using Kerberos, applications check host credentials to verify that the host they are connecting to is the same one that is actually processing the request, to prevent man-in-the-middle attacks. To clarify that the load-balancing proxy server is legitimate, perform these extra Kerberos setup steps:
- This section assumes you are starting with a Kerberos-enabled cluster. For more information, see Enabling Kerberos Authentication for CDH.
- Choose the host you will use for the proxy server. Based on the Kerberos setup procedure, it should already have an entry solr/proxy_host@realm in its keytab:
- Cloudera Manager:
- Navigate to
- Set the value of Solr Load Balancer to <hostname>:<port>, specifying the hostname and port of the proxy host.
- Click Save Changes.
- Launch the Stale Configurations wizard to restart the Solr service and any dependent services.
Cloudera Manager transparently handles the keytab and dependent service updates.
- Cloudera Manager: