Tuning the metastore
Generally, you need to limit concurrent connections to Hive metastore. As the number of open connections increases, so does latency. Issues with the backend database, improper Hive use, such as extremely complex queries, a connection leak, and other factors can affect performance.
General Metastore Tuning in CDP Private Cloud Base
Try making the following changes to tune HMS performance:
- Ensure that a single query accesses no more than 10,000 table partitions. If the query joins tables, calculate the combined partition count accessed across all tables.
- Tune the backend (the RDBMS). HiveServer connects to HMS, and only HMS connects to the
RDBMS. The longer the backend takes, the more memory the HMS needs to respond to the same
requests. Limit the number of connections in the backend database.
MySQL: For example, in /etc/my.cnf:
[mysqld] datadir=/var/lib/mysql max_connections=8192 . . .
MariaDB: For example, in /etc/systemd/system/mariadb.service.d/limits.conf:
[Service] LimitNOFILE=24000 . . .
- Use default thrift properties
(8K):
hive.server2.async.exec.threads 8192 hive.server2.async.exec.wait.queue.size 8192 hive.server2.thrift.max.worker.threads 8192
- Set
datanucleus.connectionPool.maxPoolSize
for your applications. For example, if poolSize = 100, with 3 HMS instances (one dedicated to compaction), and with 4 pools per server, you can accommodate 1200 connections.