Tuning the metastore
Generally, you need to limit concurrent connections to Hive metastore. As the number of open connections increases, so does latency. Issues with the backend database, improper Hive use, such as extremely complex queries, a connection leak, and other factors can affect performance.
General Metastore Tuning in CDP Data Center
Try making the following changes to tune HMS performance:
- Buy an SSD for one or more Hive metastores.
- Cloudera recommends that a single query access no more than 10,000 table partitions. If the query joins tables, calculate the combined partition count accessed across all tables.
- Tune the backend (the RDBMS). HiveServer connects to HMS, and only HMS connects to the
RDBMS. The longer the backend takes, the more memory the HMS needs to respond to the same
requests. Limit the number of connections in the backend database.
MySQL: For example, in /etc/my.cnf:
[mysqld] datadir=/var/lib/mysql max_connections=8192 . . .
MariaDB: For example, in /etc/systemd/system/mariadb.service.d/limits.conf:
[Service] LimitNOFILE=24000 . . .
- Use default thrift properties
hive.server2.async.exec.threads 8192 hive.server2.async.exec.wait.queue.size 8192 hive.server2.thrift.max.worker.threads 8192
datanucleus.connectionPool.maxPoolSizefor your applications. For example, if poolSize = 100, with 3 HMS instances (one dedicated to compaction), and with 4 pools per server, you can accommodate 1200 connections.