Tuning the metastore

Similar to other tuning procedures, general metastore tuning involves tweaking and testing until you discover the combination of changes that improves metastore performance. Tuning suggestions include hardware and software changes.

Generally, you need to limit concurrent connections to Hive metastore. As the number of open connections increases, so does latency. Issues with the backend database, improper Hive use, such as extremely complex queries, a connection leak, and other factors can affect performance.

General Metastore Tuning

Try making the following changes to tune HMS performance:

  • Ensure that a single query accesses no more than 10,000 table partitions. If the query joins tables, calculate the combined partition count accessed across all tables.
  • Tune the backend (the RDBMS). HiveServer connects to HMS, and only HMS connects to the RDBMS. The longer the backend takes, the more memory the HMS needs to respond to the same requests. Limit the number of connections in the backend database.

    MySQL: For example, in /etc/my.cnf:

            . . .

    MariaDB: For example, in /etc/systemd/system/mariadb.service.d/limits.conf:

            . . .
  • Use default thrift properties (8K):
    hive.server2.async.exec.threads 8192
    hive.server2.async.exec.wait.queue.size 8192
    hive.server2.thrift.max.worker.threads 8192
  • Set datanucleus.connectionPool.maxPoolSize for your applications. For example, if poolSize = 100, with 3 HMS instances (one dedicated to compaction), and with 4 pools per server, you can accommodate 1200 connections.