loading table of contents...
1.2.8.2. Hive LLAP - Overview

Shows the aggregated information across all of the clusters. For example, the total cache memory from all the nodes. This dashboard allows you to see that your cluster is configured and running correctly. For example, you might have configured 10 nodes but only see executors and cache accounted for 8 nodes running.

If you find an issue in this dashboard, open the LLAP Daemon dashboard to see which node is having the problem.

RowMetrics

Description

Overview

Total Executor Threads

Shows the total number of executors across all nodes.

Total Executor Memory

Shows the total amount of memory for executors across all nodes.
Total Cache MemoryShows the total amount of memory for cache across all nodes.
Total JVM MemoryShows the total amount of max Java Virtual Machine (JVM) memory across all nodes.

Cache Metrics Across all nodes

Total Cache Usage

Shows the total amount of cache usage (Total, Remaining, and Used) across all nodes.
Average Cache Hit RateAs the data is released from the cache, the curve should increase. For example, the first query should run at 0, the second at 80-90 seconds, and then the third 10% faster. If, instead, it decreases, there might be a problem in the cluster.
Average Cache Read RequestsShows how many requests are being made for the cache and how many queries you are able to run that make use of the cache. If it says 0, for example, your cache might not be working properly and this grid might reveal a configuration issue.

Cache Metrics Across all nodes

Total Cache Usage

Shows the total amount of cache usage (Total, Remaining, and Used) across all nodes.
Average Cache Hit RateAs the data is released from the cache, the curve should increase. For example, the first query should run at 0, the second at 80-90 seconds, and then the third 10% faster. If, instead, it decreases, there might be a problem in the cluster.
Average Cache Read RequestsShows how many requests are being made for the cache and how many queries you are able to run that make use of the cache. If it says 0, for example, your cache might not be working properly and this grid might reveal a configuration issue.

Executor Metrics Across All nodes

Total Executor Requests

Shows the total number of task requests that were handled, succeeded, failed, killed, evicted and rejected across all nodes.

Handled: Total requests across all sub-groups

Succeed: Total requests that were processed. For example, if you have 8 core machines, the number of total executor requests would be 8

Failed: Did not complete successfully because, for example, you ran out of memory

Rejected: If all task priorities are the same, but there are still not enough slots to fulfill the request, the system will reject some tasks

Evicted: Lower priority requests are evicted if the slots are filled by higher priority requests

Total Execution Slots

Shows the total execution slots, the number of free or available slots, and number of slots occupied in the wait queue across all nodes.

Ideally, the threads available (blue) result should be the same as the threads that are occupied in the queue result.

Time to Kill Pre-empted Task (300s interval)Shows the time that it took to kill a query due to pre-emption in percentile (50th, 90th, 99th) latencies in 300 second intervals.
Max Time To Kill Task (due to preemption)Shows the maximum time taken to kill a task due to pre-emption. This grid and the one above show you if you are wasting a lot of time killing queries. Time lost while a task is waiting to be killed is time lost in the cluster. If your max time to kill is high, you might want to disable this feature.
Pre-emption Time Lost (300s interval)Shows the time lost due to pre-emption in percentile (50th, 90th, 99th) latencies in 300 second intervals.
Max Time Lost In Cluster (due to pre-emption)Shows the maximum time lost due to pre-emption. If your max time to kill is high, you might want to disable this feature.
IO Elevator Metrics Across All NodesColumn Decoding Time (30s interval)

Shows the percentile (50th, 90th, 99th) latencies for time it takes to decode the column chunk (convert encoded column chunk to column vector batches for processing) in 30 second intervals.

The cache comes from IO Elevator. It loads data from HDFS to the cache, and then from the cache to the executor. This metric shows how well the threads are performing and is useful to see that the threads are running.

Max Column Decoding TimeShows the maximum time taken to decode column chunk (convert encoded column chunk to column vector batches for processing).
JVM Metrics across all nodesAverage JVM Heap Usage

Shows the average amount of Java Virtual Machine (JVM) heap memory used across all nodes.

If the heap usage keeps increasing, you might run out of memory and the task failure count would also increase.

Average JVM Non-Heap Usage

Shows the average amount of JVM non-heap memory used across all nodes.

Max GcTotalExtraSleepTimeShows the maximum garbage collection extra sleep time in milliseconds across all nodes. Garbage collection extra sleep time measures when the garbage collection monitoring is delayed (for example, the thread does not wake up after 500 milliseconds).
Max GcTimeMillisShows the total maximum GC time in milliseconds across all nodes.
Total JVM ThreadsShows the total number of JVM threads that are in a NEW, RUNNABLE, WAITING, TIMED_WAITING, and TERMINATED state across all nodes.
JVM MetricsTotal JVM Heap Used

Shows the total amount of Java Virtual Machine (JVM) heap memory used in the daemon.

If the heap usage keeps increasing, you might run out of memory and the task failure count would also increase.

Total JVM Non-Heap Used

Shows the total amount of JVM non-heap memory used in the LLAP daemon.

If the non-heap memory is over-allocated, you might run out of memory and the task failure count would also increase.

Max GcTotalExtraSleepTimeShows the maximum garbage collection extra sleep time in milliseconds in the LLAP daemon. Garbage collection extra sleep time measures when the garbage collection monitoring is delayed (for example, the thread does not wake up after 500 milliseconds).
Max GcTimeMillisShows the total maximum GC time in milliseconds in the LLAP daemon.
Max JVM Threads RunnableShows the maximum number of Java Virtual Machine (JVM) threads that are in RUNNABLE state.
Max JVM Threads BlockedShows the maximum number of JVM threads that are in BLOCKED state. If you are seeing spikes in the threads blocked, you might have a problem with your LLAP daemon.
Max JVM Threads WaitingShows the maximum number of JVM threads that are in WAITING state.
Max JVM Threads Timed WaitingShows the maximum number of JVM threads that are in TIMED_WAITING state.