Collecting metrics through HTTP

Metrics can be collected from a server process via its HTTP interface by visiting /metrics. The output of this page is JSON for easy parsing by monitoring services. This endpoint accepts several GET parameters in its query string:

  • /metrics?metrics=<substring1>,<substring2>,…​ - Limits the returned metrics to those which contain at least one of the provided substrings. The substrings also match entity names, so this may be used to collect metrics for a specific tablet.

  • /metrics?include_schema=1 - Includes metrics schema information such as unit, description, and label in the JSON output. This information is typically omitted to save space.

  • /metrics?compact=1 - Eliminates unnecessary whitespace from the resulting JSON, which can decrease bandwidth when fetching this page from a remote host.

  • /metrics?include_raw_histograms=1 - Include the raw buckets and values for histogram metrics, enabling accurate aggregation of percentile metrics over time and across hosts.

  • /metrics?level=info - Limits the returned metrics based on their severity level. The levels are ordered and lower levels include the levels above them. If no level is specified, debug is used to include all metrics. The valid values are:
    • debug - Metrics that are diagnostically helpful but generally not monitored during normal operation.
    • info - Generally useful metrics that operators always want to have available but may not be monitored under normal circumstances.
    • warn - Metrics which can often indicate operational oddities, which may need more investigation.

For example:

$ curl -s 'http://example-ts:8050/metrics?include_schema=1&metrics=connections_accepted'
[
    {
        "type": "server",
        "id": "kudu.tabletserver",
        "attributes": {},
        "metrics": [
            {
                "name": "rpc_connections_accepted",
                "label": "RPC Connections Accepted",
                "type": "counter",
                "unit": "connections",
                "description": "Number of incoming TCP connections made to the RPC server",
                "value": 92
            }
        ]
    }
]
$ curl -s 'http://example-ts:8050/metrics?metrics=log_append_latency'
[
    {
        "type": "tablet",
        "id": "c0ebf9fef1b847e2a83c7bd35c2056b1",
        "attributes": {
            "table_name": "lineitem",
            "partition": "hash buckets: (55), range: [(<start>), (<end>))",
            "table_id": ""
        },
        "metrics": [
            {
                "name": "log_append_latency",
                "total_count": 7498,
                "min": 4,
                "mean": 69.3649,
                "percentile_75": 29,
                "percentile_95": 38,
                "percentile_99": 45,
                "percentile_99_9": 95,
                "percentile_99_99": 167,
                "max": 367244,
                "total_sum": 520098
            }
        ]
    }
]