Remote Querying [Technical Preview]

Remote Querying in Streams Replication Manager (SRM) refers to the SRM Service's capability of querying other, remote SRM Services to fetch the remote cluster replication metrics. This allows users to monitor all replications of a deployment that has multiple instances of SRM through a single SRM Service.

Overview

The SRM Service role gathers, aggregates, and exposes metrics related to cluster replications. While a single cluster of SRM Service roles (SRM Service cluster) can be configured to target and gather metrics from multiple clusters, a setup like this can result in heavily loaded Service roles, which might not be suitable for your deployment. Instead, you can choose to have a single SRM Service cluster connect to other, remote SRM Service clusters and fetch metrics from them. This is called Remote Querying.

Using Remote Querying makes it possible to designate an SRM Service cluster in your deployment to act as a monitoring gateway. The designated SRM Service cluster can then be used to monitor all clusters and replications in your deployment. This way, a single SRM Service cluster can provide you with information on all clusters and replications. In addition, if you have Streams Messaging Manager (SMM) integrated with the SRM Service cluster acting as the gateway, information regarding all replications will be available in that SMM instance's UI.

How it works

Remote SRM Service clusters are discovered through Kafka. SRM Service clusters advertise themselves through their target Kafka cluster by writing data into a heartbeats topic. The information advertised is the Service role's protocol, host, port and root API path. When Remote Querying is configured for a specific SRM Service cluster, that SRM Service cluster connects to the specified external Kafka clusters, consumes the heartbeats topics, and based on the advertised information, discovers the remote SRM Service clusters.

Following discovery, an SRM Service cluster can cooperate with its remote counterparts and fetch the metrics related to remote replications. These metrics can then be queried using the SRM REST API, or viewed on the SMM UI.

Data locality

When the feature is enabled, all metrics are still processed locally. Each SRM Service cluster processes the metrics of its target Kafka cluster only. The SRM Service cluster configured to be the gateway does not take over and process the metrics of the remote SRM Service clusters. It only communicates with the remote SRM Service clusters to fetch and then serve their metrics.

However, because metrics processing remains local, when you enable the feature, additional traffic is generated between the gateway and remote SRM Service clusters. It is important that you take this into consideration especially if one or more of your SRM installations are located in a Public Cloud environment. For more information on the amount of data generated, see SRM Service data traffic reference.

Remote Querying example

Consider the following deployment:

There are three clusters, cluster A, B, and C. All clusters have Kafka and SRM deployed on them. Additionally, Cluster A has SMM installed as well. Bidirectional replication is happening between Cluster A and Cluster B. Additionally, unidirectional replication is set up from Cluster A to Cluster C. Each SRM Service cluster is targeting its co-located Kafka.

In this scenario, Remote Querying is configured and enabled for the SRM Service cluster A. This enables you to monitor all replications in the deployment using SRM Service cluster A. Additionally, information regarding all replications can be viewed in SMM deployed in Cluster A.

Without Remote Querying, SMM would only be able to display replications that are targeting Cluster A. If you wanted to monitor any other replications, you would need to manually query each SRM Service cluster separately using the REST API, or set up separate instances of SMM on each of the clusters.