Schema Registry with Flink
When Kafka is chosen as source and sink for your application, you can use Cloudera Schema Registry to register and retrieve schema information of the different Kafka topics. You must add Schema Registry dependency to your project and add the appropriate schema object to your Kafka topics.
There are several reasons why you should prefer the Schema Registry instead of custom
serializer implementations on both consumer and producer side:
- Offers automatic and efficient serialization/deserialization for avro and basic types (+ JSON in the future)
- Guarantees that only compatible data can be written to a given topic (assuming that every producer uses the registry)
- Supports safe schema evolution on both producer and consumer side
- Offers visibility to developers on the data types and they can track schema evolution for the different Kafka topics
Add the following Maven dependency or equivalent to use the schema registry integration in your
project:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-avro-cloudera-registry</artifactId>
<version>${flink.version}</version>
</dependency>
The schema registry can be plugged directly into the
FlinkKafkaConsumer
and
FlinkKafkaProducer
using the appropriate schema:-
org.apache.flink.formats.avro.registry.cloudera.ClouderaRegistryKafkaDeserializationSchema
-
org.apache.flink.formats.avro.registry.cloudera.ClouderaRegistryKafkaSerializationSchema
See the Apache Flink documentation for Kafka consumer and producer basics.
Supported types
Currently, the following data types are supported for producers and consumers:
- Avro Specific Record types
- Avro Generic Records
- Basic Java Data types: byte[], Byte, Integer, Short, Double, Float, Long, String, Boolean
To get started with Avro schemas and generated Java objects, see the Apache Avro documentation.
Security
You need to include every SSL configuration into a
Map
that is passed to
the Schema Registry
configuration.Map<String, String> sslClientConfig = new HashMap<>();
sslClientConfig.put(K_TRUSTSTORE_PATH, params.get(K_SCHEMA_REG_SSL_CLIENT_KEY + "." + K_TRUSTSTORE_PATH));
sslClientConfig.put(K_TRUSTSTORE_PASSWORD, params.get(K_SCHEMA_REG_SSL_CLIENT_KEY + "." + K_TRUSTSTORE_PASSWORD));
Map<String, Object> schemaRegistryConf = new HashMap<>();
schemaRegistryConf.put(K_SCHEMA_REG_URL, params.get(K_SCHEMA_REG_URL));
schemaRegistryConf.put(K_SCHEMA_REG_SSL_CLIENT_KEY, sslClientConfig);
For Kerberos authentication, Flink can maintain the authentication and ticket renewal
automatically. You can define an additional
RegistryClient
property to the
security.kerberos.login.contexts
parameter in Cloudera
Manager.security.kerberos.login.contexts=Client,KafkaClient,RegistryClient