Also available as:

Chapter 4. Schema Registry Overview

The Hortonworks DataFlow Platform (HDF) provides flow management, stream processing, and enterprise services for collecting, curating, analyzing and acting on data in motion across on-premise data centers and cloud environments.

As the diagram below instructions, Hortonworks Schema Registry is part of the enterprise services that powers the HDF platform.

Schema Registry provides a shared repository of schemas that allows applications and HDF components HDF (NiFi, Storm, Kafka, Streaming Analytics Manager, and similar) to flexibly interact with each other.

Applications built using HDF often need a way to share metadata across 3 dimensions:

  • Data format

  • Schema

  • Semantics or meaning of the data

The Schema Registry design principle is to provide a way to tackle the challenges of managing and sharing schemas between the components of HDF and in such a way that the schemas are designed to support evolution such that a consumer and producer can understand different versions of those schemas but still read all information shared between both versions and safely ignore the rest.

Hence, the value that Schema Registry provides for HDF and the applications that integrate with it are the following:

  • Centralized registry – Provide reusable schema to avoid attaching schema to every piece of data

  • Version management – Define relationship between schema versions so that consumers and producers can evolve at different rates

  • Schema validation – Enable generic format conversion, generic routing and data quality

Figure 4.1. Schema Registry Usage in Flow Management