Audit
As customers deploy Hadoop into corporate data and processing environments, metadata and data governance must be vital parts of any enterprise-ready data lake. For these reasons, Hortonworks established the Data Governance Initiative (DGI) with Aetna, Merck, Target, and SAS to introduce a common approach to Hadoop data governance into the open source community. This initiative has since evolved into a new open source project called Apache Atlas. Apache Atlas is a set of core foundational governance services that enables enterprises to effectively and efficiently meet their compliance requirements within Hadoop, and also allows integration with the complete enterprise data ecosystem. These services include:
Search and Lineage for datasets
Metadata-driven data access control
Indexed and searchable centralized auditing operational events
Data lifecycle management – ingestion to disposition
Metadata interchange with other tools
Ranger also provides a centralized framework for collecting access audit history and easily reporting this data, including the ability to filter data based on various parameters. HDP enhances audit information that is captured within various components within Hadoop, and provides insights through this centralized reporting capability.