Chapter 1. Security Introduction
Security is essential for organizations that store and process sensitive data in the Hadoop ecosystem. Many organizations must adhere to strict corporate security polices.
Hadoop is a distributed framework used for data storage and large-scale processing on clusters using commodity servers. Adding security to Hadoop is challenging because all the interactions do not follow the classic client-server pattern. In Hadoop the file system is partitioned and distributed, requiring authorization checks at multiple points; a submitted job is executed at a later time on nodes different than the node on which the client authenticated and submitted the job; secondary services such as a workflow system access Hadoop on behalf of users; and the system scales to thousands of servers and tens of thousands of concurrent tasks.
A Hadoop-powered "Data Lake" can provide a robust foundation for a new generation of Big Data analytics and insight, but can also increase the number of access points to an organization's data. As diverse types of enterprise data are pulled together into a central repository, the inherent security risks must be understood and addressed.
Hortonworks understands the importance of security and governance for every business. To ensure effective protection for our customers, we use a holistic approach based on five core security features:
Administration
Authentication and perimeter security
Authorization
Audit
Data protection
This chapter provides an overview of the security features implemented in the Hortonworks Data Platform (HDP). Subsequent chapters in this guide provide more details on each of these security features.