Navigator Optimizer Workload Anonymizer
Anonymizer is a command-line tool designed to help satisfy data-compliance requirements when you use cloud-based services like Cloudera Navigator Optimizer. Anonymizer uses AES-256 encryption to protect sensitive information in SQL workloads you analyze. Run Anonymizer on SQL files (.csv or semicolon-separated .sql files) to:
- Mask all literals (also known as constant values or fixed data values) in SQL queries.
- Encrypt table and column names.
For more information about Anonymizer encryption, see How secure is Anonymizer encryption?
Anonymizer performs two types of anonymization:
If you do not need to mask literals or encrypt your database schema information, you can configure Anonymizer to skip either of these anonymization operations. See the CLI Reference for more information about tool options.
Masking Literals in Workloads
Security standards such as PCI-DSS (Payment Card Industry Data Security Standard) and laws such as HIPAA mandate that PII (Personally Identifiable Information) and PHI (Protected Health Information) are treated with the highest level of confidentiality. PII/PHI is defined as information that can unambiguously identify an individual. In the case of HIPAA, for example, this could be names, addresses, or social security numbers. This kind of information appears infrequently in SQL queries.
The second-highest level of confidentiality protects a limited data set. This information, such as birthday, geographical region, or hospital admission dates, could be used to identify an individual within a population. This information is sometimes contained within the literals of SQL queries. Anonymizer, which is run on the client side, irreversibly masks literals in SQL queries before sending them to the Navigator Optimizer cloud service.
A healthcare organization wants to make sure all PHI data are scrubbed from the SQL queries before being sent to the Navigator Optimizer cloud service. To accomplish this, they use Anonymizer to mask all literals from their workloads before uploading them to Navigator Optimizer. All sensitive data is masked in the workload files before uploading.
Encrypting Database Schema Information
Database schemas can contain sensitive information that might be considered IP (intellectual property). For example, database schemas can reveal how a company organizes its data, and column and table names can reveal what data a company collects. Before query workloads are sent to the Navigator Optimizer cloud service, Anonymizer encrypts all column names and table names in the query workloads on the client side and secures them with a password known only to the Anonymizer user. The encrypted queries retain their structure and retain SQL keywords in plain text, but all SQL identifiers, such as names of tables, columns, databases, views, and aliases, are encrypted using AES-256 encryption standards. Navigator Optimizer cloud service provides workload analysis based on query patterns only. All private or sensitive information is encrypted and protected by a password. Schema change recommendations are computed on encrypted table and column names. You view recommendations by supplying the password, which is never sent to the Navigator Optimizer cloud service. Decryption happens securely on the client side in the browser.
A company uses a secret proprietary algorithm to make decisions about its customers based on public Facebook profiles. A column name titled "facebook-id" would reveal that the company is looking at Facebook information. Anonymizer ensures that this information remains confidential by encrypting all column and table names on the client side before uploading queries to the Navigator Optimizer cloud service.
Currently, Anonymizer has the following limitations:
Impala SQL and HiveQL function names cannot be processed on anonymized (encrypted) workload files.
Anonymizer encrypts Impala SQL and HiveQL function names so the Navigator Optimizer analytic processor cannot recognize them. Consequently, Navigator Optimizer cannot include the effects of functions in its analysis of Impala and Hive workloads that are anonymized.
- The following does not work on encrypted workloads uploaded to Navigator Optimizer:
- Search. However, you can search on encrypted names.
- Compatibility highlighting.
- PDF reports, which are generated with encrypted names.
- DDL downloads, which contain encrypted names.