Encrypting and Decrypting CSV Workload Files

Encrypting and Decrypting CSV Files without Extra Columns

Before encrypting CSV workload files, identify which column contains the query text. From the left-most column, start counting from 1. For example, in the following sample CSV file, the query text is in the third column from the left, so it occupies position 3. You must specify which column contains the query text in the command you use to encrypt the workload file in step 2 below.

Sample CSV File

"SQL_ID","CPU_TIME","SQL_FULLTEXT","USER","APP","REPORT"
43958,100,"select emps.id from emps where emps.name = 'Joe' group by emps.mgr, emps.id;","Alice Johnson","Finance","Manager Report"
235,900,"select emps.name from emps where emps.num = 007 group by emps.state, emps.name;","Bill Huntington","HR","Employee Locations"
abc1,40,"select Part.partkey, Part.name, Part.type from Part where Part.yyprice > 2095;","Carrol Robinson","Operations","Inventory"
f58,440,"select Part.partkey, Part.name, Part.mfgr FROM Part WHERE Part.name LIKE '%red';","David Stanley","Operations","Inventory"
345,67000,"select count(*) as loans from account a where a.account_state_id in (5,9);","Anne McCaffrey","Finance","Liability Report"
2341,999,"select orders.key, orders.id from orders where orders.price < 9999;","David Stanley","Sales","Revenue Report"
5ef3,678,"select mgr.name from mgr where mgr.reports > 10 group by mgr.state;","David Stanley","HR","Manager Report"
7676,34,"select vp.salary from vp where vp.grade = 'Alpha' group by vp.state;","Alice Johnson","Finance","Revenue Report"
346,67053,"select count(*) as loans from account a where a.account_state_id in (5,9);","Alice Johnson","Finance","Liability Report"

For information about formatting CSV files, see Formatting CSV Files. For a full list of command options, see Anonymizer CLI Reference.

Encrypt CSV Workload Files

  1. Determine which column contains the query text.
  2. Open a terminal window, go to the directory where the Anonymizer JAR file is located, and run the following command:
    java -jar cloudera_anonymizer-<x.x>.jar anonymize csv <path_to_workload_file> -q <column_position> 
    
    Option Description/Effect
    -jar Path to the Anonymizer JAR file
    anonymize Encrypt the file
    csv <path_to_workload_file> Path to the CSV file that contains the workload
    -q <column_position> Position of the column that contains queries in the CSV file

    Anonymizer does not encrypt the first row of CSV files.To encrypt the first row, use the -h option. See the Anonymizer CLI Reference for more command-line options.

  3. Add the following when prompted by Anonymizer:
    • Location and name of the encrypted output file.
    • Location and name of the .passkey file.
    • Location and name of the error output file.

    For each file, type the path and file name and press Enter. If you press Enter without specifying path and filename, Anonymizer assigns the following names to the files:

    Input File Name Encrypted Output File Name Key File Name Error File Name
    xyz.csv anonymized_xyz.csv xyz.csv.passkey xyz.csv.err
  4. Type the name of the vendor of your database on which the queries are used and press Enter.
  5. Anonymizer prompts you to enter a strong password. Type your password and press Enter. If the password is not strong enough, Anonymizer displays the password requirements in the terminal window.
  6. Anonymizer processes the CSV file and, when finished, writes a success message to the terminal window.

To use the encrypted workload with Navigator Optimizer, see Using Anonymizer with Navigator Optimizer.

To Decrypt CSV Workload Files

  1. Open a terminal window, go to the directory where the Anonymizer JAR file is located, and run the following command:
    java -jar cloudera_anonymizer-<x.x>.jar deanonymize csv <path_to_encrypted_workload_file> -q <column_position> -k <path_to_passkey_file>
             
    Option Description/Effect
    -jar Path to the Anonymizer JAR file
    deanonymize Decrypt the file
    csv <path_to_encrypted_workload_file> Path to the encrypted CSV workload file
    -q <column_position> Position of the column that contains queries in the CSV file
    -k <path_to_passkey_file> Path to the .passkey file created when you originally encrypted the CSV workload

    See the Anonymizer CLI Reference for more command-line options.

  2. Anonymizer prompts you for the following information:
    • The location and filename of the decrypted output file. Type the path and filename and press Enter. If you press Enter without specifying the path and filename, Anonymizer assigns a default name to the file. For example, if your encrypted filename is anonymized_xyz.csv, Anonymizer names the decrypted file deanonymized_anonymized_xyz.csv.
    • The vendor of your database where the queries are used. Type the name and press Enter.
  3. Anonymizer prompts you to enter the password. Type the password that you entered when the file was encrypted, and press Enter.
  4. Anonymizer processes the encrypted file and, when finished, writes the locations of the decrypted file, the key file, and the error output file to the terminal window.

Encrypting and Decrypting Extra Columns in CSV Workload Files Used for Workload Slice and Dice

You can encrypt the contents of any extra columns you include in your CSV workload files for Workload Slice and Dice . You use the -z command-line option to encrypt and decrypt the attributes in the extra columns.

Before you encrypt your workload file, identify the position of the column that contains the query text and the columns containing attributes used for Workload Slice and Dice. To identify columns, start counting at 1 from the left-most column. For example, in the following sample CSV file, the query text is in the third column from the left, so it occupies position 3. If you are using the USER, APP, and REPORT columns for Workload Slice and Dice, these are the fourth, fifth, and sixth columns from the left, so they occupy positions 4, 5, and 6. You must specify these column positions in the command you use to encrypt the workload file, which is described in step 2 below.

"SQL_ID","CPU_TIME","SQL_FULLTEXT","USER","APP","REPORT"
43958,100,"select emps.id from emps where emps.name = 'Joe' group by emps.mgr, emps.id;","Alice Johnson","Finance","Manager Report"
235,900,"select emps.name from emps where emps.num = 007 group by emps.state, emps.name;","Bill Huntington","HR","Employee Locations"
abc1,40,"select Part.partkey, Part.name, Part.type from Part where Part.yyprice > 2095;","Carrol Robinson","Operations","Inventory"
f58,440,"select Part.partkey, Part.name, Part.mfgr FROM Part WHERE Part.name LIKE '%red';","David Stanley","Operations","Inventory"
345,67000,"select count(*) as loans from account a where a.account_state_id in (5,9);","Anne McCaffrey","Finance","Liability Report"
2341,999,"select orders.key, orders.id from orders where orders.price < 9999;","David Stanley","Sales","Revenue Report"
5ef3,678,"select mgr.name from mgr where mgr.reports > 10 group by mgr.state;","David Stanley","HR","Manager Report"
7676,34,"select vp.salary from vp where vp.grade = 'Alpha' group by vp.state;","Alice Johnson","Finance","Revenue Report"
346,67053,"select count(*) as loans from account a where a.account_state_id in (5,9);","Alice Johnson","Finance","Liability Report"

For information about formatting CSV files, see Formatting CSV Files. For a full list of command options, see Anonymizer CLI Reference.

Encrypt CSV Workload Files with Extra Columns for Workload Slice and Dice

  1. Determine which columns contain the query text and the extra attributes.
  2. Open a terminal window, go to the directory where the Anonymizer JAR file is located, and run the following command:
    java -jar cloudera_anonymizer-<x.x>.jar anonymize csv <path_to_workload_file> -q <column_position> -z <attribute_column_positions>
    
    Option Description/Effect
    -jar Path to the Anonymizer JAR file
    anonymize Encrypt the file
    csv <path_to_encrypted_workload_file> Path to the CSV file containing the workload
    -q <column_position> Position of the column that contains queries in the CSV file
    -z <attribute_column_positions Columns that contain attributes for Workload Slice and Dice, separated by commas and no spaces

    Anonymizer does not encrypt the first row of CSV files. To encrypt the first row, use the -h option. See the Anonymizer CLI Reference for more command-line options.

  3. Anonymizer prompts you for the following information:
    • Location and file name of the encrypted output file.
    • Location and file name of the .passkey file.
    • Location and file name of the error output file.

    For each file, type the path and file name and press Enter. If you press Enter without specifying a path and filename, Anonymizer assigns the following names to the files:

    Input File Name Encrypted Output File Name Key File Name Error File Name
    xyz.csv anonymized_xyz.csv xyz.csv.passkey xyz.csv.err
  4. Type the name of the vendor of your database on which the queries are used and press Enter.
  5. Anonymizer prompts you to enter a strong password. Type your password and press Enter. If the password is not strong enough, Anonymizer displays the password requirements in the terminal window.
  6. Anonymizer processes the CSV file and, when finished, writes a success message to the terminal window.

To use the encrypted workload with Navigator Optimizer, see Using Anonymizer with Navigator Optimizer.

Decrypt CSV Workload Files that Contain Extra Columns for Workload Slice and Dice

  1. Open a terminal window, go to the directory where the Anonymizer JAR file is located, and run the following command:
    java -jar cloudera_anonymizer-<x.x>.jar deanonymize csv <path_to_encrypted_workload_file> -q <column_position> -z <attribute_column_positions> -k <path_to_passkey_file> 
             
    Option Description/Effect
    -jar Path to the Anonymizer JAR file
    deanonymize Decrypt the file
    csv <path_to_encrypted_workload_file> Path to the encrypted CSV workload file
    -q <column_position> Position of the column that contains queries in the CSV file
    -z <attribute_column_positions Columns that contain attributes for Workload Slice and Dice, separated by commas and no spaces
    -k <path_to_passkey_file> Path to the .passkey file created when you originally encrypted the CSV workload

    See the Anonymizer CLI Reference for more command-line options.

  2. Anonymizer prompts you for the following information:
    • The location and file name of the decrypted output file. Type the path and file name and press Enter. If you press Enter only, Anonymizer assigns a default name to the file. For example, if your encrypted file name is anonymized_xyz.csv, Anonymizer names the decrypted file deanonymized_anonymized_xyz.csv.
    • Identify the vendor of your database where the queries are used. Type the name and press Enter.
  3. The tool prompts you to enter the password. Type the password that you originally entered when the file was encrypted, and press Enter.
  4. The tool processes the encrypted file and when finished writes a success message to the terminal window.