Loading CSV Data into an Impala Table

For this demonstration, we will be using the tips.csv dataset. Use the following steps to save this file to a project in Cloudera Machine Learning, and then load it into a table in Apache Impala.
  1. Create a new Cloudera Machine Learning project.
  2. Create a folder called data and upload tips.csv to this folder. For detailed instructions, see Managing Project Files .
  3. The next steps require access to services on the CDH cluster. If Kerberos has been enabled on the cluster, enter your credentials (username, password/keytab) in Cloudera Machine Learning to enable access.
  4. Navigate back to the project Overview page and click Open Workbench.
  5. Launch a new session (Python or R).
  6. Open the Terminal.
    1. Run the following command to create an empty table in Impala called tips. Replace <impala_daemon_hostname> with the hostname for your Impala daemon.
      impala-shell -i <impala_daemon_hostname>:21000 -q '
        CREATE TABLE default.tips (
          `total_bill` FLOAT,
          `tip` FLOAT,
          `sex` STRING,
          `smoker` STRING,
          `day` STRING,
          `time` STRING,
          `size` TINYINT)
        LOCATION "hdfs:///user/hive/warehouse/tips/";'
    2. Run the following command to load data from the /data/tips.csv file into the Impala table.
      hdfs dfs -put data/tips.csv /user/hive/warehouse/tips/