Loading CSV Data into an Impala Table

For this demonstration, we will be using the tips.csv dataset. Use the following steps to save this file to a project in Cloudera Machine Learning, and then load it into a table in Apache Impala.

Create a new Cloudera Machine Learning project.
Create a folder called data and upload tips.csv to this folder. For detailed instructions, see Managing Project Files .
The next steps require access to services on the CDH cluster. If Kerberos has been enabled on the cluster, enter your credentials (username, password/keytab) in Cloudera Machine Learning to enable access.
Navigate back to the project Overview page and click Open Workbench.
Launch a new session (Python or R).

Open the Terminal.

Run the following command to create an empty table in Impala called tips. Replace <impala_daemon_hostname> with the hostname for your Impala daemon.

impala-shell -i <impala_daemon_hostname>:21000 -q '
  CREATE TABLE default.tips (
    `total_bill` FLOAT,
    `tip` FLOAT,
    `sex` STRING,
    `smoker` STRING,
    `day` STRING,
    `time` STRING,
    `size` TINYINT)
  ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
  LOCATION "hdfs:///user/hive/warehouse/tips/";'

Run the following command to load data from the /data/tips.csv file into the Impala table.
```
hdfs dfs -put data/tips.csv /user/hive/warehouse/tips/
```

Loading CSV Data into an Impala Table

We want your opinion

How can we improve this page?