Integrating Apache Hive with Apache Spark and BIPDF version

Catalog operations

Short descriptions and the syntax of catalog operations, which include creating, dropping, and describing an Apache Hive database and table from Apache Spark, helps you write HWC API apps.

Three methods of executing catalog operations are supported: .sql (recommended), .execute() ( spark.datasource.hive.warehouse.read.jdbc.mode = client), or .executeQuery() for backward compatibility in LLAP mode.

  • Set the current database for unqualified Hive table references

    hive.setDatabase(<database>)

  • Execute a catalog operation and return a DataFrame

    hive.execute("describe extended web_sales").show()

  • Show databases

    hive.showDatabases().show(100)

  • Show tables for the current database

    hive.showTables().show(100)

  • Describe a table

    hive.describeTable(<table_name>).show(100)

  • Create a database

    hive.createDatabase(<database_name>,<ifNotExists>)

  • Create an ORC table

    hive.createTable("web_sales").ifNotExists().column("sold_time_sk", "bigint").column("ws_ship_date_sk", "bigint").create()

    See the CreateTableBuilder interface section below for additional table creation options. You can also create Hive tables using hive.executeUpdate.

  • Drop a database

    hive.dropDatabase(<databaseName>, <ifExists>, <useCascade>)

  • Drop a table

    hive.dropTable(<tableName>, <ifExists>, <usePurge>)