Using the Hive Schema Tool in CDH

Use the Hive command-line schematool to upgrade or validate the Hive metastore database schema for unmanaged clusters.

Schema Version Verification and Validation

Hive records the schema version in the metastore database and verifies that the metastore schema version is compatible with the Hive binaries that are going to access the metastore. The Hive configuration properties that implicitly create or alter the existing schema are disabled by default. Consequently, Hive does not attempt to change the metastore schema implicitly. When you execute a Hive query against a metastore where the schema is not initialized or the schema is old, it fails to access the metastore and an entry similar to the following example appears in the error log:

...
Caused by: MetaException(message:Version information not found in metastore. )
        at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:5638)
...

Use Hive schematool to repair the condition that causes this error by either initializing the schema or upgrading it.

Using schematool

Use the Hive schematool to initialize the metastore schema for the current Hive version or to upgrade the schema from an older version. The tool tries to find the current schema from the metastore if it is available there.

The schematool determines the SQL scripts that are required to initialize or upgrade the schema and then executes those scripts against the metastore database. The metastore database connection information such as JDBC URL, JDBC driver, and database credentials are extracted from the Hive configuration. You can provide alternate database credentials if needed.

The following options are available as part of the schematool package:
$ schematool -help
usage: schemaTool
 -dbType <databaseType>             Metastore database type                 
 -dryRun                                  List SQL scripts (no execute)                                       
 -help                                    Print this message
 -info                                    Show config and schema details
 -initSchema                              Schema initialization
 -initSchemaTo <initTo>             Schema initialization to a version
 -passWord <password>               Override config file password
 -upgradeSchema                           Schema upgrade
 -upgradeSchemaFrom <upgradeFrom>   Schema upgrade from a version
 -userName <user>                   Override config file user name
 -validate                                Validate the database
 -verbose                                 Only print SQL statements
The dbType option must always be specified and can be one of the following:
derby|mysql|postgres|oracle

Prerequisite Configuration

Before you can use the schematool, you must add the following properties to the /etc/hive/conf/hive-site.xml file:
  • javax.jdo.option.ConnectionURL
  • javax.jdo.option.ConnectionDriverName
For example, the following hive-site.xml entries are made if you are using a MySQL database as your Hive metastore and hive1 is the database user name:
<property>
   <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:mysql://my_cluster.com:3306/hive1?useUnicode=true&amp;characterEncoding=UTF-8</value>
</property>
<property>
   <name>javax.jdo.option.ConnectionDriverName</name>
   <value>com.mysql.jdbc.Driver</value>
</property>

Usage Examples

To use the schematool command-line tool, navigate to the directory where it is located:

  • If you installed CDH using parcels, schematool is usually located at:

    /opt/cloudera/parcels/CDH/lib/hive/bin/schematool
  • If you installed CDH using packages, schematool is usually located at:

    /usr/lib/hive/bin/schematool

After you locate the executable, you can use schematool to perform the following actions:

  • Initialize your metastore to the current schema for a new Hive setup using the initSchema option.

    $ schematool -dbType mysql -initSchema -passWord <db_user_pswd> -userName
      <db_user_name>
    Metastore connection URL:
    jdbc:mysql://<cluster_address>:3306/<user_name>?useUnicode=true&characterEncoding=UTF-8
    Metastore Connection Driver : com.mysql.jdbc.Driver
    Metastore connection User: <user_name>
    Starting metastore schema initialization to <new_version>
    Initialization script hive-schema-<new_version_number>.mysql.sql
    Initialization script completed
    schemaTool completed
    
  • Get schema information using the info option.

    $ schematool -dbType mysql -info -passWord <db_user_pswd> -userName
      <db_user_name>
    Metastore connection URL:
    jdbc:mysql://<cluster_address>:3306/<user_name>?useUnicode=true&characterEncoding=UTF-8
    Metastore Connection Driver : com.mysql.jdbc.Driver
    Metastore connection User: <user_name>
    Hive distribution version:       <new_version>
    Required schema version:         <new_version>
    Metastore schema version:        <new_version>
    schemaTool completed
    
  • If you attempt to get schema information from older metastores that did not store version information or if the schema is not initialized, the tool reports an error as follows.

    $ schematool -dbType mysql -info -passWord <db_user_pswd> -userName
      <db_user_name>
    Metastore connection URL:
    jdbc:mysql://<cluster_address>:3306/<user_name>?useUnicode=true&characterEncoding=UTF-8
    Metastore Connection Driver : com.mysql.jdbc.Driver
    Metastore connection User: <user_name>
    Hive distribution version:       <new_version>
    Required schema version:         <new_version>
    org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version,
    Cause:<cause_description>
    *** schemaTool failed ***
    
  • You can upgrade the schema from a specific release by specifying the -upgradeSchemaFrom option. The -upgradeSchemaFrom option requires the Hive version and not the CDH version. See CDH 5 Packaging and Tarball Information for information about which Hive version ships with each CDH release. The following example shows how to upgrade from CDH 5.2/Hive 0.13.1:

    $ schematool -dbType mysql -passWord <db_user_pswd> -upgradeSchemaFrom
      0.13.1 -userName <db_user_name>
    Metastore connection URL:
    jdbc:mysql://<cluster_address>:3306/<user_name>?useUnicode=true&characterEncoding=UTF-8
    Metastore Connection Driver : com.mysql.jdbc.Driver
    Metastore connection User: <user_name>
    Starting upgrade metastore schema from version 0.13.1 to <new_version>
    Upgrade script upgrade-0.13.1-to-<new_version>.mysql.sql
    Completed pre-0-upgrade-0.13.1-to-<new_version>.mysql.sql
    Completed upgrade-0.13.1-to-<new_version>.mysql.sql
    schemaTool completed
    
  • Use the -validate option to verify the metastore schema. The following example shows the types of validations that are performed against the metastore schema when you use this option with schematool:

    $ schematool -dbType mysql -passWord <db_user_pswd> -userName
                   <db_user_name> -validate
    Starting metastore validation
    
    Validating schema version
    Succeeded in schema version validation.
    [SUCCESS]
    
    Validating sequence number for SEQUENCE_TABLE
    Succeeded in sequence number validation for SEQUENCE_TABLE
    [SUCCESS]
    
    Validating metastore schema tables
    Succeeded in schema table validation.
    [SUCCESS]
    
    Validating database/table/partition locations
    Succeeded in database/table/partition location validation
    [SUCCESS]
    
    Validating columns for incorrect NULL values
    Succeeded in column validation for incorrect NULL values
    [SUCCESS]
    
    Done with metastore validation: [SUCCESS]
    schemaTool completed
                 
  • If you want to find out all the required scripts for a schema upgrade, use the dryRun option.

    $ schematool -dbType mysql -upgradeSchemaFrom 0.10.0 -dryRun -passWord
                  <db_user_pswd> -userName <db_user_name>
    Metastore connection URL:
    jdbc:mysql://<cluster_address>:3306/<user_name>?useUnicode=true&characterEncoding=UTF-8
    Metastore Connection Driver : com.mysql.jdbc.Driver
    Metastore connection User: <user_name>
    Starting upgrade metastore schema from version 0.10.0 to <new_version>
    Upgrade script upgrade-0.10.0-to-0.11.0.mysql.sql
    Upgrade script upgrade-0.11.0-to-0.12.0.mysql.sql
    Upgrade script upgrade-0.12.0-to-0.13.0.mysql.sql
    Upgrade script upgrade-0.13.0-to-0.14.0.mysql.sql
    Upgrade script upgrade-0.14.0-to-1.1.0.mysql.sql
    Upgrade script upgrade-1.1.0-to-<new_version>.mysql.sql
    schemaTool completed