Using the Hive Schema Tool
Schema Version Verification
Hive now records the schema version in the metastore database and verifies that the metastore schema version is compatible with the Hive binaries that are going to access the metastore. The Hive properties to implicitly create or alter the existing schema are disabled by default. Hence, Hive will not attempt to change the metastore schema implicitly. When you execute a Hive query against an old schema, it will fail to access the metastore displaying an error message as follows:
$ build/dist/bin/hive -e "show tables" FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
The error log will contain an entry similar to the following:
... Caused by: MetaException(message:Version information not found in metastore. ) at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:5638) ...
To suppress the schema check and allow the metastore to implicitly modify the schema, you need to set the hive.metastore.schema.verification configuration property to false in hive-site.xml.
Using schematool
The Hive distribution now includes an offline tool for Hive metastore schema manipulation called schematool. This tool can be used to initialize the metastore schema for the current Hive version. It can also handle upgrading schema from an older version to the current one. The tool will try to find the current schema from the metastore if available. However, this will be applicable only to any future upgrades. In case you are upgrading from existing CDH releases like CDH 4 or CDH 3, you should specify the schema version of the existing metastore as a command line option to the tool.
The schematool figures out the SQL scripts required to initialize or upgrade the schema and then executes those scripts against the backend database. The metastore database connection information such as JDBC URL, JDBC driver and database credentials are extracted from the Hive configuration. You can provide alternate database credentials if needed.
$ schematool -help usage: schemaTool -dbType <databaseType> Metastore database type -dryRun List SQL scripts (no execute) -help Print this message -info Show config and schema details -initSchema Schema initialization -initSchemaTo <initTo> Schema initialization to a version -passWord <password> Override config file password -upgradeSchema Schema upgrade -upgradeSchemaFrom <upgradeFrom> Schema upgrade from a version -userName <user> Override config file user name -verbose Only print SQL statementsThe dbType option should always be specified and can be one of the following:
derby|mysql|postgres|oracle
Usage Examples
- Initialize your metastore to the current schema for a new Hive setup
using the initSchema
option.
$ schematool -dbType derby -initSchema Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver Metastore connection User: APP Starting metastore schema initialization to <new_version> Initialization script hive-schema-<new_version>.derby.sql Initialization script completed schemaTool completed
- Get schema information using the info
option.
$ schematool -dbType derby -info Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver Metastore connection User: APP Hive distribution version: <new_version> Required schema version: <new_version> Metastore schema version: <new_version> schemaTool completed
- If you attempt to get schema information from older metastores that did
not store version information, the tool will report an error as follows.
$ schematool -dbType derby -info Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver Metastore connection User: APP Hive distribution version: <new_version> Required schema version: <new_version> org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version. *** schemaTool failed ***
- You can upgrade schema from a CDH 4 release by specifying the upgradeSchemaFrom
option.
$ schematool -dbType derby -upgradeSchemaFrom 0.10.0 Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver Metastore connection User: APP Starting upgrade metastore schema from version 0.10.0 to <new_version> Upgrade script upgrade-0.10.0-to-<new_version>.derby.sql Completed upgrade-0.10.0-to-<new_version>.derby.sql Upgrade script upgrade-0.11.0-to-<new_version>.derby.sql Completed upgrade-0.11.0-to-<new_version>.derby.sql schemaTool completed
The Hive versions of the older CDH releases are:CDH Releases Hive Version CDH 3
0.7.0
CDH 4.0
0.8.0
CDH 4.1
0.9.0
CDH 4.2 and later
0.10.0
- If you want to find out all the required scripts for a schema upgrade,
use the dryRun
option.
$ build/dist/bin/schematool -dbType derby -upgradeSchemaFrom 0.7.0 -dryRun 13/09/27 17:06:31 WARN conf.Configuration: hive.server2.enable.impersonation is deprecated. Instead, use hive.server2.enable.doAs Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver Metastore connection User: APP Starting upgrade metastore schema from version 0.7.0 to <new_version> Upgrade script upgrade-0.7.0-to-0.8.0.derby.sql Upgrade script upgrade-0.8.0-to-0.9.0.derby.sql Upgrade script upgrade-0.9.0-to-0.10.0.derby.sql Upgrade script upgrade-0.10.0-to-0.11.0.derby.sql Upgrade script upgrade-0.11.0-to-<new_version>.derby.sql schemaTool completed
<< Using Hive with HBase | Installing the Hive JDBC on Clients >> | |