Taking a mandatory snapshot of HDP tables
Taking a snapshot of Hive tables is mandatory before upgrading or migrating. You also need to keep track of how many tables you have before and after for comparison.
-
In Ambari, go to Services/Hive/Configs, and check the value of
hive.metastore.warehouse.dir
to determine the location of the Hive warehouse,/apps/hive/warehouse
by default. -
On any node in the cluster, as the HDFS superuser, enable snapshots.
$ sudo su - hdfs $ hdfs dfsadmin -allowSnapshot /apps/hive/warehouse
Allowing snapshot on /apps/hive/warehouse succeeded
-
Create a snapshot of the Hive warehouse.
$ hdfs dfs -createSnapshot /apps/hive/warehouse
Output includes the name and location of the snapshot.Created snapshot /apps/hive/warehouse/.snapshot/s20181204-164645.898
-
Start Hive as a user who has SELECT privileges on the tables.
$ beeline beeline> !connect jdbc:hive2:// Enter username for jdbc:hive2://: hive Enter password for jdbc:hive2://: *********
Connected to: Apache Hive (version 1.2.1000.2.6.5.0-292) Driver: Hive JDBC (version 1.2.1000.2.6.5.0-292)
-
Identify all tables outside
/apps/hive/warehouse/
.hive> USE my_database; hive> SHOW TABLES;
-
Determine the location of each table using the DESCRIBE command. For
example:
hive> DESCRIBE FORMATTED my_table partition (dt='20181130');
- Create a snapshot of the directory shown in the location section of the output.
-
Repeat steps 5-7 for each database and its tables outside
/apps/hive/warehouse/
.