Importing and restoring data into COD database

You can import your data into your CDP Operational Database (COD) database by restoring your HBase table into COD.

  • Have a location in cloud storage (for example, S3 or ABFS) with an exported snapshot in it, and have the name of the snapshot.
    If you do not already have an exported HBase snapshot, you can export your data to cloud storage using the following command:
    hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot [***SNAPSHOT NAME***] -copy-to [***CLOUD STORAGE LOCATION***] -mappers 10
    
    For example, the data-from-onprem snapshot can be exported into s3a://cod-external-bucket/hbase:
    hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot data-from-onprem -copy-to s3a://cod-external-bucket/hbase -mappers 10
    
  • Have and edge node with a configured HBase client tarball and know how to launch the hbase shell from it. For more information, see Launching HBase shell.
  1. Get your CLOUD STORAGE LOCATION for your COD database using the COD web interface.
    s3a://my_cod_bucket/cod-12345/hbase
  2. Add your bucket to the IAM policy used by IDBroker.
  3. Launch the hbase shell from the edge node.
    For more information, see Launching HBase shell.
  4. From the edge node, run the ExportSnapshot command. Use the external bucket as the source location and the COD cloud storage as the target.
    For example:
    $ cd $HBASE_HOME
    $ ./bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot “data-from-onprem” --copy-from s3a://cod-external-bucket/hbase --copy-to s3a://my_cod_bucket/cod-12345/hbase
    
  5. Use the list_snapshots command and verify that your snapshot is listed.
    $ cd $HBASE_HOME
    $ ./bin/hbase shell
    $ hbase> list_snapshots
    [‘snapshot_name’]
    
  6. Use the restore_snapshot or the clone_snapshot command to reconstitute the table.
    • restore_snapshot: Overwrites current table state with that of the snapshot. It means that any data modification applied after the snapshot was taken would be lost. If the table does not exist in the given cluster, the command automatically creates it.
    • clone_snapshot: Accepts a new table name for the table in which it restores the table schema and data.
  7. Validate that all rows are present in the table using the hbase rowcounter or count command in the hbase shell.