Backup directory structure

The backup directory structure in the rootPath is considered an internal detail and could change in future versions of Kudu. Additionally, the format and content of the data and metadata files is meant for the backup and restore process only and could change in future versions of Kudu. That said, understanding the structure of the backup rootPath and how it is used can be useful when working with Kudu backups.

The backup directory structure in the rootPath is as follows:
/<rootPath>/<tableId>-<tableName>/<backup-id>/
   .kudu-metadata.json
   part-*.<format>
  • rootPath: Can be used to distinguish separate backup groups, jobs, or concerns

  • tableId: The unique internal ID of the table being backed up

  • tableName: The name of the table being backed up

    • Note: Table names are URL encoded to prevent pathing issues

  • backup-id: A way to uniquely identify/group the data for a single backup run

  • .kudu-metadata.json: Contains all of the metadata to support recreating the table, linking backups by time, and handling data format changes

    Written last so that failed backups will not have a metadata file and will not be considered at restore time or backup linking time.

  • part-*.<format>: The data files containing the tables data.

    • Currently 1 part file per Kudu partition

    • Incremental backups contain an additional “RowAction” byte column at the end

    • Currently the only supported format/suffix is parquet