Choosing a DynamoDB Table and IO Capacity

When using DynamoDB, you must pay for the IO as well as the data storage: reads, writes and queries.

You can do this in two ways:

  • Provisioned Capacity - You configure the DynamoDB table for a maximum number of reads and writes a second. You are then billed for this capacity for the life of the database, even when not using S3Guard. You are also limited to the maximum IO throughput of that provisioned capacity. This was originally the only billing mechanism for DynamoDB. Choosing an IO capacity was hard, as you needed to balance maximum throughput (usually Hive query planning) with the costs that would entail.
  • On demand- When a table is created as an On demand table, you pay per IO operation. There is no monthly fee for pre-provisioned IO capacity, and your throughput is not limited to that prepaid capacity. You are, however, billed slightly more for each individual IO Operatation.

For Cloudera applications, on demand billing mechanism is needed. When a bucket and its S3Guard table is unused, the sole dynamo DB costs are for storing the file metadata, not for any unused IO. Only when an application work with a S3Guard table, the caller is billed.

In CDP environments, S3Guard tables are always configured for on-demand billing. If you want to switch to provisioned IO capacity, you can configure using the AWS DynamoDB console.