Configure lifecycle management for logs on Azure

To avoid unnecessary costs related to ALS Gen2 cloud storage, you should create lifecycle management rules for your cloud storage container used by CDP for storing logs so that these logs get deleted once they are no longer useful.

Some examples of CDP logs stored in cloud storage are: cloudera server logs, cloudera agent logs, autossh logs, freeipa logs, ranger audit logs, datahub services logs, datalake logs, cm management services logs, and so on. These logs are mostly useful for troubleshooting, so they can be periodically deleted.

Azure allows you to set up lifecycle management rules for your ADLS cloud storage. For example, you can set a specified expiration period for a cloud storage location so that files in that location get deleted automatically on a scheduled basis. Cloudera recommends that you do this for the cloud storage location that you provided to CDP for log storage.

Consider the following when setting up lifecycle management rules:

  • As logs and data locations may overlap with each other (in case the same bucket or container is used for both), ensure to use the correct path prefixes in order to delete only the logs. The prefixes are listed below.
  • When setting an expiration period, consider how long you would like to keep the logs to allow enough time for troubleshooting. For example, in case your Data Lake, FreeIPA or Data Hub cluster is ever down, you should be able to access the logs for troubleshooting.

To set up lifecycle management:

Prefixes based on Azure environment’s logs location base

Prior to creating lifecycle management rules, review this information to ensure that you use the correct path.

The “cluster-logs” prefix is automatically generated if a bucket name without any subdirectories is used as logs location

The “cluster-logs” prefix is automatically generated if subdirectories are provided

If your environment was registered prior to February 2021: If you defined a sub-directory, then that subdirectory is used instead of “cluster-logs”
Logs location provided during environment registration abfs://mycontainer@myaccount.dfs.core.windows.net abfs://mycontainer/my-dl@myaccount.dfs.core.windows.net abfs://mycontainer/my-dl@myaccount.dfs.core.windows.net
FreeIPA prefix for lifecycle rule mycontainer/cluster-logs/freeipa mycontainer/my-dl/cluster-logs/freeipa mycontainer/my-dl/freeipa
DataLake prefix for lifecycle rule mycontainer/cluster-logs/datalake mycontainer/my-dl/cluster-logs/datalake mycontainer/my-dl/datalake
DataHub prefix for lifecycle rule mycontainer/cluster-logs/datahub mycontainer/my-dl/cluster-logs/datahub mycontainer/my-dl/datahub