Cluster Lifecycle Management with Cloudera Manager
Cloudera Manager clusters that use parcels to provide Cloudera Runtime and other components require adequate disk space in the following locations:
|Parcel Lifecycle Path (default)||Notes|
|Local Parcel Repository Path (
This path exists only on the host where Cloudera Manager Server
The default location is
Provide sufficient space to hold all the parcels you download from all configured Remote Parcel Repository URLs. Cloudera Manager deployments that manage multiple clusters store all applicable parcels for all clusters.
Parcels are provided for each operating system, so be aware that heterogeneous clusters (distinct operating systems represented in the cluster) require more space than clusters with homogeneous operating systems.
For example, a cluster with both RHEL6.x and 7.x hosts must hold -el6 and -el7 parcels in the Local Parcel Repository Path, which requires twice the amount of space.Lifecycle Management and Best Practices
Delete any parcels that are no longer in use from the Cloudera Manager Administration Console, (never delete them manually from the command line) to recover disk space in the Local Parcel Repository Path and simultaneously across all managed cluster hosts which hold the parcel.Backup Considerations
Perform regular backups of this path, and
consider it a non-optional accessory to backing up Cloudera Manager Server. If you
migrate Cloudera Manager Server to a new host or restore it from a backup (for
example, after a hardware failure), recover the full content of this path to the new
host, in the
|Parcel Cache (
Managed Hosts running a Cloudera Manager Agent stage distributed parcels into
this path (as
Provide sufficient space per-host to hold all the parcels you distribute to each host.
You can configure Cloudera Manager to remove these cached
To configure this behavior in the Cloudera Manager Administration Console, select
|Host Parcel Directory (
Managed cluster hosts running a Cloudera Manager Agent extract parcels from
Provide sufficient space on each host to hold all the parcels you distribute to each host. Be aware that the typical Runtime or CDH parcel size is approximately 2 GB per parcel, and some third party parcels can exceed 3 GB. If you maintain various versions of parcels staged before and after upgrading, be aware of the disk space implications.
You can configure Cloudera Manager to automatically remove older parcels when they are no longer in use. As an administrator you can always manually delete parcel versions not in use, but configuring these settings can handle the deletion automatically, in case you forget.
To configure this behavior in the Cloudera Manager Administration Console, selectand configure the following property:
|Activity Monitor (One-time)||
The Activity Monitor only works against a MapReduce (MR1) service, not YARN. So if your deployment has fully migrated to YARN and no longer uses a MapReduce (MR1) service, your Activity Monitor database is no longer growing. If you have waited longer than the default Activity Monitor retention period (14 days) to address this point, then the Activity Monitor has already purged it all for you and your database is mostly empty. If your deployment meets these conditions, consider cleaning up by dropping the Activity Monitor database (again, only when you are satisfied that you no longer need the data or have confirmed that it is no longer in use) and the Activity Monitor role.
|Service Monitor and Host Monitor (One-time)||
For those who used Cloudera Manager version 4.x and have now upgraded to version 5.x: The Service Monitor and Host Monitor were migrated from their previously-configured RDBMS into a dedicated time series store used solely by each of these roles respectively. After this happens, there is still legacy database connection information in the configuration for these roles. This was used to allow for the initial migration but is no longer being used for any active work.
After the above migration has taken place, the RDBMS databases previously used by the Service Monitor and Host Monitor are no longer used. Space occupied by these databases is now recoverable. If appropriate in your environment (and you are satisfied that you have long-term backups or do not need the data on disk any longer), you can drop those databases.
|Ongoing Space Reclamation||
Cloudera Management Services are automatically rolling up, purging or otherwise consolidating aged data for you in the background. Configure retention and purging limits per-role to control how and when this occurs. These configurations are discussed per-entity above. Adjust the default configurations to meet your space limitations or retention needs.
Log File Storage Space
All cluster hosts write out separate log files for each role instance assigned to the host. Cluster administrators can monitor and manage the disk space used by these roles and configure log rotation to prevent log files from consuming too much disk space.