Dynamic Resource Pools
Minimum Required Role: Configurator (also provided by Cluster Administrator, Full Administrator)
A dynamic resource pool is a named configuration of resources and a policy for scheduling the resources among YARN applications and Impala queries running in the pool. Dynamic resource pools allow you to schedule and allocate resources to YARN applications and Impala queries based on a user's access to specific pools and the resources available to those pools. If a pool's allocation is not in use, it can be preempted and distributed to other pools. Otherwise, a pool receives a share of resources in accordance with the pool's weight. Dynamic resource pools have access control lists (ACLs) that restrict who can submit work to and administer them.
A configuration set defines the allocation of resources across pools that can be active at a given time. For example, you can define "day time" and "off hour" configuration sets, for which you specify different resource allocations during the daytime and for the remaining time of the week.
A scheduling rule defines when a configuration set is active. The configurations in the configuration set are propagated to the fair scheduler allocation file as required by the schedule. The updated files are stored in the YARN ResourceManager configuration directory /var/run/cloudera-scm-agent/process/nn-yarn-RESOURCEMANAGER on the host running the ResourceManager role. See Server and Client Configuration.
Resource pools can be nested, with subpools restricted by the settings of their parent pool.
The resources available for sharing are subject to the allocations made for each service if static service pools (cgroups) are enforced. For example, if the static pool for YARN is 75% of the total cluster resources, resource pools will use only 75% of resources.
After you create or edit dynamic resource pool settings, Refresh Dynamics Resource Pools and Discard Changes buttons display. Click Refresh Dynamics Resource Pools to propagate the settings to the fair scheduler allocation file (by default, fair-scheduler.xml). The updated files are stored in the YARN ResourceManager configuration directory /var/run/cloudera-scm-agent/process/nn-yarn-RESOURCEMANAGER on the host running the ResourceManager role. See Server and Client Configuration.
For information on determining how allocated resources are actually used, see Cluster Utilization Reports.
Managing Dynamic Resource Pools
The main entry point for using dynamic resource pools with Cloudera Manager is the
menu item.Continue reading:
Viewing Dynamic Resource Pool Configuration
- YARN - Weight, Virtual Cores, Min and Max Memory, Max Running Apps, and Scheduling Policy
- Impala Admission Control - Max Memory, Max Running Queries, Max Queued Queries, Queue Timeout, and Default Query Memory Limit
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- Click the YARN or Impala Admission Control tab.
Creating a YARN Dynamic Resource Pool
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- Click the YARN tab.
- Click Create Resource Pool. The Create Resource Pool dialog box displays showing the Configuration Sets tab.
- Specify a name and resource limits for the pool:
- In the Resource Pool Name field, specify the pool name. Enter a unique name containing only alphanumeric characters. If referencing a user or group name that contains a ".", replace the "." with "_dot_".
- Choose a configuration set. Specify a weight that indicates that pool's share of resources relative to other pools, minimum and maximums for virtual cores and memory, and a limit on the number of applications that can run simultaneously in the pool.
- Click the Scheduling Policy tab and select a policy:
- Dominant Resource Fairness (DRF) (default) - An extension of fair scheduling for more than one resource. DRF determines CPU and memory resource shares based on the availability of those resources and the needs of the job.
- Fair (FAIR) - Determines resource shares based on memory.
- First-In, First-Out (FIFO) - Determines resource shares based on when a job was added.
- If you have enabled Fair Scheduler preemption, click the Preemption tab and optionally set a preemption timeout to specify how long a job in this pool must wait before it can preempt resources from jobs in other pools. To enable preemption, follow the procedure in Enabling and Disabling Fair Scheduler Preemption.
- If you have enabled ACLs and specified users or groups, optionally click the Submission and Administration Access Control tabs to specify which users and groups can submit applications and which users can view all and kill applications. By default, anyone can submit, view all, and kill applications. To restrict either of these permissions, select Allow these users and groups and provide a comma-delimited list of users and groups in the Users and Groups fields respectively.
- Click Create.
- Click Refresh Dynamic Resource Pools.
Enabling and Disabling Dynamic Resource Pools for Impala
By default, admission control and dynamic resource pools for Impala are disabled. Until both are enabled, the Impala Admission Control tab does not appear in the Dynamic Resource Pool Configuration tab. Therefore, you typically enable and disable both of these features simultaneously. To enable and disable Impala dynamic resource pools, follow the procedure in Enabling and Disabling Impala Admission Control Using Cloudera Manager.Creating an Impala Dynamic Resource Pool
There is always a resource pool designated as root.default. By default, all Impala queries run in this pool when the dynamic resource pool feature is enabled for Impala. You create additional pools when your workload includes identifiable groups of queries (such as from a particular application, or a particular group within your organization) that have their own requirements for concurrency, memory use, or service level agreement (SLA). Each pool has its own settings related to memory, number of queries, and timeout interval.
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- Click the Impala Admission Control tab.
- Click Create Resource Pool.
- Specify a name and resource limits for the pool:
- In the Resource Pool Name field, type a unique name containing only alphanumeric characters.
- Optionally click the Submission Access Control tab to specify which users and groups can submit queries. By default, anyone can submit queries. To restrict this permission, select the Allow these users and groups option and provide a comma-delimited list of users and groups in the Users and Groups fields respectively.
- Click Create.
- Click Refresh Dynamic Resource Pools.
Choosing Settings for an Impala Dynamic Resource Pool
- Max Memory
- The maximum amount of aggregate memory, cluster-wide, that can be used by all queries running concurrently in this pool. In conjunction with the Default
Query Memory Limit setting and the number of Impala hosts in the cluster, Impala determines the expected maximum memory used by all queries in the pool, and holds back any further queries once
the estimated upper limit is reached.
For example, consider the following scenario:
- The cluster is running impalad daemons on 5 DataNodes.
- A dynamic resource pool has a Max Memory setting of 100 GB.
- The Default Query Memory Limit for the pool is 10 GB. Therefore, Impala expects that any query running in this pool could use up to 50 GB of memory (default query memory limit * number of Impala nodes).
- The maximum number of queries that Impala will execute concurrently within this dynamic resource pool is 2, because that is the most that could be accomodated within the 100 GB Max Memory cluster-wide limit.
- There is no memory penalty if queries use less memory than the Default Query Memory Limit per-host setting or the Max Memory cluster-wide limit. These values are only used to estimate how many queries can be run concurrently within the resource constraints for the pool.
- Max Running Queries
- The maximum number of queries that can run concurrently in this pool. The default value is unlimited. Any queries for this pool that arrive while pool is at capacity go into the
admission control queue until other queries finish. This setting is useful during the early stages of resource management, where you do not have extensive data about query memory usage, but you can
still determine that the cluster performs better overall if some throttling is applied to Impala queries.
For a workload with many small queries, you typically specify a high value for this setting, or leave it with the default setting of "unlimited". This setting is more significant for a workload with expensive queries, where some number of concurrent queries saturate the memory, I/O, CPU, or network capacity of the cluster. In this case, set the value low enough that the cluster resources are not overcommitted for Impala.
Even once you have enabled memory-based admission control using other pool settings, you might leave this setting in place as a safeguard. Queries go into the queue if they exceed either the total estimated memory or the maximum number of concurrent queries for the pool.
- Max Queued Queries
- The maximum number of queries that can be in the admission control queue for a pool at any one time. Further queries that attempt to run in the pool are cancelled, until some running queries finish and other queries begin running, reducing the size of the queue. The default value is 200. Typically, this value does not need to be adjusted, because if a large number of queries are queued, you address the situation by changing other parameters such as the timeout interval or increasing the capacity of the pool.
- Queue Timeout
- The amount of time (measured in milliseconds) that a query can wait in the admission control queue for this pool before being cancelled. The default value is 60,000 (60 seconds).
Typically, in a low-concurrency workload, few or no queries are queued. Or, in an environment without a strict SLA, it does not matter if queries occasionally take much longer than normal because they are held back in the admission control. In cases like these, this setting is not significant and you can specify a high value to avoid cancelling queries unexpectedly. You might also need to increase the value somewhat to use Impala with some business intelligence tools that have their own timeout intervals for queries.
In a high-concurrency workload, especially for queries with a tight SLA, high wait times in the admission control can represent a serious problem. For example, if a query needs to run in 10 seconds, and you have tuned it so that it runs in 8 seconds, then it will violate its SLA if it waits in the admission control queue for more than 2 seconds. In a case like this, set a relatively low timeout value and monitor how many queries are cancelled because of timeouts. This technique helps you to discover capacity, tuning, and scaling problems early, and helps avoid wasting resources by running expensive queries that have already missed their SLA.
If you can identify some queries that are OK with a high timeout value, while other queries benefit from having a low timeout value, consider creating separate pools with different values for this setting.
- Default Query Memory Limit
- The equivalent of setting the MEM_LIMIT query option for each query that runs in this pool. This value represents the maximum memory the query can use on each host. If the query exceeds that much memory on a given host, it will activate the spill-to-disk mechanism, and possibly be cancelled if available memory is too low. Impala multiplies this default memory limit value by the number of Impala hosts in the cluster to estimate how many queries will fit within the total RAM represented by the Max Memory setting, which represents a cluster-wide limit.
Editing Dynamic Resource Pools
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- Click the YARN or Impala Admission Control tab.
- Click Edit at the right of a resource pool row. Edit the properties.
- If you have enabled ACLs and specified users or groups, optionally click the Submission and Administration Access Control tabs to specify which users and groups can submit applications and which users can view all and kill applications. By default, anyone can submit, view all, and kill applications. To restrict either of these permissions, select Allow these users and groups and provide a comma-delimited list of users and groups in the Users and Groups fields respectively.
- Click Save.
- Click Refresh Dynamic Resource Pools.
YARN Pool Status and Configuration Options
Viewing Dynamic Resource Pool Status
- Go to the YARN service.
- Click the Resource Pools tab.
Adding Subpools
Pools can be nested as subpools. They share among their siblings the resources of the parent pool. Each subpool can have its own resource restrictions; if those restrictions fall within the configuration of the parent pool, the limits for the subpool take effect. If the limits for the subpool exceed those of the parent, the parent limits take effect.
Once you create subpools, jobs cannot be submitted to the parent pool; they must be submitted to a subpool.
- Select . The tab displays.
- Click at the right of a resource pool row and select Create Subpool. Configure subpool properties.
- Click Create.
- Click Refresh Dynamic Resource Pools.
Configuring Default YARN Fair Scheduler Properties
- Select . The tab displays.
- Click the YARN tab.
- Click the Default Settings button.
- Specify the default scheduling policy, maximum applications, and preemption timeout properties.
- Click Save.
- Click Refresh Dynamic Resource Pools.
Setting YARN User Limits
Pool properties determine the maximum number of applications that can run in a pool. To limit the number of applications specific users can run at the same time in a pool:
- Select . The tab displays.
- Click the User Limits tab. The table displays a list of users and the maximum number of jobs each user can submit.
- Click Add User Limit.
- Specify a username. Enter a unique name containing only alphanumeric characters. If referencing a user or group name that contains a ".", replace the "." with "_dot_".
- Specify the maximum number of running applications.
- Click Create.
- Click Refresh Dynamic Resource Pools.
Enabling ACLs
To specify whether ACLs are checked:
- Select . The tab displays.
- Click the Other Settings button.
- In the Enable ResourceManager ACLs property, select the YARN service.
- Click Save Changes to commit the changes.
- Return to the Home page by clicking the Cloudera Manager logo.
- Click to invoke the cluster restart wizard.
- Click Restart Stale Services.
- Click Restart Now.
- Click Finish.
Configuring ACLs
To configure which users and groups can submit and kill YARN applications in any resource pool:
- Enable ACLs.
- Select . The tab displays.
- Click the Other Settings button.
- In the Admin ACL property, specify which users and groups can submit and kill applications.
- Click Save Changes to commit the changes.
- Return to the Home page by clicking the Cloudera Manager logo.
- Click to invoke the cluster restart wizard.
- Click Restart Stale Services.
- Click Restart Now.
- Click Finish.
Defining Resource Allocations with Configuration Sets
A configuration set defines the allocation of resources across pools that can be active at a given time. For example, you can define "day time" and "off hour" configuration sets, for which you specify different resource allocations during the daytime and for the remaining time of the week.
You create configuration sets while creating scheduling rules. Once you have created a configuration set, you can configure its properties, such as weight, minimum and maximum memory and virtual cores, and maximum running applications.
Specifying Configuration Set Properties
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- Click the Resource Pools tab.
- For each resource pool, click Edit.
- Select a configuration set name.
- Edit the configuration set properties and click Save.
- Click Refresh Dynamic Resource Pools.
Example Configuration Sets
The Daytime configuration set assigns the production pool five times the resources of the development pool:The Off Hour configuration set assigns the production and development pools an equal share of the resources:
See example scheduling rules for these configuration sets.
Viewing the Properties of a Configuration Set
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- In the Configuration Sets drop-down list, select a configuration set. The properties of each pool for that configuration set display.
Configuring Configuration Set Schedules
A scheduling rule defines when a configuration set is active. The configurations in the configuration set are propagated to the fair scheduler allocation file as required by the schedule. The updated files are stored in the YARN ResourceManager configuration directory /var/run/cloudera-scm-agent/process/nn-yarn-RESOURCEMANAGER on the host running the ResourceManager role. See Server and Client Configuration.
Example Scheduling Rules
Consider the example Daytime and Off Hour configuration sets. To specify that the Daytime configuration set is active every weekday from 8:00 a.m. to 5:00 p.m. and the Off Hour configuration set is active all other times (evenings and weekends), define the following rules:Adding a Scheduling Rule
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- Click the Scheduling Rules tab.
- Click Create Scheduling Rule.
- In the Configuration Set field, choose the configuration set to which the rule applies. Select the Create New or Use Existing option.
- Do one of the following:
- If you create a new configuration set, type a name in the Name field.
- If you use an existing configuration set, select one from the drop-down list.
- If the rule should repeat, keep the Repeat field selected, and specify the repeat frequency, and if the frequency is weekly, the repeat day or days.
- If the rule should not repeat, deselect the Repeat field, click the left side of the on field to display a drop-down calendar where you set the starting date and time. When you specify the date and time, a default time window of two hours is set in the right side of the on field. Click the right side to adjust the date and time.
- Click Create.
- Click Refresh Dynamic Resource Pools.
Reordering Scheduling Rules
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- Click the Scheduling Rules tab.
- Click Reorder Scheduling Rules.
- Click Move Up or Move Down in a rule row.
- Click Save.
- Click Refresh Dynamic Resource Pools.
Editing a Scheduling Rule
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- Click Scheduling Rules.
- Click Edit at the right of a rule.
- Edit the rule.
- Click Save.
- Click Refresh Dynamic Resource Pools.
Deleting a Scheduling Rule
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- Click Scheduling Rules.
- Click at the right of a rule and select Delete.
- Click OK.
Assigning Applications and Queries to Resource Pools
To submit a YARN application to a specific resource pool, specify the mapreduce.job.queuename property. The YARN application's queue property is mapped to a resource pool. To submit an Impala query to a specific resource pool, specify the REQUEST_POOL option.
If a specific pool is not designated, Cloudera Manager provides many options for determining how YARN applications and Impala queries are automatically assigned to resource pools. You can specify a set of ordered rules for assigning applications and queries in pools. You can also specify default pool settings directly in the YARN fair scheduler configuration.
Specifying Placement Rules and Rule Ordering
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- Click the YARN or Impala Admission Control tab.
- Click the Placement Rules tab.
- On the Placement Rules tab, select
- YARN
- specified pool; create the pool if it doesn't exist (default 1st)
- root.<username> pool; create the pool if it doesn't exist (default 2nd).
- root.<username> pool only if the pool exists
- specified pool only if the pool exists
- root.<primaryGroupName> pool; create the pool if it doesn't exist.
- root.<primaryGroupName> pool only if the pool exists
- root.<secondaryGroupName> pool only if one of these pools exists.
- default pool; create the pool if it doesn't exist
- Impala
- specified pool only if the pool exists (default 1st)
- default pool (default 2nd)
- root.<username> pool only if the pool exists
- root.<primaryGroupName> pool only if the pool exists
- root.<secondaryGroupName> pool only if one of these pools exists
. The available rules are:
- YARN
- Click Refresh Dynamic Resource Pools.
Reordering Placement Rules
- Select enabled for dynamic resource pools, the tab displays. . If the cluster has a YARN service, the tab displays. If the cluster has an Impala service
- Click the YARN or Impala Admission Control tab.
- Click the Placement Rules tab.
- Click Reorder Placement Rules.
- Click Move Up or Move Down in a rule row.
- Click Save.
- Click Refresh Dynamic Resource Pools.