You must define the plug-in class for implementing the DevicePlugin interface.
DevicePlugin Interface
/**
* A must interface for vendor plugin to implement.
* */
public interface DevicePlugin {
/**
* Called first when device plugin framework wants to register.
* @return DeviceRegisterRequest {@link DeviceRegisterRequest}
* @throws Exception
* */
DeviceRegisterRequest getRegisterRequestInfo()
throws Exception;
/**
* Called when update node resource.
* @return a set of {@link Device}, {@link java.util.TreeSet} recommended
* @throws Exception
* */
Set<Device> getDevices() throws Exception;
/**
* Asking how these devices should be prepared/used
* before/when container launch. A plugin can do some tasks in its own or
* define it in DeviceRuntimeSpec to let the framework do it.
* For instance, define {@code VolumeSpec} to let the
* framework to create volume before running container.
*
* @param allocatedDevices A set of allocated {@link Device}.
* @param yarnRuntime Indicate which runtime YARN will use
* Could be {@code RUNTIME_DEFAULT} or {@code RUNTIME_DOCKER}
* in {@link DeviceRuntimeSpec} constants. The default means YARN's
* non-docker container runtime is used. The docker means YARN's
* docker container runtime is used.
* @return a {@link DeviceRuntimeSpec} description about environment,
* {@link VolumeSpec}, {@link MountVolumeSpec}. etc
* @throws Exception
* */
DeviceRuntimeSpec onDevicesAllocated(Set<Device>; allocatedDevices,
YarnRuntimeType yarnRuntime) throws Exception;
/**
* Called after device released.
* @param releasedDevices A set of released devices
* @throws Exception
* */
void onDevicesReleased(Set<Device> releasedDevices)
throws Exception;
}
Property |
Description |
getRegisterRequestInfo |
This method is used for the plug-in to get a new resource type name and then
the ResourceManager. The DeviceRegisterRequest returned by the method consists of a
plug-in version and a resource type name. For example,
nvidia.com/gpu . |
getDevices |
This method is used to get the latest vendor device list in this Node Manager
node. The resource count pre-defined in node-resources.xml will be
overridden. It is recommended that the vendor plug-in manages the allowed devices
reported to YARN in its own configuration. YARN can only have a blacklist
configuration specified using the devices.denied-numbers parameter in
the container-executor.cfg file. In this method, you may invoke a
shell command or invoke RESTful/RPC to remote service to get the list of devices
whenever required. | Note |
---|
The Device object can describe a fake device. If the major
device number, minor device number and device path are blank, the framework does not
do isolation for it. This provides feasibility for you to define a fake device
without real hardware. |
|
onDevicesAllocated |
This method is invoked to provide information to the framework on how to use
these devices. The Node Manager invokes this interface to let the plug-in start
preparation tasks like create volume before container launch and provides information
on how to expose the devices to container when launching it. This is described in the
DeviceRuntimeSpec interface. For example, DeviceRuntimeSpec can describe the container
launch requirements like environment variables, device and volume mounts, Docker
runtime type, and so on. |
onDeviceReleased |
This method is used for the plug-in to do clean up work like device reset
before the container terminates. |