Custom Data Connection Development
Custom data connections can be developed from within Cloudera Machine Learning Workspace and Python Sessions using the Cloudera Machine Learning Python Data Library and implementing the Cloudera Machine Learning Custom Connection Interface.
You can view CustomConnection
interface help descriptions within in a
session:
import cml.data_v1 as cmldata
help(cmldata.customconnection)
Alternatively, you can inspect the source content as follows:
import cml.data_v1 as cmldata
import inspect
print(inspect.getsource(cmldata.customconnection))
Your custom connection code must implement the CustomConnection
interface for
the cml.data_v1
library to load your module dynamically (see Loading
custom connections)
Two functions are already implemented so that the Cloudera Machine Learning Python Data Library can dynamically load
your Python module implementation and make custom parameters available in
self.parameters
. In most cases, you will not need to reimplement these:
__init__(self, properties)
update_properties(self, properties)
The rest of the interface functions are included as common functions that you may want to implement.
get_base_connection(self)
get_pandas_dataframe(self, query)
get_cursor(self)
print_usage(self)
override_parameters(self)
See Developing and testing your first custom connection for a simple example of how to implement these.