Custom Data Connection Development

Custom data connections can be developed from within Cloudera Machine Learning Workspace and Python Sessions using the Cloudera Machine Learning Python Data Library and implementing the Cloudera Machine Learning Custom Connection Interface.

You can view CustomConnection interface help descriptions within in a session:

import cml.data_v1 as cmldata
help(cmldata.customconnection)

Alternatively, you can inspect the source content as follows:

import cml.data_v1 as cmldata
import inspect
print(inspect.getsource(cmldata.customconnection))

Your custom connection code must implement the CustomConnection interface for the cml.data_v1 library to load your module dynamically (see Loading custom connections)

Two functions are already implemented so that the Cloudera Machine Learning Python Data Library can dynamically load your Python module implementation and make custom parameters available in self.parameters. In most cases, you will not need to reimplement these:

  1. __init__(self, properties)
  2. update_properties(self, properties)

The rest of the interface functions are included as common functions that you may want to implement.

  1. get_base_connection(self)
  2. get_pandas_dataframe(self, query)
  3. get_cursor(self)
  4. print_usage(self)
  5. override_parameters(self)

See Developing and testing your first custom connection for a simple example of how to implement these.