Machine Learning Discovery and Exploration has a few prerequisites.

  • You must have Admin (write) permissions to set up or edit data connections. Only read permissions are needed to list data connections or view data connection details such as the library code.
  • To manually set up data connections, you must have the JDBC URL (for Impala or Hive connections) or the Data Lake external directory URL (for Spark connections). You can obtain this from the option menu for the virtual warehouse to which you want to connect.
  • One or more data warehouses must be already created in the environment. For more information on how to create a data warehouse, see Adding a New Virtual Warehouse.
  • For Spark data connections, you must use ML Runtimes, and specifically the Spark Runtime Add-on must be available.
  • Hive and Impala data connections also require ML Runtimes. Legacy engines may work, but are not supported.
  • For Spark data connections, you must have permissions set correctly for external access to the S3 bucket.
  • For both Hive and Impala connections, the HADOOP_USERNAME and WORKLOAD_PASSWORD must be set. The HADOOP_USERNAME is set automatically by the environment. To set the WORKLOAD_PASSWORD, see Using data connections in a project.
  • For Hive data connections, make sure that SSO is disabled.