Master Host

The master host keeps track of all critical persistent and stateful data within Cloudera Machine Learning. This data is stored at /var/lib/cdsw.

  • Project Files

    Cloudera Machine Learning uses an NFS server to store project files. Project files can include user code, any libraries you install, and small data files. The master host provides a persistent filesystem which is exported to worker hosts using NFS. This filesystem allows users to install packages interactively and have their dependencies and code available on all Cloudera Machine Learning nodes without any need for synchronization. The files for all the projects are stored on the master host at /var/lib/cdsw/current/projects. When a job or session is launched, the project’s filesystem is mounted into an isolated Docker container at /home/cdsw.

  • Relational Database

    The Cloudera Machine Learning uses a PostgreSQL database that runs within a container on the master host at /var/lib/cdsw/current/postgres-data.

  • Livelog

    Cloudera Machine Learning allows users to work interactively with R, Python, and Scala from their browser and display results in realtime. This realtime state is stored in an internal database called Livelog, which stores data on the master host at /var/lib/cdsw/current/livelog. Users do not need to be connected to the server for results to be tracked or jobs to run.