The master host keeps track of all critical persistent and stateful data within Cloudera Machine Learning. This data is stored at
- Project Files
Cloudera Machine Learning uses an NFS server to store project files. Project files can include user code, any libraries you install, and small data files. The master host provides a persistent filesystem which is exported to worker hosts using NFS. This filesystem allows users to install packages interactively and have their dependencies and code available on all Cloudera Machine Learning nodes without any need for synchronization. The files for all the projects are stored on the master host at
/var/lib/cdsw/current/projects. When a job or session is launched, the project’s filesystem is mounted into an isolated Docker container at
- Relational Database
The Cloudera Machine Learning uses a PostgreSQL database that runs within a container on the master host at
Cloudera Machine Learning allows users to work interactively with R, Python, and Scala from their browser and display results in realtime. This realtime state is stored in an internal database called Livelog, which stores data on the master host at /var/lib/cdsw/current/livelog. Users do not need to be connected to the server for results to be tracked or jobs to run.