Accessing Local Data from Your Computer
This topic includes code samples that demonstrate how to access local data for CML workloads.
If you want to perform analytics operations on existing data files (.csv, .txt, etc.) from your computer, you can upload these files directly to your Cloudera Machine Learning project. Go to the project's Overview page. Under the Files section, click Upload and select the relevant data files to be uploaded.
The following sections use the tips.csv dataset to demonstrate
how to work with local data stored within your project. Upload this
dataset to the data
folder in your project before you
run these examples.
Pandas (Python)
import pandas as pd tips = pd.read_csv('data/tips.csv') tips \ .query('sex == "Female"') \ .groupby('day') \ .agg({'tip' : 'mean'}) \ .rename(columns={'tip': 'avg_tip_dinner'}) \ .sort_values('avg_tip_dinner', ascending=False)
dplyr (R)
library(readr) library(dplyr) # load data from .csv file in project tips <- read_csv("data/tips.csv") # query using dplyr tips %>% filter(sex == "Female") %>% group_by(day) %>% summarise( avg_tip = mean(tip, na.rm = TRUE) ) %>% arrange(desc(avg_tip))