If you want to perform analytics operations on existing data files (.csv, .txt, etc.)
from your computer, you can upload these files directly to your Cloudera Data Science Workbench
project.
-
Navigate to the project's Overview page.
-
Under the Files section, click Upload and select the relevant data files to be
uploaded.
-
Upload the appropriate tips.csv dataset to the
data
folder in your project before you run these examples.
The following sections use the
tips.csv dataset to demonstrate how to work with
local data stored within your project.
Pandas (Python)
import pandas as pd
tips = pd.read_csv('data/tips.csv')
tips \
.query('sex == "Female"') \
.groupby('day') \
.agg({'tip' : 'mean'}) \
.rename(columns={'tip': 'avg_tip_dinner'}) \
.sort_values('avg_tip_dinner', ascending=False)
dplyr (R)
library(readr)
library(dplyr)
# load data from .csv file in project
tips <- read_csv("data/tips.csv")
# query using dplyr
tips %>%
filter(sex == "Female") %>%
group_by(day) %>%
summarise(
avg_tip = mean(tip, na.rm = TRUE)
) %>%
arrange(desc(avg_tip))