Accessing Local Data from Your Computer

If you want to perform analytics operations on existing data files (.csv, .txt, etc.) from your computer, you can upload these files directly to your Cloudera Data Science Workbench project.

  1. Navigate to the project's Overview page.
  2. Under the Files section, click Upload and select the relevant data files to be uploaded.
  3. Upload the appropriate tips.csv dataset to the data folder in your project before you run these examples.
    The following sections use the tips.csv dataset to demonstrate how to work with local data stored within your project.
    Pandas (Python)
    import pandas as pd
    
    tips = pd.read_csv('data/tips.csv')
      
    tips \
      .query('sex == "Female"') \
      .groupby('day') \
      .agg({'tip' : 'mean'}) \
      .rename(columns={'tip': 'avg_tip_dinner'}) \
      .sort_values('avg_tip_dinner', ascending=False)
    dplyr (R)
    library(readr)
    library(dplyr)
    
    # load data from .csv file in project
    tips <- read_csv("data/tips.csv")
    
    # query using dplyr
    tips %>%
      filter(sex == "Female") %>%
      group_by(day) %>%
      summarise(
        avg_tip = mean(tip, na.rm = TRUE)
      ) %>%
      arrange(desc(avg_tip))