Data AccessPDF version

Accessing Data in Amazon S3 Buckets

Every language in Cloudera Data Science Workbench has libraries available for uploading to and downloading from Amazon S3.

To work with S3:
  1. Add your Amazon Web Services access keys to your project's environment variables as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
  2. Pick your favorite language from the code samples below. Each one downloads the R 'Old Faithful' dataset from S3.
    R
    library("devtools") 
    install_github("armstrtw/AWS.tools") 
    
    Sys.setenv("AWSACCESSKEY"=Sys.getenv("AWS_ACCESS_KEY_ID")) 
    Sys.setenv("AWSSECRETKEY"=Sys.getenv("AWS_SECRET_ACCESS_KEY")) 
    
    library("AWS.tools") 
    
    s3.get("s3://sense-files/faithful.csv")
    Python
    # Install Boto to the project
    !pip install boto
    
    # Create the Boto S3 connection object.
    from boto.s3.connection import S3Connection
    aws_connection = S3Connection()
            
    # Download the dataset to file 'faithful.csv'.
    bucket = aws_connection.get_bucket('sense-files')
    key = bucket.get_key('faithful.csv')
    key.get_contents_to_filename('/home/cdsw/faithful.csv')