Accessing Data in Amazon S3 Buckets

Every language in Cloudera Data Science Workbench has libraries available for uploading to and downloading from Amazon S3.

To work with S3:
  1. Add your Amazon Web Services access keys to your project's environment variables as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
  2. Pick your favorite language from the code samples below. Each one downloads the R 'Old Faithful' dataset from S3.
    R
    library("devtools") 
    install_github("armstrtw/AWS.tools") 
    
    Sys.setenv("AWSACCESSKEY"=Sys.getenv("AWS_ACCESS_KEY_ID")) 
    Sys.setenv("AWSSECRETKEY"=Sys.getenv("AWS_SECRET_ACCESS_KEY")) 
    
    library("AWS.tools") 
    
    s3.get("s3://sense-files/faithful.csv")
    Python
    # Install Boto to the project
    !pip install boto
    
    # Create the Boto S3 connection object.
    from boto.s3.connection import S3Connection
    aws_connection = S3Connection()
            
    # Download the dataset to file 'faithful.csv'.
    bucket = aws_connection.get_bucket('sense-files')
    key = bucket.get_key('faithful.csv')
    key.get_contents_to_filename('/home/cdsw/faithful.csv')