Accessing Data in Amazon S3 Buckets
Every language in Cloudera Machine Learning has libraries available for uploading to and downloading from Amazon S3.
To work with S3:
- Add your Amazon Web Services access keys to your project's environment variables as
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. - Pick your favorite language from the code samples below. Each one downloads the R 'Old Faithful' dataset from S3.
R
library("devtools")
install_github("armstrtw/AWS.tools")
Sys.setenv("AWSACCESSKEY"=Sys.getenv("AWS_ACCESS_KEY_ID"))
Sys.setenv("AWSSECRETKEY"=Sys.getenv("AWS_SECRET_ACCESS_KEY"))
library("AWS.tools")
s3.get("s3://sense-files/faithful.csv")
Python
# Install Boto to the project
!pip install boto
# Create the Boto S3 connection object.
from boto.s3.connection import S3Connection
aws_connection = S3Connection()
# Download the dataset to file 'faithful.csv'.
bucket = aws_connection.get_bucket('sense-files')
key = bucket.get_key('faithful.csv')
key.get_contents_to_filename('/home/cdsw/faithful.csv')