Accessing Data in Amazon S3 Buckets
Every language in Cloudera Machine Learning has libraries available for uploading to and downloading from Amazon S3.
To work with S3:
- Add your Amazon Web Services access keys to your project's environment variables as
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
. - Pick your favorite language from the code samples below. Each one downloads the R 'Old Faithful' dataset from S3.
R
library("devtools") install_github("armstrtw/AWS.tools") Sys.setenv("AWSACCESSKEY"=Sys.getenv("AWS_ACCESS_KEY_ID")) Sys.setenv("AWSSECRETKEY"=Sys.getenv("AWS_SECRET_ACCESS_KEY")) library("AWS.tools") s3.get("s3://sense-files/faithful.csv")
Python
# Install Boto to the project !pip install boto # Create the Boto S3 connection object. from boto.s3.connection import S3Connection aws_connection = S3Connection() # Download the dataset to file 'faithful.csv'. bucket = aws_connection.get_bucket('sense-files') key = bucket.get_key('faithful.csv') key.get_contents_to_filename('/home/cdsw/faithful.csv')