Transforming and organizing data using Cloudera Data Engineering
You must define the Spark job configurations, resources, and schedule in the Cloudera Data Engineering (CDE) cluster.
In CDE, you perform the organization and transformation piece of the Business Intelligence at Scale pattern. A Spark jar file (bi-workflow_2.11-0.1.jar) has been created for you and is available as part of your pattern artifacts download package to create a Spark job. The Spark job converts the weather data that you put into the AWS S3 bucket in Avro format to Parquet, and makes the Parquet data available to Cloudera Data Warehouse.