Aggregating and grouping data
You use functions, such as AVG, SUM, or MAX, to aggregate data. The GROUP BY clause groups data query results in one or more columns.
- You upload a CSV file to S3 having the following data:
1,jane doe,engineer,service,2022,375000 2,john smith,sales rep,sales,2021,375000 3,naoko murai,service rep,service,2022,200000 4,somporn thong,ceo,sales,2020,500000 5,xi singh,cfo,finance,2023,222000 6,mary brown,sales rep,sales,2021,475000 7,rashida kumar,sales rep,sales,2021,299000
- You create a table having the following data in Hive or Impala as shown in the
examples below:
+-----+---------------+-------------+---------+------+----------+ | eid | name | job | dept | year | salary | +-----+---------------+-------------+---------+------+----------+ | 1 | jane doe | engineer | service | 2022 | 375000.0 | | 2 | john smith | sales rep | sales | 2021 | 375000.0 | | 3 | naoko murai | service rep | service | 2022 | 200000.0 | | 4 | somporn thong | ceo | sales | 2020 | 500000.0 | | 5 | xi singh | cfo | finance | 2023 | 222000.0 | | 6 | mary brown | sales rep | sales | 2021 | 475000.0 | | 7 | rashida kumar | sales rep | sales | 2021 | 299000.0 | +-----+---------------+-------------+---------+------+----------+
Hive and Impala example:
CREATE EXTERNAL TABLE employees (eid int, name string, job string, dept string, year int, salary float) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 's3a://cdpsaasdemo-cdp-private-default-1ogvplm/user/andrena';