Example analyze archived, indexed data with Hive

A working example of analyzing archived, indexed data with Hive.

Once data has been archived or saved to HDFS, Hive tables can be used to quickly access and analyzed stored data. Only line delimited JSON files can be analyzed with Hive. Line delimited JSON files are created by default unless the --json-file argument is passed. Data saved or archived using --json-file cannot be analyzed with Hive. In the following examples, the hive-json-serde.jar is used to process the stored JSON data. Prior to creating the included tables, the jar must be added in the Hive shell:

ADD JAR [PATH_TO_JAR]/hive-json-serde.jar

Here are some examples for table schemes for various log types. Using external tables is recommended, as it has the advantage of keeping the archives in HDFS. First ensure a directory is created to store archived or stored line delimited logs:

hadoop fs -mkdir [SOME_DIRECTORY_PATH]