Querying files into a DataFrame

If you have data files that are outside of a Hive or Impala table, you can use SQL to directly read JSON or Parquet files into a DataFrame.

JSON

df = sqlContext.sql("SELECT * FROM json.`input dir`")

Parquet

df = sqlContext.sql("SELECT * FROM parquet.`input dir`")