Querying files into a DataFrame
If you have data files that are outside of a Hive or Impala table, you can use SQL to directly read JSON or Parquet files into a DataFrame.
JSON
df = sqlContext.sql("SELECT * FROM json.`input dir`")
Parquet
df = sqlContext.sql("SELECT * FROM parquet.`input dir`")