Developing Apache Spark ApplicationsPDF version

Querying files into a DataFrame

If you have data files that are outside of a Hive or Impala table, you can use SQL to directly read JSON or Parquet files into a DataFrame.

df = sqlContext.sql("SELECT * FROM json.`input dir`")
df = sqlContext.sql("SELECT * FROM parquet.`input dir`")