This tutorial will explain how to read various types of files(such as JSON, parquet, ORC and Avro) into Spark dataframe.

PySpark: File To Dataframe(Part 2)

This tutorial will explain how to read various types of files(such as JSON, parquet, ORC and Avro) into Spark dataframe.


Read JSON file: json() function can be used to read data from JSON file. Files used in this example can be downloaded from here(normal JSON) and here(multiline JSON).
Read Parquet file: parquet() function can be used to read data from Parquet file. File used in this example can be downloaded from here.
Read ORC file: Spark also support ORC file format which is mostly used in Hive. orc() function can be used for this purpose. File used in this example can be downloaded from here.
Read Avro file: Avro file format is not native to Spark and a spark-avro_x.xx-x.x.x.jar jar is required to be added in Spark library to read/write Avro files. Spark Avro jar can be downloaded from maven repository, here is the link to download spark-avro_2.12-3.0.3.jar. File used in this example can be downloaded from here.