This tutorial will explain how to read various types of files(such as JSON, parquet, ORC and Avro) into Spark dataframe.
df_json=spark.read.json("file:///path_to_file/json_tutorial_file.json")
df_json.show()
+-----+---------+-------+
|db_id| db_name|db_type|
+-----+---------+-------+
| 12| Teradata| RDBMS|
| 14|Snowflake|CloudDB|
| 15| Vertica| RDBMS|
| 17| Oracle| RDBMS|
| 19| MongoDB| NOSQL|
+-----+---------+-------+
json_df=spark.read.format("json").option("multiline",True).json("file:///path_to_file/json_multiline_file.json")
json_df.show()
+-----+--------+-------+
|db_id| db_name|db_type|
+-----+--------+-------+
| 14|Teradata| RDBMS|
+-----+--------+-------+
df_pq=spark.read.parquet("file:///path_to_file/parquet_tutorial_file.parquet")
df_pq=spark.read.option("pathGlobFilter", "*.parquet").parquet("file:///path_to_directory")
df_pq.show()
+-----+---------+-------+
|db_id| db_name|db_type|
+-----+---------+-------+
| 12| Teradata| RDBMS|
| 14|Snowflake|CloudDB|
| 15| Vertica| RDBMS|
| 17| Oracle| RDBMS|
| 19| MongoDB| NOSQL|
+-----+---------+-------+
df_pq=spark.read.option("pathGlobFilter", "*.parquet").option("recursiveFileLookup", "true").parquet("file:///path_to_directory")
df_pq.show()
+-----+---------+-------+
|db_id| db_name|db_type|
+-----+---------+-------+
| 12| Teradata| RDBMS|
| 14|Snowflake|CloudDB|
| 15| Vertica| RDBMS|
| 17| Oracle| RDBMS|
| 19| MongoDB| NOSQL|
+-----+---------+-------+
df_orc=spark.read.orc("file:///path_to_file/orc_tutorial_file.orc")
df_orc.show()
+-----+---------+-------+
|db_id| db_name|db_type|
+-----+---------+-------+
| 12| Teradata| RDBMS|
| 14|Snowflake|CloudDB|
| 15| Vertica| RDBMS|
| 17| Oracle| RDBMS|
| 19| MongoDB| NOSQL|
+-----+---------+-------+
df_avro=spark.read.format("avro").load("file:///path_to_file/avro_tutorial_file.avro")
df_avro.show()
+-----+---------+-------+
|db_id| db_name|db_type|
+-----+---------+-------+
| 12| Teradata| RDBMS|
| 14|Snowflake|CloudDB|
| 15| Vertica| RDBMS|
| 17| Oracle| RDBMS|
| 19| MongoDB| NOSQL|
+-----+---------+-------+