This tutorial will explain how to read data from various types of databases(such as Mysql, SingleStore, Teradata) using JDBC Connection into Spark dataframe.

PySpark: DB To Dataframe

This tutorial will explain how to read data from various types of databases(such as Mysql, SingleStore, Teradata) using JDBC Connection into Spark dataframe.


Read From Mysql: These are examples to read from Mysql database using JDBC connection. "mysql-connector-java-8.0.11.jar" jar should be present in Spark library to read data from Mysql database using JDBC connection. This jar can be downloaded from Mysql website. Full table data is being fetched using "dbtable" in first example. Custom query using "query" is used in the the second example.

Replace below attributes in the examples:
Read From SingleStore: These are examples to read from SingleStore / MemSQL database using JDBC connection. Full table data is being fetched using "dbtable" in first example. Custom query using "query" is used in the the second example.

Replace below attributes in the examples:
Read From Teradata: These are examples to read from Teradata database using JDBC connection. "terajdbc4.jar" jar should be present in Spark library to read data from Teradata using JDBC connection. This jar can be downloaded from Teradata website. Full table data is being fetched using "dbtable" in first example. Custom query using "query" is used in the the second example.

Replace below attributes in the examples:
Read From Hive: This section will explain how to read data from hive table. If hive setup is present for spark then table() function can be used to read data from Hive. Full table data is being fetched using "dbtable" in first example. Custom query using "sql" is used in the the second example.