PySpark: Dataframe Write Modes

Pradeep

PySpark: Dataframe Write Modes

This tutorial will explain how mode() function or mode parameter can be used to alter the behavior of write operation when data (directory) or table already exists.

mode() function can be used with dataframe write operation for any file format or database.
Both option() and mode() functions can be used to alter the behavior of write operation but in a different sense. you can to visit Dataframe Options page to understand how option/options function can be used to alter write behaviour.

Mode function accept six possible values: append, overwrite, error, errorifexists, ignore and default.

Mode value	Description
append	Append content of the dataframe to existing data or table.
overwrite	Overwrite existing data with the content of dataframe.
ignore	Ignore current write operation if data / table already exists without any error.
error \| errorifexists \| default	Throw an exception if data or table already exists.

By default, DataFrameWriter will result in "error/exception" if data already exists and mode function or mode parameter is not used during write operation.

Example when directory / file exists


df.write.mode("error").json("file:///path_to_directory/json_dir")

Error: pyspark.sql.utils.AnalysisException: path file:/path_to_directory/json_dir already exists.;

Example when database table exists


df.write.format("jdbc").options(driver="com.mysql.cj.jdbc.Driver",user="tutorial_user",password="user_password",url="jdbc:mysql://mysql.dbmstutorials.com:3306?serverTimezone=UTC&useSSL=false",dbtable="tutorial_db.spark_department").save()


Error: pyspark.sql.utils.AnalysisException: Table or view 'tutorial_db.spark_department' already exists. SaveMode: ErrorIfExists.;

Following topics will be covered on this page:

➠ Overwrite Existing Data: When overwrite mode is used then write operation will overwrite existing data (directory) or table with the content of dataframe. If data/table does not exists then write operation with overwrite mode will behave normally. Below examples are showing mode operation on CSV and JSON files only but this can be used with any file format / tables.

Example 1: Overwrite CSV data using mode function().


df.write.mode("overwrite").csv("file:///path_to_directory/csv_without_header")

Example 2: Overwrite CSV data using mode parameter.


df.write.csv("file:///path_to_directory/csv_without_header",mode="overwrite")

Example 3: Overwrite JSON data using mode function().


df.write.mode("overwrite").json("file:///path_to_directory/json_dir")

Example 4: Overwrite JSON data using mode parameter.


df.write.json("file:///path_to_directory/csv_without_header",mode="overwrite")

Example 5: Overwrite Mysql table data using mode function().


df.write.mode("overwrite").format("jdbc").options(driver="com.mysql.cj.jdbc.Driver",user="tutorial_user",password="user_password",url="jdbc:mysql://mysql.dbmstutorials.com:3306?serverTimezone=UTC&useSSL=false",dbtable="tutorial_db.spark_department").save()

➠ Append to Existing Data: When append mode is used then write operation will append the content of the dataframe to existing data directory or table. If data/table does not exists then write operation with append mode will behave normally. Below examples are showing mode operation on CSV and JSON files only but this can be used with any file format / tables.

Example 1: Append CSV data using mode function().


df.write.mode("append").csv("file:///path_to_directory/csv_without_header")

Example 2: Append CSV data using mode parameter


df.write.csv("file:///path_to_directory/csv_without_header",mode="append")

Example 3: Append JSON data using mode function().


df.write.mode("append").json("file:///path_to_directory/json_dir")

Example 4: Append JSON data using mode parameter.


df.write.json("file:///path_to_directory/json_dir",mode="append")

Example 5: Append data to existing Mysql table using mode function().


df.write.mode("append").format("jdbc").options(driver="com.mysql.cj.jdbc.Driver",user="tutorial_user",password="user_password",url="jdbc:mysql://mysql.dbmstutorials.com:3306?serverTimezone=UTC&useSSL=false",dbtable="tutorial_db.spark_department").save()

➠ Ignore Write Operation if data exists: When ignore mode is used then write operation will ignore current write operation if data / table already exists without throwing error or exception. If data/table does not exists then write operation with ignore mode will behave normally. Below examples are showing mode operation on CSV and JSON files only but this can be used with any file format / tables.

Example 1: Ignore write operation on CSV data using mode function().


df.write.mode("ignore").csv("file:///path_to_directory/csv_without_header")

Example 2: Ignore write operation on CSV data using mode parameter.


df.write.csv("file:///path_to_directory/csv_without_header",mode="ignore")

Example 3: Ignore write operation on JSON data using mode function().


df.write.mode("ignore").json("file:///path_to_directory/json_dir")

Example 4: Ignore write operation on JSON data using mode parameter.


df.write.json("file:///path_to_directory/json_dir",mode="ignore")

Example 5: Write operation with Ignore will only write data if table does not exists. And if table exists then it will not write any data.


df.write.mode("ignore").format("jdbc").options(driver="com.mysql.cj.jdbc.Driver",user="tutorial_user",password="user_password",url="jdbc:mysql://mysql.dbmstutorials.com:3306?serverTimezone=UTC&useSSL=false",dbtable="tutorial_db.spark_department").save()

➠ Throw Error in Write Operation: When error mode is used then write operation throws an exception if data or table already exists. If data/table does not exists then write operation with error mode will behave normally. Parameter values error, errorifexists and default are aliases of one another. Below examples are showing mode operation on CSV and JSON files only but this can be used with any file format / tables.

Example 1: Throw write exception for CSV data using mode parameter


df.write.mode("error").csv("file:///path_to_directory/csv_without_header")

Example 2: Throw write exception for CSV data using mode parameter


df.write.csv("file:///path_to_directory/csv_without_header",mode="error")

Example 3: Throw write exception for JSON data using mode function().


df.write.mode("error").json("file:///path_to_directory/json_dir")

Example 4: Throw write exception for JSON data using mode parameter.


df.write.json("file:///path_to_directory/json_dir",mode="error")