This tutorial will explain how mode() function or mode parameter can be used to alter the behavior of write operation when data (directory) or table already exists.
Mode value |
Description |
append |
Append content of the dataframe to existing data or table. |
overwrite |
Overwrite existing data with the content of dataframe. |
ignore |
Ignore current write operation if data / table already exists without any error. |
error | errorifexists | default |
Throw an exception if data or table already exists. |
df.write.mode("error").json("file:///path_to_directory/json_dir")
Error: pyspark.sql.utils.AnalysisException: path file:/path_to_directory/json_dir already exists.;
df.write.format("jdbc").options(driver="com.mysql.cj.jdbc.Driver",user="tutorial_user",password="user_password",url="jdbc:mysql://mysql.dbmstutorials.com:3306?serverTimezone=UTC&useSSL=false",dbtable="tutorial_db.spark_department").save()
Error: pyspark.sql.utils.AnalysisException: Table or view 'tutorial_db.spark_department' already exists. SaveMode: ErrorIfExists.;
df.write.mode("overwrite").csv("file:///path_to_directory/csv_without_header")
df.write.csv("file:///path_to_directory/csv_without_header",mode="overwrite")
df.write.mode("overwrite").json("file:///path_to_directory/json_dir")
df.write.json("file:///path_to_directory/csv_without_header",mode="overwrite")
df.write.mode("overwrite").format("jdbc").options(driver="com.mysql.cj.jdbc.Driver",user="tutorial_user",password="user_password",url="jdbc:mysql://mysql.dbmstutorials.com:3306?serverTimezone=UTC&useSSL=false",dbtable="tutorial_db.spark_department").save()
df.write.mode("append").csv("file:///path_to_directory/csv_without_header")
df.write.csv("file:///path_to_directory/csv_without_header",mode="append")
df.write.mode("append").json("file:///path_to_directory/json_dir")
df.write.json("file:///path_to_directory/json_dir",mode="append")
df.write.mode("append").format("jdbc").options(driver="com.mysql.cj.jdbc.Driver",user="tutorial_user",password="user_password",url="jdbc:mysql://mysql.dbmstutorials.com:3306?serverTimezone=UTC&useSSL=false",dbtable="tutorial_db.spark_department").save()
df.write.mode("ignore").csv("file:///path_to_directory/csv_without_header")
df.write.csv("file:///path_to_directory/csv_without_header",mode="ignore")
df.write.mode("ignore").json("file:///path_to_directory/json_dir")
df.write.json("file:///path_to_directory/json_dir",mode="ignore")
df.write.mode("ignore").format("jdbc").options(driver="com.mysql.cj.jdbc.Driver",user="tutorial_user",password="user_password",url="jdbc:mysql://mysql.dbmstutorials.com:3306?serverTimezone=UTC&useSSL=false",dbtable="tutorial_db.spark_department").save()
df.write.mode("error").csv("file:///path_to_directory/csv_without_header")
df.write.csv("file:///path_to_directory/csv_without_header",mode="error")
df.write.mode("error").json("file:///path_to_directory/json_dir")
df.write.json("file:///path_to_directory/json_dir",mode="error")