This tutorial will explain various approaches with examples on how to rename an existing column in a dataframe.

PySpark: Dataframe Rename Columns

This tutorial will explain various approaches with examples on how to rename an existing column in a dataframe. Below listed topics will be explained with examples on this page, click on item in the below list and it will take you to the respective section of the page:


List all Columns: columns attribute can be used on a dataframe to return all the column names as a list.
Rename Column using withColumnRenamed: withColumnRenamed() function can be used on a dataframe to rename existing column. If the dataframe schema does not contain the given column then it will not fail and will return the same dataframe.
Rename Column using select: select function can also be used to rename existing column, only downside is that user has specify all the dataframe columns(list can be accessed using df.columns) in select i.e columns which are required in final output.
Rename all columns of a dataframe (Prefix): Python list comprehension is used along with "col" function to rename all the columns of a dataframe by adding a prefix value, you can visit this page to learn more about List comprehension. This is particularly helpful in avoiding name conflicts after joining 2 dataframes having common column names. Rename all columns of a dataframe (Suffix): Python list comprehension is used along with "col" function to rename all the columns of a dataframe by adding a suffix value, you can visit this page to learn more about List comprehension. This is particularly helpful in avoiding name conflicts after joining 2 dataframes having common column names.