This tutorial will explain various approaches with examples on how to drop an existing column(s) from a dataframe.

PySpark: Dataframe Drop Columns

This tutorial will explain various approaches with examples on how to drop an existing column(s) from a dataframe. Below listed topics will be explained with examples on this page, click on item in the below list and it will take you to the respective section of the page:


Drop Column using drop function: drop() function can be used on a dataframe to drop existing column(s). If the dataframe schema does not contain the given column then it will not fail and will return the same dataframe.
Drop Column using select: select function can also be used to drop existing column(s), user has to specify all the dataframe columns(list can be accessed using df.columns) in select i.e columns which are required in final output and don't mention columns which need to be dropped.
Drop Column(s) after join: Many times it is required to drop duplicate columns(drop column with same name) after join . Columns can be dropped using one of the two ways shown above.
Drop Column(s) inplace: Column can be dropped from a dataframe and stored in the same dataframe variable so that it looks like a inplace operation.