This tutorial will explain some of the common operations (such as count check, restrict dataframe rows) that can performed on the dataframe.

PySpark: Dataframe Basic Operations

This tutorial will explain some of the common operations (such as count check, restrict dataframe rows) that can performed on the dataframe. Below listed dataframe functions will be explained with examples, click on function name in below list and it will take you to the respective section of the function:


Count: count() function can be used to count number of rows present in dataframe.
isEmpty: To conditionally run some operations, there will be requirements to check whether dataframe is empty or not. It can be determined using count() function or rdd's isEmpty() function.
Limit: limit() function can be used to restrict number of rows in a dataframe. This function takes number as parameter to restrict that many rows in dataframe.
PrintSchema: printSchema() function can be used on dataframe to print schema of the dataframe to the console in a tree format.