This tutorial will explain how you can get 'n' rows into the Python list collection from the Spark dataframe. Python list can be further used to preview data.

PySpark: Dataframe Preview (Part 2)

This tutorial will explain how you can get 'n' rows into the Python list collection from the Spark dataframe. Python list can be further used to preview data. Below listed dataframe functions will be explained with examples, click on function name in the below list and it will take you to the respective section of the function:


Head: head() function can be used on a dataframe to return either first row or 'n' number of records from the top as a list of rows. This should be used to output only small number of records because all the data returned by head() function will be stored in driver's memory and driver process can crash with OutOfMemoryError if data volume is very high.
Tail: tail() function can be used on a dataframe to return 'n' number of records from the bottom as a list of rows. This should be used to output only small number of records because all the data returned by tail() function will be stored in driver's memory and driver process can crash with OutOfMemoryError if data volume is very high.
First: Similar to head() function, first() function can be used on a dataframe to return its first row.
Take: Similar to head() function, take() function can be used on a dataframe to return 'n' number of records from the top as a list of rows. This should be used to output only small number of records because all the data returned by take() function will be stored in driver's memory and driver process can crash with OutOfMemoryError if data volume is very high.
Collect: collect() function will return all the records from dataframe as a list of rows. As this function returns all the rows from dataframe, it should be used along with limit function to output only small number of records because all the data returned by collect() function will be stored in driver's memory and driver process can crash with OutOfMemoryError if data volume is very high.