This tutorial will explain how to use different sample functions available in Pyspark to extract subset of dataframe from the main dataframe.

PySpark Dataframe Sampling

This tutorial will explain how to use different sample functions available in Pyspark to extract subset of dataframe from the main dataframe. There are multiple dataframe functions for data sampling, click on function name in the below list and it will take you to the respective section of the page.

Users can visit this page if they only want to preview the data.




Sample: sample function can be used for random sampling of dataframe.



SampleBy: sampleBy function can be used for column value(s) based sampling of dataframe.