This tutorial will explain the functions available in Pyspark to split/break dataframe into n smaller dataframes depending on the approximate weight percentage passed using the appropriate parameter.

PySpark: Dataframe Split

This tutorial will explain the functions available in Pyspark to split/break dataframe into n smaller dataframes depending on the approximate weight percentage passed using the appropriate parameter.


Example 1: 2 dataframes will be returned in the below example as 2 decimal values were passed in the list for 'weights' parameter. Since seed parameter is not used, each run may give different split dataframes.
Example 2: With the same seed value, spark will return the same split dataframes in all the run.