This tutorial will explain with examples on how to partition a dataframe randomly or based on specified column(s) of a dataframe.

PySpark: Dataframe Partitions Part 1

This tutorial will explain with examples on how to partition a dataframe randomly or based on specified column(s) of a dataframe.


getNumPartitions: RDD function getNumPartitions can be used to get the number of partition in a dataframe.
spark_partition_id: Column function spark_partition_id can be used to get the partition id to which each row belongs to in a dataframe.
Repartition Function: repartition() function can be used to increase or decrease number of partitions. Target dataframe after applying repartition function is hash partitioned.