This tutorial will explain ranking window functions which can be used to fulfil various user analytical requirements.

PySpark: Dataframe Analytical Functions Part 2


Window functions/attributes: These are most important part of ordered analytical functions and should be understood properly in order to effectively use them.



row_number function(): This function is used to find continuous number within partitions or sub-partitions based on the window feature values.

rank function(): This function is used to find rank within partitions or sub-partitions. Rank function will miss next number(rank) if there are 2 records with same value.


dense_rank function(): This function is used to find rank within partitions or sub-partitions. Dense Rank function will not miss next number(rank) if there are 2 records with same value. This is also sometime called as class rank.


percent_rank function(): This function is used to find percent rank (i.e percentile) within partitions or sub-partitions. Percent_Rank function will miss next number(rank) if there are 2 records with same value.


ntile function(): ntile can be used to divide rows into 'n' buckets within partitions or sub-partitions. For example, if 'n' is 2, the first half of the rows will get value 1 and the second half will get 2.