This tutorial will explain ordered analytical window aggregate functions which can be used to fulfil various user analytical requirements.

PySpark: Dataframe Analytical Functions Part 1


Window functions/attributes: These are most important part of ordered analytical functions and should be understood properly in order to effectively use them.



count function(): count function can be used to count number of records for each group.

sum function(): sum function can be used to calculate sum of each column passed to this function for each group. This function can be applied to only numeric columns.


avg function(): sum function can be used to calculate sum of each column passed to this function for each group. This function can be applied to only numeric columns.


min function(): min function can be used to calculate minimum value within each column passed to this function in each group.


max function(): max function can be used to calculate maximum value within each column passed to this function in each group.