This tutorial will explain other window analytical functions which can be used to fulfil various user analytical requirements.

PySpark: Dataframe Analytical Functions Part 3


Window functions/attributes: These are most important part of ordered analytical functions and should be understood properly in order to effectively use them.



lag function(): This function is used to find previous row values within partitions or sub-partitions.

lead function(): This function is used to find next row values within partitions or sub-partitions.


first function(): This function is used to get the value of expression column for 1st row within each partition. This is equivalent to first_value() in SQL


last function(): This function is used to get the value of expression column for last row within each partition. By default last() returns the current value of expression column. Default window set for this function is "UNBOUNDED PRECEDING AND CURRENT ROW" and to get the last value within partition this value need to be overridden. . This is equivalent to last_value() in SQL


cume_dist function(): This function will return fraction of rows that are below the current row.