This tutorial will explain (with examples) how to format data and timestamp datatypes using date_format function in Pyspark.

PySpark: Dataframe Format Timestamp

This tutorial will explain (with examples) how to format data and timestamp datatypes using date_format function in Pyspark.


Below table list most of the metacharacters which can be used to create a format string. This format string can used in date_format function. Please note that all the characters are case sensitive and some of them may work with less number of pattern characters but fixed length pattern are listed below for simplicity.

MetaCharacters

Description / Functionality

yyyy

Return formatted year in four digits(example: 1987)

yy

Return formatted year in two digits(example: 87)

MM

Return formatted month of the year in number format(example: 12)

MMM

Return formatted month in 3 characters format(example: Jun)

MMMM

Return formatted full month name(example: June) format

dd

Return formatted day of the month in two digits(example: 30)

d

Return formatted day of the month in 1 digits(example: 5)

DDD

Return formatted day of the year(example: 276)

EEE

Return formatted day of the week in 3 characters(example: Wed)

EEEE

Return formatted full name of week day(example: Wednesday)

HH

Return formatted hour of the time in two digits i.e. 24 hour format(example: 17)

hh

Return formatted hour of the time in 12 hour format(example: 11)

a

Return formatted timestamp string with AM/PM format

mm

Return formatted minutes of the time in two digits(example: 59)

ss

Return formatted seconds of the time in two digits(example: 58)

SSS

Return formatted milliseconds of the time in three digits(example: 545)

SSSSSS

Return formatted microseconds of the time in six digits(example: 545333)

SSSSSSSSS

Return formatted nanoseconds of the time (example: 545333444)

VV

Return formatted timestamp string with timezone Id (e.g. Asia/Calcutta)

z

Return formatted timestamp string with timezone abbreviation(e.g. IST)

zzzz

Return formatted timestamp string with timezone name(e.g. India Standard Time )

O

Return formatted timestamp offset from GMT (e.g GMT-7)

Z

Return formatted timestamp offset from GMT (e.g. -0700 )

x

Return formatted timestamp offset from GMT (e.g. -07 )



date_format(): date_format function can be used to format data/timestamp datatype column. This function is available to import from Pyspark Sql function library.