This tutorial will explain (with examples) how to convert strings into date/timestamp datatypes using TO_DATE / TO_TIMESTAMP functions in Pyspark.

PySpark: Dataframe String to Timestamp

This tutorial will explain (with examples) how to convert strings into date/timestamp datatypes using to_date / to_timestamp functions in Pyspark.


Below table list most of the metacharacters which can be used to create a format_string. This format can used in to_date / to_timestamp functions. Please note that all the characters are case sensitive and some of them may work with less number of pattern characters but fixed length pattern are listed below for simplicity.

MetaCharacters

Description / Functionality

yyyy

Convert year in four digits(example: 1987)

yy

Convert year in two digits(example: 87)

MM

Convert month of the year in number format(example: 12)

MMM

Convert month in 3 characters format(example: Jun)

MMMM

Convert full month name(example: June) format

dd

Convert day of the month in two digits(example: 30)

d

Convert day of the month in 1 digits(example: 5)

DDD

Convert day of the year(example: 276)

HH

Convert hour of the time in two digits i.e. 24 hour format(example: 17)

hh

Convert hour of the time in 12 hour format(example: 11)

a

Convert string timestamp with AM/PM format

mm

Convert minutes of the time in two digits(example: 59)

ss

Convert seconds of the time in two digits(example: 58)

SSS

Convert milliseconds of the time in three digits(example: 545)

SSSSSS

Convert microseconds of the time in six digits(example: 545333)

SSSSSSSSS

Convert nanoseconds of the time (example: 545333444)

VV

Convert timestamp string with timezone Id (e.g. Asia/Calcutta or +05:30) into timestamp

z

Convert timestamp string with timezone abbreviation(e.g. IST or +05:30) into timestamp

zzzz

Convert timestamp string with timezone name(e.g. India Standard Time ) into timestamp


Note: You can visit this page to get the timezone Ids for the required TimeZones.


to_date(): to_date function can be used to convert timestamp strings to timestamp datatype. This function is available to import from Pyspark Sql function library.


to_timestamp(): to_timestamp function can be used to convert timestamp strings to timestamp datatype. This function is available to import from Pyspark Sql function library.