Teradata Regular Expression Functions

Pradeep

Teradata Regular Expression Functions

There are multiple Teradata Regular Expression Functions which helps to perform variety of String related operation which are not possible using regular string functions. There are five regular expression functions available in Teradata, click on the function below to go directly to that function.

regexp_substr
regexp_replace
regexp_instr
regexp_similar
regexp_split_to_table

All regexp function has regular expression string as one of the input parameter which distinguishes them from normal functions. These regexp string can be created using metacharacters. Lets see some of available metacharacters used in regexp functions.

Regexp Metacharacters: Lets understand or revise regular expression metacharacters which will be helpful in creating regular expression string used in regexp functions. These metacharacters work same way in most of the modern programming languages.

Metacharacter	Meaning in word	Description (usefulness)	Example Pattern	Comment
.	Period	Wild character which matches any character except newline character	de.r	This can match any string with “de” followed by any character and then followed by “r”
^	caret	Metacharacter which matches start of the String	^de	This can match any string starting with “de” and followed by any character string
$	dollar	Metacharacter which matches end of the String	ar$	This can match any string ending with “ar” and preceded by any character string
[]	Square Bracket	1) This is called character class 2) Characters/pattern defined within these square bracket matches only single character. 3) Multiple classes can be defined one after other to match required string	[a-zA-Z] [0-9]	[a-zA-Z] can match any string with any alphabet in any case (lower or upper) [0-9] can match any string containing any of the numerical character.

Complementary Metacharacters: Below list of metacharacters works as a supplement to previous character class or expression.

Metacharacter	Meaning in word	Description (usefulness)	Example	Comment
[^]	Square Bracket/caret	1) This will invert the characters defined in the class 2) That means it will not match those characters defined inside character class 3)^ should be inside the brackets to invert the character class. 4) ^ has no meaning unless it is the first character in the character class.	[^A-Z] [^a-z]	[^A-Z] can match any string not starting with any of the Upper case character [^a-z] can match any string not starting with any of the lower case character
*	asterisk	This means zero or more repetitions of the previous character class or expression.	[0-9]*	[0-9]* will match numeric pattern and even will return true if no numeric character is present
+	plus	This means one or more repetitions of the previous character class or expression.	[0-9]+	[0-9]* will match numeric pattern only if there are 1 or more numeric characters
?	question mark	This means zero or one repetition of the previous character class or expression.	[0-9]?	[0-9]* will match pattern only if there is either no numeric or 1 numeric character is present.
{n,m}	curley bracket	This means between n and m repetitions of previous character class or expression.	6{1,3}	1) 6{1,3} will match any of the ( 6 or 66 or 666) numeric pattern. 2) {0,1} will be equivalent to ?

Last argument for all the regular function is same and it called match parameter and below values are possible inputs for that parameter of functions.

Value	Description
i	Matching will be case insensitive.
c	Matching will be case sensitive.
n	By default Period(.) does not match newline charater and with this parameter, period character can match the newline character as well.
m	If source string has new lines characters then '^' and '$' characters apply to each line in source string instead of the entire source string.
x	Matching will ignore whitespace.
l	Matching will return NULL instead of error if source string exceeds the maximun allowed input size.

Note:

The match will be case sensitive.
A period(.) will not match the newline character.
Complete source string will be treated as a single line.

REGEXP_SUBSTR: Extracts a substring from source string that matches a regular expression string parameter.
```
Syntax: REGEXP_SUBSTR(source string, regular expression string [,start from position number, nth occurrence, match argument ])
```
Note: This function supports 2, 3, 4, or 5 parameters.

Examples:
- With two parameters: Function will assume to start from beginning of the string and return first matched occurrence.
```
SELECT regexp_substr('tractor,scooter,bike,','[s][a-z]+');

Output: scooter
```
- With three parameters: This will return first matched occurrence starting from the specified position.
```
SELECT regexp_substr('tractor,scooter,bike,scooty','[s][a-z]+',15); 

Output: scooty
```
- With four parameters: This will return nth matched occurrence. In the below 2 examples, 1st occurrence & 2nd occurrence is returned starting from position 1.
```
1st occurrence
SELECT regexp_substr('tractor,scooter,bike','[s][a-z]+',1,1);
Output: scooter

2nd occurrence
SELECT regexp_substr('tractor,scooter,bike,scooty','[s][a-z]+',1,2);
Output: scooty
```
- With five parameters: Match argument can be passed as specified in the table above. This example will do case insensitive matching.
```
SELECT regexp_substr('tractor,scooter,bike,SCOOTY','[s][a-z]+',1,2,'mi');

Output: SCOOTY
```
- Find first word of Sentence:
```
SELECT regexp_substr('tractor scooter bike','[^ ]+',1,1);

Output: tractor
```
REGEXP_REPLACE: Replaces portions of source string that match regular expression string with the replace string.
```
Syntax: REGEXP_REPLACE(source string, regular expression string [,replace string, start from position number, nth occurrence, match argument ])
```
Note: This function supports 2, 3, 4, 5 or 6 parameters.

Examples:
- With two parameters: Function will assume replace string as blank & to start from beginning of the string and replace all matched occurrence with blank and return string.
```
SELECT regexp_replace('tractor,scooter,bike','[s][a-z]+');

Output: tractor,,bike
```
- With three parameters: Function will assume to start from beginning of the string and replace all matched occurrence with replace string and return string.
```
SELECT regexp_replace('tractor,scooter,bike,summer','[s][a-z]+','scooty'); 

Output: tractor,scooty,bike,scooty
```
- With four parameters: Function will assume to start from specified position of the string (15 in the below example) and replace all matched occurrence with replace string after that position and return string.
```
SELECT regexp_replace('tractor,scooter,bike,summer','[s][a-z]+','scooty',15); 

Output: tractor,scooter,bike,scooty
```
- With five parameters: Function will assume to start from beginning position of the string and replace nth matched occurrence with replace string and return string. If "0" is passed then it will replace all occurence.
```
2nd occurrence
SELECT regexp_replace('tractor,scooter,bike,summer','[s][a-z]+','scooty',1,2);

Output: tractor,scooter,bike,scooty
```
- With six parameters:
```
SELECT regexp_replace('tractor,scooter,bike,SCOOTY','[s][a-z]+','scooty',1,2,'mi');

Output: tractor,scooter,bike,scooty
```
REGEXP_INSTR: Searches source string for a regular expression string match.
```
Syntax: REGEXP_INSTR(source string, regular expression String [start from position argument, nth occurrence, return option,  match argument ])
```
Note: This function supports 2, 3, 4, 5 or 6 parameters.

Examples:
- With two parameters: Function will assume to start from beginning of the string and return first matched occurrence start position.
```
SELECT regexp_instr('tractor,scooter,bike,','[s][a-z]+'); --start position of scooter

Output: 9
```
- With three parameters: This will return first matched occurrence starting position from the specified position.
```
SELECT regexp_instr('tractor,scooter,bike,scooty','[s][a-z]+',15);  --start position of scooty

Output: 22
```
- With four parameters: This will return nth matched occurrence starting position. In the below example, position of 2nd occurrence is returned starting from position 1.
```
2nd occurrence
SELECT regexp_instr('tractor,scooter,bike,scooty,baby','[b][a-z]+',1,2); --start position of baby

Output: 29
```
- With five parameters: Fifth parameter is whether to return start position or end position of matched string. possible values are 0 & 1, 0 specify to return start position of the string and 1 specify to end position of string.
```
SELECT regexp_instr('tractor,scooter,bike,scooty,baby','[b][a-z]+',1,2,1); --end position of baby

Output: 33
```
- With six parameters: Match argument can be passed as specified in the table above. Belowa example will do case insensitive matching and return position.
```
SELECT regexp_instr('tractor,scooter,bike,SCOOTY','[s][a-z]+',1,2,1,'mi'); --end position of SCOOTY

Output: 28
```

REGEXP_SIMILAR: Compares source string to regular expression string and returns integer value.


Syntax: REGEXP_SIMILAR(source string, regular expression String [, match argument ])

Note: This function supports 2 or 3 parameters.

Example: Function will try to match whole string with the regular expression to return 0 or 1. 1 specify that string is match and 0 specify string is not matching.


SELECT user_id,regexp_similar(user_id,'(b|p)eta','i') FROM alphanumeric_test;

Output: 
User_Id     regexp_similar(User_Id,'(b|p)eta','i')
----------  --------------------------------------
alpha                                            0
324BETA                                          0
alpha4532                                        0
PETA                                             1
beta                                             1

REGEXP_SPLIT_TO_TABLE: Splits source string into a table of strings using regular expression string as the delimiter. This function is explained separately on String to table tutorials.

Teradata Regular Expression Functions

dbmstutorials.com

Teradata Regular Expression Functions