Teradata Regular Expression Functions

Teradata Regular Expression Functions

There are multiple Teradata Regular Expression Functions which helps to perform variety of String related operation which are not possible using regular string functions. There are five regular expression functions available in Teradata, click on the function below to go directly to that function.
All regexp function has regular expression string as one of the input parameter which distinguishes them from normal functions. These regexp string can be created using metacharacters. Lets see some of available metacharacters used in regexp functions.

Regexp Metacharacters: Lets understand or revise regular expression metacharacters which will be helpful in creating regular expression string used in regexp functions. These metacharacters work same way in most of the modern programming languages.

Metacharacter

Meaning in word

Description (usefulness)

Example Pattern

Comment

.

Period

Wild character which matches any character except newline character

de.r

This can match any string with “de” followed by any character and then followed by “r”

^

caret

Metacharacter which matches start of the String

^de

This can match any string starting with “de” and followed by any character string

$

dollar

Metacharacter which matches end of the String

ar$

This can match any string ending with “ar” and preceded by any character string

[]

Square Bracket

1) This is called character class
2) Characters/pattern defined within these square bracket matches only single character.
3) Multiple classes can be defined one after other to match required string

[a-zA-Z]


[0-9]

[a-zA-Z] can match any string with any alphabet in any case (lower or upper)

[0-9] can match any string containing any of the numerical character.



Complementary Metacharacters: Below list of metacharacters works as a supplement to previous character class or expression.

Metacharacter

Meaning in word

Description (usefulness)

Example

Comment

[^]

Square Bracket/caret

1) This will invert the characters defined in the class
2) That means it will not match those characters defined inside character class

3)^ should be inside the brackets to invert the character class.

4) ^ has no meaning unless it is the first character in the character class.

[^A-Z]


[^a-z]

[^A-Z] can match any string not starting with any of the Upper case character

[^a-z] can match any string not starting with any of the lower case character

*

asterisk

This means zero or more repetitions of the previous character class or expression.

[0-9]*

[0-9]* will match numeric pattern and even will return true if no numeric character is present

+

plus

This means one or more repetitions of the previous character class or expression.

[0-9]+

[0-9]* will match numeric pattern only if there are 1 or more numeric characters

?

question mark

This means zero or one repetition of the previous character class or expression.

[0-9]?

[0-9]* will match pattern only if there is either no numeric or 1 numeric character is present.

{n,m}

curley bracket

This means between n and m repetitions of previous character class or expression.

6{1,3}

1) 6{1,3} will match any of the ( 6 or 66 or 666) numeric pattern.


2) {0,1} will be equivalent to ?



Last argument for all the regular function is same and it called match parameter and below values are possible inputs for that parameter of functions.

Value

Description

i

Matching will be case insensitive.

c

Matching will be case sensitive.

n

By default Period(.) does not match newline charater and with this parameter, period character can match the newline character as well.

m

If source string has new lines characters then '^' and '$' characters apply to each line in source string instead of the entire source string.

x

Matching will ignore whitespace.

l

Matching will return NULL instead of error if source string exceeds the maximun allowed input size.

Note: Multiple values can be passed as match parameters.

If "match parameter" is not specified then

  1. REGEXP_SUBSTR: Extracts a substring from source string that matches a regular expression string parameter.
    Syntax: REGEXP_SUBSTR(source string, regular expression string [,start from position number, nth occurrence, match argument ])
    
    Note: This function supports 2, 3, 4, or 5 parameters.

    Examples:
    • With two parameters: Function will assume to start from beginning of the string and return first matched occurrence.
      SELECT regexp_substr('tractor,scooter,bike,','[s][a-z]+');
      
      Output: scooter
      

    • With three parameters: This will return first matched occurrence starting from the specified position.
      SELECT regexp_substr('tractor,scooter,bike,scooty','[s][a-z]+',15); 
      
      Output: scooty
      

    • With four parameters: This will return nth matched occurrence. In the below 2 examples, 1st occurrence & 2nd occurrence is returned starting from position 1.
      1st occurrence
      SELECT regexp_substr('tractor,scooter,bike','[s][a-z]+',1,1);
      Output: scooter
      
      2nd occurrence
      SELECT regexp_substr('tractor,scooter,bike,scooty','[s][a-z]+',1,2);
      Output: scooty
      

    • With five parameters: Match argument can be passed as specified in the table above. This example will do case insensitive matching.
      SELECT regexp_substr('tractor,scooter,bike,SCOOTY','[s][a-z]+',1,2,'mi');
      
      Output: SCOOTY
      

    • Find first word of Sentence:
      SELECT regexp_substr('tractor scooter bike','[^ ]+',1,1);
      
      Output: tractor
      


  2. REGEXP_REPLACE: Replaces portions of source string that match regular expression string with the replace string.
    Syntax: REGEXP_REPLACE(source string, regular expression string [,replace string, start from position number, nth occurrence, match argument ])
    
    Note: This function supports 2, 3, 4, 5 or 6 parameters.

    Examples:
    • With two parameters: Function will assume replace string as blank & to start from beginning of the string and replace all matched occurrence with blank and return string.
      SELECT regexp_replace('tractor,scooter,bike','[s][a-z]+');
      
      Output: tractor,,bike
      

    • With three parameters: Function will assume to start from beginning of the string and replace all matched occurrence with replace string and return string.
      SELECT regexp_replace('tractor,scooter,bike,summer','[s][a-z]+','scooty'); 
      
      Output: tractor,scooty,bike,scooty
      

    • With four parameters: Function will assume to start from specified position of the string (15 in the below example) and replace all matched occurrence with replace string after that position and return string.
      SELECT regexp_replace('tractor,scooter,bike,summer','[s][a-z]+','scooty',15); 
      
      Output: tractor,scooter,bike,scooty
      

    • With five parameters: Function will assume to start from beginning position of the string and replace nth matched occurrence with replace string and return string. If "0" is passed then it will replace all occurence.
      2nd occurrence
      SELECT regexp_replace('tractor,scooter,bike,summer','[s][a-z]+','scooty',1,2);
      
      Output: tractor,scooter,bike,scooty
      

    • With six parameters:
      SELECT regexp_replace('tractor,scooter,bike,SCOOTY','[s][a-z]+','scooty',1,2,'mi');
      
      Output: tractor,scooter,bike,scooty
      


  3. REGEXP_INSTR: Searches source string for a regular expression string match.
    Syntax: REGEXP_INSTR(source string, regular expression String [start from position argument, nth occurrence, return option,  match argument ])
    
    Note: This function supports 2, 3, 4, 5 or 6 parameters.

    Examples:
    • With two parameters: Function will assume to start from beginning of the string and return first matched occurrence start position.
      SELECT regexp_instr('tractor,scooter,bike,','[s][a-z]+'); --start position of scooter
      
      Output: 9
      

    • With three parameters: This will return first matched occurrence starting position from the specified position.
      SELECT regexp_instr('tractor,scooter,bike,scooty','[s][a-z]+',15);  --start position of scooty
      
      Output: 22
      

    • With four parameters: This will return nth matched occurrence starting position. In the below example, position of 2nd occurrence is returned starting from position 1.
      2nd occurrence
      SELECT regexp_instr('tractor,scooter,bike,scooty,baby','[b][a-z]+',1,2); --start position of baby
      
      Output: 29
      

    • With five parameters: Fifth parameter is whether to return start position or end position of matched string. possible values are 0 & 1, 0 specify to return start position of the string and 1 specify to end position of string.
      SELECT regexp_instr('tractor,scooter,bike,scooty,baby','[b][a-z]+',1,2,1); --end position of baby
      
      Output: 33
      
    • With six parameters: Match argument can be passed as specified in the table above. Belowa example will do case insensitive matching and return position.
      SELECT regexp_instr('tractor,scooter,bike,SCOOTY','[s][a-z]+',1,2,1,'mi'); --end position of SCOOTY
      
      Output: 28
      


  4. REGEXP_SIMILAR: Compares source string to regular expression string and returns integer value.
    Syntax: REGEXP_SIMILAR(source string, regular expression String [, match argument ])
    
    Note: This function supports 2 or 3 parameters.

    Example: Function will try to match whole string with the regular expression to return 0 or 1. 1 specify that string is match and 0 specify string is not matching.
    SELECT user_id,regexp_similar(user_id,'(b|p)eta','i') from alphanumeric_test;
    
    Output: 
    User_Id     regexp_similar(User_Id,'(b|p)eta','i')
    ----------  --------------------------------------
    alpha                                            0
    324BETA                                          0
    alpha4532                                        0
    PETA                                             1
    beta                                             1
    


  5. REGEXP_SPLIT_TO_TABLE: Splits source string into a table of strings using regular expression string as the delimiter. This function is explained separately on String to table tutorials.