Teradata: Split String into table rows
There are 2 similar functions available to Split strings into a table using characters or regexp_string as the delimiter.
Note: Below examples & syntax are for Teradata 15 and above (there is different syntax in Teradata 14)
➠
STRTOK_SPLIT_TO_TABLE
Syntax : All parameters in below syntax are explained.
SELECT split_d.* FROM TABLE (STRTOK_SPLIT_TO_TABLE(input_key,source_string,delimiter)
RETURNS (output_key INTEGER, tokennum INTEGER, resultstring VARCHAR(20) CHARACTER SET LATIN) ) as split_d;
Input/output constants,variables or columns passed to this function must have the required data types
Input datatypes & description
input_key |
NUMERIC, VARCHAR |
Can be used to tie up all the splited rows |
source_string |
CHAR, VARCHAR, CLOB(max 16 Mb) |
String to split into rows |
delimiter |
CHAR, VARCHAR (max of 512 bytes) |
Can be any character or multiple delimiter |
Resultset datatypes & description
output_key |
NUMERIC, VARCHAR |
Return common tied up value(input_key) passed for the rows splited from single string |
tokennum |
INTEGER |
Sequence of number in which string is splited |
result_string |
VARCHAR |
Splited strings |
Example 1: Split the string from a column in database table.
CREATE MULTISET TABLE database_table_example(
id INTEGER,
user_id VARCHAR(50)
)
PRIMARY INDEX(id);
INS database_table_example(7,'alpha4532abc');
INS database_table_example(8,'beta4532cdf');
INS database_table_example(9,'gamma4532cdf');
SELECT split_d.* from table( STRTOK_SPLIT_TO_TABLE(database_table_example.id, database_table_example.user_id, '123456789')
returns (outkey INTEGER,token INTEGER,result_string VARCHAR(100) CHARACTER SET LATIN)) as split_d;
Output:
outkey token result_string
------ ------ ---------------
7 1 alpha
9 1 gamma
7 2 abc
9 2 cdf
8 1 beta
8 2 cdf
Example 2: Apply split function on subset of database table (data filtering).
- Direct WHERE clause will not work to filter rows while using STRTOK_SPLIT_TO_TABLE function.
SELECT split_d.* from table( STRTOK_SPLIT_TO_TABLE(database_table_example.id, database_table_example.user_id, '123456789')
returns (outkey INTEGER,token INTEGER,result_string VARCHAR(100) CHARACTER SET LATIN)) as split_d
WHERE database_table_example.id=7;
Output:
outkey token result_string
------ ------ ---------------
7 1 alpha
9 1 gamma
7 2 abc
9 2 cdf
8 1 beta
8 2 cdf
**Rows with id=7 did not got filtered
- WITH clause can be used to filter data from a table while using STRTOK_SPLIT_TO_TABLE function.
WITH tbl as (SELECT * FROM database_table_example WHERE id=7)
SELECT split_d.* from table( STRTOK_SPLIT_TO_TABLE(Stbl.id, tbl.user_id, '123456789')
returns (outkey INTEGER,token INTEGER,result_string VARCHAR(100) CHARACTER SET LATIN)) as split_d;
Output:
outkey token result_string
------ ------ ---------------
7 1 alpha
7 2 abc
Example 3: Split the string with single delimter(i.e 'Z').
SELECT split_d.* FROM TABLE (STRTOK_SPLIT_TO_TABLE(1,'PradeeepZKhatriZJaipur','Z')
RETURNS (outkey INTEGER, tokennum INTEGER, result_string VARCHAR(20) CHARACTER SET LATIN) ) as split_d order by 1,2;
Output:
outkey tokennum result_string
----------- ----------- --------------------
1 1 Pradeeep
1 2 Khatri
1 3 Jaipur
Example 4: Split the string with multiple delimters(i.e 'Z' ,'e', 'a' & space).
SELECT split_d.* FROM TABLE (STRTOK_SPLIT_TO_TABLE(1,'PradeeepZKhatriZJaipur RJ','Z ea')
RETURNS (outkey INTEGER, tokennum INTEGER, result_string VARCHAR(20) CHARACTER SET LATIN) ) as split_d order by 1,2;
Output:
outkey tokennum result_string
----------- ----------- ---------
1 1 Pr
1 2 d
1 3 p
1 4 Kh
1 5 tri
1 6 J
1 7 ipur
1 8 RJ
Example 5: Split the string with multiple delimters(order does not matters).
SELECT split_d.* FROM TABLE (STRTOK_SPLIT_TO_TABLE(1,'PradeeepZKhatriZJaipur','Zi')
RETURNS (outkey INTEGER, tokennum INTEGER, result_string VARCHAR(20) CHARACTER SET LATIN) ) as split_d order by 1,2;
Output:
outkey tokennum result_string
----------- ----------- -----------
1 1 Pradeeep
1 2 Khatr
1 3 Ja
1 4 pur
Example 6: Split the string with multiple delimiters other than alpha characters(order does not matters).
SELECT split_d.* from table( STRTOK_SPLIT_TO_TABLE(3, 'If23232you452correct5532your577mind235then6878rest131of7897life3448will078fall123into4356place.', '123456789')
returns (outkey INTEGER,token INTEGER,result_string VARCHAR(100) CHARACTER SET LATIN)) as split_d;
Output:
outkey tokennum result_string
----------- ----------- ----------
3 1 If
3 2 you
3 3 correct
3 4 your
3 5 mind
3 6 then
3 7 rest
3 8 of
3 9 life
3 10 will0
3 11 fall
3 12 into
3 13 place.
➠
REGEXP_SPLIT_TO_TABLE
Syntax : All parameters in below syntax are explained.
SELECT split_d.* FROM TABLE (REGEXP_SPLIT_TO_TABLE(input_key,source_string,regexp_string,match_argument)
RETURNS (output_key INTEGER, tokennum INTEGER, resultstring VARCHAR(20) CHARACTER SET UNICODE) ) as split_d;
Input/output constants,variables or columns passed to this function must have the required data types
Input datatypes & description
input_key |
NUMERIC, VARCHAR |
Can be used to tie up all the splited rows |
source_string |
CHAR, VARCHAR, CLOB(max 16 Mb) |
String to split into rows |
regexp_string |
CHAR, VARCHAR (max of 512 bytes) |
can be any character or regular expression pattern delimiter |
match_argument |
VARCHAR |
Acceptable values are for match_argument are listed below |
Resultset datatypes & description
output_key |
NUMERIC, VARCHAR |
Return common tied up value(input_key) passed for the rows splited from single string |
tokennum |
INTEGER |
Sequence of number in which string is splited |
result_string |
VARCHAR |
Splited strings |
Acceptable values are for match_argument:
- i --> case-insensitive matching.
- c --> case sensitive matching.
- n --> the period character (match any character) can match the newline character.
- m --> source_string is treated as multiple lines instead of as a single line. With this option the '^' and '$' characters apply to each line in source_string instead of the entire source_string.
- l --> if source_string exceeds the current maximum allowed source_string size (currently 16 MB), a NULL is returned instead of an error. This is useful for long-running queries where you do not want long strings causing an error that would make the query fail.
Other points for match_argument
If match_argument is not specified:
--> The match will be case sensitive.
--> A period does not match the newline character.
--> source_string will be treated as a single line.
If there is no match, NULL is returned.
If match_arg is not valid, an error is returned.
Example 1: Split the string with single delimter(i.e '&').
SELECT split_d.* from table(REGEXP_SPLIT_TO_TABLE('City','Jaipur&Bangalore&Bhubaneswar','&','c')
returns (outkey varchar(30), token INTEGER, result_string varchar(100))) as split_d;
Output:
outkey tokennum result_string
------------------------------ ----------- ------------
City 1 Jaipur
City 2 Bangalore
City 3 Bhubaneswar
Example 2: Split the string with single regular expression pattern.
SELECT split_d.* from table( REGEXP_SPLIT_TO_TABLE(3, 'If23232you452correct5532your577mind235then6878rest131of7897life3448will078fall123into4356place.', '[0-9]', 'c')
returns (outkey INTEGER,token INTEGER,result_string VARCHAR(100) CHARACTER SET LATIN)) as split_d;
Output:
outkey tokennum result_string
----------- ----------- ----------
3 1 If
3 2 you
3 3 correct
3 4 your
3 5 mind
3 6 and
3 7 rest
3 8 of
3 9 life
3 10 will
3 11 fall
3 12 into
3 13 place.
Example 3: Split the string with multiple regular expression pattern (3 numbers in sequence).
SELECT split_d.* from table( REGEXP_SPLIT_TO_TABLE( 3, 'If23232you452correct5532your577mind235then6878rest131of7897life3448will078fall123into4356place.', '[0-9][0-9][0-9]', 'c')
returns (outkey INTEGER,token INTEGER,result_string VARCHAR(100) CHARACTER SET LATIN)) as split_d;
Output:
outkey tokennum result_string
----------- ----------- ----------
3 1 If
3 2 32you
3 3 correct
3 4 2your
3 5 mind
3 6 then
3 7 8rest
3 8 of
3 9 7life
3 10 8will
3 11 fall
3 12 into
3 13 6place.
Example 4: Split the string with repetition of regular expression pattern.
SELECT split_d.* from table( REGEXP_SPLIT_TO_TABLE(3, 'If23232you452correct5532your577mind235then6878rest131of7897life3448will078fall123into4356place.', '[0-9]*', 'c')
returns (outkey INTEGER,token INTEGER,res VARCHAR(100) CHARACTER SET LATIN)) as split_d;
Output:
outkey tokennum res
----------- ----------- ----------
3 1 I
3 2 f
3 3 y
3 4 o
3 5 u
3 6 c
3 7 o
3 8 r
3 9 r
3 10 e
3 11 c
3 12 t
3 13 y
3 14 o
3 15 u
3 16 r
3 17 m
3 18 i
3 19 n
3 20 d
3 21 t
3 22 h
3 23 e
3 24 n
3 25 r
3 26 e
3 27 s
3 28 t
3 29 o
3 30 f
3 31 l
3 32 i
3 33 f
3 34 e
3 35 w
3 36 i
3 37 l
3 38 l
3 39 f
3 40 a
3 41 l
3 42 l
3 43 i
3 44 n
3 45 t
3 46 o
3 47 p
3 48 l
3 49 a
3 50 c
3 51 e
3 52 .