Teradata: Split String into multiple columns and Vice Versa
There are table functions available which Split strings into multiple columns or concat multiple columns into single string/column using delimiter character/s.
➠ CSVLD:
This functions takes in a delimited string (any delimiter can be passed), parses the string and returns multiple VARCHAR columns.
Syntax:
CSVLD(tablename.column_to_split, delimiter, quote_string)
Creating Table: Creating a table and inserting sample data for illustrating CSVLD function.
CREATE MULTISET VOLATILE TABLE csvld_example
(
id INTEGER,
code VARCHAR(100)
)
PRIMARY INDEX(id)
ON COMMIT PRESERVE ROWS;
INSERT INTO csvld_example(1,'A,B,C');
INSERT INTO csvld_example(2,'Spark,Hive,Flume');
INSERT INTO csvld_example(3,'Teradata,Vertica,Oracle');
- Example 1: Split the string into columns using temporary/permanent table.
SELECT * FROM TABLE (CSVLD(csvld_example.code, ',', '')
RETURNS (col1 varchar(50), col2 varchar(50), col3 varchar(50))) as T1;
Output:
col1 col2 col3
---------- ---------- -----------
Teradata Vertica Oracle
A B C
Spark Hive Flume
- Example 2: Split the string into columns using derived table and pipe (|) as delimiter.
WITH delimi as (SELECT 'Teradata|Vertica|Oracle' as param)
SELECT * FROM TABLE (CSVLD(CAST(delimi.param AS VARCHAR(100)), '|', '"')
RETURNS (col1 varchar(100) character set UNICODE, col2 varchar(100) character set UNICODE, col3 varchar(100)character set UNICODE)) as T1;
Output:
col1 col2 col3
----------- --------- ----------
Teradata Vertica Oracle
➩ Use of Quote String: If data contains the specified delimiter also as value then "Quote string" can be used to tell Teradata that data enclosed within specified quote characters is single column value and treat that column value as one in the final output.
- Example 3: column data contains comma(,) as value and it is enclosed with double quote
INSERT INTO csvld_example(1,'A,"Test,Quote",C');
SELECT * FROM TABLE (CSVLD(csvld_example.code, ',', '"')
RETURNS (col1 varchar(50), col2 varchar(50), col3 varchar(50))) as T1;
col1 col2 col3
---------- ------------ ----------
Teradata Vertica Oracle
A B C
A Test,Quote C
Spark Hive Flume
➩ Restrictions
- When input field value is not in sync with the number of columns defined as output then it will fail with below error.
WITH delimi as (SELECT 'Teradata|Vertica' as param)
SELECT * FROM TABLE (CSVLD(CAST(delimi.param AS VARCHAR(100)), '|', '"')
RETURNS (col1 varchar(100) character set UNICODE, col2 varchar(100) character set UNICODE, col3 varchar(100)character set UNICODE)) as T1;
*** Failure 9134 Input column can't be mapped to output columns.
- When input or output column are defined as CHAR (non varchar) then function will fail with below error.
SELECT * FROM TABLE (CSVLD(csvld_example.code, ',', '')
RETURNS (col1 char(50), col2 char(50), col3 char(50))) as T1;
New Version
*** Failure 9134 Unsupported data type encountered.
Older Version
*** Failure 9134 Input data column is not a varchar type.
- Don't use this function on constant without table otherwise, CSVLD is giving unexpected result (duplicate rows)
SELECT * FROM TABLE(CSVLD(CAST('cow,goat,sheep' AS VARCHAR(14)), ',', '')
RETURNS (col1 varchar(10) character set UNICODE,
col2 varchar(10) character set UNICODE,
col1 varchar(10)character set UNICODE)
) as T1;
Output:
col1 col2 col3
----------- --------- ----------
Teradata Vertica Oracle
Teradata Vertica Oracle
SELECT * FROM TABLE(CSVLD(TRANSLATE(CAST('cow,goat,sheep' AS VARCHAR(14)) USING UNICODE_TO_LATIN), ',', '')
RETURNS (col1 varchar(10),
col2 varchar(10),
col3 varchar(10))
) as T1;
Output:
col1 col2 col3
---------- ---------- ----------
Teradata Vertica Oracle
Teradata Vertica Oracle
➠ CSV:
This functions accepts multiple input columns as input and returns delimited string value merging all the columns with specified delimiter character.
Syntax:
CSV(NEW VARIANT_TYPE(tablename.colum1,tablename.column2 [,tablename.columnN]), delimiter, quote_string)
Note: Most of the datatypes are supported except CLOB
Creating Table: Creating a table and inserting sample data for illustrating CSV function.
CREATE MULTISET VOLATILE TABLE csv_example
(
id INTEGER,
col1 VARCHAR(10),
col2 VARCHAR(10),
col3 VARCHAR(10),
col4 DATE,
col5 INTEGER
)
PRIMARY INDEX(id)
ON COMMIT PRESERVE ROWS;
INSERT INTO csv_example(1,'Teradata','Vertica','Oracle',CURRENT_DATE,100);
INSERT INTO csv_example(2,'Spark','Hive','Flume',CURRENT_DATE,100);
- Example 1: Merging multiple columns into single column for unloading with characterset as "LATIN".
SELECT * FROM TABLE(CSV(NEW VARIANT_TYPE(csv_example.col1,csv_example.col2,csv_example.col3,csv_example.col4,csv_example.col5),',','"')
RETURNS (Technologies varchar(100) character set LATIN)) as t1;
Technologies
------------------------------
Teradata,Vertica,Oracle,2018-0
Spark,Hive,Flume,2018-09-14,10
- Example 2: Merging multiple columns into single column for unloading with characterset as "UNICODE" and delimiter as pipe(|).
SELECT * FROM TABLE(CSV(NEW VARIANT_TYPE(csv_example.col1,csv_example.col2,csv_example.col3,csv_example.col4,csv_example.col5),'|','')
RETURNS (Technologies varchar(100) character set UNICODE)) as t1;
Technologies
------------------------------------------------------------
Teradata|Vertica|Oracle|2018-09-14|100
Spark|Hive|Flume|2018-09-14|100
➩ Use of Quote String: If data contains the specified delimiter as value then it will enclose all strings columns with Quote string in the final output.
- Example 3: column data contains comma(,) as value and in the final output is enclosed with double quote as specified to differentiate it with the delimiter
INSERT INTO csv_example(3,'Quote,Test','Hive','Test,Quote',CURRENT_DATE,100);
SELECT * FROM TABLE(CSV(NEW VARIANT_TYPE(csv_example.col1,csv_example.col2,csv_example.col3,csv_example.col4,csv_example.col5),',','"')
RETURNS (Technologies varchar(100) character set LATIN)) as t1;
Technologies
--------------------------------------------------
"Quote,Test","Hive","Test,Quote",2018-09-14,100
"Teradata","Vertica","Oracle",2018-09-14,100
"Spark","Hive","Flume",2018-09-14,100