TPT Stream Operator
Stream operator is a data consumer operator which works similar to native Teradata Tpump utility.
Teradata advises to use TPT Stream over native Teradata Tpump utility as it is better optimised for performance over native load and also old utilities are not enhanced in new version.
TPT Stream apply locks at the row hash level and not on the entire table.
TPT Stream Characteristics:
- Import/load small volume of data from a file or Teradata database table to empty or populated Teradata table.
- Upto 127 empty/populated tables can be loaded at a time.
- Receives data from the producer operators such as 'DATACONNECTOR Consumer' etc.
- Load data in Teradata table without locking the entire table for write.
- Can be used to run multiple TPT stream on the same tables at the same time.
- Tables with Secondary Indexes and Referential integrity can also be loaded.
- Teradata DML statements like Insert,update,upsert,merge and deletes (delete works with passing schema column as filter) can be executed.
- Stream operator does not discard duplicate rows, it will be inserted into target table if target table is MULTISET and it will be inserted into error table if target table is SET.
There are five data formats of files in which data can be imported using TPT.
- Delimited
- Most commonly used format in which data fields are separated by delimiter character/s in the file.
- All the columns in the schema must be defined as 'VARCHAR'.
- Text
- Field in data file should be separated by fixed length characters in this format.
- TPT will fail with below error if all the columns in schema are not defined as 'CHAR' datatype for 'TEXT' format.
Define_Operator_Import: TPT19108 Data Format 'TEXT' requires all 'CHAR/ANSIDATE' schema.
- Unformatted This format does not have any delimiter and it is completely described by SCHEMA, this works if data file is exported from Teradata System using Unformatted format.
- Formatted When using this format field in data files must be prefixed by record length marker and followed by end of record marker.
- Binary Actual data is prefixed with 2 byte Integer in this format.
Points to be noted for TPT Stream:
- 'TYPE' of the operator should be defined as STREAM. In the 'example 1' below, 'Consumer_Table_Detail' is the place where TPT stream is defined.
- Target table can be empty or populated for TPT Stream.
- Basic attributes which are mandatory( that are UserName, UserPassword, TargetTable, TargetDatabase, LogTable and Tdpid) for functioning of TPT Stream.
- Degree of parallelism can be controlled by defining number of instances in the APPLY block ('3' in the 'example 1' below).
- Producer operator is defined in order to pull data from file which will be used by TPT Stream.
- Attribute 'TenacityHours' is the number of hours job will try before failing if the resources are not available. Default value is 4 hours.
- Attribute 'TenacitySleep' is the number of minutes after which system will try to check for required resources and it must be defined along with 'TenacityHours'. Default value is 6 minutes.
- Attribute 'SkipRows' is number of records that need to be skipped from the input file.
INTEGER SkipRows=2,
- Attribute 'SkipRowsEveryFile' can have either 'Y' & 'N' values and must be used complementary to 'SkipRows'. If multiple files are being processed and SkipRowsEveryFile is defined as 'Y' then it will skip rows defined in SkipRows attribute from all the files.
VARCHAR SkipRowsEveryFile='N'
- Attribute DateForm specify which dataformat teradata should assume for date columns. 'AnsiDate' & 'IntegerDate' are 2 options and default is IntegerDate.
- Careful! always define maximum number of sessions(using MaxSessions) that can be allocated to the job, else it may end up taking all the sessions.
Example 1: Sample TPT Stream script(tpt_stream_unformatted.tpt) for UNFORMATTED format file.
DEFINE JOB Import_Employee_Data
Description 'Test fastload script'
(
DEFINE OPERATOR Consumer_Table_Detail
TYPE STREAM
SCHEMA *
ATTRIBUTES
(
VARCHAR TdPid='dbc',
VARCHAR UserName='tutorial_user',
VARCHAR UserPassWord='tutorial_password',
VARCHAR TargetDatabase='tutorial_db',
VARCHAR TargetTable='emp_unformatted',
VARCHAR LogTable='dept_tpt_stream_log',
VARCHAR DateForm='AnsiDate',
INTEGER MaxSessions=6
);
DEFINE SCHEMA Define_Employee_Schema
(
emp_no INTEGER,
ename_name VARCHAR(50),
hire_date ANSIDATE
);
DEFINE OPERATOR Producer_File_Detail
TYPE DATACONNECTOR PRODUCER
SCHEMA Define_Employee_Schema
ATTRIBUTES
(
VARCHAR DirectoryPath='$HOME/tpt/outfile',
VARCHAR FileName='emp_unformatted.out',
VARCHAR Format='UNFORMATTED',
VARCHAR OpenMode='Read',
VARCHAR DateForm='AnsiDate'
);
APPLY
(
'INSERT INTO tutorial_db.emp_unformatted(emp_no,ename_name,hire_date) VALUES (:emp_no,:ename_name,:hire_date);'
)
TO OPERATOR(Consumer_Table_Detail)
SELECT * FROM OPERATOR (Producer_File_Detail);
);
Example 2: Sample TPT Stream script(tpt_stream_delimited.tpt) for DELIMITED format file with skipping initial records.
DEFINE JOB Import_Employee_Data
Description 'Test fastload script'
(
DEFINE OPERATOR Consumer_Table_Detail
TYPE STREAM
SCHEMA *
ATTRIBUTES
(
VARCHAR TdPid='dbc',
VARCHAR UserName='tutorial_user',
VARCHAR UserPassWord='tutorial_password',
VARCHAR TargetDatabase='tutorial_db',
VARCHAR TargetTable='emp_unformatted',
VARCHAR LogTable='dept_tpt_stream_log',
VARCHAR DateForm='AnsiDate',
INTEGER MaxSessions=6
);
DEFINE SCHEMA Define_Employee_Schema
(
emp_no VARCHAR(11),
ename_name VARCHAR(50),
hire_date VARCHAR(10)
);
DEFINE OPERATOR Producer_File_Detail
TYPE DATACONNECTOR PRODUCER
SCHEMA Define_Employee_Schema
ATTRIBUTES
(
VARCHAR DirectoryPath='$HOME/tpt/outfile',
VARCHAR FileName='tpt_load_delimited_data_file.out',
VARCHAR TextDelimiter='|~|',
VARCHAR Format='Delimited',
VARCHAR OpenMode='Read',
VARCHAR IndicatorMode='N',
VARCHAR DateForm='AnsiDate',
INTEGER SkipRows=2,
VARCHAR SkipRowsEveryFile='N'
);
APPLY
(
'INSERT INTO tutorial_db.emp_delimited(emp_no,ename_name,hire_date) VALUES (:emp_no,:ename_name,:hire_date);'
)
TO OPERATOR(Consumer_Table_Detail)
SELECT * FROM OPERATOR (Producer_File_Detail);
);
Example 3: Sample TPT Stream script(tpt_stream_formatted.tpt) loading data for FORMATTED format
DEFINE JOB Import_Employee_Data
Description 'Test fastload script'
(
DEFINE OPERATOR Consumer_Table_Detail
TYPE STREAM
SCHEMA *
ATTRIBUTES
(
VARCHAR TdPid='dbc',
VARCHAR UserName='tutorial_user',
VARCHAR UserPassWord='tutorial_password',
VARCHAR TargetDatabase='tutorial_db',
VARCHAR TargetTable='emp_unformatted',
VARCHAR LogTable='dept_tpt_stream_log',
VARCHAR DateForm='AnsiDate',
INTEGER MaxSessions=6
);
DEFINE SCHEMA Define_Employee_Schema
(
emp_no INTEGER,
ename_name VARCHAR(50),
hire_date ANSIDATE
);
DEFINE OPERATOR Producer_File_Detail
TYPE DATACONNECTOR PRODUCER
SCHEMA Define_Employee_Schema
ATTRIBUTES
(
VARCHAR DirectoryPath='$HOME/tpt/outfile',
VARCHAR FileName='emp_formatted.out',
VARCHAR Format='FORMATTED',
VARCHAR OpenMode='Read',
VARCHAR DateForm='AnsiDate'
);
APPLY
(
'INSERT INTO tutorial_db.emp_formatted(emp_no,ename_name,hire_date) VALUES (:emp_no,:ename_name,:hire_date);'
)
TO OPERATOR(Consumer_Table_Detail)
SELECT * FROM OPERATOR (Producer_File_Detail);
);
Syntax 1: Executing unformatted TPT Stream script
tbuild -f tpt_stream_unformatted.tpt
Syntax 2: Executing delimited TPT Stream script
tbuild -f tpt_stream_delimited.tpt
Syntax 3: Executing formatted TPT Stream script
tbuild -f tpt_stream_formatted.tpt
Download script files