Teradata Partitioned Primary Index(PPI)

Pradeep

Teradata Partition Primary Index(PPI)

PPI Characteristics:

Partitioning can be done on volatile,global temporary and permanent tables.
Partitioning column doesn't have to be part of non unique Primary Index.
Partitioning column must be part of unique Primary Index.
Partitioned column/s will not decide which AMP data should go,it is solely dependent on Primary Index.
Rows within AMP will be sorted by partitioned column/s and not by rowId.
PPI can be defined at multi-level i.e sub-partition within partition (called Multilevel PPI).
Teradata uses partition elimination strategy which helps to avoid Full Table Scan(FTS) for partitioned tables.
Partitioned tables may reduce I/O for range queries by reducing the number of rows to be processed .
Details about the partitions,constraints & conditions can be retrieved from dbc.PartitioningConstraintsV view.
```
select * from dbc.PartitioningConstraintsV where databasename='tutorial_db' and tablename='employee';
```

PPI disadvantages:

Partitioning can make single row (primary index) accesses to the table slower if partitioning column is not a part of the primary index column.

Rows are sents to their appropriate AMPs based on row hash. By default rows are ordered within AMP by row Id. For example,Primary index for the table employee is emp_no.

Below query will result in Full Table Scan(FTS) to satify the query since the rows are distributed across the AMPs.


LOCK ROW FOR ACCESS SELECT * FROM tutorial_db.employee where hire_date between '1981-02-01' AND '1981-02-28';

Partitioned Primary Index : Now Partitioned Primary Index is added on the hiredate while the primary index is same as before i.e emp_no. Rows will go to exactly same AMP as before

Syntax/Example :


CREATE MULTISET TABLE tutorial_db.employee
     (
      emp_no INTEGER,
      emp_name VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      job_title VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
      manager_id INTEGER,
      hire_date DATE FORMAT 'yyyy-mm-dd',
      salary DECIMAL(18,2),
      commission DECIMAL(18,2),
      dept_no INTEGER)
PRIMARY INDEX ( emp_no )
PARTITION BY range_n(
  hire_date BETWEEN date '1981-01-01' and '1988-12-31' Each interval '1' Month,
  NO range OR UNKNOWN );

***Only difference will be that data is sorted by hire_date and not by row Id.
Non-Partitioned Index

Below query will use partition elimination strategie to avoid Full Table Scan(FTS) as rows are grouped together in their respective partition(monthly) across the AMPs.


LOCK ROW FOR ACCESS SELECT * FROM tutorial_db.employee where hire_date between '1981-02-01' AND '1981-02-28';

 Question: How many partition on each AMP will need to be read by above query?
 Answer:  1

Teradata also support Case(CASE_N) & Range(RANGE_N) as a partitioning option.

CASE_N : Rows will be placed into the partitions derived using condition mentioned in the Case statement(It will process the first met case condition only).
Syntax/Example :


CREATE MULTISET TABLE tutorial_db.employee(
  emp_no INTEGER,
  emp_name VARCHAR(50),
  job_title VARCHAR(30),
  manager_id INTEGER,
  hire_date Date,
  salary DECIMAL(18,2),
  commission DECIMAL(18,2),
  dept_no INTEGER
)
Primary Index(emp_no)
PARTITION BY CASE_N
(dept_no = 100
,dept_no = 200
,dept_no = 300, NO CASE, UNKNOWN
);

RANGE_N : Rows will be placed into the partitions derived using the range expression. Ranges are always listed in incremental order and ranges cannot be overlapped. Value to the RANGE_N can only be numerical (i.e Date or Number)
Syntax/Example :


CREATE MULTISET TABLE tutorial_db.employee(
  emp_no INTEGER,
  emp_name VARCHAR(50),
  job_title VARCHAR(30),
  manager_id INTEGER,
  hire_date Date,
  salary DECIMAL(18,2),
  commission DECIMAL(18,2),
  dept_no INTEGER
)
Primary Index(emp_no)
PARTITION BY Range_N(hire_date BETWEEN '1981-06-01' AND '1982-06-01' EACH INTERVAL '1' DAY,
NO RANGE OR UNKNOWN );

Partitions within table

Partition number can be identified for any particular row/column using 'PARTITION' keyword for the PPI table. For Example:
```
LOCK ROW FOR ACCESS SELECT PARTITION,hire_date FROM tutorial_db.employee;
```

Partition number can be also identified for both the PPI & non PPI table using CASE_N & RANGE_N. For Example:


SELECT RANGE_N(hire_date BETWEEN date '1981-01-01' and '1988-12-31' EACH INTERVAL '1' MONTH,
          NO RANGE OR UNKNOWN ) AS Partition#,hire_date FROM tutorial_db.employee;

 Note: For non-PPI tables,above query will return only probable partition numbers where data may be stored if that particular PPI is added on the table.

PPI tables can be queried using partition number as well. For Example:
```
SELECT * FROM tutorial_db.employee WHERE PARTITION=2;
```

Handling Unknown values: Different conditions and combinations to handle unknown values and store them into separate or one partitions. Number of partition will depend on how conditions are specified while defining PPI, if 'OR' is used between clauses then only 1 partition will be created for unknown & NULL values and if comma(,) is used then 2 partitions will be created for unknown & NULL values.

NO CASE or NO RANGE : Partition to hold rows that do not fulfill any criteria for CASE_N or RANGE_N respectively.
UNKNOWN : Partition to hold rows with NULL values.
NO CASE, UNKNOWN : Two different partitions,one for NULL values and other for rows not meeting the case criteria.
NO CASE OR UNKNOWN : Only one partition for rows with NULL values and also for rows not meeting the case criteria.
NO RANGE OR UNKNOWN : Only one partition for rows with NULL values and also for rows not falling between the given range.
NO RANGE, UNKNOWN : Two different partitions,one for NULL values and other for rows not falling between the given range.

Note: If unknown values are not handled properly and users try to insert unknown values then users will get below error.


*** Failure 5728 Partitioning violation for table tutorial_db.employee.

Multilevel Partitioned Primary Index(MLPPI) : Multilevel Partitioned primary index allow each partition at a given level to be again partitioned into sub-partitions. MLPPI perform efficient searches by using partition elimination at the various levels or combinations of levels.

Syntax/Example :Below table is first partitioned by year and then within each year rows will be partitoned(sub-partition) by month.


CREATE MULTISET TABLE tutorial_db.pradeep_employee_test
 (
  emp_no INTEGER,
  emp_name VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
  job_title VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
  manager_id INTEGER,
  hire_date DATE FORMAT 'yyyy-mm-dd',
  hire_year INTEGER,
  hire_month INTEGER,
  salary DECIMAL(18,2),
  commission DECIMAL(18,2),
  dept_no INTEGER)
PRIMARY INDEX ( emp_no )
PARTITION BY (
  RANGE_N(hire_year BETWEEN 1980 AND 1988 EACH 1),
  RANGE_N(hire_month BETWEEN 1 AND 12 Each 1,
  NO RANGE OR UNKNOWN ));

Teradata Partitioned Primary Index(PPI) is option in teradata which allows it to divide the data in partition for fast retrieval

dbmstutorials.com

Teradata Partition Primary Index(PPI)