Teradata spreads data rows for each table across the AMPs with the help of Primary Index(PI)

Teradata Primary Index(PI)

Teradata spreads data rows for each table across the AMPs. Whenever the table needs to be read then each AMP has to read portion of the table stored on that AMP. If the AMPs start reading at the same time and there are an equal amount of rows on each AMP, then parallel processing works best. This parallel processing starts with the Primary Index(PI).

The Primary Index (PI) plays 3 important roles:
Primary Index characteristics:
There are two types of Primary Indexes
Unique Primary Index (UPI): It means that the values for the selected column must be unique. Example 1:

CREATE MULTISET TABLE [databasename].[tablename]
(                                   
  col_1   INTEGER,
  col_2   VARCHAR(50)
)
UNIQUE PRIMARY INDEX ( col_1 ); 


Non-Unique Primary Index (NUPI): It means that the values for the selected column can have duplicates. Example 2:

CREATE MULTISET TABLE [databasename].[tablename]
(                                   
  col_1   INTEGER,
  col_2   VARCHAR(50)
)
PRIMARY INDEX ( col_1 ); 


General rules for defining a Primary Indexes Example 3: Checking data distribution of given columns on the AMP

SELECT HASHAMP(HASHBUCKET(HASHROW([columnlist]))),COUNT(*) FROM [databasename].[tablename]
GROUP BY 1 ORDER BY 2 DESC;

Example 4: Storage of a table for each AMP

SELECT vproc, CurrentPerm
FROM DBC.TableSize
WHERE DatabaseName = '[databasename]'
AND TableName = '[tablename]'
ORDER BY 1;

Example 5: Skewness of a table

SELECT TableName,SUM(CurrentPerm) AS CurrentPerm,SUM(PeakPerm) AS PeakPerm,
       CAST((100 - (AVG(CurrentPerm)/MAX(CurrentPerm)*100)) AS INTEGER) AS SkewFactor
FROM DBC.TableSize
WHERE DatabaseName='[databasename]'
AND TableName= '[tablename]'
GROUP BY 1;
Note: If table is more skewed then user may get "no more room in database" error while inserting data even if there is space available. This is because of HOT AMP i.e. there is uneven distribution of data and more data is going to single amp which does not have enough space to handle all data.