Teradata: Group Consecutive Dates
- This tutorial will explain how to group consecutive dates and preserve gaps.
- Consecutive date means end_dt of current row should be same as start_dt of next row or start date of next row should start_dt+1 of current row.
- This collapsing of consecutive dates can be done using multiple approaches.
- Watch this video on youtube which is explaining below examples for grouping consecutive dates.
Lets try to understand how row grouping will work with the help of a simple example.
- Sample Data: Consider below data showing 2 group of Ids having multiple records over different period.
task_id group_id start_dt end_Dt
------- ----------- ---------- ----------
1 2 2018-08-01 2018-08-02
2 2 2018-08-02 2018-08-03
3 2 2018-08-03 2018-08-04
4 5 2018-08-13 2018-08-14
5 5 2018-08-14 2018-08-15
6 5 2018-08-28 2018-08-29
7 2 2018-08-30 2018-08-31
- Required Output: Grouping all rows with consecutive dates into a single row where end_dt of current row is same as start_dt of next row.
group_id start_dt end_dt
-------- ---------- ----------
2 2018-08-01 2018-08-04
5 2018-08-13 2018-08-15
5 2018-08-28 2018-08-29
2 2018-08-30 2018-08-31
Result Explanation:
- First 3 rows with group_id=2 got grouped because all these 3 row's start date is consecutive i.e Aug 1, 2, 3.
- 4th & 5th rows with group_id=5 got grouped because both row's start date is consecutive i.e Aug 13, 14.
- 6th row with group_id=5 did not got grouped with any other row because there is no consecutive date row against '2018-08-28'.
- 7th row with group_id=2 did not got grouped with any other row because there is no consecutive date row against '2018-08-30'.
Dataset: Lets convert the above dataset into a table for demonstration purpose.
CREATE MULTISET VOLATILE TABLE manrega_task
(
task_id INTEGER,
group_id INTEGER,
start_dt DATE,
end_Dt DATE
)
ON COMMIT PRESERVE ROWS;
--Data
INSERT INTO manrega_task VALUES(1,2,'2018-08-01','2018-08-02');
INSERT INTO manrega_task VALUES(2,2,'2018-08-02','2018-08-03');
INSERT INTO manrega_task VALUES(3,2,'2018-08-03','2018-08-04');
INSERT INTO manrega_task VALUES(4,5,'2018-08-13','2018-08-14');
INSERT INTO manrega_task VALUES(5,5,'2018-08-14','2018-08-15');
INSERT INTO manrega_task VALUES(6,5,'2018-08-28','2018-08-29');
INSERT INTO manrega_task VALUES(7,2,'2018-08-30','2018-08-31');
INSERT INTO manrega_task VALUES(8,4,'2018-08-01','2018-08-02');
INSERT INTO manrega_task VALUES(9,4,'2018-08-02','2018-08-03');
INSERT INTO manrega_task VALUES(10,4,'2018-08-03','2018-08-04');
Approach 1: Query to group consecutive days using row_number
--Step 2:
SELECT
group_id,
MIN(start_dt) start_dt,
MAX(end_dt) end_dt
FROM
(
--Step 1:
SELECT
group_id,
start_dt,
start_dt+1 end_dt,
start_dt - ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY start_dt) grouping_date
FROM manrega_task
)ref
GROUP BY grouping_date,group_id
ORDER BY 2;
Output:
group_id start_dt end_dt
-------- ---------- ----------
2 2018-08-01 2018-08-04
5 2018-08-13 2018-08-15
5 2018-08-28 2018-08-29
2 2018-08-30 2018-08-31
Explanation: grouping_date will hold same date value for all consecutive dates for the given grouping( partition by group_id in this case) and this is used to group consecutive dates together.
Approach 2: Query to group consecutive days using normalize
--Step 2:
SELECT
group_id,
BEGIN(period_type) start_dt,
END(period_type) end_dt
FROM
(
--Step 1:
SELECT NORMALIZE
group_id,
PERIOD(start_dt,end_dt) period_type
FROM
manrega_task
)ref
ORDER BY 2;
Output:
group_id start_dt end_dt
-------- ---------- ----------
2 2018-08-01 2018-08-04
5 2018-08-13 2018-08-15
5 2018-08-28 2018-08-29
2 2018-08-30 2018-08-31
Click
here to know more about Normalize operation.