-1
I am working on one of requirement the raw data is in following format
Requirement – Startdate should be the date when status changed to 1 and enddate should be the 1st date after the record status changed from 1 to any other number.
Customer | Status | Date |
---|---|---|
A123 | 0 | 7/2/2021 |
A123 | 0 | 7/15/2021 |
A123 | 0 | 7/22/2021 |
A123 | 1 | 8/18/2021 |
A123 | 1 | 9/8/2021 |
A123 | 0 | 12/1/2021 |
A123 | 0 | 1/21/2022 |
A123 | 1 | 3/6/2022 |
A123 | 1 | 3/7/2022 |
A123 | 0 | 3/15/2022 |
B123 | 1 | 1/1/2022 |
B123 | 0 | 1/6/2022 |
C123 | 1 | 1/2/2022 |
C123 | 2 | 1/8/2022 |
C123 | 0 | 1/9/2022 |
expected output
Customer | StartDate | EndDate |
---|---|---|
A123 | 8/18/2021 | 12/1/2021 |
A123 | 9/8/2021 | 12/1/2021 |
A123 | 3/6/2022 | 3/15/2022 |
A123 | 3/7/2022 | 3/15/2022 |
B123 | 1/1/2022 | 1/6/2022 |
C123 | 1/2/2022 | 1/8/2022 |
Query I tried to get the output is below, I am getting the output for Customer B123 and C123, but not for A123 as expected.
Query Explanation – In 1st part of query I am taking all the records with status = 1 and in next part taking only those records where status is not equal to 1, and joining these 2 datasets based on Customer and row number generated.
SELECT A.[Customer],A.StartDate,B.EndDate
from
(
SELECT [Customer],MIN(Date) AS STARTDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
FROM table1
WHERE [STATUS] = 1
GROUP BY Customer,Date,[Status]
) A
LEFT JOIN
(
SELECT [Customer],MIN(Date) AS ENDDATE,[Status],RANK() OVER (PARTITION BY [STATUS] ORDER BY Date ASC) AS ROWNUM
FROM table1
WHERE [STATUS] != 1
AND Date>(
SELECT MIN(Date) AS STARTDATE
FROM table1
WHERE [STATUS] = 1
)
GROUP BY Customer,Date,[Status]
) B
ON
(
A.[Customer] = B.[Customer]
AND A.RowNum = B.RowNum
)
ORDER BY A.Startdate
sql-servertsqljoingaps-and-islandsranking-functions
ShareFollow
32.6k55 gold badges1313 silver badges3333 bronze badges
asked Mar 31 at 12:03
69733 gold badges1313 silver badges2626 bronze badges
- 3Images of data doesn’t help us help you. Neither does no explanation of the logic. – Larnu Mar 31 at 12:05
- 1@Larnu updated the question with data instead of image and added the requirement and explanation – BIDeveloper Mar 31 at 12:21
- 1Note: this is an «Islands and Gaps» type problem that is best solved using windowing aggregate functions (usually the difference of two of them). Cross Joins and Left Joins can be made to work but are pretty inefficient (O(n^2)) compared to a differential query (O(n)). – RBarryYoung Mar 31 at 12:43
2 Answers
Sorted by:Trending sort available Highest score (default) Trending (recent votes count more) Date modified (newest first) Date created (oldest first)
1
First you list the rows where Status = 1
and then use CROSS APPLY
to get the corresponding minimum Date where the Status is not equal to 1
select s.[Customer],
StartDate = s.[Date],
EndDate = e.[Date]
from Table1 s
cross apply
(
select [Date] = min(e.[Date])
from Table1 e
where e.[Customer] = s.[Customer]
and e.[Date] > s.[Date]
and e.[Status] <> 1
) e
where s.[Status] = 1
order by s.[Customer], s.[Date]
ShareFollow
answered Mar 31 at 12:29
21.6k44 gold badges3535 silver badges3131 bronze badges
1
Here is a more efficient way to do this without a self-join.
WITH cte01only AS
( SELECT *, CASE Status WHEN 1 THEN 1 ELSE 0 END AS Status1 FROM table1 ),
cteDifference AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Date, Status1)
- ROW_NUMBER() OVER (PARTITION BY Customer, Status1 ORDER BY Date) AS StatusGroup
FROM cte01only
),
cteGroup AS
(
SELECT Customer, StatusGroup, Status1, MIN(Date) As StartDate
FROM cteDifference
GROUP BY Customer, StatusGroup, Status1
),
cteNextDate AS
(
SELECT Customer, StatusGroup, Status1, StartDate,
LEAD(StartDate, 1, NULL) OVER (PARTITION BY Customer ORDER BY StatusGroup) AS EndDate
FROM cteGroup
)
SELECT Customer, StartDate, EndDate
FROM cteNextDate
WHERE Status1 = 1
ORDER BY Customer, StateDate
The key trick here is the second CTE which uses the difference of two ROW_NUMBER()
functions to tag the customer records (with the StatusGroup
column) into separate partitions by contiguous runs of records whose status is 1 or not 1. After that they can be grouped according to that tag to get the start dates, and then use the LEAD()
function to get the following group’s StartDate
as the current groupings EndDate
.
(There may be a more compact way to express this, but I like to layout each stage as a separate CTE.)