Clever use of the division grouping ranking increasing feature to solve the problem of merging consecutive same data rows

Time:2021-9-24

Question raised

Make some test data to illustrate the problem:


DECLARE @TestData TABLE(ID INT,Col1 VARCHAR(20),Col2 VARCHAR(20)) 
INSERT INTO @TestData(ID,Col1,Col2) 
SELECT 1,'New','Approved' UNION ALL 
SELECT 2,'Approved','Commited' UNION ALL 
SELECT 3,'Commited','In Progress' UNION ALL 
SELECT 4,'New','Approved' UNION ALL 
SELECT 5,'New','Approved' UNION ALL 
SELECT 6,'New','Approved' UNION ALL 
SELECT 7,'Approved','Removed' 
SELECT * FROM @TestData

Data Description: the ID column is continuously self incremented. Both column 1 and column 2 are the status records of PBI in TFs, that is, from what changes to what, such as new creation to approval, approval to submission to Shenma

It is now required that continuous and identical status change records be merged, and discontinuous or different status change records be retained, for example:

Taking the above figure as an example, the records with IDS 1, 4, 5 and 6 are in the approved state from new, but record 1 is not adjacent or continuous with records 4, 5 and 6, so it should be divided into two groups,

Record 1 is a group and records 4, 5 and 6 are a group. Other records are retained because their status changes are different. The final query result should look like the following figure:

You can try it yourself before continuing, which may bring new ideas for solving problems,

Problem solving ideas

The key to this problem is that group by will merge records 1, 4, 5 and 6 together, which does not meet the requirements. Only 4, 5 and 6 need to be merged. There is no such field in the source table to distinguish record 1 from records 4, 5 and 6, which is the key to solving the problem

Here, the rank function can be used together with the partition keyword. Firstly, 1456 is divided into a group, and a new field r ranking in the group is generated. This ranking R is very key and will be used later, as shown in the figure below:

Rank function does not understand the point here

Rank function takes col1 + col2 as grouping conditions, which is divided into four groups: new approved, approved committed, committed in progress and approved removed

In the new approved group, records 1, 4, 5 and 6 rank 1, 2, 3 and 4 respectively; There is only one record in other groups, ranking 1 in their groups

Now an R field is created. The R field identifies the ranking of each record in its group. The ranking increases from 1,

The ID in the source table increases and the rank r in the group increases, which is the key to problem solving,

When the same record appears continuously, its ID and its ranking r increase at the same time, then the difference is the same. It is easy to solve the problem by obtaining the difference. See the figure below:

Records 4, 5 and 6 are the same and appear continuously. Their ID increases at the same time as their ranking, and their difference remains the same. Here, records 4, 5 and 6 can be merged by using col1 + col2 + gap as the grouping condition, and then a minimum ID can be taken. The problem is solved. The complete script is as follows:

But what if the IDs are not continuous? This is not difficult. Refer to [MSSQL] row_ Number function