Expand row results based on a value in column (with iterator)

Expand row results based on a value in column (with iterator) - sql-server

Need help from you all in writing up this query. Running SQL 2005 Standard edition.
I have a basic query that gets a subset of records from a table where the record_Count is greater then 1.
SELECT *
FROM Table_Records
WHERE Record_Count > 1
This query gives me a result set of, say:
TableRecords_ID Record_Desc Record_Count
123 XYZ 3
456 PQR 2
The above query needs to be modified so that each record appears as many times as the Record_Count and has its iteration number with it, as a value. So the new query should return results as follows:
TableRecords_ID Record_Desc Record_Count Rec_Iteration
123 XYZ 3 1
123 XYZ 3 2
123 XYZ 3 3
456 PQR 2 1
456 PQR 2 2
Could anyone help we write this query up? appreciate the help.
Clarification: Rec_Iteration column is a sub representation of the Record_Count. Basically, since there are three Record_Count for XYZ description thus three rows were returned with the Rec_Iteration representing the Row one , two and three respectively.

You can use a recursive CTE for this query. Below I use a table variable #T instead of your table Table_Records.
declare #T table(TableRecords_ID int,Record_Desc varchar(3), Record_Count int)
insert into #T
select 123, 'XYZ', 3 union all
select 456, 'PQR', 2
;with cte as
(
select TableRecords_ID,
Record_Desc,
Record_Count,
1 as Rec_Iteration
from #T
where Record_Count > 1
union all
select TableRecords_ID,
Record_Desc,
Record_Count,
Rec_Iteration + 1
from cte
where Rec_Iteration < Record_Count
)
select TableRecords_ID,
Record_Desc,
Record_Count,
Rec_Iteration
from cte
order by TableRecords_ID,
Rec_Iteration

Related

Identifying changes over time

No doubt a similar question has come up before, but I haven't been able to locate it by searching...
I have a raw dataset with time series data including 'from' and 'to' date fields.
The problem is, when data is loaded, new records have been created ('to' date added to old record, new record 'from' load date) even where no values have changed.
I want to convert this to a table which just shows a row for each genuine change - and the from/ to dates reflecting this.
By way of example, the source data looks like this:
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/06/2020
Test2
1
2
3
01/07/2020
30/09/2020
Test2
3
2
1
01/10/2020
31/12/9999
The first two records for Test2 (rows 2 and 3) are essentially the same - there was no change when the second row was loaded on 01/07/2020. I want a single row for the period 01/01/2020 - 30/09/2020 for which there was no change:
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/09/2020
Test2
3
2
1
01/10/2020
31/12/9999
For this simplified example, I can achieve that by grouping by each column (apart from dates) and using the MIN from date/ MAX end date:
SELECT
ID, Col1, Col2, Col3, MIN(From) AS From, MAX(To) as TO
FROM TABLE
GROUP BY ID, Col1, Col2, Col3
However, this won't work if a value changes then subsequently changes back to what it was before eg
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/04/2020
Test2
1
2
3
01/05/2020
30/06/2020
Test2
3
2
1
01/07/2020
30/10/2020
Test2
1
2
3
01/11/2020
31/12/9999
Simply using MIN/ MAX in the code above would return this - so it looks like both sets of values were valid for the period from 01/07/2020 - 30/10/2020:
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
31/12/9999
Test2
3
2
1
01/07/2020
30/10/2020
Whereas actually the first set of values were valid before and after that period, but not during.
It should return a single row for instead of two for the period from 01/01/2020 - 30/06/2020 when there were no changes for this ID, but then another row for the period when the values were different, and then another row where it reverted to the initial values, but with a new From date.
ID
Col1
Col2
Col3
From
To
Test1
1
1
1
01/01/2020
31/12/9999
Test2
1
2
3
01/01/2020
30/06/2020
Test2
3
2
1
01/07/2020
30/10/2020
Test2
1
2
3
01/11/2020
31/12/9999
I'm struggling to conceptualise how to approach this.
I'm guessing I need to use LAG somehow but not sure how to apply it - eg rank everything in a staging table first, then use LAG to compare a concatenation of the whole row?
I'm sure I could find a fudged way eventually, but I've no doubt this problem has been solved many times before so hoping somebody can point me to a simpler/ neater solution than I'd inevitably come up with...

Advanced Gaps and Islands
I believe this is an advanced "gaps and islands" problem. Use that as a search term and you'll find plenty of literature on the subject. Only difference is normally only one column is being tracked, but you have 3.
No Gaps Assumption
One major assumption of this script is there is no gap in the overlapping dates, or in other words, it assumes the previous rows ToDate = current FromDate - 1 day.
Not sure if you need to account for gaps, would be simple just add criteria to IsChanged to check for that
Multi-Column Gaps and Islands Solution
DROP TABLE IF EXISTS #Grouping
DROP TABLE IF EXISTS #Test
CREATE TABLE #Test (ID INT IDENTITY(1,1),TestName Varchar(10),Col1 INT,Col2 INT,Col3 INT,FromDate Date,ToDate DATE)
INSERT INTO #Test VALUES
('Test1',1,1,1,'2020-01-01','9999-12-31')
,('Test2',1,2,3,'2020-01-01','2020-04-30')
,('Test2',1,2,3,'2020-05-01','2020-06-30')
,('Test2',3,2,1,'2020-07-01','2020-10-30')
,('Test2',1,2,3,'2020-11-01','9999-12-31')
;WITH cte_Prev AS (
SELECT *
,PrevCol1 = LAG(Col1) OVER (PARTITION BY TestName ORDER BY FromDate)
,PrevCol2 = LAG(Col2) OVER (PARTITION BY TestName ORDER BY FromDate)
,PrevCol3 = LAG(Col3) OVER (PARTITION BY TestName ORDER BY FromDate)
FROM #Test
), cte_Compare AS (
SELECT *
,IsChanged = CASE
WHEN Col1 = PrevCol1
AND Col2 = PrevCol2
AND Col3 = PrevCol3
THEN 0 /*No change*/
ELSE 1 /*Iterate so new group created */
END
FROM cte_Prev
)
SELECT *,GroupID = SUM(IsChanged) OVER (PARTITION BY TestName ORDER BY ID)
INTO #Grouping
FROM cte_Compare
/*Raw unformatted data so you can see how it works*/
SELECT *
FROM #Grouping
/*Aggregated results*/
SELECT GroupID,TestName,Col1,Col2,Col3
,FromDate = MIN(FromDate)
,ToDate = MAX(ToDate)
,NumberOfRowsCollapsedIntoOneRow = COUNT(*)
FROM #Grouping
GROUP BY GroupID,TestName,Col1,Col2,Col3

Create a view using SQL Server with repeating rows and new column

I have a table with the following columns.
EVAL_ID | GGRP_ID | GOAL_ID
1 1 1
2 2 1
2 2 2
3 1 3
I want to create a view with another columns called GOAL_VERSION which has values from 1 to 3. So that each row from the above table should be duplicated 5 times for different GOAL_VERSION numbers. The out put should be like this.
EVAL_ID | GGRP_ID | GOAL_ID |GOAL_VERSION
1 1 1 1
1 1 1 2
1 1 1 3
1 1 1 4
1 1 1 5
2 2 1 1
2 2 1 2
2 2 1 3
2 2 1 4
2 2 1 5
How can I do that. Help me. Thank you.

Is it this you are looking for?
DECLARE #tbl TABLE(EVAL_ID INT,GGRP_ID INT,GOAL_ID INT);
INSERT INTO #tbl VALUES
(1,1,1)
,(2,2,1)
,(2,2,2)
,(3,1,3);
SELECT tbl.*
,x.Nr
FROM #tbl AS tbl
CROSS JOIN (VALUES(1),(2),(3),(4),(5)) AS x(Nr)
EDIT: Varying count of repetition
DECLARE #tbl TABLE(EVAL_ID INT,GGRP_ID INT,GOAL_ID INT);
INSERT INTO #tbl VALUES
(1,1,1)
,(2,2,1)
,(2,2,2)
,(3,1,3);
DECLARE #tblCountOfRep TABLE(CountOfRep INT);
INSERT INTO #tblCountOfRep VALUES(3);
SELECT tbl.*
,y.Nr
FROM #tbl AS tbl
CROSS JOIN (SELECT TOP (SELECT CountOfRep FROM #tblCountOfRep) * FROM(VALUES(1),(2),(3),(4),(5) /*add the max count here*/) AS x(Nr)) AS y
In this case I'd prefer I numbers table...

Take a look at CROSS JOIN. If you make a table that's got one column with the 5 rows you want you can just CROSS JOIN it to get the result you're after.

You can achieve this using a CTE and CROSS APPLY:
;WITH CTE AS
(
SELECT 1 AS GOAL_VERSION
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
UNION
SELECT 5
)
SELECT * FROM <your table>
CROSS APPLY CTE

use "with" (cte) with rank clause for creating view.

If you have a numbers table in SQL database, you can cross join your table with the numbers table for numbers between 1 and 5
Here is my SQL solution for your requirement
select
goals.*,
n.i as GOAL_VERSION
from goals, dbo.NumbersTable(1,5,1) n
And here is the modified version with "cross join" as suggested in the comments
select
goals.*,
n.i as GOAL_VERSION
from goals
cross join dbo.NumbersTable(1,5,1) n
You can realize, I used a SQL table-valued function for SQL numbers table
Please create that SQL function using the source codes given in the referred tutorial
I hope it helps,

Search within ColA duplicates against specific unique vals in ColB to exclude all of ColA

I apologize in advance I feel like I'm missing something really stupid simple. (and let's ignore database structure as I'm kind of locked into that).
I have, let's use customer orders - an order number can be shipped to more than one place. For the sake of ease I'm just illustrating three but it could be more than that (home, office, gift, gift2, gift 3, etc)
So my table is:
Customer orders:
OrderID MailingID
--------------------
1 1
1 2
1 3
2 1
3 1
3 3
4 1
4 2
4 3
What I need to find is OrderIDs that have been shipped to MailingID 1 but not 2 (basically what I need to find is orderID 2 and 3 above).
If it matters, I'm using Sql Express 2012.
Thanks

Maybe this could help:
create table #temp(
orderID int,
mailingID int
)
insert into #temp
select 1, 1 union all
select 1, 2 union all
select 1, 3 union all
select 2, 1 union all
select 3, 1 union all
select 3, 3 union all
select 4, 1 union all
select 4, 2 union all
select 4, 3
-- find orderIDs that have been shipeed to mailingID = 1
select
distinct orderID
from #temp
where mailingID = 1
except
-- find orderIDs that have been shipeed to mailingID = 2
select
orderID
from #temp
where mailingID = 2
drop table #temp

A simple Subquery With NOT IN Operator should work.
SELECT DISTINCT OrderID
FROM <tablename> a
WHERE orderid NOT IN (SELECT orderid
FROM <tablename> b
WHERE b.mailingID = 2)

T-SQL select rows by oldest date and unique category

I'm using Microsoft SQL. I have a table that contains information stored by two different categories and a date. For example:
ID Cat1 Cat2 Date/Time Data
1 1 A 11:00 456
2 1 B 11:01 789
3 1 A 11:01 123
4 2 A 11:05 987
5 2 B 11:06 654
6 1 A 11:06 321
I want to extract one line for each unique combination of Cat1 and Cat2 and I need the line with the oldest date. In the above I want ID = 1, 2, 4, and 5.
Thanks

Have a look at row_number() on MSDN.
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY date_time, id) rn
FROM mytable
) q
WHERE rn = 1
(run the code on SQL Fiddle)

Quassnoi's answer is fine, but I'm a bit uncomfortable with how it handles dups. It seems to return based on insertion order, but I'm not sure if even that can be guaranteed? (see these two fiddles for an example where the result changes based on insertion order: dup at the end, dup at the beginning)
Plus, I kinda like staying with old-school SQL when I can, so I would do it this way (see this fiddle for how it handles dups):
select *
from my_table t1
left join my_table t2
on t1.cat1 = t2.cat1
and t1.cat2 = t2.cat2
and t1.datetime > t2.datetime
where t2.datetime is null

Select top field based on ordering of another field

Sorry if the title is hard to understand -- I'm not quite sure how to describe what I want to do. Let's say I have this table. test1 and test2 are int columns, whereas test3 is a string/varchar:
test1 | test2 | test3
1 1 one
1 2 two
1 3 three
1 4 four
2 10 ten
2 11 eleven
2 12 twelve
3 101 one hundred one
3 104 one hundred four
3 107 one hundred seven
I am trying to figure out the select query that will return the top test3 for each test1, where the ordering is done based on the value in test2. In other words, I am trying to find the query that will return this:
test1 | test3
1 four
2 twelve
3 one hundred seven
It'd be really great if the solution could work on both MS SQL Server (2005 and 2008) and MS Access (2007 and 2010).

A Lowest common denominator answer.
SELECT yourtable.test1, yourtable.test3
FROM yourtable
INNER JOIN
(
SELECT test1, MAX(test2) AS test_2
FROM yourtable
GROUP BY test1
) t
ON t.test1 = yourtable.test1 AND t.test_2 = yourtable.test2

In SQL Server:
SELECT test1, test3
FROM (
SELECT test1, test3, ROW_NUMBER() OVER (PARTITION BY test1 ORDER BY test2 DESC) AS rn
FROM mytable
) q
WHERE rn = 1
or
SELECT test1, test3
FROM (
SELECT DISTINCT test1
FROM mytable
) md
CROSS APPLY
(
SELECT TOP 1
test3
FROM mytable mi
WHERE mi.test1 = md.test1
ORDER BY
mi.test2 DESC
) mt
Each of these methods can be more efficient than another, depending on your data distribution.
This article may be of interest to you:
SQL Server: Selecting records holding group-wise maximum