Create "Sets" within the same table based on multi-column criteria

Create "Sets" within the same table based on multi-column criteria - sql-server

For every unique combination of BoxId and Revision with a single UnitTypeId of 1 and a single UnitTypeId of 2 both having a NULL SetNumber, assign a SetNumber of 1.
Table and data setup:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[UnitTypes]') AND type in (N'U'))
Drop Table dbo.UnitTypes
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Tracking]') AND type in (N'U'))
DROP TABLE [dbo].[Tracking]
GO
CREATE TABLE dbo.UnitTypes
(
Id int NOT NULL,
Notes varchar(80)
)
GO
CREATE TABLE dbo.Tracking
(
Id int NOT NULL IDENTITY (1, 1),
BoxId int NOT NULL,
Revision int NOT NULL,
UnitValue int NULL,
UnitTypeId int NULL,
SetNumber int NULL
)
GO
ALTER TABLE dbo.Tracking ADD CONSTRAINT
PK_Tracking PRIMARY KEY CLUSTERED
(
Id
)
GO
Insert Into dbo.UnitTypes (Id, Notes) Values (1, 'X Coord'),
(2, 'Y Coord'),
(3, 'Weight'),
(4, 'Length')
Go
Insert Into dbo.Tracking (BoxId, Revision, UnitValue, UnitTypeId, SetNumber)
Values (1165, 1, 150, 1, NULL),
(1165, 1, 1477, 2, NULL),
(1165, 1, 31, 4, NULL),
(1166, 1, 425, 1, 1),
(1166, 1, 1146, 2, 1),
(1166, 1, 438, 1, NULL),
(1166, 1, 1163, 2, NULL),
(1167, 1, 560, 1, NULL),
(1167, 1, 909, 2, NULL),
(1167, 1, 12763, 3, NULL),
(1168, 1, 21, 1, NULL),
(1168, 1, 13109, 3, NULL)
The ideal results would be:
Id BoxId Revision UnitValue UnitTypeId SetNumber
1 1165 1 150 1 1
2 1165 1 1477 2 1
3 1165 1 31 4 1
4 1166 1 425 1 1
5 1166 1 1146 2 1
6 1166 1 438 1 NULL <--NULL Because there is already an existing Set
7 1166 1 1163 2 NULL <--NULL Because there is already an existing Set
8 1167 1 560 1 1
9 1167 1 909 2 1
10 1167 1 12763 3 1
11 1168 1 21 1 NULL <--NULL Because there is not exactly one UnitTypeId of 1 and exactly one UnitTypeId of 2 for this BoxId\Revision combination.
12 1168 1 13109 3 NULL <--NULL Because there is not exactly one UnitTypeId of 1 and exactly one UnitTypeId of 2 for this BoxId\Revision combination.
EDIT:
The question is how can I update the SetNumber, given the constraints above, using pure TSQL?

If I understand your question correctly, you could do this with a subquery that demands all conditions are met:
update t1
set SetNumber = 1
from dbo.Tracking t1
where SetNumber is null
and 1 =
(
select case
when count(case when t2.UnitTypeId = 1 then 1 end) <> 1 then 0
when count(case when t2.UnitTypeId = 2 then 1 end) <> 1 then 0
when count(t2.SetNumber) <> 0 then 0
else 1
end
from dbo.Tracking t2
where t1.BoxId = t2.BoxId
and t1.Revision = t2.Revision
)
The count(t2.SetNumber) is a bit tricky: this will only count rows where SetNumber is not null. So this meets the criterion that no other set with the same (BoxId, Revision) exists.

Try this out, it returns the same results that you gave. The WITH statement sets up a CTE to query from. The ROW_NUMBER() function is partitioning function that does what you want:
;WITH BoxSets AS (
SELECT
ID
,BoxId
,Revision
,UnitValue
,UnitTypeId
,CASE WHEN UnitTypeId IN (1,2) THEN 1 ELSE 0 END ValidUnit
,ROW_NUMBER() OVER (PARTITION BY BoxID,UnitTypeID ORDER BY BoxID,UnitTypeID,UnitValue ) SetNumber
FROM Tracking
)
SELECT
b.ID
,b.BoxId
,b.Revision
,b.UnitValue
,b.UnitTypeId
,CASE ISNULL(b1.ValidUnits,0) WHEN 0 THEN NULL ELSE CASE b.SetNumber WHEN 1 THEN b.SetNumber ELSE NULL END END
FROM BoxSets AS b
LEFT JOIN (SELECT
BoxID
,SUM(ValidUnit) AS ValidUnits
FROM BoxSets
GROUP BY BoxId
HAVING SUM(ValidUnit) > 1) AS b1 ON b.BoxId = b1.BoxId

Related

Fill gaps with set based on priority

Goal
For each Foo we should try the fill the time frames as much as possible with records from FooBar. It's ok to have empty time frames when no records exists in FooBar.
The records in FooBar behave as a set.
Meaning if FooId and the time frame are (exactly) the same the BarId's are all valid in this time frame.
Gaps should be filled based on the sets priority.
Table structure
CREATE TABLE Foo(
FooId INT NOT NULL,
ValidFrom DATETIME NOT NULL,
ValidUntil DATETIME NOT NULL,
)
CREATE TABLE FooBar (
FooId INT NOT NULL,
BarId INT NOT NULL,
ValidFrom DATETIME NOT NULL,
ValidUntil DATETIME NOT NULL,
Priority TINYINT NOT NULL,
)
Sample data
INSERT INTO Foo(FooId, ValidFrom, ValidUntil)
VALUES
(1, '2020-01-01', '2021-12-31')
, (2, '2020-01-01', '2021-06-30')
INSERT INTO FooBar(FooId, BarId, ValidFrom, ValidUntil, Priority)
VALUES
-- First set for FooId = 1 with prio 1
(1, 1, '2021-01-01', '2021-03-01', 1)
, (1, 2, '2021-01-01', '2021-03-01', 1)
, (1, 3, '2021-01-01', '2021-03-01', 1)
-- Second set for FooId = 1 with prio 2
, (1, 1, '2021-02-01', '2021-06-01', 2)
, (1, 2, '2021-02-01', '2021-06-01', 2)
-- Third set for FooId = 1 with prio 3
, (1, 1, '2021-01-01', '2021-12-31', 3)
, (1, 2, '2021-01-01', '2021-12-31', 3)
, (1, 3, '2021-01-01', '2021-12-31', 3)
-- Fourth set for FooId = 1 with Prio 1
, (1, 4, '2021-04-01', '2021-04-02', 1)
, (1, 5, '2021-04-01', '2021-04-02', 1)
-- First set for FooId = 2 with prio 3
, (2, 6, '2021-01-01', '2021-04-02', 3)
Expected result
Origin column to clarify, shouldn't be part of the generated result set
FooId
BarId
ValidFrom
ValidUntil
Origin
1
1
2021-01-01
2021-03-01
First set
1
2
2021-01-01
2021-03-01
First set
1
3
2021-01-01
2021-03-01
First set
1
1
2021-03-02
2021-03-31
Second set
1
2
2021-03-02
2021-03-31
Second set
1
4
2021-04-01
2021-04-02
Fourth set
1
5
2021-04-01
2021-04-02
Fourth set
1
1
2021-04-03
2021-06-01
Second set
1
2
2021-04-03
2021-06-01
Second set
1
1
2021-06-02
2021-12-31
Third set
1
2
2021-06-02
2021-12-31
Third set
1
3
2021-06-02
2021-12-31
Third set
2
6
2021-01-01
2021-12-31
First set (FooId = 2)
I know this is possible with a cursor or while loop, but I'm looking for a more performant/elegant solution.
Compatibility level is: 130

It's ok to have empty time frames when no records exists in FooBar.
Does this mean that a solution without empty frames is also acceptable?
If so, then the third set for FooId = 1 also defines a BarId = 3 for period 2021-03-02 -> 2021-03-31 for example.
Sample data
Tweaked the data model a bit to no have those timestamps (00:00:00.000) in the result.
Also added a set identifier (FooBar.SetId) for easy origin tracing.
CREATE TABLE Foo(
FooId INT NOT NULL,
ValidFrom DATE/*TIME*/ NOT NULL,
ValidUntil DATE/*TIME*/ NOT NULL
)
CREATE TABLE FooBar (
FooId INT NOT NULL,
BarId INT NOT NULL,
ValidFrom DATE/*TIME*/ NOT NULL,
ValidUntil DATE/*TIME*/ NOT NULL,
Priority TINYINT NOT NULL,
SetId nvarchar(5)
)
INSERT INTO Foo(FooId, ValidFrom, ValidUntil)
VALUES
(1, '2020-01-01', '2021-12-31')
, (2, '2020-01-01', '2021-06-30')
INSERT INTO FooBar(FooId, BarId, ValidFrom, ValidUntil, Priority, SetId)
VALUES
-- First set for FooId = 1 with prio 1
(1, 1, '2021-01-01', '2021-03-01', 1, 'Set 1')
, (1, 2, '2021-01-01', '2021-03-01', 1, 'Set 1')
, (1, 3, '2021-01-01', '2021-03-01', 1, 'Set 1')
-- Second set for FooId = 1 with prio 2
, (1, 1, '2021-02-01', '2021-06-01', 2, 'Set 2')
, (1, 2, '2021-02-01', '2021-06-01', 2, 'Set 2')
-- Third set for FooId = 1 with prio 3
, (1, 1, '2021-01-01', '2021-12-31', 3, 'Set 3')
, (1, 2, '2021-01-01', '2021-12-31', 3, 'Set 3')
, (1, 3, '2021-01-01', '2021-12-31', 3, 'Set 3')
-- Fourth set for FooId = 1 with Prio 1
, (1, 4, '2021-04-01', '2021-04-02', 1, 'Set 4')
, (1, 5, '2021-04-01', '2021-04-02', 1, 'Set 4')
-- First set for FooId = 2 with prio 3
, (2, 6, '2021-01-01', '2021-04-02', 3, 'Set 1')
Solution
The common table expressions (CTE) ValidFrom and ValidPeriod cut all period information from Foo and FooBar in the smallest individual periods.
The previous step also produces an extra trailing, incomplete period for each FooId that is remove with the exists clause.
Then for each individual period fetch the FooBar record with the first priority value (by saying that no similar record with an even smaller priority is allowed: not exists ... fb2.Priority < fb.Priority).
This gives:
with ValidFrom as
(
select f.FooId,
f.ValidFrom
from Foo f
union
select f.FooId,
dateadd(day, 1, f.ValidUntil)
from Foo f
union
select fb.FooId,
fb.ValidFrom
from FooBar fb
union
select fb.FooId,
dateadd(day, 1, fb.ValidUntil)
from Foobar fb
),
ValidPeriod as
(
select vf.FooId,
vf.ValidFrom,
dateadd(day, -1, lead(vf.ValidFrom) over(partition by vf.FooId order by vf.ValidFrom)) as ValidUntil
from ValidFrom vf
)
select vp.FooId,
fb.BarId,
vp.ValidFrom,
vp.ValidUntil,
--fb.ValidFrom,
--fb.ValidUntil,
--fb.Priority,
fb.SetId
from ValidPeriod vp
left join FooBar fb
on fb.FooId = vp.FooId
and fb.ValidFrom <= vp.ValidUntil
and fb.ValidUntil >= vp.ValidFrom
and not exists ( select 'x'
from FooBar fb2
where fb2.FooId = fb.FooId
and fb2.BarId = fb.BarId
and fb2.ValidFrom <= vp.ValidUntil
and fb2.ValidUntil >= vp.ValidFrom
and fb2.Priority < fb.Priority )
where exists ( select 'x'
from ValidPeriod vp2
where vp2.FooId = vp.FooId
and vp2.ValidFrom > vp.ValidFrom )
order by vp.FooId,
vp.ValidFrom,
fb.BarId;
Result
This result contains more period information than those you requested in your expected result. Removing the union's with Foo from the first CTE will remove the null values and limit the period to the period information available in FooBar alone (in fact this would eliminate Foo from the solution entirely).
With vp.ValidFrom and vp.ValidUntil as result periods:
FooId BarId ValidFrom ValidUntil SetId
----- ----- ---------- ---------- -----
1 null 2020-01-01 2020-12-31 null -- extra row
1 1 2021-01-01 2021-01-31 Set 1
1 2 2021-01-01 2021-01-31 Set 1
1 3 2021-01-01 2021-01-31 Set 1
1 1 2021-02-01 2021-03-01 Set 1 -- extra row
1 2 2021-02-01 2021-03-01 Set 1 -- extra row
1 3 2021-02-01 2021-03-01 Set 1 -- extra row
1 1 2021-03-02 2021-03-31 Set 2
1 2 2021-03-02 2021-03-31 Set 2
1 3 2021-03-02 2021-03-31 Set 3 -- extra row
1 1 2021-04-01 2021-04-02 Set 2 -- extra row
1 2 2021-04-01 2021-04-02 Set 2 -- extra row
1 3 2021-04-01 2021-04-02 Set 3 -- extra row
1 4 2021-04-01 2021-04-02 Set 4
1 5 2021-04-01 2021-04-02 Set 4
1 1 2021-04-03 2021-06-01 Set 2
1 2 2021-04-03 2021-06-01 Set 2
1 3 2021-04-03 2021-06-01 Set 3 -- extra row
1 1 2021-06-02 2021-12-31 Set 3
1 2 2021-06-02 2021-12-31 Set 3
1 3 2021-06-02 2021-12-31 Set 3
2 null 2020-01-01 2020-12-31 null -- extra row
2 6 2021-01-01 2021-04-02 Set 1
2 null 2021-04-03 2021-06-30 null -- extra row
With fb.ValidFrom and fb.ValidUntil as result periods:
FooId BarId ValidFrom ValidUntil SetId
----- ----- ---------- ---------- -----
1 null null null null -- extra row
1 1 2021-01-01 2021-03-01 Set 1
1 2 2021-01-01 2021-03-01 Set 1
1 3 2021-01-01 2021-03-01 Set 1
1 1 2021-01-01 2021-03-01 Set 1 -- extra row
1 2 2021-01-01 2021-03-01 Set 1 -- extra row
1 3 2021-01-01 2021-03-01 Set 1 -- extra row
1 1 2021-02-01 2021-06-01 Set 2
1 2 2021-02-01 2021-06-01 Set 2
1 3 2021-01-01 2021-12-31 Set 3 -- extra row
1 1 2021-02-01 2021-06-01 Set 2 -- extra row
1 2 2021-02-01 2021-06-01 Set 2 -- extra row
1 3 2021-01-01 2021-12-31 Set 3 -- extra row
1 4 2021-04-01 2021-04-02 Set 4
1 5 2021-04-01 2021-04-02 Set 4
1 1 2021-02-01 2021-06-01 Set 2
1 2 2021-02-01 2021-06-01 Set 2
1 3 2021-01-01 2021-12-31 Set 3 -- extra row
1 1 2021-01-01 2021-12-31 Set 3
1 2 2021-01-01 2021-12-31 Set 3
1 3 2021-01-01 2021-12-31 Set 3
2 null null null null -- extra row
2 6 2021-01-01 2021-04-02 Set 1
2 null null null null -- extra row
Fiddle to see things in action.

Create comma separated value strings using data from different tables in SQL Server

I have the following database model:
criteria table:
criteria_id criteria_name is_range
1 product_category 0
2 product_subcategory 0
3 items 1
4 criteria_4 1
evaluation_grid table:
evaluation_grid_id criteria_id start_value end_value provider property_1 property_2 property_3
1 1 3 NULL internal 1 1 1
2 1 1 NULL internal 1 1 1
3 2 1 NULL internal 1 2 1
4 3 1 100 internal 2 1 1
5 4 1 50 internal 2 2 1
6 1 2 NULL external 2 8 1
7 2 2 NULL external 2 5 1
8 3 1 150 external 2 2 2
9 3 1 100 external 2 3 1
product_category table:
id name
1 test1
2 test2
3 test3
product_subcategory table:
id name
1 producttest1
2 producttest2
3 producttest3
What I am trying to achieve is returning the values like this:
criteria start_value end_value provider property_1 property_2 property_3
product_category test3, test1 NULL internal 1 1 1
product_subcategory producttest1 NULL internal 1 2 1
items 1 100 internal 2 1 1
criteria_4 1 50 internal 2 2 1
product_category test2 NULL external 2 8 1
product_subcategory producttest2 NULL external 2 5 1
items 1 150 external 2 2 2
criteria_4 1 100 external 2 3 1
Basically keeping the order from table evaluation_grid but grouping only the criterias which are not ranges
in comma separated value strings based on start_value, end_value, provier, property_1, property_2 and property_3
I tried like this:
SELECT c.criteria_name AS criteria
,CASE WHEN c.criteria_id = 1
THEN
(IsNull(STUFF((SELECT ', ' + RTRIM(LTRIM(pc.name))
FROM product_category pc
INNER JOIN [evaluation_grid] eg ON eg.start_value=pc.id
WHERE srsg.criteria_id=c.criteria_id
FOR XML PATH('')), 1, 2, ''), ''))
WHEN c.criteria_id = 2
THEN (IsNull(STUFF((SELECT ' , ' + RTRIM(LTRIM(psc.name))
FROM product_subcategory psc
INNER JOIN [evaluation_grid] eg ON eg.start_value=psc.id
WHERE srsg.criteria_id=c.criteria_id
FOR XML PATH('')
), 1, 3, ''), ''))
ELSE
CAST(eg.start_value AS VARCHAR)
END AS start_value
,eg.end_value AS end_value
,eg.provider AS provider
,eg.property_1 AS property_1
,eg.property_2 AS property_2
,eg.property_3 AS property_3
FROM [evaluation_grid] eg
INNER JOIN criteria c ON eg.criteria_id = crs.criteria_id
GROUP BY c.criteria_name,c.criteria_id,c.is_range,eg.start_value,eg.end_value,eg.provider,eg.property_1,eg.property_2,eg.property_3
But it is returning wrong data, like this:
criteria start_value end_value provider property_1 property_2 property_3
product_category test3, test1, test2 NULL internal 1 1 1
product_category test3, test1, test2 NULL external 2 8 1
product_category test3, test1, test2 NULL internal 1 1 1
product_subcategory producttest1,producttest2 NULL internal 1 2 1
product_subcategory producttest1,producttest2 NULL external 2 5 1
items 1 100 internal 1 1 1
items 1 150 external 2 2 2
criteria_4 1 50 internal 2 2 1
criteria_4 1 100 external 2 3 1
I tried some versions with "with cte;" as well but didn't manage to find the solution yet and yes, I checked the similar questions already. :)
PS: I cannot use STRING_AGG because we have below 2017 Sql Server version.
Any suggestion will be highly appreciated, thanks !

As far as I can tell this query returns the exact output you're looking for.
with cte as (
select c.criteria_name,
eg.evaluation_grid_id,
case when c.criteria_id = 1 then pc.[name]
when c.criteria_id = 2 then psc.[name]
else null end pc_cat,
c.criteria_id,c.is_range, eg.start_value, eg.end_value,
eg.[provider], eg.property_1, eg.property_2,eg.property_3
from #evaluation_grid eg
join #criteria c ON eg.criteria_id = c.criteria_id
left join #product_category pc on eg.start_value=pc.id
left join #product_subcategory psc on eg.start_value=psc.id)
select c.criteria_name as criteria,
case when c.is_range=0 then
STUFF((SELECT ', ' + RTRIM(LTRIM(c2.pc_cat))
FROM cte c2
WHERE c2.criteria_id=c.criteria_id
and c2.is_range=c.is_range
and c2.[provider]=c.[provider]
and c2.property_1=c.property_1
and c2.property_2=c.property_2
and c2.property_3=c.property_3
FOR XML PATH('')), 1, 2, '')
else max(cast(c.start_value as varchar(50))) end as start_value,
c.end_value, c.[provider], c.property_1, c.property_2, c.property_3
from cte c
group by c.criteria_name, c.criteria_id, c.is_range, c.end_value,
c.[provider], c.property_1, c.property_2, c.property_3
order by max(c.evaluation_grid_id);
Output
criteria start_value end_value provider property_1 property_2 property_3
product_category test3, test1 NULL internal 1 1 1
product_subcategory producttest1 NULL internal 1 2 1
items 1 100 internal 2 1 1
criteria_4 1 50 internal 2 2 1
product_category test2 NULL external 2 8 1
product_subcategory producttest2 NULL external 2 5 1
items 1 150 external 2 2 2
criteria_4 1 100 external 2 3 1

It's a bit difficult to follow the requirements. Can you review the setup & results below and let us know what the desired resultset should be?
declare #criteria table (criteria_id int, criteria_name varchar(50), is_range bit)
insert into #criteria
values(1, 'product_category', 0), (2, 'product_subcategory', 0), (3, 'items', 1), (4, 'criteria_4', 1);
declare #evaluation_grid table (evaluation_grid_id int, criteria_id int, start_value int, end_value int, [provider] varchar(50), property_1 int, property_2 int, property_3 int);
insert into #evaluation_grid
values
(1, 1, 3, NULL, 'internal', 1, 1, 1),
(2, 1, 1, NULL, 'internal', 1, 1, 1),
(3, 2, 1, NULL, 'internal', 1, 2, 1),
(4, 3, 1, 100, 'internal', 2, 1, 1),
(5, 4, 1, 50, 'internal', 2, 2, 1),
(6, 1, 2, NULL, 'external', 2, 8, 1),
(7, 2, 2, NULL, 'external', 2, 5, 1),
(8, 3, 1, 150, 'external', 2, 2, 2),
(9, 4, 1, 100, 'external', 2, 3, 1)
declare #product_category table (id int, [name] varchar(50))
insert into #product_category
values (1, 'test1'), (2, 'test2'), (3, 'test3'), (4, 'test4');
declare #product_subcategory table (id int, [name] varchar(50))
insert into #product_subcategory
values (1, 'producttest1'), (2, 'producttest2'), (3, 'producttest3');
select c.criteria_name,
stuff(( select ',' + ipc.[name]
from #evaluation_grid ieg
join #product_category ipc on ieg.start_value = ipc.id
where [provider] = eg.[provider] and property_1 = eg.property_1 and property_2 = eg.property_2 and property_3 = eg.property_3
order by ieg.evaluation_grid_id
for xml path('')), 1 ,1, '') as start_value,
end_value,
[provider],
property_1,
property_2,
property_3
from #evaluation_grid eg
join #criteria c on eg.criteria_id = c.criteria_id
where c.is_range = 0
group
by c.criteria_name, end_value, [provider], property_1, property_2, property_3
union all
select c.criteria_name, cast(start_value as varchar(10)), end_value, [provider], property_1, property_2, property_3
from #evaluation_grid eg
join #criteria c on eg.criteria_id = c.criteria_id
where c.is_range = 1;

Rank or merge sequential rows

I have a log file I need to either rank (but treating sequential and equal rows as ties), or merge sequential equal rows (based on specific column). My table looks like below, The Start and Stop are all being sequential (within the same ID window)
ID Start Stop Value
1 0 1 A
1 1 2 A
1 2 3 A
1 3 4 B
1 4 5 B
1 5 6 A
2 3 4 A
I have two approches to get what I need.
Approach 1: Rank (treating sequential rows with equal values in "Value" as ties) and using ID as partition.
This should give the output below. But how do I do the special rank: Treating sequential rows with equal values in "Value" as ties.
Select *,
rank() OVER (partition by id order by start, stop) as Rank,
XXX as SpecialRank
from Table
ID Start Stop Value Rank SpecialRank
1 0 1 A 1 1
1 1 2 A 2 1
1 2 3 A 3 1
1 3 4 B 4 2
1 4 5 B 5 2
1 5 6 A 6 3
2 3 4 A 1 1
Approach 2: Merge sequential rows with equal values in "Value".
This will shall create a table like below.
ID Start Stop Value
1 0 3 A
1 3 5 B
1 5 6 A
2 3 4 A
I don't know if this helps, but I have also a nextValue column that might help in this
ID Start Stop Value NextValue
1 0 1 A A
1 1 2 A A
1 2 3 A B
1 3 4 B B
1 4 5 B A
1 5 6 A A
2 3 4 A ...
Example-table:
CREATE TABLE #Table ( id int, start int, stop int, Value char(1), NextValue char(1));
INSERT INTO #Table values (1,0, 1, 'A', 'A');
INSERT INTO #Table values (1,1, 2, 'A', 'A');
INSERT INTO #Table values (1,2, 3, 'A', 'B');
INSERT INTO #Table values (1,3, 4, 'B', 'B');
INSERT INTO #Table values (1,4, 5, 'B', 'A');
INSERT INTO #Table values (1,5, 6, 'A', 'A');
INSERT INTO #Table values (2,3, 4, 'A', null);

Use a self join to an aggregate subquery from the full set, e.g.
with rankTable (id, value) as
( select 1, 'A' union all select 1, 'A' union all select 1, 'B' union all select 2, 'A')
select t2.* from rankTable t1 join (
select id, value, rank() over (partition by id order by value) as specialRank from
(
select distinct id, value
from rankTable
) t) t2 on t2.id =t1.id and t2.value = t1.value
id value specialRank
1 A 1
1 A 1
1 B 2
2 A 1

Grouped result in recursive query (SQL Server)

I have a recursive query that is working as intended for calculating weighted average cost for inventory calculation. My problem is that I need multiple weighted average from the same query grouped by different columns. I know I can solve the issue by calculating it multiple times, one for each key-column. But because of query performance considerations, I want it to be traversed once. Sometimes I have 1M+ rows.
I have simplified the data and replaced weighted average to a simple sum to make my problem more easy to follow.
How can I get the result below using recursive cte? Remember that I have to use a recursive query to calculate weighted average cost. I am on sql server 2016.
Example data (Id is also the sort order. The Id and Key is unique together.)
Id Key1 Key2 Key3 Value
1 1 1 1 10
2 1 1 1 10
3 1 2 1 10
4 2 2 1 10
5 1 2 1 10
6 1 1 2 10
7 1 1 1 10
8 3 3 1 10
Expected result
Id Key1 Key2 Key3 Value Key1Sum Key2Sum Key3Sum
1 1 1 1 10 10 10 10
2 1 1 1 10 20 20 20
3 1 2 1 10 30 10 30
4 2 2 1 10 10 20 40
5 1 2 1 10 40 30 50
6 1 1 2 10 50 30 10
7 1 1 1 10 60 40 60
8 3 3 1 10 10 10 70
EDIT
After some well deserved criticism I have to be much better in how I make a question.
Here is an example and why I need a recursive query. In the example I get the result for Key1, but I need it for Key2 and Key3 as well in the same query. I know that I can repeat the same query three times, but that is not preferable.
DECLARE #InventoryItem AS TABLE (
IntentoryItemId INT NULL,
InventoryOrder INT,
Key1 INT NULL,
Key2 INT NULL,
Key3 INT NULL,
Quantity NUMERIC(22,9) NOT NULL,
Price NUMERIC(16,9) NOT NULL
);
INSERT INTO #InventoryItem (
IntentoryItemId,
InventoryOrder,
Key1,
Key2,
Key3,
Quantity,
Price
)
VALUES
(1, NULL, 1, 1, 1, 10, 1),
(2, NULL, 1, 1, 1, 10, 2),
(3, NULL, 1, 2, 1, 10, 2),
(4, NULL, 2, 2, 1, 10, 1),
(5, NULL, 1, 2, 1, 10, 5),
(6, NULL, 1, 1, 2, 10, 3),
(7, NULL, 1, 1, 1, 10, 3),
(8, NULL, 3, 3, 1, 10, 1);
--The steps below will give me the cost "grouped" by Key1
WITH Key1RowNumber AS (
SELECT
IntentoryItemId,
ROW_NUMBER() OVER (PARTITION BY Key1 ORDER BY IntentoryItemId) AS RowNumber
FROM #InventoryItem
)
UPDATE #InventoryItem
SET InventoryOrder = Key1RowNumber.RowNumber
FROM #InventoryItem InventoryItem
INNER JOIN Key1RowNumber
ON Key1RowNumber.IntentoryItemId = InventoryItem.IntentoryItemId;
WITH cte AS (
SELECT
IntentoryItemId,
InventoryOrder,
Key1,
Quantity,
Price,
CONVERT(NUMERIC(22,9), InventoryItem.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price) / NULLIF(InventoryItem.Quantity, 0)) AS AvgPrice
FROM #InventoryItem InventoryItem
WHERE InventoryItem.InventoryOrder = 1
UNION ALL
SELECT
Sub.IntentoryItemId,
Sub.InventoryOrder,
Sub.Key1,
Sub.Quantity,
Sub.Price,
CONVERT(NUMERIC(22,9), Main.CurrentQuantity + Sub.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9),
((Main.CurrentQuantity) * Main.AvgPrice + Sub.Quantity * Sub.price)
/
NULLIF((Main.CurrentQuantity) + Sub.Quantity, 0)
) AS AvgPrice
FROM CTE Main
INNER JOIN #InventoryItem Sub
ON Main.Key1 = Sub.Key1
AND Sub.InventoryOrder = main.InventoryOrder + 1
)
SELECT cte.IntentoryItemId, cte.AvgPrice
FROM cte
ORDER BY IntentoryItemId

Why you will want to calculate on 1M+ rows ?
Secondly I think your db design is wrong ? key1 ,key2,key3 should have been unpivoted and one column called Keys and 1 more column to identify each key group.
It will be clear to you in below example.
If I am able to optimize my query then I can think of calculating many rows else I try to limit number of rows.
Also if possible you can think of keeping calculated column of Avg Price.i.e. when table is populated then you can calculate and store it.
First let us know, if output is correct or not.
DECLARE #InventoryItem AS TABLE (
IntentoryItemId INT NULL,
InventoryOrder INT,
Key1 INT NULL,
Key2 INT NULL,
Key3 INT NULL,
Quantity NUMERIC(22,9) NOT NULL,
Price NUMERIC(16,9) NOT NULL
);
INSERT INTO #InventoryItem (
IntentoryItemId,
InventoryOrder,
Key1,
Key2,
Key3,
Quantity,
Price
)
VALUES
(1, NULL, 1, 1, 1, 10, 1),
(2, NULL, 1, 1, 1, 10, 2),
(3, NULL, 1, 2, 1, 10, 2),
(4, NULL, 2, 2, 1, 10, 1),
(5, NULL, 1, 2, 1, 10, 5),
(6, NULL, 1, 1, 2, 10, 3),
(7, NULL, 1, 1, 1, 10, 3),
(8, NULL, 3, 3, 1, 10, 1);
--select * from #InventoryItem
--return
;with cte as
(
select *
, ROW_NUMBER() OVER (PARTITION BY Key1 ORDER BY IntentoryItemId) AS rn1
, ROW_NUMBER() OVER (PARTITION BY Key2 ORDER BY IntentoryItemId) AS rn2
, ROW_NUMBER() OVER (PARTITION BY Key3 ORDER BY IntentoryItemId) AS rn3
from #InventoryItem
)
,cte1 AS (
SELECT
IntentoryItemId,
Key1 keys,
Quantity,
Price
,rn1
,rn1 rn
,1 pk
FROM cte c
union ALL
SELECT
IntentoryItemId,
Key2 keys,
Quantity,
Price
,rn1
,rn2 rn
,2 pk
FROM cte c
union ALL
SELECT
IntentoryItemId,
Key3 keys,
Quantity,
Price
,rn1
,rn3 rn
,3 pk
FROM cte c
)
, cte2 AS (
SELECT
IntentoryItemId,
rn,
Keys,
Quantity,
Price,
CONVERT(NUMERIC(22,9), InventoryItem.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price)) a,
CONVERT(NUMERIC(22,9), InventoryItem.Price) b,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price) / NULLIF(InventoryItem.Quantity, 0)) AS AvgPrice
,pk
FROM cte1 InventoryItem
WHERE InventoryItem.rn = 1
UNION ALL
SELECT
Sub.IntentoryItemId,
sub.rn,
Sub.Keys,
Sub.Quantity,
Sub.Price,
CONVERT(NUMERIC(22,9), Main.CurrentQuantity + Sub.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9),Main.CurrentQuantity * Main.AvgPrice),
CONVERT(NUMERIC(22,9),Sub.Quantity * Sub.price),
CONVERT(NUMERIC(22,9),
((Main.CurrentQuantity * Main.AvgPrice) + (Sub.Quantity * Sub.price))
/
NULLIF(((Main.CurrentQuantity) + Sub.Quantity), 0)
) AS AvgPrice
,sub.pk
FROM CTE2 Main
INNER JOIN cte1 Sub
ON Main.Keys = Sub.Keys and main.pk=sub.pk
AND Sub.rn = main.rn + 1
--and Sub.InventoryOrder<=2
)
select *
,(select AvgPrice from cte2 c1 where pk=2 and c1.IntentoryItemId=c.IntentoryItemId ) AvgPrice2
,(select AvgPrice from cte2 c1 where pk=2 and c1.IntentoryItemId=c.IntentoryItemId ) AvgPrice3
from cte2 c
where pk=1
ORDER BY pk,rn
Alternate Solution (for Sql 2012+) and many thanks to Jason,
SELECT *
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key1 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey1Price
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key2 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey2Price
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key3 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey3Price
from #InventoryItem
order by IntentoryItemId

Here's how to do it in SQL Server 2012 & later...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
Id INT,
Key1 INT,
Key2 INT,
Key3 INT,
[Value] INT
);
INSERT #TestData(Id, Key1, Key2, Key3, Value) VALUES
(1, 1, 1, 1, 10),
(2, 1, 1, 1, 10),
(3, 1, 2, 1, 10),
(4, 2, 2, 1, 10),
(5, 1, 2, 1, 10),
(6, 1, 1, 2, 10),
(7, 1, 1, 1, 10),
(8, 3, 3, 1, 10);
--=============================================================
SELECT
td.Id, td.Key1, td.Key2, td.Key3, td.Value,
Key1Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key1 ORDER BY td.Id ROWS UNBOUNDED PRECEDING),
Key2Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key2 ORDER BY td.Id ROWS UNBOUNDED PRECEDING),
Key3Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key3 ORDER BY td.Id ROWS UNBOUNDED PRECEDING)
FROM
#TestData td
ORDER BY
td.Id;
results...
Id Key1 Key2 Key3 Value Key1Sum Key2Sum Key3Sum
----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
1 1 1 1 10 10 10 10
2 1 1 1 10 20 20 20
3 1 2 1 10 30 10 30
4 2 2 1 10 10 20 40
5 1 2 1 10 40 30 50
6 1 1 2 10 50 30 10
7 1 1 1 10 60 40 60
8 3 3 1 10 10 10 70

TSQL : conditional query issue

I need to create a flag to identify all Room_IDs where the following is met:
a "Qc-" Status is present within one Hotel_ID.
the "Qc-" Statushas a corresponding non "Qc-" Status (e.g.
'qc-occupied' & 'occupied').
the "Qc-" Status has to have a to have a smaller Room_ID than the
non "Qc-" Status. (e.g. Status = 'qc-occupied' has a Room_ID=1 and Status = 'occupied' has a Room_ID= 5)
This is a simplified table (tableX) I am using as an example:
**Hotel_ID Room_Id Status**
1 1 vacant
1 2 qc-occupied
1 3 vacant
2 1 occupied
2 2 qc-vacant
2 3 vacant
3 1 qc-vacant
4 1 vacant
4 2 occupied
4 3 qc-vacant
5 1 vacant
I need the following as a result:
**Hotel_ID Room_Id Status flag**
1 1 vacant 0
1 2 qc-occupied 0
1 3 vacant 0
2 1 occupied 0
2 2 qc-vacant 1
2 3 vacant 1
3 1 qc-vacant 0
4 1 vacant 0
4 2 occupied 0
4 3 qc-vacant 0
5 1 vacant 0
Thank you in advance !

This is a literal translation of the requirements into rather inelegant code. It can certainly be improved, e.g. by removing your first requirement ("qc-" present.) since it is implicit in the other two requirements. The second requirement is implicit in the third, allowing another improvement.
-- Sample data.
declare #TableX as Table ( Hotel_Id Int, Room_Id Int, Stat VarChar(16) );
insert into #TableX ( Hotel_Id, Room_Id, Stat ) values
( 1, 1, 'vacant' ), ( 1, 2, 'qc-occupied' ), ( 1, 3, 'vacant' ),
( 2, 1, 'occupied' ), ( 2, 2, 'qc-vacant' ), ( 2, 3, 'vacant' ),
( 3, 1, 'qc-vacant' ),
( 4, 1, 'vacant' ), ( 4, 2, 'occupied' ), ( 4, 3, 'qc-vacant' ),
( 5, 1, 'vacant' );
select * from #TableX;
-- Literal translation of requirements.
declare #False as Bit = 0, #True as Bit = 1;
select Hotel_Id, Room_Id, Stat,
QC_In_Hotel, QC_And_NonQC_In_Hotel, QC_Precedes_NonQC_In_Hotel,
case when QC_In_Hotel = #True and QC_And_NonQC_In_Hotel = #True and
QC_Precedes_NonQC_In_Hotel = #True then #True else #False end as Flag
from (
select Hotel_Id, Room_Id, Stat,
-- Req: a "Qc-" Status is present within one Hotel_ID.
case when exists ( select 42 from #TableX as I
where I.Hotel_Id = O.Hotel_Id and I.Stat like 'qc-%' )
then #True else #False end as QC_In_Hotel,
-- Req: the "Qc-" Status has a corresponding non "Qc-" Status (e.g. 'qc-occupied' & 'occupied').
case when exists ( select 42 from #TableX as I
where I.Hotel_Id = O.Hotel_Id and
( ( I.Stat like 'qc-' + O.Stat ) or ( O.Stat like 'qc-' + I.Stat ) ) )
then #True else #False end as QC_And_NonQC_In_Hotel,
-- Req: the "Qc-" Status has to have a to have a smaller Room_ID than the non "Qc-" Status.
case when exists ( select 42 from #TableX as I
where I.Hotel_Id = O.Hotel_Id and
( ( I.Room_Id < O.Room_Id and I.Stat like 'qc-' + O.Stat ) or
( O.Room_Id < I.Room_Id and O.Stat like 'qc-' + I.Stat ) ) )
then #True else #False end as QC_Precedes_NonQC_In_Hotel
from #TableX as O ) as PH
order by Hotel_Id, Room_Id;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Create "Sets" within the same table based on multi-column criteria - sql-server

Related

Fill gaps with set based on priority

Create comma separated value strings using data from different tables in SQL Server

Rank or merge sequential rows

Grouped result in recursive query (SQL Server)

TSQL : conditional query issue

Categories

Resources