I tried to find answers by searching articles on the web and SO suggestions (e.g., INSERT, ALTER TABLE, MERGE, COALESCE, INSERT INTO SELECT). This suggestion using FULL JOIN or UNION ALL is close to what is needed, but the new fields added to the table need to be appended to their corresponding "id" and not become new records as shown (Table C): Creating table from two different tables sql
SSMS2018 will be used to create a time series using data from different tables. Each date has multiple tables with different fields. The field "id" is present in all tables (FYI: "id" is the company's id number).
Steps:
needs to combine all the fields into one new table for a given date.
a master table needs to be created with the data for all dates and all fields (note: there may be new "id"'s added or existing "id"'s dropped across dates). The goal is to be able to analyze the values for each field across all dates grouped by "id" (see example below).
Questions:
What SQL statement(s) in SSMS '18 are used to perform the steps above?
Is it possible and more efficient to use JOINs or another SQL function to perform Step 2?
Example:
Step 1: Append the fields in Table 2 to Table 1
Table 1
date id field1 field2 Table 2 date id field5 field6
20191231 a1 4 4 20191231 a1 9 5
20191231 b5 4 10 20191231 b5 8 8
20191231 c9 2 9 20191231 c9 9 10
Table 1 (revised)
date id field1 field2 field5 field6
20191231 a1 4 4 9 5
20191231 b5 4 10 8 8
20191231 c9 2 9 9 10
Step 2: Combine / Merge Table 1 (revised) with Table 4 (Table 4 was previously created using Step 1) to create a time series in "New Table"
Table 4
date id field1 field2 field5 field6
20190930 a1 1 7 0 7
20190930 b5 3 2 6 1
20190930 c9 5 10 4 6
20190930 d11 0 5 3 7
New Table
date id field1 field2 field5 field6
20190930 a1 1 7 0 7
20191231 a1 4 4 9 5
20190930 b5 3 2 6 1
20191231 b5 4 10 8 8
20190930 c9 5 10 4 6
20191231 c9 2 9 9 10
20190930 d11 0 5 3 7
20191231 d11 NULL NULL NULL NULL
Instead of "appending fields" from Table2 to Table1, etc. and then creating a main table, the relational way would be to convert variable lists of columns into variable rows with a fixed number of columns. This means 'unpivoting' each table directly into a normalized Test_Main table. The 'New Table' output could be produced by a query using conditional aggregation.
Data
drop table if exists #tTEST1;
go
select * INTO #tTEST1 from (values
('20191231', 'a1', 4, 4),
('20191231', 'b5', 4, 10),
('20191231', 'c9', 2, 9)) V(mdate, id, field1, field2);
drop table if exists #tTEST2;
go
select * INTO #tTEST2 from (values
('20191231', 'a1', 9, 5),
('20191231', 'b5', 8, 8),
('20191231', 'c9', 9, 10)) V(mdate, id, field5, field6);
drop table if exists #tTEST4;
go
select * INTO #tTEST4 from (values
('20191230', 'a1', 1, 7, 0, 7),
('20191230', 'b5', 3, 2, 6, 1),
('20191230', 'c9', 5, 10, 4, 6),
('20191230', 'd11', 0, 5, 3, 7)) V(mdate, id, field1, field2, field5, field6);
DDL of main table
drop table if exists #tTEST_Main;
go
create table #tTEST_Main(
id varchar(10) not null,
mdate date not null,
field_name varchar(100) not null,
series_val int not null,
constraint
unq_tm_id_m_fn unique(id, mdate, field_name));
Unpivoting queries to populate Test_Main table
insert #tTEST_Main(id, mdate, field_name, series_val)
select v.*
from #tTEST1 t1
cross apply
(values (id, mdate, 'field1', field1),
(id, mdate, 'field2', field2)) v(id, mdate, field_name, series_val);
insert #tTEST_Main(id, mdate, field_name, series_val)
select v.*
from #tTEST2 t2
cross apply
(values (id, mdate, 'field5', field5),
(id, mdate, 'field6', field6)) v(id, mdate, field_name, series_val);
insert #tTEST_Main(id, mdate, field_name, series_val)
select v.*
from #tTEST4 t4
cross apply
(values (id, mdate, 'field1', field1),
(id, mdate, 'field2', field2),
(id, mdate, 'field5', field5),
(id, mdate, 'field6', field6)) v(id, mdate, field_name, series_val);
Query to output "New Table" results
select id, mdate,
max(case when field_name='field1' then series_val else 0 end) field1,
max(case when field_name='field2' then series_val else 0 end) field2,
max(case when field_name='field5' then series_val else 0 end) field5,
max(case when field_name='field6' then series_val else 0 end) field6
from #tTEST_Main
group by id, mdate;
Output
id mdate field1 field2 field5 field6
a1 2019-12-30 1 7 0 7
a1 2019-12-31 4 4 9 5
b5 2019-12-30 3 2 6 1
b5 2019-12-31 4 10 8 8
c9 2019-12-30 5 10 4 6
c9 2019-12-31 2 9 9 10
d11 2019-12-30 0 5 3 7
This question already has an answer here:
SQL Server - Cumulative Sum that resets when 0 is encountered
(1 answer)
Closed 3 years ago.
I am creating a running total for specific group in a sequence. In between a sequence zero value occurs for which I have to start the running total from the zero record
select
Sno,
Group,
Value,
sum(Value) over(partition by Group order by Sno) Cum_Value
from
Table
Output:
Sno Group Value CumValue
-------------------------------
1 A 5 5
2 A 10 15
3 A 25 40
4 A 0 40
5 A 10 50
6 A 5 55
7 A 0 55
7 A 20 75
Sno Group Value CumValue
------------------------------
1 A 5 5
2 A 10 15
3 A 25 40
4 A 0 0--> zero occurs [starts running total again]
5 A 10 10
6 A 5 15
7 A 0 0--> zero occurs [starts running total again]
7 A 20 20
You may try with the following approach:
Input:
CREATE TABLE #Data (
Sno int,
[Group] varchar(1),
[Value] int
)
INSERT INTO #Data
(Sno, [Group], [Value])
VALUES
(1, 'A', 5),
(2, 'A', 10),
(3, 'A', 25),
(4, 'A', 0),
(5, 'A', 10),
(6, 'A', 5),
(7, 'A', 0),
(8, 'A', 20)
Statement:
SELECT
Sno,
[Group],
[Value],
Changed,
SUM([Value]) OVER (PARTITION BY Changed ORDER BY Sno) AS Cum_Value
FROM
(
SELECT
Sno,
[Group],
[Value],
SUM (CASE
WHEN [Value] = 0 THEN 1
ELSE 0
END) OVER (PARTITION BY [Group] ORDER BY Sno) AS Changed
FROM #Data
) t
Output:
Sno Group Value Cum_Value
1 A 5 5
2 A 10 15
3 A 25 40
4 A 0 0
5 A 10 10
6 A 5 15
7 A 0 0
8 A 20 20
Which of the two alternatives is better?
ROW_NUMBER() OVER (PARTITION BY...)
or
COUNT(1) OVER (PARTITION BY ...)
I could not find any such question.
Edit:
DBMS: SQL-SERVER (version >= 2008)
In my case the over partition is guaranteed by a single field:
ROW_NUMBER() OVER (PARTITION BY ELEMENT ORDER BY EMPLOYEE)
COUNT(1) OVER (PARTITION BY ELEMENT ORDER BY EMPLOYEE)
ELEMENT EMPLOYEE ROW_NUMBER COUNT
0000001 00000003 1 1
0000001 00000004 2 2
0000001 00000005 3 3
0000003 00000045 1 1
0000003 00000046 2 2
COUNT(1) behaves different when the same group of values in the ORDER BY columns are repeated.
The following is an example of SQL Server:
IF OBJECT_ID('tempdb..#Example') IS NOT NULL
DROP TABLE #Example
CREATE TABLE #Example (
Number INT,
GroupNumber INT)
INSERT INTO #Example (
Number,
GroupNumber)
VALUES
(NULL, 1),
(100, 1),
(101, 1),
(102, 1),
(103, 1),
(NULL, 2),
(NULL, 2),
(NULL, 2),
(200, 2),
(201, 2),
(202, 2),
(300, 3),
(301, 3),
(301, 3),
(301, 3),
(302, 3)
SELECT
E.*,
RowNumber = ROW_NUMBER() OVER (PARTITION BY E.GroupNumber ORDER BY E.Number ASC),
CountOver = COUNT(1) OVER (PARTITION BY E.GroupNumber ORDER BY E.Number ASC)
FROM
#Example AS E
Result:
Number GroupNumber RowNumber CountOver
----------- ----------- -------------------- -----------
NULL 1 1 1
100 1 2 2
101 1 3 3
102 1 4 4
103 1 5 5
NULL 2 1 3 Here
NULL 2 2 3
NULL 2 3 3
200 2 4 4
201 2 5 5
202 2 6 6
300 3 1 1
301 3 2 4 Here
301 3 3 4
301 3 4 4
302 3 5 5
This is because it's a count and not a row number. You should use the one that's appropriate to your needs.
I have a recursive query that is working as intended for calculating weighted average cost for inventory calculation. My problem is that I need multiple weighted average from the same query grouped by different columns. I know I can solve the issue by calculating it multiple times, one for each key-column. But because of query performance considerations, I want it to be traversed once. Sometimes I have 1M+ rows.
I have simplified the data and replaced weighted average to a simple sum to make my problem more easy to follow.
How can I get the result below using recursive cte? Remember that I have to use a recursive query to calculate weighted average cost. I am on sql server 2016.
Example data (Id is also the sort order. The Id and Key is unique together.)
Id Key1 Key2 Key3 Value
1 1 1 1 10
2 1 1 1 10
3 1 2 1 10
4 2 2 1 10
5 1 2 1 10
6 1 1 2 10
7 1 1 1 10
8 3 3 1 10
Expected result
Id Key1 Key2 Key3 Value Key1Sum Key2Sum Key3Sum
1 1 1 1 10 10 10 10
2 1 1 1 10 20 20 20
3 1 2 1 10 30 10 30
4 2 2 1 10 10 20 40
5 1 2 1 10 40 30 50
6 1 1 2 10 50 30 10
7 1 1 1 10 60 40 60
8 3 3 1 10 10 10 70
EDIT
After some well deserved criticism I have to be much better in how I make a question.
Here is an example and why I need a recursive query. In the example I get the result for Key1, but I need it for Key2 and Key3 as well in the same query. I know that I can repeat the same query three times, but that is not preferable.
DECLARE #InventoryItem AS TABLE (
IntentoryItemId INT NULL,
InventoryOrder INT,
Key1 INT NULL,
Key2 INT NULL,
Key3 INT NULL,
Quantity NUMERIC(22,9) NOT NULL,
Price NUMERIC(16,9) NOT NULL
);
INSERT INTO #InventoryItem (
IntentoryItemId,
InventoryOrder,
Key1,
Key2,
Key3,
Quantity,
Price
)
VALUES
(1, NULL, 1, 1, 1, 10, 1),
(2, NULL, 1, 1, 1, 10, 2),
(3, NULL, 1, 2, 1, 10, 2),
(4, NULL, 2, 2, 1, 10, 1),
(5, NULL, 1, 2, 1, 10, 5),
(6, NULL, 1, 1, 2, 10, 3),
(7, NULL, 1, 1, 1, 10, 3),
(8, NULL, 3, 3, 1, 10, 1);
--The steps below will give me the cost "grouped" by Key1
WITH Key1RowNumber AS (
SELECT
IntentoryItemId,
ROW_NUMBER() OVER (PARTITION BY Key1 ORDER BY IntentoryItemId) AS RowNumber
FROM #InventoryItem
)
UPDATE #InventoryItem
SET InventoryOrder = Key1RowNumber.RowNumber
FROM #InventoryItem InventoryItem
INNER JOIN Key1RowNumber
ON Key1RowNumber.IntentoryItemId = InventoryItem.IntentoryItemId;
WITH cte AS (
SELECT
IntentoryItemId,
InventoryOrder,
Key1,
Quantity,
Price,
CONVERT(NUMERIC(22,9), InventoryItem.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price) / NULLIF(InventoryItem.Quantity, 0)) AS AvgPrice
FROM #InventoryItem InventoryItem
WHERE InventoryItem.InventoryOrder = 1
UNION ALL
SELECT
Sub.IntentoryItemId,
Sub.InventoryOrder,
Sub.Key1,
Sub.Quantity,
Sub.Price,
CONVERT(NUMERIC(22,9), Main.CurrentQuantity + Sub.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9),
((Main.CurrentQuantity) * Main.AvgPrice + Sub.Quantity * Sub.price)
/
NULLIF((Main.CurrentQuantity) + Sub.Quantity, 0)
) AS AvgPrice
FROM CTE Main
INNER JOIN #InventoryItem Sub
ON Main.Key1 = Sub.Key1
AND Sub.InventoryOrder = main.InventoryOrder + 1
)
SELECT cte.IntentoryItemId, cte.AvgPrice
FROM cte
ORDER BY IntentoryItemId
Why you will want to calculate on 1M+ rows ?
Secondly I think your db design is wrong ? key1 ,key2,key3 should have been unpivoted and one column called Keys and 1 more column to identify each key group.
It will be clear to you in below example.
If I am able to optimize my query then I can think of calculating many rows else I try to limit number of rows.
Also if possible you can think of keeping calculated column of Avg Price.i.e. when table is populated then you can calculate and store it.
First let us know, if output is correct or not.
DECLARE #InventoryItem AS TABLE (
IntentoryItemId INT NULL,
InventoryOrder INT,
Key1 INT NULL,
Key2 INT NULL,
Key3 INT NULL,
Quantity NUMERIC(22,9) NOT NULL,
Price NUMERIC(16,9) NOT NULL
);
INSERT INTO #InventoryItem (
IntentoryItemId,
InventoryOrder,
Key1,
Key2,
Key3,
Quantity,
Price
)
VALUES
(1, NULL, 1, 1, 1, 10, 1),
(2, NULL, 1, 1, 1, 10, 2),
(3, NULL, 1, 2, 1, 10, 2),
(4, NULL, 2, 2, 1, 10, 1),
(5, NULL, 1, 2, 1, 10, 5),
(6, NULL, 1, 1, 2, 10, 3),
(7, NULL, 1, 1, 1, 10, 3),
(8, NULL, 3, 3, 1, 10, 1);
--select * from #InventoryItem
--return
;with cte as
(
select *
, ROW_NUMBER() OVER (PARTITION BY Key1 ORDER BY IntentoryItemId) AS rn1
, ROW_NUMBER() OVER (PARTITION BY Key2 ORDER BY IntentoryItemId) AS rn2
, ROW_NUMBER() OVER (PARTITION BY Key3 ORDER BY IntentoryItemId) AS rn3
from #InventoryItem
)
,cte1 AS (
SELECT
IntentoryItemId,
Key1 keys,
Quantity,
Price
,rn1
,rn1 rn
,1 pk
FROM cte c
union ALL
SELECT
IntentoryItemId,
Key2 keys,
Quantity,
Price
,rn1
,rn2 rn
,2 pk
FROM cte c
union ALL
SELECT
IntentoryItemId,
Key3 keys,
Quantity,
Price
,rn1
,rn3 rn
,3 pk
FROM cte c
)
, cte2 AS (
SELECT
IntentoryItemId,
rn,
Keys,
Quantity,
Price,
CONVERT(NUMERIC(22,9), InventoryItem.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price)) a,
CONVERT(NUMERIC(22,9), InventoryItem.Price) b,
CONVERT(NUMERIC(22,9), (InventoryItem.Quantity * InventoryItem.Price) / NULLIF(InventoryItem.Quantity, 0)) AS AvgPrice
,pk
FROM cte1 InventoryItem
WHERE InventoryItem.rn = 1
UNION ALL
SELECT
Sub.IntentoryItemId,
sub.rn,
Sub.Keys,
Sub.Quantity,
Sub.Price,
CONVERT(NUMERIC(22,9), Main.CurrentQuantity + Sub.Quantity) AS CurrentQuantity,
CONVERT(NUMERIC(22,9),Main.CurrentQuantity * Main.AvgPrice),
CONVERT(NUMERIC(22,9),Sub.Quantity * Sub.price),
CONVERT(NUMERIC(22,9),
((Main.CurrentQuantity * Main.AvgPrice) + (Sub.Quantity * Sub.price))
/
NULLIF(((Main.CurrentQuantity) + Sub.Quantity), 0)
) AS AvgPrice
,sub.pk
FROM CTE2 Main
INNER JOIN cte1 Sub
ON Main.Keys = Sub.Keys and main.pk=sub.pk
AND Sub.rn = main.rn + 1
--and Sub.InventoryOrder<=2
)
select *
,(select AvgPrice from cte2 c1 where pk=2 and c1.IntentoryItemId=c.IntentoryItemId ) AvgPrice2
,(select AvgPrice from cte2 c1 where pk=2 and c1.IntentoryItemId=c.IntentoryItemId ) AvgPrice3
from cte2 c
where pk=1
ORDER BY pk,rn
Alternate Solution (for Sql 2012+) and many thanks to Jason,
SELECT *
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key1 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey1Price
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key2 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey2Price
,CONVERT(NUMERIC(22,9),avg((Quantity * Price) / NULLIF(Quantity, 0))
OVER(PARTITION BY Key3 ORDER by IntentoryItemId ROWS UNBOUNDED PRECEDING))AvgKey3Price
from #InventoryItem
order by IntentoryItemId
Here's how to do it in SQL Server 2012 & later...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
Id INT,
Key1 INT,
Key2 INT,
Key3 INT,
[Value] INT
);
INSERT #TestData(Id, Key1, Key2, Key3, Value) VALUES
(1, 1, 1, 1, 10),
(2, 1, 1, 1, 10),
(3, 1, 2, 1, 10),
(4, 2, 2, 1, 10),
(5, 1, 2, 1, 10),
(6, 1, 1, 2, 10),
(7, 1, 1, 1, 10),
(8, 3, 3, 1, 10);
--=============================================================
SELECT
td.Id, td.Key1, td.Key2, td.Key3, td.Value,
Key1Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key1 ORDER BY td.Id ROWS UNBOUNDED PRECEDING),
Key2Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key2 ORDER BY td.Id ROWS UNBOUNDED PRECEDING),
Key3Sum = SUM(td.[Value]) OVER (PARTITION BY td.Key3 ORDER BY td.Id ROWS UNBOUNDED PRECEDING)
FROM
#TestData td
ORDER BY
td.Id;
results...
Id Key1 Key2 Key3 Value Key1Sum Key2Sum Key3Sum
----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
1 1 1 1 10 10 10 10
2 1 1 1 10 20 20 20
3 1 2 1 10 30 10 30
4 2 2 1 10 10 20 40
5 1 2 1 10 40 30 50
6 1 1 2 10 50 30 10
7 1 1 1 10 60 40 60
8 3 3 1 10 10 10 70
I have a problem with a query.
This is the data (order by Timestamp):
Data
ID Value Timestamp
1 0 2001-1-1
2 0 2002-1-1
3 1 2003-1-1
4 1 2004-1-1
5 0 2005-1-1
6 2 2006-1-1
7 2 2007-1-1
8 2 2008-1-1
I need to extract distinct values and the first occurance of the date. The exception here is that I need to group them only if not interrupted with a new value in that timeframe.
So the data I need is:
ID Value Timestamp
1 0 2001-1-1
3 1 2003-1-1
5 0 2005-1-1
6 2 2006-1-1
I've made this work by a complicated query, but am sure there is an easier way to do it, just cant think of it. Could anyone help?
This is what I started with - probably could work with that. This is a query that should locate when a value is changed.
> SELECT * FROM Data d1 join Data d2 ON d1.Timestamp < d2.Timestamp and
> d1.Value <> d2.Value
It probably could be done with a good use of row_number clause but cant manage it.
Sample data:
declare #T table (ID int, Value int, Timestamp date)
insert into #T(ID, Value, Timestamp) values
(1, 0, '20010101'),
(2, 0, '20020101'),
(3, 1, '20030101'),
(4, 1, '20040101'),
(5, 0, '20050101'),
(6, 2, '20060101'),
(7, 2, '20070101'),
(8, 2, '20080101')
Query:
;With OrderedValues as (
select *,ROW_NUMBER() OVER (ORDER By TimeStamp) as rn --TODO - specific columns better than *
from #T
), Firsts as (
select
ov1.* --TODO - specific columns better than *
from
OrderedValues ov1
left join
OrderedValues ov2
on
ov1.Value = ov2.Value and
ov1.rn = ov2.rn + 1
where
ov2.ID is null
)
select * --TODO - specific columns better than *
from Firsts
I didn't rely on the ID values being sequential and without gaps. If that's the situation, you can omit OrderedValues (using the table and ID in place of OrderedValues and rn). The second query simply finds rows where there isn't an immediate preceding row with the same Value.
Result:
ID Value Timestamp rn
----------- ----------- ---------- --------------------
1 0 2001-01-01 1
3 1 2003-01-01 3
5 0 2005-01-01 5
6 2 2006-01-01 6
You can order by rn if you need the results in this specific order.