For the table definition
CREATE TABLE Accounts
(
AccountID INT ,
Filler CHAR(1000)
)
Containing 21 rows (7 for each of the AccountId values 4,6,7).
It has 1 root page and 4 leaf pages
index_depth page_count index_level
----------- -------------------- -----------
2 4 0
2 1 1
The root page looks like
FileId PageId ROW LEVEL ChildFieldId ChildPageId AccountId (KEY) UNIQUIFIER (KEY) KeyHashValue
----------- ----------- ----------- ----------- ------------ ----------- --------------- ---------------- ------------------------------
1 121 0 1 1 119 NULL NULL NULL
1 121 1 1 1 151 6 0 NULL
1 121 2 1 1 175 6 3 NULL
1 121 3 1 1 215 7 1 NULL
The actual distribution of AccountId records over these pages is
AccountID page_id Num
----------- ----------- -----------
4 119 7
6 151 3
6 175 4
7 175 1
7 215 6
The Query
SELECT AccountID
FROM Accounts
WHERE AccountID IN (4,6,7)
Gives the following IO stats
Table 'Accounts'. Scan count 3, logical reads 13
Why?
I thought for each seek it would seek into the first page that might potentially contain that value and then (if necessary) continue along the linked list until it found the first row not equal to the seeked value.
However that only adds up to 10 page accesses
4) Root Page -> Page 119 -> Page 151 (Page 151 Contains a 6 so should stop)
6) Root Page -> Page 119 -> Page 151 -> Page 175 (Page 175 Contains a 7 so should stop)
7) Root Page -> Page 175 -> Page 215 (No more pages)
So what accounts for the additional 3?
Full script to reproduce
USE tempdb
SET NOCOUNT ON;
CREATE TABLE Accounts
(
AccountID INT ,
Filler CHAR(1000)
)
CREATE CLUSTERED INDEX ix ON Accounts(AccountID)
INSERT INTO Accounts(AccountID)
SELECT C
FROM (SELECT 4 UNION ALL SELECT 6 UNION ALL SELECT 7) Vals(C)
CROSS JOIN (SELECT TOP (7) 1 FROM master..spt_values) T(X)
DECLARE #AccountID INT
SET STATISTICS IO ON
SELECT #AccountID=AccountID FROM Accounts WHERE AccountID IN (4,6,7)
SET STATISTICS IO OFF
SELECT index_depth,page_count,index_level
FROM
sys.dm_db_index_physical_stats (2,OBJECT_ID('Accounts'), DEFAULT,DEFAULT, 'DETAILED')
SELECT AccountID, P.page_id, COUNT(*) AS Num
FROM Accounts
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%) P
GROUP BY AccountID, P.page_id
ORDER BY AccountID, P.page_id
DECLARE #index_info TABLE
(PageFID VARCHAR(10),
PagePID VARCHAR(10),
IAMFID TINYINT,
IAMPID INT,
ObjectID INT,
IndexID TINYINT,
PartitionNumber TINYINT,
PartitionID BIGINT,
iam_chain_type VARCHAR(30),
PageType TINYINT,
IndexLevel TINYINT,
NextPageFID TINYINT,
NextPagePID INT,
PrevPageFID TINYINT,
PrevPagePID INT,
PRIMARY KEY (PageFID, PagePID));
INSERT INTO #index_info
EXEC ('DBCC IND ( tempdb, Accounts, -1)' );
DECLARE #DynSQL NVARCHAR(MAX) = 'DBCC TRACEON (3604);'
SELECT #DynSQL = #DynSQL + '
DBCC PAGE(tempdb, ' + PageFID + ', ' + PagePID + ', 3); '
FROM #index_info
WHERE IndexLevel = 1
SET #DynSQL = #DynSQL + '
DBCC TRACEOFF(3604); '
CREATE TABLE #index_l1_info
(FileId INT,
PageId INT,
ROW INT,
LEVEL INT,
ChildFieldId INT,
ChildPageId INT,
[AccountId (KEY)] INT,
[UNIQUIFIER (KEY)] INT,
KeyHashValue VARCHAR(30));
INSERT INTO #index_l1_info
EXEC(#DynSQL)
SELECT *
FROM #index_l1_info
DROP TABLE #index_l1_info
DROP TABLE Accounts
Just to supply the answer in answer form rather than as discussion in the comments...
The additional reads arise due to the read ahead mechanism. This scans the parent pages of the leaf level in case it needs to issue an asynchronous IO to bring the leaf level pages into the buffer cache so they are ready when the range seek reaches them.
It is possible to use trace flag 652 to disable the mechanism (server wide) and verify that the number of reads is now exactly 10 as expected.
From what I see from the output of DBCC IND, there is 1 root page (type = 10), 1 key page (type = 2) and four leaf pages (type = 1), total of 6 pages.
So each scan goes as root -> key -> leaf -> … -> final leaf which gives 4 reads for 4 and 7 and 5 reads for 6, total 4 + 4 + 5 = 13.
Related
I have a bunch of production orders and I'm trying to group by within a datetime range, then count the quantity within that range. For example, I want to group from 2230 to 2230 each day.
PT.ActualFinish is datetime (eg. if PT.ActualFinish is 2020-05-25 23:52:30 then it would be counted on the 26th May instead of the 25th)
Currently it's grouped by date (midnight to midnight) as opposed to the desired 2230 to 2230.
GROUP BY CAST(PT.ActualFinish AS DATE)
I've been trying to reconcile some DATEADD with the GROUP without success. Is it possible?
Just add 1.5 hours (90 minutes) and then extract the date:
group by convert(date, dateadd(minute, 90, pt.acctualfinish))
For this kind of thing you can use a function I created called NGroupRangeAB (code below) which can be used to create groups over values with an upper and lower bound.
Note that this:
SELECT f.*
FROM core.NGroupRangeAB(0,1440,12) AS f
ORDER BY f.RN;
Returns:
RN GroupNumber Low High
--- ------------ ------ -------
0 1 0 120
1 2 121 240
2 3 241 360
3 4 361 480
4 5 481 600
5 6 601 720
6 7 721 840
7 8 841 960
8 9 961 1080
9 10 1081 1200
10 11 1201 1320
11 12 1321 1440
This:
SELECT
f.GroupNumber,
L = DATEADD(MINUTE,f.[Low]-SIGN(f.[Low]),CAST('00:00:00.0000000' AS TIME)),
H = DATEADD(MINUTE,f.[High]-1,CAST('00:00:00.0000000' AS TIME))
FROM core.NGroupRangeAB(0,1440,12) AS f
ORDER BY f.RN;
Returns:
GroupNumber L H
------------- ---------------- ----------------
1 00:00:00.0000000 01:59:00.0000000
2 02:00:00.0000000 03:59:00.0000000
3 04:00:00.0000000 05:59:00.0000000
4 06:00:00.0000000 07:59:00.0000000
5 08:00:00.0000000 09:59:00.0000000
6 10:00:00.0000000 11:59:00.0000000
7 12:00:00.0000000 13:59:00.0000000
8 14:00:00.0000000 15:59:00.0000000
9 16:00:00.0000000 17:59:00.0000000
10 18:00:00.0000000 19:59:00.0000000
11 20:00:00.0000000 21:59:00.0000000
12 22:00:00.0000000 23:59:00.0000000
Now for a real-life example that may help you:
-- Sample Date
DECLARE #table TABLE (tm TIME);
INSERT #table VALUES ('00:15'),('11:20'),('21:44'),('09:50'),('02:15'),('02:25'),
('02:31'),('23:31'),('23:54');
-- Solution:
SELECT
GroupNbr = f.GroupNumber,
TimeLow = f2.L,
TimeHigh = f2.H,
Total = COUNT(t.tm)
FROM core.NGroupRangeAB(0,1440,12) AS f
CROSS APPLY (VALUES(
DATEADD(MINUTE,f.[Low]-SIGN(f.[Low]),CAST('00:00:00.0000000' AS TIME)),
DATEADD(MINUTE,f.[High]-1,CAST('00:00:00.0000000' AS TIME)))) AS f2(L,H)
LEFT JOIN #table AS t
ON t.tm BETWEEN f2.L AND f2.H
GROUP BY f.GroupNumber, f2.L, f2.H;
Returns:
GroupNbr TimeLow TimeHigh Total
-------------------- ---------------- ---------------- -----------
1 00:00:00.0000000 01:59:00.0000000 1
2 02:00:00.0000000 03:59:00.0000000 3
3 04:00:00.0000000 05:59:00.0000000 0
4 06:00:00.0000000 07:59:00.0000000 0
5 08:00:00.0000000 09:59:00.0000000 1
6 10:00:00.0000000 11:59:00.0000000 1
7 12:00:00.0000000 13:59:00.0000000 0
8 14:00:00.0000000 15:59:00.0000000 0
9 16:00:00.0000000 17:59:00.0000000 0
10 18:00:00.0000000 19:59:00.0000000 0
11 20:00:00.0000000 21:59:00.0000000 1
12 22:00:00.0000000 23:59:00.0000000 2
Note that an inner join will eliminate the 0-count rows.
CREATE FUNCTION core.NGroupRangeAB
(
#min BIGINT, -- Group Number Lower boundary
#max BIGINT, -- Group Number Upper boundary
#groups BIGINT -- Number of groups required
)
/*****************************************************************************************
[Purpose]:
Creates an auxilliary table that allows for grouping based on a given set of rows (#rows)
and requested number of "row groups" (#groups). core.NGroupRangeAB can be thought of as a
set-based, T-SQL version of Oracle's WIDTH_BUCKET, which:
"...lets you construct equiwidth histograms, in which the histogram range is divided into
intervals that have identical size. (Compare with NTILE, which creates equiheight
histograms.)" https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions214.htm
See usage examples for more details.
[Author]:
Alan Burstein
[Compatibility]:
SQL Server 2008+
[Syntax]:
--===== Autonomous
SELECT ng.*
FROM dbo.NGroupRangeAB(#rows,#groups) AS ng;
[Parameters]:
#rows = BIGINT; the number of rows to be "tiled" (have group number assigned to it)
#groups = BIGINT; requested number of tile groups (same as the parameter passed to NTILE)
[Returns]:
Inline Table Valued Function returns:
GroupNumber = BIGINT; a row number beginning with 1 and ending with #rows
Members = BIGINT; Number of possible distinct members in the group
Low = BIGINT; the lower-bound range
High = BIGINT; the Upper-bound range
[Dependencies]:
core.rangeAB (iTVF)
[Developer Notes]:
1. An inline derived tally table using a CTE or subquery WILL NOT WORK. NTally requires
a correctly indexed tally table named dbo.tally; if you have or choose to use a
permanent tally table with a different name or in a different schema make sure to
change the DDL for this function accordingly. The recomended number of rows is
1,000,000; below is the recomended DDL for dbo.tally. Note the "Beginning" and "End"
of tally code.To learn more about tally tables see:
http://www.sqlservercentral.com/articles/T-SQL/62867/
2. For best results a P.O.C. index should exists on the table that you are "tiling". For
more information about P.O.C. indexes see:
http://sqlmag.com/sql-server-2012/sql-server-2012-how-write-t-sql-window-functions-part-3
3. NGroupRangeAB is deterministic; for more about deterministic and nondeterministic functions
see https://msdn.microsoft.com/en-us/library/ms178091.aspx
[Examples]:
-----------------------------------------------------------------------------------------
--===== 1. Basic illustration of the relationship between core.NGroupRangeAB and NTILE.
-- Consider this query which assigns 3 "tile groups" to 10 rows:
DECLARE #rows BIGINT = 7, #tiles BIGINT = 3;
SELECT t.N, t.TileGroup
FROM ( SELECT r.RN, NTILE(#tiles) OVER (ORDER BY r.RN)
FROM core.rangeAB(1,#rows,1,1) AS r) AS t(N,TileGroup);
Results:
N TileGroup
--- ----------
1 1
2 1
3 1
4 2
5 2
6 3
7 3
To pivot these "equiheight histograms" into "equiwidth histograms" we could do this:
DECLARE #rows BIGINT = 7, #tiles BIGINT = 3;
SELECT TileGroup = t.TileGroup,
[Low] = MIN(t.N),
[High] = MAX(t.N),
Members = COUNT(*)
FROM ( SELECT r.RN, NTILE(#tiles) OVER (ORDER BY r.RN)
FROM core.rangeAB(1,#rows,1,1) AS r) AS t(N,TileGroup);
GROUP BY t.TileGroup;
Results:
TileGroup Low High Members
---------- ---- ----- -----------
1 1 3 3
2 4 5 2
3 6 7 2
This will return the same thing at a tiny fraction of the cost:
SELECT TileGroup = ng.GroupNumber,
[Low] = ng.[Low],
[High] = ng.[High],
Members = ng.Members
FROM core.NGroupRangeAB(1,#rows,#tiles) AS ng;
--===== 2.1. Divide 25 Rows into 3 groups
DECLARE #min BIGINT = 1, #max BIGINT = 25, #groups BIGINT = 4;
SELECT ng.GroupNumber, ng.Members, ng.low, ng.high
FROM core.NGroupRangeAB(#min,#max,#groups) AS ng;
--===== 2.2. Assign group membership to another table
DECLARE #min BIGINT = 1, #max BIGINT = 25, #groups BIGINT = 4;
SELECT
ng.GroupNumber, ng.low, ng.high, s.WidgetId, s.Price
FROM (VALUES('a',$12),('b',$22),('c',$9),('d',$2)) AS s(WidgetId,Price)
JOIN core.NGroupRangeAB(#min,#max,#groups) AS ng
ON s.Price BETWEEN ng.[Low] AND ng.[High]
ORDER BY ng.RN;
Results:
GroupNumber low high WidgetId Price
------------ ---- ----- --------- ---------------------
1 1 7 d 2.00
2 8 13 a 12.00
2 8 13 c 9.00
4 20 25 b 22.00
-----------------------------------------------------------------------------------------
[Revision History]:
Rev 00 - 20190128 - Initial Creation; Final Tuning - Alan Burstein
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
RN = r.RN, -- Sort Key
GroupNumber = r.N2, -- Bucket (group) number
Members = g.S-ur.N+1, -- Count of members in this group
[Low] = r.RN*g.S+rc.N+ur.N, -- Lower boundary for the group (inclusive)
[High] = r.N2*g.S+rc.N -- Upper boundary for the group (inclusive)
FROM core.rangeAB(0,#groups-1,1,0) AS r -- Range Function
CROSS APPLY (VALUES((#max-#min)/#groups,(#max-#min)%#groups)) AS g(S,U) -- Size, Underflow
CROSS APPLY (VALUES(SIGN(SIGN(r.RN-g.U)-1)+1)) AS ur(N) -- get Underflow
CROSS APPLY (VALUES(#min+r.RN-(ur.N*(r.RN-g.U)))) AS rc(N); -- Running Count
GO
For the table definition
CREATE TABLE Accounts
(
AccountID INT ,
Filler CHAR(1000)
)
Containing 21 rows (7 for each of the AccountId values 4,6,7).
It has 1 root page and 4 leaf pages
index_depth page_count index_level
----------- -------------------- -----------
2 4 0
2 1 1
The root page looks like
FileId PageId ROW LEVEL ChildFieldId ChildPageId AccountId (KEY) UNIQUIFIER (KEY) KeyHashValue
----------- ----------- ----------- ----------- ------------ ----------- --------------- ---------------- ------------------------------
1 121 0 1 1 119 NULL NULL NULL
1 121 1 1 1 151 6 0 NULL
1 121 2 1 1 175 6 3 NULL
1 121 3 1 1 215 7 1 NULL
The actual distribution of AccountId records over these pages is
AccountID page_id Num
----------- ----------- -----------
4 119 7
6 151 3
6 175 4
7 175 1
7 215 6
The Query
SELECT AccountID
FROM Accounts
WHERE AccountID IN (4,6,7)
Gives the following IO stats
Table 'Accounts'. Scan count 3, logical reads 13
Why?
I thought for each seek it would seek into the first page that might potentially contain that value and then (if necessary) continue along the linked list until it found the first row not equal to the seeked value.
However that only adds up to 10 page accesses
4) Root Page -> Page 119 -> Page 151 (Page 151 Contains a 6 so should stop)
6) Root Page -> Page 119 -> Page 151 -> Page 175 (Page 175 Contains a 7 so should stop)
7) Root Page -> Page 175 -> Page 215 (No more pages)
So what accounts for the additional 3?
Full script to reproduce
USE tempdb
SET NOCOUNT ON;
CREATE TABLE Accounts
(
AccountID INT ,
Filler CHAR(1000)
)
CREATE CLUSTERED INDEX ix ON Accounts(AccountID)
INSERT INTO Accounts(AccountID)
SELECT C
FROM (SELECT 4 UNION ALL SELECT 6 UNION ALL SELECT 7) Vals(C)
CROSS JOIN (SELECT TOP (7) 1 FROM master..spt_values) T(X)
DECLARE #AccountID INT
SET STATISTICS IO ON
SELECT #AccountID=AccountID FROM Accounts WHERE AccountID IN (4,6,7)
SET STATISTICS IO OFF
SELECT index_depth,page_count,index_level
FROM
sys.dm_db_index_physical_stats (2,OBJECT_ID('Accounts'), DEFAULT,DEFAULT, 'DETAILED')
SELECT AccountID, P.page_id, COUNT(*) AS Num
FROM Accounts
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%) P
GROUP BY AccountID, P.page_id
ORDER BY AccountID, P.page_id
DECLARE #index_info TABLE
(PageFID VARCHAR(10),
PagePID VARCHAR(10),
IAMFID TINYINT,
IAMPID INT,
ObjectID INT,
IndexID TINYINT,
PartitionNumber TINYINT,
PartitionID BIGINT,
iam_chain_type VARCHAR(30),
PageType TINYINT,
IndexLevel TINYINT,
NextPageFID TINYINT,
NextPagePID INT,
PrevPageFID TINYINT,
PrevPagePID INT,
PRIMARY KEY (PageFID, PagePID));
INSERT INTO #index_info
EXEC ('DBCC IND ( tempdb, Accounts, -1)' );
DECLARE #DynSQL NVARCHAR(MAX) = 'DBCC TRACEON (3604);'
SELECT #DynSQL = #DynSQL + '
DBCC PAGE(tempdb, ' + PageFID + ', ' + PagePID + ', 3); '
FROM #index_info
WHERE IndexLevel = 1
SET #DynSQL = #DynSQL + '
DBCC TRACEOFF(3604); '
CREATE TABLE #index_l1_info
(FileId INT,
PageId INT,
ROW INT,
LEVEL INT,
ChildFieldId INT,
ChildPageId INT,
[AccountId (KEY)] INT,
[UNIQUIFIER (KEY)] INT,
KeyHashValue VARCHAR(30));
INSERT INTO #index_l1_info
EXEC(#DynSQL)
SELECT *
FROM #index_l1_info
DROP TABLE #index_l1_info
DROP TABLE Accounts
Just to supply the answer in answer form rather than as discussion in the comments...
The additional reads arise due to the read ahead mechanism. This scans the parent pages of the leaf level in case it needs to issue an asynchronous IO to bring the leaf level pages into the buffer cache so they are ready when the range seek reaches them.
It is possible to use trace flag 652 to disable the mechanism (server wide) and verify that the number of reads is now exactly 10 as expected.
From what I see from the output of DBCC IND, there is 1 root page (type = 10), 1 key page (type = 2) and four leaf pages (type = 1), total of 6 pages.
So each scan goes as root -> key -> leaf -> … -> final leaf which gives 4 reads for 4 and 7 and 5 reads for 6, total 4 + 4 + 5 = 13.
Consider this datatable :
word wordCount documentId
---------- ------- ---------------
Ball 10 1
School 11 1
Car 4 1
Machine 3 1
House 1 2
Tree 5 2
Ball 4 2
I want to insert these data into two tables with this structure :
Table WordDictionary
(
Id int,
Word nvarchar(50),
DocumentId int
)
Table WordDetails
(
Id int,
WordId int,
WordCount int
)
FOREIGN KEY (WordId) REFERENCES WordDictionary(Id)
But because I have thousands of records in initial table, I have to do this just in one transaction (batch query) for example using bulk insert can help me doing this purpose.
But the question here is how I can separate this data into these two tables WordDictionary and WordDetails.
For more details :
Final result must be like this :
Table WordDictionary:
Id word
---------- -------
1 Ball
2 School
3 Car
4 Machine
5 House
6 Tree
and table WordDetails :
Id wordId WordCount DocumentId
---------- ------- ----------- ------------
1 1 10 1
2 2 11 1
3 3 4 1
4 4 3 1
5 5 1 2
6 6 5 2
7 1 4 2
Notice :
The words in the source can be duplicated so I must check word existence in table WordDictionary before any insert record in these tables and if a word is found in table WordDictionary, the just found Word ID must be inserted into table WordDetails (please see Word Ball)
Finally the 1 M$ problem is: this insertion must be done as fast as possible.
If you're looking to just load the table the first time without any updates to the table over time you could potentially do it this way (I'm assuming you've already created the tables you're loading into):
You can put all of the distinct words from the datatable into the WordDictionary table first:
SELECT DISTINCT word
INTO WordDictionary
FROM datatable;
Then after you populate your WordDictionary you can then use the ID values from it and the rest of the information from datatable to load your WordDetails table:
SELECT WD.Id as wordId, DT.wordCount as WordCount, DT.documentId AS DocumentId
INTO WordDetails
FROM datatable as DT
INNER JOIN WordDictionary AS WD ON WD.word = DT.word
There a little discrepancy between declared table schema and your example data, but it was solved:
1) Setup
-- this the table with the initial data
-- drop table DocumentWordData
create table DocumentWordData
(
Word NVARCHAR(50),
WordCount INT,
DocumentId INT
)
GO
-- these are result table with extra information (identity, primary key constraints, working foreign key definition)
-- drop table WordDictionary
create table WordDictionary
(
Id int IDENTITY(1, 1) CONSTRAINT PK_WordDictionary PRIMARY KEY,
Word nvarchar(50)
)
GO
-- drop table WordDetails
create table WordDetails
(
Id int IDENTITY(1, 1) CONSTRAINT PK_WordDetails PRIMARY KEY,
WordId int CONSTRAINT FK_WordDetails_Word REFERENCES WordDictionary,
WordCount int,
DocumentId int
)
GO
2) The actual script to put data in the last two tables
begin tran
-- this is to make sure that if anything in this block fails, then everything is automatically rolled back
set xact_abort on
-- the dictionary is obtained by considering all distinct words
insert into WordDictionary (Word)
select distinct Word
from DocumentWordData
-- details are generating from initial data joining the word dictionary to get word id
insert into WordDetails (WordId, WordCount, DocumentId)
SELECT W.Id, DWD.WordCount, DWD.DocumentId
FROM DocumentWordData DWD
JOIN WordDictionary W ON W.Word = DWD.Word
commit
-- just to test the results
select * from WordDictionary
select * from WordDetails
I expect this script to run very fast, if you do not have a very large number of records (millions at most).
This is the query. I'm using temp table to be able to test.
if you use the 2 CTEs, you'll be able to generate the final result
1.Setting up a sample data for test.
create table #original (word varchar(10), wordCount int, documentId int)
insert into #original values
('Ball', 10, 1),
('School', 11, 1),
('Car', 4, 1),
('Machine', 3, 1),
('House', 1, 2),
('Tree', 5, 2),
('Ball', 4, 2)
2. Use cte1 and cte2. In your real database, you need to replace #original with the actual table name you have all initial records.
;with cte1 as (
select ROW_NUMBER() over (order by word) Id, word
from #original
group by word
)
select * into #WordDictionary
from cte1
;with cte2 as (
select ROW_NUMBER() over (order by #original.word) Id, Id as wordId,
#original.word, #original.wordCount, #original.documentId
from #WordDictionary
inner join #original on #original.word = #WordDictionary.word
)
select * into #WordDetails
from cte2
select * from #WordDetails
This will be data in #WordDetails
+----+--------+---------+-----------+------------+
| Id | wordId | word | wordCount | documentId |
+----+--------+---------+-----------+------------+
| 1 | 1 | Ball | 10 | 1 |
| 2 | 1 | Ball | 4 | 2 |
| 3 | 2 | Car | 4 | 1 |
| 4 | 3 | House | 1 | 2 |
| 5 | 4 | Machine | 3 | 1 |
| 6 | 5 | School | 11 | 1 |
| 7 | 6 | Tree | 5 | 2 |
+----+--------+---------+-----------+------------+
I'm trying to write a incremental update statement using SQL Server 2012.
Current Data:
RecNo Budget_ID Item_Code Revision
---------------------------------------
1 16 xxx 2
2 16 xxx NULL
3 16 xxx NULL
12 19 yyy 3
13 19 yyy NULL
14 19 yyy NULL
15 19 yyy NULL
Expected result:
RecNo Budget_ID Item_Code Revision
---------------------------------------
1 16 xxx 2
2 16 xxx 1
3 16 xxx 0
12 19 yyy 3
13 19 yyy 2
14 19 yyy 1
15 19 yyy 0
However with following approach, I ended up with the result set as below.
UPDATE a
SET a.Revision = (SELECT MIN(b.Revision)
FROM [dbo].[foo] b
WHERE b.item_code = a.item_code
AND b.budget_id = a.budget_id
GROUP BY b.item_code ) -1
FROM [dbo].[foo] a
WHERE a.Revision is NULL
Result:
RecNo Budget_ID Item_Code Revision
---------------------------------------
1 16 xxx 2
2 16 xxx 1
3 16 xxx 1
12 19 yyy 3
13 19 yyy 2
14 19 yyy 2
15 19 yyy 2
Can anyone help me to get this right?
Thanks in advance!
Try this:
;with cte as
(select *, row_number() over (partition by budget_id order by rec_no desc) rn from dbo.foo)
update cte
set revision = rn - 1
Basically, since the revision value seems to be decreasing with increase in rec_no, we simply use the row_number() function to get row number of each record within the subset of all records with a particular budget_id, sorted in descending order of rec_no. Since the least possible value of row_number() will be 1, we subtract 1 so that the last record in the partition will have revision set to 0 instead 1.
You may test the code here
I found this example from this link https://stackoverflow.com/a/13629639/1692632
First you select MIN value to some variable and then you can update table by decreasing variable at same time.
DECLARE #table TABLE (ID INT, SomeData VARCHAR(10))
INSERT INTO #table (SomeData, ID) SELECT 'abc', 6 ;
INSERT INTO #table (SomeData) SELECT 'def' ;
INSERT INTO #table (SomeData) SELECT 'ghi' ;
INSERT INTO #table (SomeData) SELECT 'jkl' ;
INSERT INTO #table (SomeData) SELECT 'mno' ;
INSERT INTO #table (SomeData) SELECT 'prs' ;
DECLARE #i INT = (SELECT ISNULL(MIN(ID),0) FROM #table)
UPDATE #table
SET ID = #i, #i = #i - 1
WHERE ID IS NULL
SELECT *
FROM #table
I'm not sure if this will do the trick but you can try with
Update top(1) a
SET a.Revision = (Select MIN(b.Revision)
FROM [dbo].[foo] b where b.item_code = a.item_code and b.budget_id = a.budget_id
group by b.item_code ) -1
FROM [dbo].[foo] a
WHERE a.Revision is NULL
and repeat until there's no changes left
Update Data
set Revision = x.Revision
from
(select RecNo, Budget_ID, Item_Code, case when Revision is null then ROW_NUMBER() over(partition by Budget_ID order by RecNo desc) - 1 else Revision end Revision
from Data
) x
where x.RecNo = data.RecNo
You basically use ROW_NUMBER() to count backwards for each Budget_ID, and use that row number minus 1 where Revision is null. This is basically the same as Shree's answer, just without the CTE.
Is there a way to have something like:
id Name value
--------------------
1 sex m
2 age 12
3 weight 200
4 height 200
5 rx 34
from a known table:
sex age weight height rx
--------------------------
m 12 200 200 34
If I do:
Select
[id] = ORDINAL_POSITION,
[Name] = COLUMN_NAME
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 'known'
I get:
id Name
-----------
1 sex
2 age
3 weight
4 height
5 rx
how to changethe query to get:
id Name value
--------------------
1 sex m
2 age 12
3 weight 200
4 height 200
5 rx 34
If they were 2 rows:
sex age weight height rx
--------------------------
m 12 200 200 34
f 34 245 111 67
id Name value
--------------------
1 sex m
2 age 12
3 weight 200
4 height 200
5 rx 34
6 sex f
7 age 34
8 weight 240
9 height 111
10 rx 67
-----------------EDIT--------------------
Thanks for your answers, but I am wondering if this can be possible intead of getting
id value
-------------------
1 m
2 12
3 200
4 200
5 34
from:
sex age weight height rx
--------------------------
m 12 200 200 34
using
Select
[id] = ORDINAL_POSITION,
[Value] ...
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 'known'
The problem you're going to run into whatever method you choose is that all data in a particular column must be of the same type. In your case you have sex, which is a character, and a bunch of numbers (probably integers). This stops you being able to do this neatly as you cant use UNPIVOT. However there is always a way...
Given this setup code:
CREATE TABLE test(sex char(1), age int, weight int, height int, rx int)
INSERT INTO test
SELECT 'm', 12 , 200, 200, 34
union select 'f',34,245,111,67
You can do this, which is just a small addition to your query:
Select
[id] = ROW_NUMBER() OVER(ORDER BY isc.ORDINAL_POSITION),
[Name] = COLUMN_NAME,
[Value] = CASE LOWER(COLUMN_NAME)
WHEN 'sex' THEN CAST(d.sex AS VARCHAR(20))
WHEN 'age' then CAST(d.age AS VARCHAR(20))
WHEN 'weight' THEN CAST(d.weight AS VARCHAR(20))
WHEN 'height' THEN CAST(d.height AS VARCHAR(20))
WHEN 'rx' THEN CAST(d.rx AS VARCHAR(20))
END
from INFORMATION_SCHEMA.COLUMNS isc
CROSS JOIN dbo.test d
where TABLE_NAME = 'test'
Output:
1 sex m
2 sex f
3 age 12
4 age 34
5 weight 200
6 weight 245
7 height 200
8 height 111
9 rx 34
10 rx 67
You''ll notice this output is in a slightly different order to your own. This is because you have not described any key on your "known" table. If you did have a key on that table, you simply change this line:
[id] = ROW_NUMBER() OVER(ORDER BY isc.ORDINAL_POSITION),
to
[id] = ROW_NUMBER() OVER(ORDER BY d.yourKeyField, isc.ORDINAL_POSITION),
You can do it with PIVOT/UNPIVOT (see Books Online).
You can try this.
declare #T table
(
sex char(1),
age int,
weight int,
height int,
rx int
)
insert into #T values
('m', 12, 200, 200, 34),
('f', 34, 245, 111, 67)
select row_number() over(order by (select 1)) as ID,
T2.X.value('local-name(.)', 'varchar(128)') as Name,
T2.X.value('.', 'varchar(10)') as Value
from (select *
from #T
for xml path(''), type
) as T1(X)
cross apply T1.X.nodes('/*') as T2(X)
This part order by (select 1) makes the assignment of ID's somewhat unpredictable. If you had a primary key (ID int identity) or a datetime to use in order by you could change that to order by ID instead.