I have seen many examples of hierarchical ctes, but I wasn't able to get my sorting right yet.
To describe my problem, assume this Task table:
TaskId ParentTaskId Label
a10 null 10
a20 null 11
a30 a20 18
a50 a30 5
a40 a20 15
a60 null 12
The output of my query should be sorted by Label and the children's labels, like so:
Sequence TaskId Label
1 a10 10
2 a20 11
3 a40 15
4 a30 18
5 a50 5
6 a60 12
I added indentation to make it easier for you to notice the grouping. a40 and a30 are children of a20, and are ordered based on the label.
Please help. thanks!
Here's the answer:
drop table if exists #t
go
select
*
into
#t
from
(
values
('a10', null, '10'),
('a20', null, '11'),
('a30', 'a20', '18'),
('a50', 'a30', '5'),
('a40', 'a20', '15'),
('a60', null, '12')
) as T(TaskId, ParentTaskId, Label)
;
with cte as
(
select
l = 0,
p = cast('/' + Label as nvarchar(max)),
*
from
#t where ParentTaskId is null
union all
select
l = p.l + 1,
p = p.p + '/' + c.Label,
c.*
from
#t c
inner join
cte p on c.ParentTaskId = p.TaskId
)
select
*
from
cte
order by p, Label
You need to create a path from the root of your task to the current task and then use it to sort the final result.
On small dataset the above query will perform fine. On bigger (hundreds of thousands) I recommend to take a look at hierarchyid data type:
https://learn.microsoft.com/en-us/sql/t-sql/data-types/hierarchyid-data-type-method-reference?view=sql-server-2017
Quick Summary: I have a function that pulls data from table X. I'm running an UPDATE on table X, and using a CROSS APPLY on the function that is pulling data from X (during the update) and the function doesn't look to be returning updated data.
The real-world scenario is much more complicated, but here's a sample of what I'm seeing.
Table
create table BO.sampleData (id int primary key, data1 int, val int)
Function
create function BO.getPrevious(
#id int
)
returns #info table (
id int, val int
)
as
begin
declare #val int
declare #prevRow int = #id - 1
-- grab data from previous row
insert into #info
select #id, val
from BO.sampleData where id = #prevRow
-- if previous row doesn't exist, return 3*prev row id
if ##rowcount = 0
insert into #info values (#id, #prevRow * 3)
return
end
Issue
Populate some sample data:
delete BO.sampleData
insert into BO.sampleData values (10, 20, 0)
insert into BO.sampleData values (11, 22, 0)
insert into BO.sampleData values (12, 24, 0)
insert into BO.sampleData values (13, 26, 0)
insert into BO.sampleData values (14, 28, 0)
select * from BO.sampleData
id data1 val
----------- ----------- -----------
10 20 0
11 22 0
12 24 0
13 26 0
14 28 0
Update BO.sampleData using a CROSS APPLY on BO.getPrevious (which accesses data from BO.sampleData):
update t
set t.val = ca.val
from bo.sampleData t
cross apply BO.getPrevious(t.id) ca
where t.id = ca.id
Problem
I'm expecting the row with id 10 to have the value 27 (since there is no row 9, the function will return 9*3). For id 11, I assumed it would look in 10 (which just got updated with 27) and set it's val to 27 -- and this would cascade down the rest of the table. But what I get is:
id data1 val
----------- ----------- -----------
10 20 27
11 22 0
12 24 0
13 26 0
14 28 0
I'm guessing this isn't allowed/supported -- the function doesn't have access to the updated data yet? Or I've got something wrong with the syntax? In the real scenario I'm researching, the function is much more complex, does some child table look ups, aggregates, etc.. before returning a result. But this represents the basics of what I'm seeing -- the function that queries BO.sampleData doesn't seem to have access to the updated values of BO.sampleData within the CROSS APPLY during the UPDATE.
Any ideas welcomed.
Thanks to #Martin Smith for identifying the issue -- i.e. "Halloween Protection". Now that my issue has a name, I did some research and found the following article which mentions this specific scenario in SQL Server:
... update plans consist of two parts: a read cursor that identifies
the rows to be updated and a write cursor that actually performs the
updates. Logically speaking, SQL Server must execute the read cursor
and write cursor of an update plan in two separate steps or phases.
To put it another way, the actual update of rows must not affect the
selection of which rows to update.
Emphasis mine. It makes sense now. The CROSS APPLY is happening over the read cursor where all of the values are still zero.
The data is always coming from #info
For input id = 11, it will execute:
insert into #info
select #id, val --which #id = 10, val = 0
from BO.sampleData where id = 10
so from the #info, the val for id=10 is 0(which comes from BO.sampleData where id = 11), then cross apply, it deal with id = 10 from #info, which is val = 10.
everything is what it is in your UDF. And there is no update val to 27 when id = 10 from #info in your UDF, be careful that #info is the table get returned.
This question already has answers here:
SQL Server Pivot Table with multiple column aggregates
(3 answers)
Closed 6 years ago.
I have a SQL Server 2008 Express table like this:
rno gid uid dat origamt disamt
-----------------------------------------------
1 AA a 12-05-2016 200 210
2 AA b 12-05-2016 300 305
3 AA c 12-05-2016 150 116
4 BB a 12-05-2016 120 125
5 BB c 12-05-2016 130 136
6 CC a 12-05-2016 112 115
7 CC b 12-05-2016 135 136
and so on for different dates
I want to show it like this:
sno dat gid a_orig a_dis b_orig b_dis c_orig c_dis .....
1 12-05-2016 AA 200 210 300 305 150 116
2 12-05-2016 BB 120 125 0 0 130 136
3 12-05-2016 CC 112 115 135 136 0 0
NOTE: the values of uid are not fixed, they may vary dynamically, so, a_orig, a_dis, b_orig, b_dis, etc cannot be hardcoded into SQL.
NOTE: around 300 rows are expected on each date due to the cartesian product of gid and uid. and I will search datewise by implementing the LIKE clause since datatype of dat column is varchar(50).
Note: I would prefer datatype of origamt and disamt to be varchar(50) instead of Decimal(18, 0) but it is not compulsion.
I have tried to use PIVOT by taking reference from several articles posted here on stackoverflow and other website but couldn't get the work done completely.
Here is what I tried and got almost fine results with fixed uid and only fetched origamt:
select *
from
(
select gid, uid, dat, origamt
from vamounts
) as src
pivot
(
sum(origamt)
for uid IN ( a, b )
) as piv;
Kindly help me with the least bulky possible solution for this problem. I will prefer least lines of code and least complexity.
Errr, no. You can't generate your desired table using SQL. This isn't a valid pivot table.
"the values of uid are not fixed, they may vary dynamically, so,
a_orig, a_dis, b_orig, b_dis, etc cannot be hardcoded into SQL."
Sorry, this is also not possible. You must specify the exact values to be placed as the column headers. Whenever you write a SELECT statement, you must specify the names of the columns (fields) which you'll be returning. There's no way around this.
However, below are the steps required to create a "valid" SQL Server pivot table from your data:
I've got to admit, when I recently had to write my first PIVOT in SQL Server, I also Googled like mad, but didn't understand how to write it.
However, I eventually worked out what you need to do, so here's the step-by-step guide that you won't find anywhere else..!
(Readers can easily adapt these instructions, to use with your own data !)
1. Create your sample data
If you expect readers to reply to your Question, you should at least give them the SQL to create your sample data, so they have something to work off.
So, here's how I would create the data shown in your question:
CREATE TABLE tblSomething
(
[gid] nvarchar(100),
[uid] nvarchar(100),
[dat] datetime,
[origamt] int,
[disamt] int
)
GO
INSERT INTO tblSomething VALUES ('AA', 'a', '2016-05-12', 200, 210)
INSERT INTO tblSomething VALUES ('AA', 'b', '2016-05-12', 300, 305)
INSERT INTO tblSomething VALUES ('AA', 'c', '2016-05-12', 150, 116)
INSERT INTO tblSomething VALUES ('BB', 'a', '2016-05-12', 120, 125)
INSERT INTO tblSomething VALUES ('BB', 'c', '2016-05-12', 130, 136)
INSERT INTO tblSomething VALUES ('CC', 'a', '2016-05-12', 112, 115)
INSERT INTO tblSomething VALUES ('CC', 'b', '2016-05-12', 135, 136)
GO
2. Write a SQL Query which returns exactly three columns
The first column will contain the values which will appear in your PIVOT table's left-hand column.
The second column will contain the list of values which will appear on the top row.
The values in the third column will be positioned within your PIVOT table, based on the row/column headers.
Okay, here's the SQL to do this:
SELECT [gid], [uid], [origamt]
FROM tblSomething
This is the key to using a PIVOT. Your database structure can be as horribly complicated as you like, but when using a PIVOT, you can only work with exactly three values. No more, no less.
So, here's what that SQL will return. Our aim is to create a PIVOT table containing (just) these values:
3. Find a list of distinct values for the header row
Notice how, in the pivot table I'm aiming to create, I have three columns (fields) called a, b and c. These are the three unique values in your [uid] column.
So, to get a comma-concatenated list of these unique values, I can use this SQL:
DECLARE #LongString nvarchar(4000)
SELECT #LongString = COALESCE(#LongString + ', ', '') + '[' + [uid] + ']'
FROM [tblSomething]
GROUP BY [uid]
SELECT #LongString AS 'Subquery'
When I run this against your data, here's what I get:
Now, cut'n'paste this value: we'll need to place it twice in our overall SQL SELECT command to create the pivot table.
4. Put it all together
This is the tricky bit.
You need to combine your SQL command from Step 2 and the result from Step 3, into a single SELECT command.
Here's what your SQL would look like:
SELECT [gid],
-- Here's the "Subquery" from part 3
[a], [b], [c]
FROM (
-- Here's the original SQL "SELECT" statement from part 2
SELECT [gid], [uid], [origamt]
FROM tblSomething
) tmp ([gid], [uid], [origamt])
pivot (
MAX([origamt]) for [uid] in (
-- Here's the "Subquery" from part 3 again
[a], [b], [c]
)
) p
... and here's a confusing image, which shows where the components come from, and the results of running this command.
As you can see, the key to this is that SELECT statement in Step 2, and putting your three chosen fields in the correct place in this command.
And, as I said earlier, the columns (fields) in your pivot table come from the values obtained in step 3:
[a], [b], [c]
You could, of course, use a subset of these values. Perhaps you just want to see the PIVOT values for [a], [b] and ignore [c].
Phew !
So, that's how to create a pivot table out of your data.
I will prefer least lines of code and least complexity.
Yeah, good luck on that one..!!!
5. Merging two pivot tables
If you really wanted to, you could merge the contents of two such PIVOT tables to get the exact results you're looking for.
This is easy enough SQL for Shobhit to write himself.
You need dynamic SQL for this stuff.
At first create table with your data:
CREATE TABLE #temp (
rno int,
gid nvarchar(10),
[uid] nvarchar(10),
dat date,
origamt int,
disamt int
)
INSERT INTO #temp VALUES
(1, 'AA', 'a', '12-05-2016', 200, 210),
(2, 'AA', 'b', '12-05-2016', 300, 305),
(3, 'AA', 'c', '12-05-2016', 150, 116),
(4, 'BB', 'a', '12-05-2016', 120, 125),
(5, 'BB', 'c', '12-05-2016', 130, 136),
(6, 'CC', 'a', '12-05-2016', 112, 115),
(7, 'CC', 'b', '12-05-2016', 135, 136)
And then declare variables with columns:
DECLARE #columns nvarchar(max), #sql nvarchar(max), #columns1 nvarchar(max), #columnsN nvarchar(max)
--Here simple columns like [a],[b],[c] etc
SELECT #columns =STUFF((SELECT DISTINCT ','+QUOTENAME([uid]) FROM #temp FOR XML PATH('')),1,1,'')
--Here with ISNULL operation ISNULL([a],0) as [a],ISNULL([b],0) as [b],ISNULL([c],0) as [c]
SELECT #columnsN = STUFF((SELECT DISTINCT ',ISNULL('+QUOTENAME([uid])+',0) as '+QUOTENAME([uid]) FROM #temp FOR XML PATH('')),1,1,'')
--Here columns for final table orig.a as a_orig, dis.a as a_dis,orig.b as b_orig, dis.b as b_dis,orig.c as c_orig, dis.c as c_dis
SELECT #columns1 = STUFF((SELECT DISTINCT ',orig.'+[uid] + ' as ' +[uid]+ '_orig, dis.'+[uid] + ' as ' +[uid]+ '_dis' FROM #temp FOR XML PATH('')),1,1,'')
And main query:
SELECT #sql = '
SELECT orig.gid,
orig.dat,
'+#columns1+'
FROM (
SELECT gid, dat, '+#columnsN+'
FROM (
SELECT gid, [uid], LEFT(dat,10) as dat, origamt
FROM #temp
) as p
PIVOT (
SUM(origamt) FOR [uid] in ('+#columns+')
) as pvt
) as orig
LEFT JOIN (
SELECT gid, dat, '+#columnsN+'
FROM (
SELECT gid, [uid], LEFT(dat,10) as dat, disamt
FROM #temp
) as p
PIVOT (
SUM(disamt) FOR [uid] in ('+#columns+')
) as pvt
) as dis
ON dis.gid = orig.gid and dis.dat = orig.dat'
EXEC(#sql)
Output:
gid dat a_orig a_dis b_orig b_dis c_orig c_dis
AA 2016-12-05 200 210 300 305 150 116
BB 2016-12-05 120 125 0 0 130 136
CC 2016-12-05 112 115 135 136 0 0
A join might help
declare #t table (rno int, gid varchar(2), uid varchar(1), dat varchar(10), origamt int, disamt int)
insert into #t
values
(1, 'AA', 'a', '12-05-2016', 200, 210),
(2 , 'AA', 'b', '12-05-2016', 300, 305),
(3 , 'AA', 'c', '12-05-2016', 150, 116),
(4 , 'BB', 'a', '12-05-2016', 120, 125),
(5 , 'BB', 'c', '12-05-2016', 130, 136),
(6 , 'CC', 'a', '12-05-2016', 112, 115),
(7 , 'CC', 'b', '12-05-2016', 135, 136)
select -- piv.*,piv2.*
piv.gid,piv.dat
,piv.a as a_org
,piv2.a as a_dis
,piv.b as b_org
,piv2.b as b_dis
,piv.c as c_org
,piv2.c as c_dis
from
(
select gid, uid, dat, origamt
from #t
) as src
pivot
(
sum(origamt)
for uid IN ([a],[b],[c] )
) as piv
join
(select piv2.*
from
(
select gid, uid, dat, disamt
from #t
) as src
pivot
(
sum(disamt)
for uid IN ([a],[b],[c] )
) as piv2
) piv2
on piv2.gid = piv.gid and piv2.dat = piv.dat
This is a POC you would have to use dynamic sql to deal with the variable number of uids. Let me know if you don't know how to use dynamic SQL and I'll work up an example for you.
I was answering another question and ran into a strange outcome - the output of a product aggregate (without CLR) was different when used in a SELECT vs UPDATE.
This is simplified from the original question to minimally reproduce the problem:
GroupKey RowIndex A
----------- ----------- -----------
25 1 5
25 2 6
25 3 NULL
26 1 3
26 2 4
26 3 NULL
The goal is for each group key to update the A column of each row with a RowIndex = 3 to the product of the A columns of each row with RowIndex IN (1, 2), so this would produce the following changes:
GroupKey RowIndex A
----------- ----------- -----------
25 3 30
26 3 12
So this is the code I used:
UPDATE T SET
A = Products.Product
FROM #Table T
INNER JOIN (
SELECT
GroupKey,
EXP(SUM(LOG(A))) AS Product
FROM #Table
WHERE RowIndex IN (1, 2)
GROUP BY
GroupKey
) Products
ON Products.GroupKey = T.GroupKey
WHERE T.RowIndex = 3
SELECT * FROM #Table WHERE RowIndex = 3
Which then produced the off-by-one results:
GroupKey RowIndex A
----------- ----------- -----------
25 3 29
26 3 12
If I just run the sub-query, I see the correct values.
GroupKey Product
----------- ----------------------
25 30
26 12
Here's the full script to make it easy to play with. I can't figure out where the off-by-one is coming from.
DECLARE #Table TABLE (GroupKey INT, RowIndex INT, A INT)
INSERT #Table VALUES (25, 1, 5), (25, 2, 6), (25, 3, NULL), (26, 1, 3), (26, 2, 4), (26, 3, NULL)
SELECT * FROM #Table
SELECT
GroupKey,
EXP(SUM(LOG(A))) AS Product
FROM #Table
WHERE RowIndex IN (1, 2)
GROUP BY
GroupKey
UPDATE T SET
A = Products.Product
FROM #Table T
INNER JOIN (
SELECT
GroupKey,
EXP(SUM(LOG(A))) AS Product
FROM #Table
WHERE RowIndex IN (1, 2)
GROUP BY
GroupKey
) Products
ON Products.GroupKey = T.GroupKey
WHERE T.RowIndex = 3
SELECT * FROM #Table WHERE RowIndex = 3
Here are some references I came across:
Non-CLR Aggregate: http://michaeljswart.com/2011/03/the-aggregate-function-product/
Original question: Set one row fields as a multiplication of 2 others
I'd say that this cute "PRODUCT" aggregate is inherently unreliable if you want to work with ints - EXP and LOG are only defined against the float type and so we get rounding errors creeping in.
Why they're not consistently appearing, I couldn't say, except to suggest that different queries may cause changes in evaluation orders.
As a simpler example of how this can go wrong:
select CAST(EXP(LOG(5)) as int)
Can produce 4. EXP and LOG together will produce a value that is just less than 5, but of course when converting to int, SQL Server always truncates rather than applying any rounding.
I got an interesting requirement to partition a dataset using different conditions.
Say, it is not simple GROUP BY or ORDER BY I have to say at first place.
Is it a ranking? Yeah little bit closer, but the challenge here is to write a single query for that.
Well I'm still wondering and looking for a straight forward option. Let me introduce a problem.
Name ----- Age ----- MarksForMaths ---- AvgByTotal
Above is a simple sample schema where it can be a marks taken by few students for maths and all average marks.
I need to filter out this set based on following criterias.
people who got 75 > Mathsmarks > 50 should be on top
people who got Mathsmarks > 90 should be a next set
people who average > 65 should take place thereafter
Older people Age > 55 should be a last set
Yeah obviously rank and filter is an option but can we do it in a optimized query?
Tip - what I did basically is create a additional column name RANK and update the column with a index based on conditions.
Then it's just a matter or filter the data order by RANK. Piece of cake !
But the question here is , can we go for one shot query? Appreciate tips.
Thanks
Does the below query fits with your requirement :
DECLARE #BaseTable TABLE (Name VARCHAR(50), Age INT, MarksForMaths INT, AvgByTotal INT)
INSERT INTO #BaseTable (Name, Age, MarksForMaths, AvgByTotal)
SELECT 'A', 1, 65, 12 UNION ALL
SELECT 'B', 1, 5, 75 UNION ALL
SELECT 'C', 1, 95, 12 UNION ALL
SELECT 'D', 65, 65, 12 UNION ALL
SELECT 'E', 65, 5, 12
SELECT tmp.Name, tmp.TmpRank
FROM
(
SELECT
Name,
CASE
WHEN (MarksForMaths > 50 AND MarksForMaths < 75) THEN 1
WHEN (MarksForMaths > 90) THEN 2
WHEN (AvgByTotal > 65) THEN 3
WHEN (Age > 55) THEN 4
ELSE 5
END AS TmpRank
FROM #BaseTable
) tmp
ORDER BY tmp.TmpRank