Partition a dataset by multiple conditions TSQL

Partition a dataset by multiple conditions TSQL - sql-server

I got an interesting requirement to partition a dataset using different conditions.
Say, it is not simple GROUP BY or ORDER BY I have to say at first place.
Is it a ranking? Yeah little bit closer, but the challenge here is to write a single query for that.
Well I'm still wondering and looking for a straight forward option. Let me introduce a problem.
Name ----- Age ----- MarksForMaths ---- AvgByTotal
Above is a simple sample schema where it can be a marks taken by few students for maths and all average marks.
I need to filter out this set based on following criterias.
people who got 75 > Mathsmarks > 50 should be on top
people who got Mathsmarks > 90 should be a next set
people who average > 65 should take place thereafter
Older people Age > 55 should be a last set
Yeah obviously rank and filter is an option but can we do it in a optimized query?
Tip - what I did basically is create a additional column name RANK and update the column with a index based on conditions.
Then it's just a matter or filter the data order by RANK. Piece of cake !
But the question here is , can we go for one shot query? Appreciate tips.
Thanks

Does the below query fits with your requirement :
DECLARE #BaseTable TABLE (Name VARCHAR(50), Age INT, MarksForMaths INT, AvgByTotal INT)
INSERT INTO #BaseTable (Name, Age, MarksForMaths, AvgByTotal)
SELECT 'A', 1, 65, 12 UNION ALL
SELECT 'B', 1, 5, 75 UNION ALL
SELECT 'C', 1, 95, 12 UNION ALL
SELECT 'D', 65, 65, 12 UNION ALL
SELECT 'E', 65, 5, 12
SELECT tmp.Name, tmp.TmpRank
FROM
(
SELECT
Name,
CASE
WHEN (MarksForMaths > 50 AND MarksForMaths < 75) THEN 1
WHEN (MarksForMaths > 90) THEN 2
WHEN (AvgByTotal > 65) THEN 3
WHEN (Age > 55) THEN 4
ELSE 5
END AS TmpRank
FROM #BaseTable
) tmp
ORDER BY tmp.TmpRank

Related

SQL Server: Pulling updated data from a function during a CROSS APPLY

Quick Summary: I have a function that pulls data from table X. I'm running an UPDATE on table X, and using a CROSS APPLY on the function that is pulling data from X (during the update) and the function doesn't look to be returning updated data.
The real-world scenario is much more complicated, but here's a sample of what I'm seeing.
Table
create table BO.sampleData (id int primary key, data1 int, val int)
Function
create function BO.getPrevious(
#id int
)
returns #info table (
id int, val int
)
as
begin
declare #val int
declare #prevRow int = #id - 1
-- grab data from previous row
insert into #info
select #id, val
from BO.sampleData where id = #prevRow
-- if previous row doesn't exist, return 3*prev row id
if ##rowcount = 0
insert into #info values (#id, #prevRow * 3)
return
end
Issue
Populate some sample data:
delete BO.sampleData
insert into BO.sampleData values (10, 20, 0)
insert into BO.sampleData values (11, 22, 0)
insert into BO.sampleData values (12, 24, 0)
insert into BO.sampleData values (13, 26, 0)
insert into BO.sampleData values (14, 28, 0)
select * from BO.sampleData
id data1 val
----------- ----------- -----------
10 20 0
11 22 0
12 24 0
13 26 0
14 28 0
Update BO.sampleData using a CROSS APPLY on BO.getPrevious (which accesses data from BO.sampleData):
update t
set t.val = ca.val
from bo.sampleData t
cross apply BO.getPrevious(t.id) ca
where t.id = ca.id
Problem
I'm expecting the row with id 10 to have the value 27 (since there is no row 9, the function will return 9*3). For id 11, I assumed it would look in 10 (which just got updated with 27) and set it's val to 27 -- and this would cascade down the rest of the table. But what I get is:
id data1 val
----------- ----------- -----------
10 20 27
11 22 0
12 24 0
13 26 0
14 28 0
I'm guessing this isn't allowed/supported -- the function doesn't have access to the updated data yet? Or I've got something wrong with the syntax? In the real scenario I'm researching, the function is much more complex, does some child table look ups, aggregates, etc.. before returning a result. But this represents the basics of what I'm seeing -- the function that queries BO.sampleData doesn't seem to have access to the updated values of BO.sampleData within the CROSS APPLY during the UPDATE.
Any ideas welcomed.

Thanks to #Martin Smith for identifying the issue -- i.e. "Halloween Protection". Now that my issue has a name, I did some research and found the following article which mentions this specific scenario in SQL Server:
... update plans consist of two parts: a read cursor that identifies
the rows to be updated and a write cursor that actually performs the
updates. Logically speaking, SQL Server must execute the read cursor
and write cursor of an update plan in two separate steps or phases.
To put it another way, the actual update of rows must not affect the
selection of which rows to update.
Emphasis mine. It makes sense now. The CROSS APPLY is happening over the read cursor where all of the values are still zero.

The data is always coming from #info
For input id = 11, it will execute:
insert into #info
select #id, val --which #id = 10, val = 0
from BO.sampleData where id = 10
so from the #info, the val for id=10 is 0(which comes from BO.sampleData where id = 11), then cross apply, it deal with id = 10 from #info, which is val = 10.
everything is what it is in your UDF. And there is no update val to 27 when id = 10 from #info in your UDF, be careful that #info is the table get returned.

SQL Server : show rows as columns / Pivot [duplicate]

This question already has answers here:
SQL Server Pivot Table with multiple column aggregates
(3 answers)
Closed 6 years ago.
I have a SQL Server 2008 Express table like this:
rno gid uid dat origamt disamt
-----------------------------------------------
1 AA a 12-05-2016 200 210
2 AA b 12-05-2016 300 305
3 AA c 12-05-2016 150 116
4 BB a 12-05-2016 120 125
5 BB c 12-05-2016 130 136
6 CC a 12-05-2016 112 115
7 CC b 12-05-2016 135 136
and so on for different dates
I want to show it like this:
sno dat gid a_orig a_dis b_orig b_dis c_orig c_dis .....
1 12-05-2016 AA 200 210 300 305 150 116
2 12-05-2016 BB 120 125 0 0 130 136
3 12-05-2016 CC 112 115 135 136 0 0
NOTE: the values of uid are not fixed, they may vary dynamically, so, a_orig, a_dis, b_orig, b_dis, etc cannot be hardcoded into SQL.
NOTE: around 300 rows are expected on each date due to the cartesian product of gid and uid. and I will search datewise by implementing the LIKE clause since datatype of dat column is varchar(50).
Note: I would prefer datatype of origamt and disamt to be varchar(50) instead of Decimal(18, 0) but it is not compulsion.
I have tried to use PIVOT by taking reference from several articles posted here on stackoverflow and other website but couldn't get the work done completely.
Here is what I tried and got almost fine results with fixed uid and only fetched origamt:
select *
from
(
select gid, uid, dat, origamt
from vamounts
) as src
pivot
(
sum(origamt)
for uid IN ( a, b )
) as piv;
Kindly help me with the least bulky possible solution for this problem. I will prefer least lines of code and least complexity.

Errr, no. You can't generate your desired table using SQL. This isn't a valid pivot table.
"the values of uid are not fixed, they may vary dynamically, so,
a_orig, a_dis, b_orig, b_dis, etc cannot be hardcoded into SQL."
Sorry, this is also not possible. You must specify the exact values to be placed as the column headers. Whenever you write a SELECT statement, you must specify the names of the columns (fields) which you'll be returning. There's no way around this.
However, below are the steps required to create a "valid" SQL Server pivot table from your data:
I've got to admit, when I recently had to write my first PIVOT in SQL Server, I also Googled like mad, but didn't understand how to write it.
However, I eventually worked out what you need to do, so here's the step-by-step guide that you won't find anywhere else..!
(Readers can easily adapt these instructions, to use with your own data !)
1. Create your sample data
If you expect readers to reply to your Question, you should at least give them the SQL to create your sample data, so they have something to work off.
So, here's how I would create the data shown in your question:
CREATE TABLE tblSomething
(
[gid] nvarchar(100),
[uid] nvarchar(100),
[dat] datetime,
[origamt] int,
[disamt] int
)
GO
INSERT INTO tblSomething VALUES ('AA', 'a', '2016-05-12', 200, 210)
INSERT INTO tblSomething VALUES ('AA', 'b', '2016-05-12', 300, 305)
INSERT INTO tblSomething VALUES ('AA', 'c', '2016-05-12', 150, 116)
INSERT INTO tblSomething VALUES ('BB', 'a', '2016-05-12', 120, 125)
INSERT INTO tblSomething VALUES ('BB', 'c', '2016-05-12', 130, 136)
INSERT INTO tblSomething VALUES ('CC', 'a', '2016-05-12', 112, 115)
INSERT INTO tblSomething VALUES ('CC', 'b', '2016-05-12', 135, 136)
GO
2. Write a SQL Query which returns exactly three columns
The first column will contain the values which will appear in your PIVOT table's left-hand column.
The second column will contain the list of values which will appear on the top row.
The values in the third column will be positioned within your PIVOT table, based on the row/column headers.
Okay, here's the SQL to do this:
SELECT [gid], [uid], [origamt]
FROM tblSomething
This is the key to using a PIVOT. Your database structure can be as horribly complicated as you like, but when using a PIVOT, you can only work with exactly three values. No more, no less.
So, here's what that SQL will return. Our aim is to create a PIVOT table containing (just) these values:
3. Find a list of distinct values for the header row
Notice how, in the pivot table I'm aiming to create, I have three columns (fields) called a, b and c. These are the three unique values in your [uid] column.
So, to get a comma-concatenated list of these unique values, I can use this SQL:
DECLARE #LongString nvarchar(4000)
SELECT #LongString = COALESCE(#LongString + ', ', '') + '[' + [uid] + ']'
FROM [tblSomething]
GROUP BY [uid]
SELECT #LongString AS 'Subquery'
When I run this against your data, here's what I get:
Now, cut'n'paste this value: we'll need to place it twice in our overall SQL SELECT command to create the pivot table.
4. Put it all together
This is the tricky bit.
You need to combine your SQL command from Step 2 and the result from Step 3, into a single SELECT command.
Here's what your SQL would look like:
SELECT [gid],
-- Here's the "Subquery" from part 3
[a], [b], [c]
FROM (
-- Here's the original SQL "SELECT" statement from part 2
SELECT [gid], [uid], [origamt]
FROM tblSomething
) tmp ([gid], [uid], [origamt])
pivot (
MAX([origamt]) for [uid] in (
-- Here's the "Subquery" from part 3 again
[a], [b], [c]
)
) p
... and here's a confusing image, which shows where the components come from, and the results of running this command.
As you can see, the key to this is that SELECT statement in Step 2, and putting your three chosen fields in the correct place in this command.
And, as I said earlier, the columns (fields) in your pivot table come from the values obtained in step 3:
[a], [b], [c]
You could, of course, use a subset of these values. Perhaps you just want to see the PIVOT values for [a], [b] and ignore [c].
Phew !
So, that's how to create a pivot table out of your data.
I will prefer least lines of code and least complexity.
Yeah, good luck on that one..!!!
5. Merging two pivot tables
If you really wanted to, you could merge the contents of two such PIVOT tables to get the exact results you're looking for.
This is easy enough SQL for Shobhit to write himself.

You need dynamic SQL for this stuff.
At first create table with your data:
CREATE TABLE #temp (
rno int,
gid nvarchar(10),
[uid] nvarchar(10),
dat date,
origamt int,
disamt int
)
INSERT INTO #temp VALUES
(1, 'AA', 'a', '12-05-2016', 200, 210),
(2, 'AA', 'b', '12-05-2016', 300, 305),
(3, 'AA', 'c', '12-05-2016', 150, 116),
(4, 'BB', 'a', '12-05-2016', 120, 125),
(5, 'BB', 'c', '12-05-2016', 130, 136),
(6, 'CC', 'a', '12-05-2016', 112, 115),
(7, 'CC', 'b', '12-05-2016', 135, 136)
And then declare variables with columns:
DECLARE #columns nvarchar(max), #sql nvarchar(max), #columns1 nvarchar(max), #columnsN nvarchar(max)
--Here simple columns like [a],[b],[c] etc
SELECT #columns =STUFF((SELECT DISTINCT ','+QUOTENAME([uid]) FROM #temp FOR XML PATH('')),1,1,'')
--Here with ISNULL operation ISNULL([a],0) as [a],ISNULL([b],0) as [b],ISNULL([c],0) as [c]
SELECT #columnsN = STUFF((SELECT DISTINCT ',ISNULL('+QUOTENAME([uid])+',0) as '+QUOTENAME([uid]) FROM #temp FOR XML PATH('')),1,1,'')
--Here columns for final table orig.a as a_orig, dis.a as a_dis,orig.b as b_orig, dis.b as b_dis,orig.c as c_orig, dis.c as c_dis
SELECT #columns1 = STUFF((SELECT DISTINCT ',orig.'+[uid] + ' as ' +[uid]+ '_orig, dis.'+[uid] + ' as ' +[uid]+ '_dis' FROM #temp FOR XML PATH('')),1,1,'')
And main query:
SELECT #sql = '
SELECT orig.gid,
orig.dat,
'+#columns1+'
FROM (
SELECT gid, dat, '+#columnsN+'
FROM (
SELECT gid, [uid], LEFT(dat,10) as dat, origamt
FROM #temp
) as p
PIVOT (
SUM(origamt) FOR [uid] in ('+#columns+')
) as pvt
) as orig
LEFT JOIN (
SELECT gid, dat, '+#columnsN+'
FROM (
SELECT gid, [uid], LEFT(dat,10) as dat, disamt
FROM #temp
) as p
PIVOT (
SUM(disamt) FOR [uid] in ('+#columns+')
) as pvt
) as dis
ON dis.gid = orig.gid and dis.dat = orig.dat'
EXEC(#sql)
Output:
gid dat a_orig a_dis b_orig b_dis c_orig c_dis
AA 2016-12-05 200 210 300 305 150 116
BB 2016-12-05 120 125 0 0 130 136
CC 2016-12-05 112 115 135 136 0 0

A join might help
declare #t table (rno int, gid varchar(2), uid varchar(1), dat varchar(10), origamt int, disamt int)
insert into #t
values
(1, 'AA', 'a', '12-05-2016', 200, 210),
(2 , 'AA', 'b', '12-05-2016', 300, 305),
(3 , 'AA', 'c', '12-05-2016', 150, 116),
(4 , 'BB', 'a', '12-05-2016', 120, 125),
(5 , 'BB', 'c', '12-05-2016', 130, 136),
(6 , 'CC', 'a', '12-05-2016', 112, 115),
(7 , 'CC', 'b', '12-05-2016', 135, 136)
select -- piv.*,piv2.*
piv.gid,piv.dat
,piv.a as a_org
,piv2.a as a_dis
,piv.b as b_org
,piv2.b as b_dis
,piv.c as c_org
,piv2.c as c_dis
from
(
select gid, uid, dat, origamt
from #t
) as src
pivot
(
sum(origamt)
for uid IN ([a],[b],[c] )
) as piv
join
(select piv2.*
from
(
select gid, uid, dat, disamt
from #t
) as src
pivot
(
sum(disamt)
for uid IN ([a],[b],[c] )
) as piv2
) piv2
on piv2.gid = piv.gid and piv2.dat = piv.dat
This is a POC you would have to use dynamic sql to deal with the variable number of uids. Let me know if you don't know how to use dynamic SQL and I'll work up an example for you.

Find the average of a given result set

My table looks like the one below.
I am doing average for total table. I am getting 14. It is fine.
declare #Table table (Student Varchar(10), Score int)
insert into #Table
select 'A',10
union all
select 'B',20
union all
select 'A',10
union all
select 'C',20
union all
select 'B',10
select avg(cast(Score as float)) AvgScore from #Table
AvgScore
--------
14
select Student, avg(cast(Score as float)) AvgScore from #Table group by Grouping sets(Student,())
Student AvgScore
------------------
A 10
B 15
C 20
NULL 14
If I do average (10+15+20)/3, I am not getting 14.
How can I over come this?
Am I not doing mathematics correct?
Can any give me brief explanation about it.
Thanks in advance.

Total average is for all data so:
(10 + 20 + 10 + 20 + 10) / 5 = 70 / 5 = 14
Everything is ok. You try to calculate average on averages (10+15+20)/3 which is nonsense from Math point of view.
Look at this example:
A - 1
A - 1
A - 1
A - 1
B - 20
Average is (1+1+1+1+20) / 5 and NOT (1+20)/2

The problem is that you reduce the information you have in the two steps of the calculation. Your original is a simple average.
After your reduction you got:
The problem you have is that the weight of each value is different. You got 2 values affecting A, two values affecting B but only 1 value affecting C. And this information, while important for calculating the average, is lost. What you need to do in addition is to get the proper average, is to store the weight of each average. Means the amount of source values. This would be:
Student Value Weight
A 10 2
B 15 2
C 20 1
A weight is simply the count of values for each student. You can extract that easily in one query.
Now your final average calculation should look like this:
Selecting the values you need should look like this I think:
SELECT Student, AVG(CAST(Score as float)) AvgScore, COUNT(*) Weight
FROM #Table
GROUP BY Grouping sets(Student,())
The rest of the path should be clear. Multiply weight and average values and divide it by the sum of the weight value.

Pivot Tables - using more than one column

With Pivot Tables is it possible to base your columns on two values (in my example 'code' and 'val')? By the looks of it you can't, but that seems a bit of a limitation to me, so perhaps I've just misunderstood something. For example if my data table looks like this:
code val total
---- --- -----
SI 12 90
SI 12 30
SI 24 240
CI 12 210
and the output I desire is this:
SI12 SI24 CI12
---- ---- ----
120 240 210
I'd appreciate it if someone could show me how this could be achieved whether the solution is to use Pivot Tables or something else?

Try this,
DECLARE #mytable TABLE
(
code VARCHAR(2),
val INT,
total INT
)
INSERT INTO #mytable
VALUES ('SI',
12,
90),
('SI',
12,
30),
('SI',
24,
240),
('CI',
12,
210)
SELECT *
FROM #mytable
SELECT *
FROM (SELECT code + CONVERT(VARCHAR(2), VAL) VAL,
total
FROM #mytable) T
PIVOT(sum(total)
FOR VAL IN ([SI12],
[SI24],
[CI12])) AS PIVOTTABLE

How to generate permutations in Oracle?

In Oracle, I have a table of object types.
I would like to generate all the permutations on ITEM_PURPOSE_CODE.
The table looks something like this:
ITEM_PURPOSE_CODE ITEM_CATEGORY_ID ITEM_ID
==========================================
1 101 50
2 202 94
2 202 95
What I would like then, is to generate a bunch of table types representing the permutations, for example:
ITEM_PURPOSE_CODE ITEM_CATEGORY_ID ITEM_ID
==========================================
1 101 50
2 202 94
and
ITEM_PURPOSE_CODE ITEM_CATEGORY_ID ITEM_ID
==========================================
1 101 50
2 202 95
Obviously this is a very simple case. There could be any number of item purpose codes (1 to n) and these codes could be repeated any number of times for differing item category IDs/item IDs.
Thanks for any advice.

Please find the solution to generating combinations here. It was a nice variant on a previous problem we've had in our software for real estate development.
Create and fill datamodel
First set up:
create table contents
( item_purpose_code number
, item_category_id number
, item_id number
)
/
begin
insert into contents values (1, 101, 50);
insert into contents values (2, 202, 94);
insert into contents values (2, 202, 95);
commit;
end;
/
Assisting views
First I create some views. But ofcourse you can also inline them or use with.
--
-- Add to each row the consecutive number of the driver columns
-- (here only item_purpose_code) and for each different value
-- for the driver columns a consecutive number that restarts
-- when a new driver column value starts.
--
create or replace force view sequencedrows
as
select item_purpose_code
, item_category_id
, item_id
, dense_rank()
over
( order
by item_purpose_code
) driver_seq
, row_number()
over
( partition
by item_purpose_code
order
by item_category_id
, item_id
)
values_per_driver_seq
from contents
/
--
-- Generate list of combinations.
--
create or replace force view combinations
as
select sys_connect_by_path (driver_seq || '-' || values_per_driver_seq, '#') || '#' combination
from sequencedrows
where level = ( select max(driver_seq) from sequencedrows )
start
with driver_seq = 1
connect
by
nocycle driver_seq = prior driver_seq + 1
/
With these, it becomes really simple since the combination is already contained in the field combination and the rows have been numbered:
select c.combination
, s.item_purpose_code
, s.item_category_id
, s.item_id
from combinations c
join sequencedrows s
on c.combination like '%#' || to_char(s.driver_seq) || '-' || to_char(s.values_per_driver_seq) || '#%'
order
by c.combination
, s.driver_seq
, s.values_per_driver_seq
/
The results are:
#1-1#2-1# 1 101 50
#1-1#2-1# 2 202 94
#1-1#2-2# 1 101 50
#1-1#2-2# 2 202 95
Performance
Depending on the data volume and indexes, the performance can be insufficient for interactive use. In our real estate development package we've however found that even with 50K rows generated performance is acceptable since Oracle 11g. Oracle 10g did a less optimal job on optimization.
When performance is unacceptable at your site, please list some key statistics or add a reproduction scenario.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Partition a dataset by multiple conditions TSQL - sql-server

Related

SQL Server: Pulling updated data from a function during a CROSS APPLY

SQL Server : show rows as columns / Pivot [duplicate]

Find the average of a given result set

Pivot Tables - using more than one column

How to generate permutations in Oracle?

Categories

Resources