T-SQL grouping values on multiple columns and return row values as columns - sql-server

I have a dataset which returns a number of row, 2 columns RoomType and FaultTypeName should be grouped but those 2 columns also have 1 'Result' column. Because of the 'Result' column the grouping will fail. So to make it clearer, the result set looks as follows:
The FaultTypeName are always the same three values 'Methode (M)', 'Periodiek (P)' or Vuil (V). These values should be returned as new columns with respectively their result values. So above resultset should be returned as following:
I already tried to do something with the rownumber (hence the rn column) but this didn't quite work out:
select
...
from(
select MeasurementId, RoomType, FaultTypeName, Result,
row_number() over(partition by RoomType order by RoomType, FaultTypeName) rn
from vwReportData
where measurementid = 1382596
)sub
There is a possibility that only 2 (or less) of the 3 columns (Methode, Periodiek and Vuil) are returned instead of all 3 (so less rows), if this is the case, the missing FaultTypeName(s) should still be added as column but with a result of 0.
Any ideas how I can get the right output?

Try this:
select *
from
(
select MeasurementId, RoomType, FaultTypeName, Result,
row_number() over(partition by RoomType order by RoomType, FaultTypeName) rn
from vwReportData
where measurementid = 1382596
) DS
PIVOT
(
MAX(result) for rn in ([1], [2], [3])
) PVT

In the end I figured it out myself:
SELECT
MeasurementId,
RoomType,
M = ISNULL(MIN(CASE WHEN FaultTypeName = 'Methode (M)' THEN Result ELSE NULL END), 0),
P = ISNULL(MIN(CASE WHEN FaultTypeName = 'Periodiek (P)' THEN Result ELSE NULL END), 0),
V = ISNULL(MIN(CASE WHEN FaultTypeName = 'Vuil (V)' THEN Result ELSE NULL END), 0)
FROM vwReportData
WHERE MeasurementId = 1382596
GROUP BY MeasurementId, RoomType

Related

update using over order by row_number()

I found some answers to ways to update using over order by, but not anything that solved my issue. In SQL Server 2014, I have a column of DATES (with inconsistent intervals down to the millisecond) and a column of PRICE, and I would like to update the column of OFFSETPRICE with the value of PRICE from 50 rows hence (ordered by DATES). The solutions I found have the over order by in either the query or the subquery, but I think I need it in both. Or maybe I'm making it more complicated than it is.
In this simplified example, if the offset was 3 rows hence then I need to turn this:
DATES, PRICE, OFFSETPRICE
2018-01-01, 5.01, null
2018-01-03, 8.52, null
2018-02-15, 3.17, null
2018-02-24, 4.67, null
2018-03-18, 2.54, null
2018-04-09, 7.37, null
into this:
DATES, PRICE, OFFSETPRICE
2018-01-01, 5.01, 3.17
2018-01-03, 8.52, 4.67
2018-02-15, 3.17, 2.54
2018-02-24, 4.67, 7.37
2018-03-18, 2.54, null
2018-04-09, 7.37, null
This post was helpful, and so far I have this code which works as far as it goes:
select dates, price, row_number() over (order by dates asc) as row_num
from pricetable;
I haven't yet figured out how to point the update value to the future ordered row. Thanks in advance for any assistance.
LEAD is a useful window function for getting values from subsequent rows. (Also, LAG, which looks at preceding rows,) Here's a direct answer to your question:
;WITH cte AS (
SELECT dates, LEAD(price, 2) OVER (ORDER BY dates) AS offsetprice
FROM pricetable
)
UPDATE pricetable SET offsetprice = cte.offsetprice
FROM pricetable
INNER JOIN cte ON pricetable.dates = cte.dates
Since you asked about ROW_NUMBER, the following does the same thing:
;WITH cte AS (
SELECT dates, price, ROW_NUMBER() OVER (ORDER BY dates ASC) AS row_num
FROM pricetable
),
cte2 AS (
SELECT dates, price, (SELECT price FROM cte AS sq_cte WHERE row_num = cte.row_num + 2) AS offsetprice
FROM cte
)
UPDATE pricetable SET offsetprice = cte2.offsetprice
FROM pricetable
INNER JOIN cte2 ON pricetable.dates = cte2.dates
So, you could use ROW_NUMBER to sort the rows and then use that result to select a value 2 rows ahead. LEAD just does that very thing directly.

SQL Server : how to divide database records into even, random groups

tblNames
OrganizationID (int)
LastName (varchar)
...
GroupNumber (int)
GroupNumber is currently NULL for all records, I need an UPDATE statement to update this column.
I need to split up records on an OrganizationID level into even, random groups.
If there are < 20,000 records for an OrganizationID, I need 2 even, random groups. So records for that OrganizationID will have a GroupNumber of 1 or 2. There will be the same (or if odd number of records difference of only 1) number of records for GroupNumber = 1 and for GroupNumber = 2, and there will be no recognizable way to tell how a person got into a GroupNumber - i.e. LastNames that start with A-L are group 1, M-Z are group 2 would not be OK.
If there are > 20,000 records for an OrganizationID, I need 4 even, random groups. So records for that OrganizationID will have a GroupNumber values of 1, 2, 3, or 4. There will be the same (or if odd number of records difference of only 1) number of records for each GroupNumber, and there will be no recognizable way to tell how a person got into a GroupNumber - i.e. LastNames that start with A-F are group 1, G-L are group 2, etc. would not be OK.
There are only about 20 organizations, so I can run an update statement 20 times, once per organizationID if needed.
I have full control of the table so I can add keys or columns, but for now this is what it is.
Would appreciate any help.
Create row numbers randomly (with ROW_NUMBER and GETID). Then get their modulo 2 or 4 depending on the record count to get buckets 0 to 1 or 0 to 3.
select
organizationid, lastname, ...,
case when cnt <= 20000 then rn % 2 else rn % 4 end as bucket
from
(
select
organizationid, lastname, ...,
row_number() over(order by newid()) as rn,
count(*) over () as cnt
from mytable
) randomized;
UPDATE: I suppose the update statement would have to look something like this:
with randomized as
(
select
groupnumber,
row_number() over(order by newid()) as rn,
count(*) over () as cnt
from mytable
)
update randomzized
set groupnumber = case when cnt <= 20000 then rn % 2 else rn % 4 end + 1;
Another slightly different approach;
Setting up some fake data:
if object_id('tempdb.dbo.#Orgs') is not null drop table #Orgs
create table #Orgs
(
RID int identity(1,1) primary key clustered,
OrganizationId int,
LastName varchar(36),
GroupId int
)
insert into #Orgs (OrganizationId, LastName)
select top 40000 row_number() over (order by (select null)) % 20000, newid()
from sys.all_objects a, sys.all_objects b
then using the rarely useful ntile() function to get as close to identically sized groups as possible. Sorting by newid() essentially sorts the data randomly (or as random as generating one guid to the next is).
declare #NumRandomGroups int = 4
update o
set GroupId = x.GroupId
from #orgs o
inner join (select RID, GroupId = ntile(#NumRandomGroups) over (order by newid())
from #orgs) x
on o.RID = x.RID
select GroupId, count(1)
from #Orgs
group by GroupId
select *
from #Orgs
order by RID
You can then set #NumRandomGroups to whatever you want it to be based on the count of Organizations

Identify sub-set of records based on date and rules in SQL Server

I have a dataset that looks like this:
I need to identify the rows that have Linked set to 1 but ONLY where they are together when sorted by ToDate descending as in the picture.
In other words I want to be able to identify these records (EDITED):
This is a simplified dataset, in fact there will be many more records...
The logic that defines whether a record is linked is if the FromDate of a record is within 8 weeks of the ToDate of the preceeding date...but this is testData so may not be perfect
What's the best way to do that please?
You can use LAG() and LEAD() analytic functions:
SELECT * FROM (
SELECT t.*,
LAG(t.linked,1,0) OVER(ORDER BY t.FromDate DESC) as rnk_1, --Next one
LEAD(t.linked,1,0) OVER(ORDER BY t.FromDate DESC) as rnk_2, -- Last one,
LEAD(t.linked,2,0) OVER(ORDER BY t.FromDate DESC) as rnk_3 -- Last two,
FROM YourTable t) s
WHERE ((s.rnk_1 = 1 OR s.rnk_2 = 1) AND s.linked = 1) OR
(s.rnk_2 = 1 and s.rnk_3 = 1 and s.linked = 0)
ORDER BY s.FromDate DESC
This will result in records that have linked = 1 and the previous/next record is also 1.
Using LAG and LEAD functions you can examine the previous/next row values given a sort criteria.
You can achieve your required dataset using the following DDL:
;
WITH CTE_LagLead
AS (
SELECT FromDate,
ToDate,
NoOfDays,
Weeks,
Linked,
LAG(Linked, 1, 0) OVER (ORDER BY ToDate DESC) LinkedLag,
LEAD(Linked, 1, 0) OVER (ORDER BY ToDate DESC) LinkedLead
FROM #table
)
SELECT FromDate,
ToDate,
NoOfDays,
Weeks,
Linked
FROM CTE_LagLead
WHERE Linked = 1 AND
(LinkedLag = 1 OR
LinkedLead = 1)
ORDER BY ToDate DESC;
See working example
here is the answer I came up with:
Select
*
from
#tmpAbsences
where
idcol between 1 AND (
Select TOP 1 idcol from #tmpAbsences where Linked=0)
this includes the row 7 in the below picture:

unique chat records sql

I have DB which having 5 column as follows:
message_id
user_id_send
user_id_rec
message_date
message_details
Looking for a SQL Serve Query, I want to Filter Results from two columns (user_id_send,user_id_rec)for Given User ID based on following constrains:
Get the Latest Record (filtered on date or message_id)
Only Unique Records (1 - 2 , 2 - 1 are same so only one record will be returned which ever is the latest one)
Ordered by Descending based on message_id
SQL Query
The main purpose of this query is to get records of user_id to find out to whom he has sent messages and from whom he had received messages.
I have also attached the sheet for your reference.
Here is my try
WITH t
AS (SELECT *
FROM messages
WHERE user_id_sender = 1)
SELECT DISTINCT user_id_reciever,
*
FROM t;
WITH h
AS (SELECT *
FROM messages
WHERE user_id_reciever = 1)
SELECT DISTINCT user_id_sender,
*
FROM h;
;WITH tmpMsg AS (
SELECT M2.message_id
,M2.user_id_receiver
,M2.user_id_sender
,M2.message_date
,M2.message_details
,ROW_NUMBER() OVER (PARTITION BY user_id_receiver+user_id_sender ORDER BY message_date DESC) AS 'RowNum'
FROM messages M2
WHERE M2.user_id_receiver = 1
OR M2.user_id_sender = 1
)
SELECT T.message_id
,T.user_id_receiver
,T.user_id_sender
,T.message_date
,T.message_details
FROM tmpMsg T
WHERE RowNum <= 1
The above should fetch you the results you are looking for when you query for a particular user_id (replace the 1 with parameter e.g. #p_user_id). The user_id_receiver+user_id_sender in the PARTITION clause ensure that records with user id combinations such as 1 - 2, 2 - 1 are not selected twice.
Hope this helps.
select * from
(
select ROW_NUMBER() over (order by message_date DESC) as rowno,
* from messages
where user_id_receiver = 1
--order by message_date DESC
) T where T.rowno = 1
UNION ALL
select * from
(
select ROW_NUMBER() over (order by message_date DESC) as rowno,
* from messages
where user_id_sender = 1
-- order by message_date DESC
) T where T.rowno = 1
Explanation: For each group of user_id_sender, it orders internally by message_date desc, and then adds row numbers, and we only want the first one (chronologically last). Then do the same for user_id_receiver, and union the results together to get 1 result set with all the desired rows. You can then add your own order by clause and additional where conditions at the end as required.
Of course, this only works for any 1 user_id at a time (replace =1 with #user_id).
To get a result from all user_id's at once, is a totally different query, so I hope this helps?

SQL Calculate (time) gap between occurrences in a log

I have tables that record when certain items were sent or returned to a particular location, and I want to work out the intervals between each time a particular item is returned.
Sample data:
Item ReturnDate:
Item1, 20120101
Item1, 20120201
Item1, 20120301
Item2, 20120401
Item2, 20120601
So in this case, we can see that the there was a month gap until Item 1 was returned the first time, and another month before it was returned the second time. Item 2 came back after 2 months.
My starting point is:
Select r1.Item, r1.ReturnDate, r2.Item, r2.ReturnDate, DateDiff(m, r1.ReturnDate, r2.ReturnDate)
from Returns r1
inner join Returns r2 on r2.VehicleNo = r1.VehicleNo
However, in the this sample, each item is compared to every other instance where it has been returned - and not just the next one. So I need to limit this query so it will only compare adjacent returns.
One solution is to tag each return with an count (of the number of times that item has been returned):
Item ReturnDate, ReturnNo:
Item1, 20120101, 1
Item1, 20120201, 2
Item1, 20120301, 3
Item2, 20120401, 1
Item2, 20120601, 2
This would enable me to use the following T-SQL (or similar):
Select r1.Item, r1.ReturnDate, r2.Item, r2.ReturnDate, DateDiff(m, r1.ReturnDate, r2.ReturnDate)
from Returns r1
inner join Returns r2 on r2.VehicleNo = r1.VehicleNo
and (r1.ReturnNo + 1 = r2.ReturnNo)
My first question is whether the is a sensible/optimal approach or whether there is a better approach?
Secondly, what is the easiest/slickest means of calculating the ReturnNo?
If you are using SQL Server 2005+, use ROW_NUMBER() to do exactly what you want:
WITH RankedReturn AS
(
SELECT Item, ReturnDate,
ROW_NUMBER() OVER (PARTITION BY Item ORDER BY ReturnDate DESC) AS ReturnNo
FROM Returns
)
SELECT * FROM RankedReturn
Obviously, now that you have your CTE you can put whatever you need in the outer SELECT. I would use an OUTER APPLY for this:
WITH RankedReturn AS
(
SELECT Item, ReturnDate,
ROW_NUMBER() OVER (PARTITION BY Item ORDER BY ReturnDate DESC) AS ReturnNo
FROM Returns
)
SELECT rOuter.Item, rOuter.ReturnDate, DATEDIFF(month, prev.PrevDate, ReturnDate) AS Months
FROM RankedReturn rOuter
OUTER APPLY
(
SELECT ReturnDate AS PrevDate
FROM RankedReturn rInner
WHERE rOuter.Item = rInner.Item AND rOuter.ReturnNo = rInner.ReturnNo - 1
) prev
Oops, and the SQL Fiddle is here.
Edited because the month difference calculation was backwards; fixed now
Easiest way of calculating the ReturnNo would be to use OVER:
SELECT [Item], [ReturnDate],
ROW_NUMBER() OVER (PARTITION BY [Item] ORDER BY [ReturnDate]) AS ReturnNumber
FROM Returns
http://sqlfiddle.com/#!3/e18ad/1/0
You could also attempt to make use of the techniques for calculating a running total to work out the difference between two rows.
This is how I would do it:
select itemNo,
dt,
DATEDIFF(day, previousDt, dt) as daysSince
from (select itemNo,
dt,
(select top 1 dt from testTable where itemNo = outerTbl.itemNo and dt < outerTbl.dt order by dt desc) as previousDt
from testTable as outerTbl
) as x
... and here's a bit of setup code for anybody else testing a solution to this
create table testTable(
itemNo nvarchar(20),
dt datetime)
go
insert into testTable values('Item1', '2012-01-01');
insert into testTable values('Item1', '2012-02-01');
insert into testTable values('Item1', '2012-03-01');
insert into testTable values('Item2', '2012-04-01');
insert into testTable values('Item2', '2012-05-01');
go

Resources