I have data in a column called "Medication_Description". First I need to
then find out who that received Tylenol also received a second drug (Ibuprofen here).
I also only want the top two results for each MRN (i.e. patient). I only want data for the past year.
Later I will be plugging this into an SSRS report where I will determine the percentage of patients that received both drugs.
I've played with a couple different ways of getting this to work but can't get it working quite right.
The data in this table looks like this:
As for desired results, I'd like to have something like this:
Blockquote
For MRN 654321, no Ibuprofen was administered so it returns NULL (it could also return another drug name - doesn't matter too much. I just need to be able to count the results later to determine a percentage).
For MRN 246824, only one dose of Ibuprofen was administered so the second line is NULL.
Below is my latest attempt but (as you can see) Med1 and Med2 will always reflect the same exact data - how can I make Med1 reflect one medication and Med2 reflect a second?
SELECT [MRN], [Med1], [Med2], [Row_Num], [Department_Name], [Date]
FROM
( SELECT [MRN], [Medication_Description] AS [Med1], [Medication_Description] AS [Med2],
ROW_NUMBER() OVER (PARTITION BY [MRN]
ORDER BY [Medication_Description] DESC
)
AS [Row_Num],
[Date],
[Department_Name]
FROM T_Med_Orders
WHERE [DATE] BETWEEN dateadd(year,-1,getdate()) AND getdate()
AND [Department_Name] LIKE 'ICU'
--First Med Must Match "Tylenol" but 2nd should match any result...???
AND Medication_Description LIKE '%Tylenol%'
)
tmp
WHERE
[Row_Num] <= 2
AND Med1 LIKE '%Tylenol%'
--AND Med2 LIKE '%Ibuprofen%'
AND [DATE] BETWEEN dateadd(year,-1,getdate()) AND getdate()
ORDER BY [MRN]
You have a solid start, but I think you're being too ambitious with your query. While there are ways to optimize the query, using Partitions may be overkill for the current requirements. What I ended up doing was to make two CTEs where each is filtered to the individual medicine being identified. I can LEFT JOIN those to the source table and filter it to only show results where the CTE values are NOT NULL. I can also apply the other DeptName and Date clauses, although I have not done so in my code snippet.
The main drawback is that this format would require more and more CTEs if you wanted to expand to include other medicines, swiftly reducing optimization further. But without knowing how MedicationDescription is formatted (or if it even has a standard format) I can't write that for you.
WITH Tylenol_CTE AS
(SELECT *, 'Tylenol' AS [FilteredMedicine]
FROM #Temp
WHERE Medication_Description LIKE '%Tylenol%')
,Ibuprofen_CTE AS
(SELECT *, 'Ibuprofen' AS [FilteredMedicine]
FROM #Temp
WHERE Medication_Description LIKE '%Ibuprofen%')
SELECT t.*
, Tylenol_CTE.[FilteredMedicine] AS Med1
, Ibuprofen_CTE.[FilteredMedicine] AS Med2
FROM #Temp t
LEFT JOIN Tylenol_CTE
ON t.MRN = Tylenol_CTE.MRN
AND t.Date = Tylenol_CTE.Date
LEFT JOIN Ibuprofen_CTE
ON t.MRN = Ibuprofen_CTE.MRN
AND t.Date = Ibuprofen_CTE.Date
WHERE Ibuprofen_CTE.Medication_Description IS NOT NULL
AND Tylenol_CTE.Medication_Description IS NOT NULL
Related
This is my first Stackflow question, I hope someone can help me out with this. I I am completely lost and a newbie at SQL.
I have two tables (which I overly simplified for this question), the first one has the customer info and the car tire that they need. The second one is simply filled with a tire id, and all of the information for the tires. I am trying to input only the customer ID and return the one closest tire that matches the input along with the values of both the selected tire and the customer's tire. The matches also need to be prioritized in that order (size most important, width next most important, ratio is least important). Any suggestions on how to do this or where to start? Is there anything I can look at to help me solve this problem? I have been trying many different procedures, and some nested selects, but nothing is getting me close. Thank you.
customertable (custno, custsize, custwidth, custratio)
1,17,255,50
2,16,235,50
etc...
tirecollection (tireid, tiresize, tirewidth, tireratio)
1,15,225,40
2,16,225,50
3,17,250,55
4,17,235,30
5,18,255,40
etc...
This is not a 100% complete solution, but may work towards coming up with a solution. The approach here is combining the tyre dimensions into one value and then ranking them within a tyre size partition. You could then pass in the customer tyre dimensions to get the closest match.
with CTE
as
(
select *, TyreSize + TyreWidth as [TyreDimensions]
from tblTyres
)
select TC.CustId, C.TyreId, C.TyreSize, C.TyreWidth, C.[TyreDimensions],
rank() over(partition by C.TyreSize order by C.[TyreDimensions]) as [RNK]
from tblTyreCustomer as TC
join CTE as C
on TC.CustTyreSize = C.TyreSize
Assuming you're running SQL Server 2008 or later, this should work (this assumes you want to get a result for a single customer on a case-by-case basis):
CREATE FUNCTION udf.GetClosestTireMatch
(
#CustomerNo int
)
RETURNS TABLE
AS RETURN
SELECT custno, tireid, tiresize, tirewidth, tireratio
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY sizediff, widthdiff, ratiodiff) AS rownum
, c.custno, c.custsize, c.custwidth, c.custratio, t.tireid, t.tiresize, t.tirewidth, t.tireratio
, ABS(c.custsize-t.tiresize) AS sizediff, ABS(c.custwidth-t.tirewidth) AS widthdiff, ABS(c.custratio-t.tireratio) AS ratiodiff
FROM (SELECT * FROM customertable WHERE custno = #CustomerNo) c
CROSS JOIN tirecollection
) sub
WHERE rownum = 1
GO
Then you run the function with:
SELECT * FROM udf.GetClosestTireMatch(5)
(where 5=the customernumber you're querying).
So the database I am using does not have a great way to select the most recent number by its unique ID. We have to narrow down to get the most recent record with a bunch of sub queries joining back to the original table. The original table is TBL_POL.
Ex.
Policy_ID Load_DATE ENDORSEMENT# SEQUENCE EXTRACTDATE
25276 8/16/2015 0 1 8/15/2015
25276 2/13/2016 1 2 2/12/2016
25276 9/24/2016 3 4 9/20/2016
25276 9/24/2016 3 4 9/20/2016
25276 9/24/2016 2 3 9/20/2016
so first we grab the max load date and join back to the original table and then grab the max endorsement # and then join back and grab the max sequence and then join back and get the max extract date to finally get back to our final record so it will be unique. Above is an example.
Is there an easier way to do this? Someone mentioned row_number() over(partition by), but I think that just returns the whatever row number you would like. I am for a quick way to grab the most record with all these above attributes in one swipe. Does anyone have a better idea to do this, because these queries take a little while to run.
Thanks
#Bryant,
First, #Backs saved this post for you. When I first looked at it I thought "Damn. If he doesn't care to spend any time making his request readable, why should I bother"? Further, if you're looking for a coded example, then it would be good to create some readily consumable test data to make it a whole lot easier for folks to help you. Also, as #Felix Pamittan suggested, you should also post what your expected return should be.
Here's one way to post readily consumable test data. I also added another Policy_ID so that I could demonstrate how to do this for a whole table instead of just one Policy_ID.
--===== If the test table doesn't already exist, drop it to make reruns in SSMS easier.
-- This is NOT a part of the solution. We're just simulating the original table
-- using a Temp Table.
IF OBJECT_ID('tempdb..#TBL_POL','U') IS NOT NULL
DROP TABLE #TBL_POL
;
--===== Create the test table (technically, a heap because no clustered index)
-- Total SWAG on the data-types because you didn't provide those, either.
CREATE TABLE #TBL_POL
(
Policy_ID INT NOT NULL
,Load_DATE DATE NOT NULL
,ENDORSEMENT# TINYINT NOT NULL
,SEQUENCE TINYINT NOT NULL
,EXTRACTDATE DATE NOT NULL
)
;
--===== Populate the test table
INSERT INTO #TBL_POL
(Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE)
SELECT Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE
FROM (VALUES
--===== Original values provided
(25276,'8/16/2015',0,1,'8/15/2015')
,(25276,'2/13/2016',1,2,'2/12/2016')
,(25276,'9/24/2016',3,4,'9/20/2016')
,(25276,'9/24/2016',3,4,'9/20/2016')
,(25276,'9/24/2016',2,3,'9/20/2016')
--===== Additional values to demo multiple Policy_IDs with
,(12345,'8/16/2015',0,1,'8/15/2015')
,(12345,'9/24/2016',1,5,'2/12/2016')
,(12345,'2/13/2016',1,2,'2/12/2016')
,(12345,'9/24/2016',3,4,'9/20/2016')
,(12345,'9/24/2016',3,4,'9/20/2016')
,(12345,'9/24/2016',2,3,'9/20/2016')
) v (Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE)
;
--===== Show what's in the test table
SELECT *
FROM #TBL_POL
;
If you are looking to resolve your question for more than one Policy_ID at a time, then the following will work.
--===== Use a partitioned windowing function to find the latest row
-- for each Policy_ID, ignoring "dupes" in the process.
-- This assumes that the "sequence" column is king of the hill.
WITH cteEnumerate AS
(
SELECT *
,RN = ROW_NUMBER() OVER (PARTITION BY Policy_ID ORDER BY SEQUENCE DESC)
FROM #TBL_POL
)
SELECT Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE
FROM cteEnumerate
WHERE RN = 1
;
If you're only looking for one Policy_ID for this, the "TOP 1" method that #ZLK suggested will work but so will adding a WHERE clause to the above. Not sure which will work faster but the same indexes will help both. Here's the solution with a WHERE clause (which could be parameterized).
--===== Use a partitioned windowing function to find the latest row
-- for each Policy_ID, ignoring "dupes" in the process.
-- This assumes that the "sequence" column is king of the hill.
WITH cteEnumerate AS
(
SELECT *
,RN = ROW_NUMBER() OVER (PARTITION BY Policy_ID ORDER BY SEQUENCE DESC)
FROM #TBL_POL
WHERE Policy_ID = 25276
)
SELECT Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE
FROM cteEnumerate
WHERE RN = 1
;
May be you should try Grouping SET
Throw another sample data.
Also i am not sure about performance.
Give Feedback but result and performance both
SELECT *
FROM (
SELECT Policy_ID
,max(Load_DATE) Load_DATE
,max(ENDORSEMENT#) ENDORSEMENT#
,max(SEQUENCE) SEQUENCE
,max(EXTRACTDATE) EXTRACTDATE
FROM #TBL_POL t
GROUP BY grouping SETS(Policy_ID, Load_DATE, ENDORSEMENT#, SEQUENCE, EXTRACTDATE)
) t4
WHERE Policy_ID IS NOT NULL
drop table #TBL_POL
I need to set a "waived" flag in my table for all but the newest result per id. I thought I had a query that will work here, but when I run a select on the query, I'm getting incorrect results - I saw one case where it selected both of the only two results for a particular id. I'm also getting multiple results with the same exact data.
What am I doing wrong here?
Here's my select statement:
select t.test_row_id, t.test_result_id, t.waived, t.pass, t.comment
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0
order by t.test_row_id
Here's the actual query I want to run:
update EV.Test_Result
set waived = 1
from EV.Test_Result
join EV.Test_Result as t on EV.Test_Result.test_row_id = t.test_row_id and EV.Test_Result.start_time < t.start_time and t.device_id = 1219 and t.waived = 0
If I understand this correctly, you are having problems because the Cardinality of the ON predicate returns all matching rows.
EV.Test_Result.test_row_id = t.test_row_id
and EV.Test_Result.start_time < t.start_time
This ON will compare all of the start_time values that have the same id and return every combination of result sets where start_time is lesser than the t.start_time. Clearly, this is not what you want.
and t.device_id = 1219
and t.waived = 0
This is actually a predicate (ON technically is one), but I would prefer to use this in a subquery/CTE for several reasons: You limit the number of rows SQL has to retrieve and compare.
Something like the following might be what you needed:
SELECT A.test_row_id
, A.test_result_id
, A.waived
, A.pass
, A.comment
FROM EV.Test_Result A
INNER JOIN (SELECT MAX(start_time) AS start_time
, test_row_id
FROM EV.Test_Result
WHERE device_id = 1219
AND waived = 0
GROUP BY test_row_id
) AS T ON A.test_row_id = T.test_row_id
AND A.start_time < T.start_time
ORDER BY A.test_row_id
This query then returns a 1:M relationship between the values in the ON predicate, unlike the M:M query you had run.
UPDATE:
Since I sheepishly screwed up trying to alter my Query on SO, I'll redeem myself by explaining the physical and logical orders of basic SQL Query operators:
As you know, you write a simple SELECT statement like the following:
SELECT <aggregate column>, SUM(<non-aggregate column>) AS Cost
FROM <table_name>
WHERE <column> = 'some_value'
GROUP BY <aggregate column>
HAVING SUM(<non-aggregate column>) > some_value
ORDER BY <column>
Note that if you use a aggregate function, all other columns MUST appear in the GROUP BY or another function.
Now, SQL Server requires them to be written in that order although it actually processes this logically by the following order that is worth memorizing:
FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY
There are more details found on SELECT - MSDN, but this is why any columns in the SELECT operator must be in the group by or in a aggregate function (SUM, MIN, MAX, etc)...and also why my lazy code failed on your first attempt. :/
Note also that the ORDER BY is last (technically TOP operator occurs after this), and that without it the result is not deterministic unless a function such as DENSE_RANK enforces it (thought this occurs in the SELECT statement).
Hope this helps solve the problem and better yet how SQL works. Cheers
Can you try ROW_NUMBER () function order by timestamp descending and filtering out values having ROW_NUMBER 1 ;
Below query should fetch all records per id except the latest one
I tried below query in Oracle with a table having fields : id,user_id, record_order adn timestamp and it worked :
select
<table_name_alias>.*
from
(
select
id,
user_id,
row_number() over (partition by id order by record_order desc) as record_number
from
<your_table_name>
) <table_name_alias>
where
record_number <>1;
If you are using Teradata DB, you can also try QUALIFY statement. I'm not sure if all DBs support this.
Select
table_name.*
from table_name
QUALIFY row_number() over (partition by id order by record_order desc) <>1;
I hope you can help/guide me, I have one working CTE and I would like to add two more SQL queries to have one SQL statement and I am encountering various type of error as I try to play around with the parameters...
I have less experience in CTE so please bear with me...my purpose is to create an SSRS report using report builder and I want to combine the output of each query into a column bar...
Here is the working CTE which is my first Column Bar in my SSRS report...
WITH Cnt AS (
select Count(Distinct UserID) as Entitled_Users, DATEFROMPARTS(YEAR(t.WhenAddedToGroup),MONTH(t.WhenAddedToGroup),1) as When_Added_To_Group
from Membership t
where FirstName not like '%test%' and LastName not like '%test%' and FirstName not like '%user%' and LastName not like '%user%' and Account_Disabled not like 'YES' and Obj_Type not like 'NON_USER' and Region not like 'EMEA' and Region not like 'CCLA' and SecGroup IN ('SecurityGroup1', 'SecurityGroup2', 'SecurityGroup3', 'SecurityGroup4')and curr_member like 'yes' and ATTUID is not null and WhenAddedToGroup is not null
Group By DATEFROMPARTS(YEAR(t.WhenAddedToGroup),MONTH(t.WhenAddedToGroup),1)
)
Select When_Added_To_Group, Entitled_Users, (Select SUM(t2.Entitled_Users) as Entitled_Users
from Cnt T2
where T2.When_Added_To_Group <=T1.When_Added_To_Group) as Running_Total
from Cnt T1
Below is the second SQL query that I am hoping I can merge or join with the above CTE which will be the second column bar of my Column Report in SSRS..
SELECT c.EventType, DATEFROMPARTS(YEAR(c.Event_Date),MONTH(c.Event_Date),1) as Concurrent_Date, MAX(c.MAX_Concurrent_Users) as Peak_Concurrent_Users, c.Hub
FROM vNon_Concurrent_Users c
where c.EventType like 'Broker_Daily_Max_Users' and c.Hub like 'TOK Hub'
group by DATEFROMPARTS(YEAR(c.Event_Date),MONTH(c.Event_Date),1), c.Hub, c.EventType
below is my third SQL Query which will be the third column bar of my SSRS report...
select hs.hub, hs.NAME, DATEFROMPARTS(YEAR(hs.BOOT_TIME),MONTH(hs.BOOT_TIME),1) as Host_Boot_Time,
ROUND(CAST(hs.CPU_CORE_COUNT as FLOAT)*hs.CPU_Hz*count(cast(hs.HOSTID as BIGINT))/800000000,0) as Host_Capacity
from HVD_VPXV_HOSTS as hs WITH (NOLOCK,NOWAIT)
where hs.hub like 'TOK Hub'
group by hs.hub, hs.CPU_CORE_COUNT, hs.CPU_Hz, hs.NAME, hs.BOOT_TIME
order by hs.BOOT_TIME desc
the where clause of the three queries are:
When_Added_To_Group=Concurrent_date,
When_Added_To_Group=Host_Boot_time
as mentioned above I tried playing around with the paramters but I get different errors...below is one (of many) statement which I tried that I can give as an example:
WITH Cnt AS (
select Count(Distinct UserID) as Entitled_Users, DATEFROMPARTS(YEAR(t.WhenAddedToGroup),MONTH(t.WhenAddedToGroup),1) as When_Added_To_Group
from HVDMembership t
where FirstName not like '%test%' and LastName not like '%test%' and FirstName not like '%user%' and LastName not like '%user%' and Account_Disabled not like 'YES' and Obj_Type not like 'NON_USER' and Region not like 'EMEA' and Region not like 'CCLA' and SecGroup IN ('SecurityGroup1', 'SecurityGroup2', 'SecurityGroup3', 'SecurityGroup4')and curr_member like 'yes' and ATTUID is not null and WhenAddedToGroup is not null
Group By DATEFROMPARTS(YEAR(t.WhenAddedToGroup),MONTH(t.WhenAddedToGroup),1)
)
, Concurrent
as (
SELECT c.EventType, DATEFROMPARTS(YEAR(c.Event_Date),MONTH(c.Event_Date),1) as Concurrent_Date, MAX(c.MAX_Concurrent_Users) as Peak_Concurrent_Users, c.Hub
FROM vNon_Concurrent_Users c
where c.EventType like 'Broker_Daily_Max_Users' and c.Hub like 'TOK Hub'
group by DATEFROMPARTS(YEAR(c.Event_Date),MONTH(c.Event_Date),1), c.Hub, c.EventType
)
, Capacity
as (
select hs.hub, hs.NAME, DATEFROMPARTS(YEAR(hs.BOOT_TIME),MONTH(hs.BOOT_TIME),1) as Host_Boot_Time,
ROUND(CAST(hs.CPU_CORE_COUNT as FLOAT)*hs.CPU_Hz*count(cast(hs.HOSTID as BIGINT))/800000000,0) as Host_Capacity
from HVD_VPXV_HOSTS as hs WITH (NOLOCK,NOWAIT)
where hs.hub like 'TOK Hub'
group by hs.hub, hs.CPU_CORE_COUNT, hs.CPU_Hz, hs.NAME, hs.BOOT_TIME
)
Select When_Added_To_Group, Entitled_Users, (Select SUM(t2.Entitled_Users) as Entitled_Users
from Cnt T2
where T2.When_Added_To_Group <=T1.When_Added_To_Group) as Running_Total,
Select Peak_concurrent_users, concurrent_date from Concurrent,
select SUM(Host_Capacity), Host_Boot_Time from Capacity
where When_Added_To_Group=Concurrent_date, When_Added_To_Group=Host_Boot_time
from Cnt T1
Thanks in advance.
CTEs must be referenced only in the first SQL statement after the CTE. You have multiple selects. That is your first problem.
The second problem is that your CTEs don't appear to have data that is related to each other. Without knowing what you would expect as the results from sample data, it is pretty much impossible to advise you on how you should do the query. I can't make any meaning out of what you are trying to do, so I don't know what query you want to write. I could randomly suggest 3-4 different alternative queries but that would likely be less useful to you than asking you to provide information on what you are trying for and what your business rules should be and sample data. You simply cannot effectively write queries without understanding the underlying meaning of the data.
I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.
You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).
Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx
Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.