Best way to select most recent record - sql-server

So the database I am using does not have a great way to select the most recent number by its unique ID. We have to narrow down to get the most recent record with a bunch of sub queries joining back to the original table. The original table is TBL_POL.
Ex.
Policy_ID Load_DATE ENDORSEMENT# SEQUENCE EXTRACTDATE
25276 8/16/2015 0 1 8/15/2015
25276 2/13/2016 1 2 2/12/2016
25276 9/24/2016 3 4 9/20/2016
25276 9/24/2016 3 4 9/20/2016
25276 9/24/2016 2 3 9/20/2016
so first we grab the max load date and join back to the original table and then grab the max endorsement # and then join back and grab the max sequence and then join back and get the max extract date to finally get back to our final record so it will be unique. Above is an example.
Is there an easier way to do this? Someone mentioned row_number() over(partition by), but I think that just returns the whatever row number you would like. I am for a quick way to grab the most record with all these above attributes in one swipe. Does anyone have a better idea to do this, because these queries take a little while to run.
Thanks

#Bryant,
First, #Backs saved this post for you. When I first looked at it I thought "Damn. If he doesn't care to spend any time making his request readable, why should I bother"? Further, if you're looking for a coded example, then it would be good to create some readily consumable test data to make it a whole lot easier for folks to help you. Also, as #Felix Pamittan suggested, you should also post what your expected return should be.
Here's one way to post readily consumable test data. I also added another Policy_ID so that I could demonstrate how to do this for a whole table instead of just one Policy_ID.
--===== If the test table doesn't already exist, drop it to make reruns in SSMS easier.
-- This is NOT a part of the solution. We're just simulating the original table
-- using a Temp Table.
IF OBJECT_ID('tempdb..#TBL_POL','U') IS NOT NULL
DROP TABLE #TBL_POL
;
--===== Create the test table (technically, a heap because no clustered index)
-- Total SWAG on the data-types because you didn't provide those, either.
CREATE TABLE #TBL_POL
(
Policy_ID INT NOT NULL
,Load_DATE DATE NOT NULL
,ENDORSEMENT# TINYINT NOT NULL
,SEQUENCE TINYINT NOT NULL
,EXTRACTDATE DATE NOT NULL
)
;
--===== Populate the test table
INSERT INTO #TBL_POL
(Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE)
SELECT Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE
FROM (VALUES
--===== Original values provided
(25276,'8/16/2015',0,1,'8/15/2015')
,(25276,'2/13/2016',1,2,'2/12/2016')
,(25276,'9/24/2016',3,4,'9/20/2016')
,(25276,'9/24/2016',3,4,'9/20/2016')
,(25276,'9/24/2016',2,3,'9/20/2016')
--===== Additional values to demo multiple Policy_IDs with
,(12345,'8/16/2015',0,1,'8/15/2015')
,(12345,'9/24/2016',1,5,'2/12/2016')
,(12345,'2/13/2016',1,2,'2/12/2016')
,(12345,'9/24/2016',3,4,'9/20/2016')
,(12345,'9/24/2016',3,4,'9/20/2016')
,(12345,'9/24/2016',2,3,'9/20/2016')
) v (Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE)
;
--===== Show what's in the test table
SELECT *
FROM #TBL_POL
;
If you are looking to resolve your question for more than one Policy_ID at a time, then the following will work.
--===== Use a partitioned windowing function to find the latest row
-- for each Policy_ID, ignoring "dupes" in the process.
-- This assumes that the "sequence" column is king of the hill.
WITH cteEnumerate AS
(
SELECT *
,RN = ROW_NUMBER() OVER (PARTITION BY Policy_ID ORDER BY SEQUENCE DESC)
FROM #TBL_POL
)
SELECT Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE
FROM cteEnumerate
WHERE RN = 1
;
If you're only looking for one Policy_ID for this, the "TOP 1" method that #ZLK suggested will work but so will adding a WHERE clause to the above. Not sure which will work faster but the same indexes will help both. Here's the solution with a WHERE clause (which could be parameterized).
--===== Use a partitioned windowing function to find the latest row
-- for each Policy_ID, ignoring "dupes" in the process.
-- This assumes that the "sequence" column is king of the hill.
WITH cteEnumerate AS
(
SELECT *
,RN = ROW_NUMBER() OVER (PARTITION BY Policy_ID ORDER BY SEQUENCE DESC)
FROM #TBL_POL
WHERE Policy_ID = 25276
)
SELECT Policy_ID,Load_DATE,ENDORSEMENT#,SEQUENCE,EXTRACTDATE
FROM cteEnumerate
WHERE RN = 1
;

May be you should try Grouping SET
Throw another sample data.
Also i am not sure about performance.
Give Feedback but result and performance both
SELECT *
FROM (
SELECT Policy_ID
,max(Load_DATE) Load_DATE
,max(ENDORSEMENT#) ENDORSEMENT#
,max(SEQUENCE) SEQUENCE
,max(EXTRACTDATE) EXTRACTDATE
FROM #TBL_POL t
GROUP BY grouping SETS(Policy_ID, Load_DATE, ENDORSEMENT#, SEQUENCE, EXTRACTDATE)
) t4
WHERE Policy_ID IS NOT NULL
drop table #TBL_POL

Related

Splitting Data from a Column into two Columns

I have data in a column called "Medication_Description". First I need to
then find out who that received Tylenol also received a second drug (Ibuprofen here).
I also only want the top two results for each MRN (i.e. patient). I only want data for the past year.
Later I will be plugging this into an SSRS report where I will determine the percentage of patients that received both drugs.
I've played with a couple different ways of getting this to work but can't get it working quite right.
The data in this table looks like this:
As for desired results, I'd like to have something like this:
Blockquote
For MRN 654321, no Ibuprofen was administered so it returns NULL (it could also return another drug name - doesn't matter too much. I just need to be able to count the results later to determine a percentage).
For MRN 246824, only one dose of Ibuprofen was administered so the second line is NULL.
Below is my latest attempt but (as you can see) Med1 and Med2 will always reflect the same exact data - how can I make Med1 reflect one medication and Med2 reflect a second?
SELECT [MRN], [Med1], [Med2], [Row_Num], [Department_Name], [Date]
FROM
( SELECT [MRN], [Medication_Description] AS [Med1], [Medication_Description] AS [Med2],
ROW_NUMBER() OVER (PARTITION BY [MRN]
ORDER BY [Medication_Description] DESC
)
AS [Row_Num],
[Date],
[Department_Name]
FROM T_Med_Orders
WHERE [DATE] BETWEEN dateadd(year,-1,getdate()) AND getdate()
AND [Department_Name] LIKE 'ICU'
--First Med Must Match "Tylenol" but 2nd should match any result...???
AND Medication_Description LIKE '%Tylenol%'
)
tmp
WHERE
[Row_Num] <= 2
AND Med1 LIKE '%Tylenol%'
--AND Med2 LIKE '%Ibuprofen%'
AND [DATE] BETWEEN dateadd(year,-1,getdate()) AND getdate()
ORDER BY [MRN]
You have a solid start, but I think you're being too ambitious with your query. While there are ways to optimize the query, using Partitions may be overkill for the current requirements. What I ended up doing was to make two CTEs where each is filtered to the individual medicine being identified. I can LEFT JOIN those to the source table and filter it to only show results where the CTE values are NOT NULL. I can also apply the other DeptName and Date clauses, although I have not done so in my code snippet.
The main drawback is that this format would require more and more CTEs if you wanted to expand to include other medicines, swiftly reducing optimization further. But without knowing how MedicationDescription is formatted (or if it even has a standard format) I can't write that for you.
WITH Tylenol_CTE AS
(SELECT *, 'Tylenol' AS [FilteredMedicine]
FROM #Temp
WHERE Medication_Description LIKE '%Tylenol%')
,Ibuprofen_CTE AS
(SELECT *, 'Ibuprofen' AS [FilteredMedicine]
FROM #Temp
WHERE Medication_Description LIKE '%Ibuprofen%')
SELECT t.*
, Tylenol_CTE.[FilteredMedicine] AS Med1
, Ibuprofen_CTE.[FilteredMedicine] AS Med2
FROM #Temp t
LEFT JOIN Tylenol_CTE
ON t.MRN = Tylenol_CTE.MRN
AND t.Date = Tylenol_CTE.Date
LEFT JOIN Ibuprofen_CTE
ON t.MRN = Ibuprofen_CTE.MRN
AND t.Date = Ibuprofen_CTE.Date
WHERE Ibuprofen_CTE.Medication_Description IS NOT NULL
AND Tylenol_CTE.Medication_Description IS NOT NULL

Subtract one column from another and get running total SQL Server 2008 R2

I hate to ask a question that it seems like the answer already exists somewhere, but I've been working through various articles over the last couple days (https://www.sqlshack.com/running-running-totals-sql-server/ , Calculate a Running Total in SQL Server, https://learn.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-2017 etc.) with minimal progress.
It seems most of the examples given provide a way to join (a self join) or some other WHERE clause to narrow the result set? Anyway, I have a table Location that has columns for PassengersOn and PassengersOff; I would like to be able to calculate the running total of passengers on board at a given time so that my resulting table might look something like this
Which shows a running total (OnBoard) for passengers after each location.
Also, I am aware of the OVER clause available in SQL Server 2012, but unfortunately I'm using 2008 R2.
Again sorry for what may be a duplicate question, but I don't see how I can limit my result set based on a join or WHERE clause since I don't have an incrementing column in my "location" table, instead it uses a guid.
EDIT: here is a sample of the table information
CREATE TABLE Query_8v2 (
[IDStopEvent] NVARCHAR(36),
[LocationID] NVARCHAR(6),
[LocName] NVARCHAR(5),
[PassOn] INT,
[PassOff] INT
);
INSERT INTO Query_8v2 VALUES
(N'f00e6b5b-eb64-4e6b-8b87-0000a539ee36',N'guid1',N'Loc01',0,0),
(N'617cbcae-b467-4adb-b994-00015bca9bb5',N'guid1',N'Loc01',0,59),
(N'215f92bc-8114-4dd0-a1e1-00016e4f0546',N'guid1',N'Loc01',0,42),
(N'e8eaaed5-dc0c-48a9-b39b-0001fc44576e',N'guid1',N'Loc01',0,0),
(N'4c54eef6-11f3-4114-ad9d-0004b1b3849d',N'guid1',N'Loc01',0,0),
(N'a29eb925-8226-4d89-8760-00063d64067a',N'guid1',N'Loc01',69,0),
(N'b16e1b1f-d481-447e-9771-000890fe6999',N'guid1',N'Loc01',0,69),
(N'4f5894ee-a246-4c9d-bc28-0008bc1b3614',N'guid1',N'Loc01',0,44),
(N'52e447cf-f900-4e49-94ca-0008c262a173',N'guid1',N'Loc01',0,0),
(N'f120f646-17f2-4bbb-879d-00091665ec7e',N'guid1',N'Loc01',0,0),
(N'3bbe56e0-c54c-4f3c-9f29-000c914cd724',N'guid1',N'Loc01',32,0),
(N'1ddda821-23f5-43a5-a86c-000d46d4cdc9',N'guid1',N'Loc01',0,0),
(N'b58dac6b-6cac-4bf3-af47-000e67b67582',N'guid1',N'Loc01',0,0),
(N'c9d52156-cc88-4c3c-9409-00103ba9afaa',N'guid1',N'Loc01',0,0),
(N'662d3006-938d-4a66-8999-00104632991b',N'guid1',N'Loc01',0,106),
(N'598d135b-3bdb-4d4b-9464-0010ab22b9eb',N'guid1',N'Loc01',0,0),
(N'c60e2801-efb8-41c3-9dad-00110aae0f2d',N'guid1',N'Loc01',0,0),
(N'72384001-56a3-413c-a847-0011125a5e31',N'guid1',N'Loc01',0,0),
(N'081a9c68-514a-4622-ab0d-00117909d029',N'guid1',N'Loc01',0,0),
(N'afac2c83-ee2e-4b79-8d0b-0011adc313e0',N'guid1',N'Loc01',0,0),
(N'a0f65fe9-79d2-470e-9885-000acccbf82f',N'guid2',N'Loc02',0,0),
(N'bd4371c6-896a-4a4c-9168-000b6e3d2bdd',N'guid2',N'Loc02',0,34),
(N'7c747187-905d-48f5-b9fd-000e233e2986',N'guid2',N'Loc02',21,0),
(N'a3e2773a-2310-4185-9b0c-00013204c0d4',N'guid3',N'Loc03',0,206),
(N'1a8e4c21-0550-411f-91ae-00018234e33d',N'guid3',N'Loc03',323,0),
(N'66ac5d5c-ef97-4041-92cb-0009412a4cec',N'guid3',N'Loc03',0,249),
(N'5b6b2d10-70e4-4953-bf4b-00099ffbc1cd',N'guid3',N'Loc03',183,0),
(N'0107bfcb-9628-42f3-8a4d-000bd42d8cff',N'guid3',N'Loc03',0,400),
(N'f4179bce-399a-417f-bcb1-000fce5ff5b1',N'guid3',N'Loc03',319,0),
(N'f3668d7f-4338-4c15-bb65-0000f5f6af85',N'guid4',N'Loc04',25,32),
(N'dad5af74-a873-46ff-8b61-0002a122850a',N'guid4',N'Loc04',19,75),
(N'e20b705a-6416-4876-aa96-0005e8e25d94',N'guid4',N'Loc04',48,40),
(N'2e3f93d1-65fa-4b13-a8db-0007e6e47b4a',N'guid4',N'Loc04',48,37),
(N'7bc78967-ef77-4fb7-a74d-0008dd88268a',N'guid4',N'Loc04',51,42),
(N'f409014f-189e-4e24-943b-00095acd2e38',N'guid4',N'Loc04',48,71),
(N'e9a6a04d-32da-45e6-a93b-000ae35cd97b',N'guid4',N'Loc04',63,13),
(N'5d719c25-8a20-4cce-85a2-000f6be996ba',N'guid4',N'Loc04',57,69),
(N'5d5a3666-a996-4220-b943-00110f627aee',N'guid4',N'Loc04',27,63),
(N'941880b8-0873-40ee-936b-0001b711fbba',N'guid5',N'Loc05',55,182),
(N'f3f360a1-3767-443e-ac19-000878a505eb',N'guid5',N'Loc05',62,41),
(N'd03d154b-ade6-4c06-af11-000b9fbcb218',N'guid5',N'Loc05',109,86),
(N'7c296996-32a5-46c5-bafd-000e49bf18ba',N'guid5',N'Loc05',126,68),
(N'72424ac3-7b47-44f2-9ffa-0003521bf7c2',N'guid6',N'Loc06',3,3),
(N'abb66bf1-9dab-4f56-a14c-00049b102c9c',N'guid6',N'Loc06',18,38),
(N'2db22514-3a92-4781-9232-000a6d701063',N'guid6',N'Loc06',88,34),
(N'c83239ba-4467-4d8d-9bb6-000b0c802255',N'guid6',N'Loc06',13,13),
(N'32649da2-bd02-44c3-af3a-000d33087fbe',N'guid6',N'Loc06',7,18),
(N'db9f9f3b-f4f0-4300-85c4-000f09011b60',N'guid6',N'Loc06',3,39),
(N'e6aa3c22-489d-4f97-b718-0002071629f1',N'guid7',N'Loc07',55,23),
(N'e648fff9-50ed-42a3-82e4-00027f22287f',N'guid7',N'Loc07',4,28),
(N'7b157c82-1819-4990-8147-0007f4dcaed6',N'guid7',N'Loc07',8,62),
(N'3ffecbf1-bd09-4ef8-b17f-00092211960b',N'guid7',N'Loc07',55,29),
(N'16eab156-126d-440d-a01b-0009a506e922',N'guid7',N'Loc07',3,23),
(N'69af7b49-ce4e-446c-9947-000a42bffa23',N'guid7',N'Loc07',7,8),
(N'd0ba9ab8-80dc-47c9-9f61-000e15b8c049',N'guid7',N'Loc07',3,69),
(N'77749016-19be-4657-b2d5-0005f60f5b5f',N'guid8',N'Loc08',0,163),
(N'7908e6ae-71be-4f3e-aa77-00078b16dbac',N'guid8',N'Loc08',201,0),
(N'10f13d13-9a5c-4ef8-960e-00084b5fa97c',N'guid8',N'Loc08',99,1),
(N'859c00b3-c907-4d90-92de-000e2b7f95d8',N'guid8',N'Loc08',2,167),
(N'e00136e2-e71e-4aed-afbf-00005f66f1b6',N'guid9',N'Loc09',0,299),
(N'ab711e41-e6e3-45b3-ad18-000597d39430',N'guid9',N'Loc09',0,158),
(N'301fada9-f0c1-4afb-aaf2-0005a7d0b3e8',N'guid9',N'Loc09',137,0),
(N'67d1a3f1-547d-495e-98c1-00080e3309b6',N'guid9',N'Loc09',67,0),
(N'a71a4103-dffc-40da-92b8-000a987987a2',N'guid9',N'Loc09',124,0),
(N'a60f9e16-e262-404e-9947-0000732dded4',N'guid10',N'Loc10',0,103),
(N'e4aab4d3-9c58-49fb-a9d7-0001350c9e74',N'guid10',N'Loc10',0,0),
(N'5e8617c7-d2c8-4fb4-b745-0001f8eac18a',N'guid10',N'Loc10',96,0),
(N'1864b5e5-fdda-4f9b-9522-0002e2afee4c',N'guid10',N'Loc10',0,59),
(N'05a93b5f-7776-437c-87b8-000314a9202c',N'guid10',N'Loc10',0,87),
(N'f0d6c884-e906-4aa0-8d01-00034d8d0ea3',N'guid10',N'Loc10',0,0),
(N'0f8c0751-92ed-445e-9bfc-000416967ce6',N'guid10',N'Loc10',0,0),
(N'5733564d-cbeb-4072-bcb5-0004ad90ffc6',N'guid10',N'Loc10',64,0),
(N'bf3209a9-bbb4-4aa2-8463-0006702865a4',N'guid10',N'Loc10',72,0),
(N'289647e7-7de0-482c-8771-00088940f560',N'guid10',N'Loc10',0,0),
(N'1a3cb8cf-dcb1-4441-8ab5-0009bf036b74',N'guid10',N'Loc10',0,0),
(N'6a7a665d-0b4b-41a5-b01a-0009ee84e02b',N'guid10',N'Loc10',73,0),
(N'b75a7e85-f929-4cc6-bf3f-000aaaab33e2',N'guid10',N'Loc10',0,0),
(N'2341b029-55af-41a0-bfa3-000be8e71efe',N'guid10',N'Loc10',0,0),
(N'0bf9396e-99fc-4bf0-9a48-000e90dc0cd2',N'guid10',N'Loc10',0,0),
(N'948b91b3-5928-4eb8-ac1a-000f2d55be2a',N'guid10',N'Loc10',0,0),
(N'50edd548-7a29-40cf-a082-000f5793b5b9',N'guid10',N'Loc10',0,0),
(N'4ad8be92-ce5c-432e-a461-000ff002d0b5',N'guid10',N'Loc10',72,0),
(N'265b0d5b-223b-4da1-9d4f-00107b652ae5',N'guid10',N'Loc10',0,0),
(N'6670c15d-de83-43f4-a5fd-0010d56c574d',N'guid10',N'Loc10',0,0);
I feel like if you were to index the table, you could do this pretty simply. I created a numbered Index via location. If you have the same route of these 10 stops and the data repeats loc1-loc10 over and over, you wouldn't want to use this. You need a primary key, or a date system to make this significantly easier.
SELECT CONVERT(INT, RIGHT(Location, LEN(Location) - 3)) AS ID
,Location
,Passengerson
,PAssengersoff
INTO #IDtable
FROM #table
SELECT ID
,Location
,Passengerson
,Passengersoff
,(SELECT SUM(passengerson-passengersoff)
FROM #IDtable b
WHERE b.ID <= a.ID) AS Total
FROM #IDtable a
ORDER BY ID
this will give a result that looks like this:
Hopefully this helps you in some way.
I don't have access to 2008 R2 but this solution should work from 2005+. It's mid page on the second link provided.
;with MyData as(
select 'loc1' [location], 69 PassengersOn, 0 PassengersOff
union all select 'loc2',61,0
union all select 'loc3',333,0
union all select 'loc4',57,21
union all select 'loc5',49,29
)
SELECT
MyData.*
,RunningTotal.*
FROM MyData
cross apply (select
--SUM(PassengersOn) PassOn
--,SUM(PassengersOff) PassOff
SUM(PassengersOn) - SUM(PassengersOff) as RunningTot
from MyData MyDataApply
where MyDataApply.[location]<= MyData.[location]
) as RunningTotal
ORDER BY MyData.[location]

Highlight Duplicate Values in a NetSuite Saved Search

I am looking for a way to highlight duplicates in a NetSuite saved search. The duplicates are in a column called "ACCOUNT" populated with text values.
NetSuite permits adding fields (columns) to the search using a stripped down version of SQL Server. It also permits conditional highlighting of entire rows using the same code. However I don't see an obvious way to compare values between rows of data.
Although duplicates can be grouped together in a summary report and identified by a count of 2 or more, I want to show duplicate lines separately and highlight each.
The closest thing I found was a clever formula that calculates a running total here:
sum/* comment */({amount})
OVER(PARTITION BY {name}
ORDER BY {internalid}
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
I wonder if it's possible to sort results by the field being checked for duplicates and adapt this code to identify changes in the "ACCOUNT" field between a row and the previous row.
Any ideas? Thanks!
This post has been edited. I have left the progression as a learning experience about NetSuite.
Original - plain SQL way - not suitable for NetSuite
Does something like this meet your needs? The test data assumes looking for duplicates on id1 and id2. Note: This does not work in NetSuite as it supports limited SQL functions. See comments for links.
declare #table table (id1 int, id2 int, value int);
insert #table values
(1,1,11),
(1,2,12),
(1,3,13),
(2,1,21),
(2,2,22),
(2,3,23),
(1,3,1313);
--select * from #table order by id1, id2;
select t.*,
case when dups.id1 is not null then 1 else 0 end is_dup --identify dups when there is a matching dup record
from #table t
left join ( --subquery to find duplicates
select id1, id2
from #table
group by id1, id2
having count(1) > 1
) dups
on dups.id1 = t.id1
and dups.id2 = t.id2
order by t.id1, t.id2;
First Edit - NetSuite target but in SQL.
This was a SQL test based on the example available syntax provided in the question since I do not have NetSuite to test against. This will give you a value greater than 1 on each duplicate row using a similar syntax. Note: This will give the appropriate answer but not in NetSuite.
select t.*,
sum(1) over (partition by id1, id2)
from #table t
order by t.id1, t.id2;
Second Edit - Working NetSuite version
After some back and forth here is a version that works in NetSuite:
sum/* comment */(1) OVER(PARTITION BY {name})
This will also give a value greater than 1 on any row that is a duplicate.
Explanation
This works by summing the value 1 on each row included in the partition. The partition column(s) should be what you consider a duplicate. If only one column makes a duplicate (e.g. user ID) then use as above. If multiple columns make a duplicate (e.g. first name, last name, city) then use a comma-separated list in the partition. SQL will basically group the rows by the partition and add up the 1s in the sum/* comment */(1). The example provided in the question sums an actual column. By summing 1 instead we will get the value 1 when there is only 1 ID in the partition. Anything higher is a duplicate. I guess you could call this field duplicate count.

SQL Get Second Record

I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.
You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).
Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx
Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.

Preserving ORDER BY in SELECT INTO

I have a T-SQL query that takes data from one table and copies it into a new table but only rows meeting a certain condition:
SELECT VibeFGEvents.*
INTO VibeFGEventsAfterStudyStart
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id
The code using the table relies on its order, and the copy above does not preserve the order I expected. I.e. the rows in the new table VibeFGEventsAfterStudyStart are not monotonically increasing in the VibeFGEventsAfterStudyStart.id column copied from VibeFGEvents.id.
In T-SQL how might I preserve the ordering of the rows from VibeFGEvents in VibeFGEventsStudyStart?
I know this is a bit old, but I needed to do something similar. I wanted to insert the contents of one table into another, but in a random order. I found that I could do this by using select top n and order by newid(). Without the 'top n', order was not preserved and the second table had rows in the same order as the first. However, with 'top n', the order (random in my case) was preserved. I used a value of 'n' that was greater than the number of rows. So my query was along the lines of:
insert Table2 (T2Col1, T2Col2)
select top 10000 T1Col1, T1Col2
from Table1
order by newid()
What for?
Point is – data in a table is not ordered. In SQL Server the intrinsic storage order of a table is that of the (if defined) clustered index.
The order in which data is inserted is basically "irrelevant". It is forgotten the moment the data is written into the table.
As such, nothing is gained, even if you get this stuff. If you need an order when dealing with data, you HAVE To put an order by clause on the select that gets it. Anything else is random - i.e. the order you et data is not determined and may change.
So it makes no sense to have a specific order on the insert as you try to achieve.
SQL 101: sets have no order.
Just add top to your sql with a number that is greater than the actual number of rows:
SELECT top 25000 *
into spx_copy
from SPX
order by date
I've found a specific scenario where we want the new table to be created with a specific order in the columns' content:
Amount of rows is very big (from 200 to 2000 millions of rows), so we are using SELECT INTO instead of CREATE TABLE + INSERT because needs to be loaded as fast as possible (minimal logging). We have tested using the trace flag 610 for loading an already created empty table with a clustered index but still takes longer than the following approach.
We need the data to be ordered by specific columns for query performances, so we are creating a CLUSTERED INDEX just after the table is loaded. We discarded creating a non-clustered index because it would need another read for the data that's not included in the ordered columns from the index, and we discarded creating a full-covering non-clustered index because it would practically double the amount of space needed to hold the table.
It happens that if you manage to somehow create the table with columns already "ordered", creating the clustered index (with the same order) takes a lot less time than when the data isn't ordered. And sometimes (you will have to test your case), ordering the rows in the SELECT INTO is faster than loading without order and creating the clustered index later.
The problem is that SQL Server 2012+ will ignore the ORDER BY column list when doing INSERT INTO or when doing SELECT INTO. It will consider the ORDER BY columns if you specify an IDENTITY column on the SELECT INTO or if the inserted table has an IDENTITY column, but just to determine the identity values and not the actual storage order in the underlying table. In this case, it's likely that the sort will happen but not guaranteed as it's highly dependent on the execution plan.
A trick we have found is that doing a SELECT INTO with the result of a UNION ALL makes the engine perform a SORT (not always an explicit SORT operator, sometimes a MERGE JOIN CONCATENATION, etc.) if you have an ORDER BY list. This way the select into already creates the new table in the order we are going to create the clustered index later and thus the index takes less time to create.
So you can rewrite this query:
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
ORDER BY -- ORDER BY is ignored!
FirstColumn,
SecondColumn
to
SELECT
FirstColumn = T.FirstColumn,
SecondColumn = T.SecondColumn
INTO
#NewTable
FROM
VeryBigTable AS T
UNION ALL
-- A "fake" row to be deleted
SELECT
FirstColumn = 0,
SecondColumn = 0
ORDER BY
FirstColumn,
SecondColumn
We have used this trick a few times, but I can't guarantee it will always sort. I'm just posting this as a possible workaround in case someone has a similar scenario.
You cannot do this with ORDER BY but if you create a Clustered Index on VibeFGEvents.id after your SELECT INTO the table will be sorted on disk by VibeFGEvents.id.
I'v made a test on MS SQL 2012, and it clearly shows me, that insert into ... select ... order by makes sense. Here is what I did:
create table tmp1 (id int not null identity, name sysname);
create table tmp2 (id int not null identity, name sysname);
insert into tmp1 (name) values ('Apple');
insert into tmp1 (name) values ('Carrot');
insert into tmp1 (name) values ('Pineapple');
insert into tmp1 (name) values ('Orange');
insert into tmp1 (name) values ('Kiwi');
insert into tmp1 (name) values ('Ananas');
insert into tmp1 (name) values ('Banana');
insert into tmp1 (name) values ('Blackberry');
select * from tmp1 order by id;
And I got this list:
1 Apple
2 Carrot
3 Pineapple
4 Orange
5 Kiwi
6 Ananas
7 Banana
8 Blackberry
No surprises here. Then I made a copy from tmp1 to tmp2 this way:
insert into tmp2 (name)
select name
from tmp1
order by id;
select * from tmp2 order by id;
I got the exact response like before. Apple to Blackberry.
Now reverse the order to test it:
delete from tmp2;
insert into tmp2 (name)
select name
from tmp1
order by id desc;
select * from tmp2 order by id;
9 Blackberry
10 Banana
11 Ananas
12 Kiwi
13 Orange
14 Pineapple
15 Carrot
16 Apple
So the order in tmp2 is reversed too, so order by made sense when there is a identity column in the target table!
The reason why one would desire this (a specific order) is because you cannot define the order in a subquery, so, the idea is that, if you create a table variable, THEN make a query from that table variable, you would think you would retain the order(say, to concatenate rows that must be in order- say for XML or json), but you can't.
So, what do you do?
The answer is to force SQL to order it by using TOP in your select (just pick a number high enough to cover all your rows).
I have run into the same issue and one reason I have needed to preserve the order is when I try to use ROLLUP to get a weighted average based on the raw data and not an average of what is in that column. For instance, say I want to see the average of profit based on number of units sold by four store locations? I can do this very easily by creating the equation Profit / #Units = Avg. Now I include a ROLLUP in my GROUP BY so that I can also see the average across all locations. Now I think to myself, "This is good info but I want to see it in order of Best Average to Worse and keep the Overall at the bottom (or top) of the list)." The ROLLUP will fail you in this so you take a different approach.
Why not create row numbers based on the sequence (order) you need to preserve?
SELECT OrderBy = ROW_NUMBER() OVER(PARTITION BY 'field you want to count' ORDER BY 'field(s) you want to use ORDER BY')
, VibeFGEvents.*
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
Now you can use the OrderBy field from your table to set the order of values. I removed the ORDER BY statement from the query above since it does not affect how the data is loaded to the table.
I found this approach helpful to solve this problem:
WITH ordered as
(
SELECT TOP 1000
[Month]
FROM SourceTable
GROUP BY [Month]
ORDER BY [Month]
)
INSERT INTO DestinationTable (MonthStart)
(
SELECT * from ordered
)
Try using INSERT INTO instead of SELECT INTO
INSERT INTO VibeFGEventsAfterStudyStart
SELECT VibeFGEvents.*
FROM VibeFGEvents
LEFT OUTER JOIN VibeFGEventsStudyStart
ON
CHARINDEX(REPLACE(REPLACE(REPLACE(logName, 'MyVibe ', ''), ' new laptop', ''), ' old laptop', ''), excelFilename) > 0
AND VibeFGEventsStudyStart.MIN_TitleInstID <= VibeFGEvents.TitleInstID
AND VibeFGEventsStudyStart.MIN_WinInstId <= VibeFGEvents.WndInstID
WHERE VibeFGEventsStudyStart.excelFilename IS NOT NULL
ORDER BY VibeFGEvents.id`

Resources