SQL Server ORDER BY Multiple values in case statement - sql-server

I am trying to do an order by with several different columns. The first column has several conditions.
The base order was something like
SELECT *
FROM Table T
ORDER BY T.A, T.B, T.C
Most of the time column A is an int. Sometimes A is an int with a letter appended at the end. I want the order by to be my the number portion. I was able to achieve that by modifying the query to the following which has been working for months now.
SELECT *
FROM Table T
ORDER BY
CASE WHEN ISNUMERIC(a.[HUDLine]) = 0 THEN CAST(SUBSTRING(T.A,1, PATINDEX('%[^0-9]%',T.A - 1) AS INT)
ELSE CAST (T.A AS INT) end
, T.B, T.C
Recently a new requirement came up which allows the value of "OFFLINE" to exist in column A.
I want to modify the ORDER BY to keep the same logic as before with the exception of all records with "OFFLINE" are at the end.

First, I am very sorry about your requirements, it makes absolutely no sense that bizzarely mixed values are being stored in a varchar column.
Second try this
CASE WHEN a.[HUDLine] = 'OFFLINE'
THEN 2147483647 --This is the maximum signed int value
ELSE
(CASE WHEN ISNUMERIC(a.[HUDLine]) = 0
THEN CAST(SUBSTRING(T.A,1, PATINDEX('%[^0-9]%',T.A - 1) AS INT)
ELSE CAST (T.A AS INT)
END)
END
Show that to whoever is in charge of your datamodel, and politely ask them store meanigful numeric data in a numeric column.

Related

Is there a good expression for the maximum character value in an SQL collation?

I'm building a simple SQL report that assembles possible product titles from several tables and displays unnamed products last. We have a setup that allows individual locations to override the product title from a master product table, and I thought I could do something like
SELECT a.ProdCode, COALESCE(a.ProdNameOverride, m.ProdName, '') AS ProdName
FROM ProdInventory a
INNER JOIN MasterProdTable m ON a.ProdCode = m.ProdCode
WHERE a.ProdLocation = #ReportProdLocation
ORDER BY COALESCE(a.ProdNameOverride, m.ProdName, char(255)) ASC, a.ProdCode ASC
because, hey, char(255) has to sort after all of the other possible ASCII characters, right?
Well, no. It's a diacritical Y, which in the standard (SQL_Latin1_General_CP1_CI_AS) collation gets sorted before Z.
I eventually just resorted to brute force, finding that char(254) sorted after the conventional alphanumerics, but that got me curious - is there a reliable way to assign something "the last possible value in the relevant collation"?
If you want to find the last character in a varchar collation, you can just create a table with all possible characters and sort it. eg:
declare #chars table
(
CodePoint binary(1) primary key,
Character char(1) collate Arabic_CI_AI_KS_WS
)
declare #codePoint binary(1) = 0x0
while (#codePoint < 255)
begin
insert into #chars(CodePoint,Character)
values (#codePoint, cast(#codePoint as char(1)));
set #codePoint += 1;
end
Select *
from #chars
order by Character
Even if it were reliable, sort by what you actually want last.
This report is assuming nobody could ever enter a product override that began with the last possible character. You can't be sure of that, even if char(254) doesn't show up on conventional keyboards.
If you want products to appear last when they have a NULL override and main product name, construct a sort using CASE WHEN around that condition, as:
SELECT a.ProdCode, COALESCE(a.ProdNameOverride, m.ProdName, '') AS ProdName
FROM ProdInventory a
INNER JOIN MasterProdTable m ON a.ProdCode = m.ProdCode
WHERE a.ProdLocation = #ReportProdLocation
ORDER BY CASE WHEN a.ProdNameOverride IS NULL AND m.ProdName IS NULL THEN 1 ELSE 0 END ASC,
COALESCE(a.ProdNameOverride, m.ProdName) ASC, a.ProdCode ASC

Order by string returns wrong order

I am using a column which is named ItemCode. ItemCode is Varchar(50) type.
Here is my query
Select * from Inventory order by ItemCode
So, now my result is looks like
ItemCode-1
ItemCode-10
ItemCode-2
ItemCode-20
And so on.
How can I order my string as the example below?
ItemCode-1
ItemCode-2
ItemCode-10
ItemCode-20
Should I convert my column as number? Also I mention that I have some fields that contain no number.
You could order by the numbers as
SELECT Str
FROM
(
VALUES
('ItemCode-1'),
('ItemCode-10'),
('ItemCode-2'),
('ItemCode-20')
) T(Str)
ORDER BY CAST(RIGHT(Str, LEN(Str) - CHARINDEX('-', Str)) AS INT)
Note: Since you tagged your Q with SQL Server 2008 tag, you should upgrade as soon as possible because it's out of support.
UPDATE:
Since you don't provide a good sample data, I'm just guessing.
Here is another way may feet your requirements
SELECT Str
FROM
(
VALUES
('ItemCode-1'),
('ItemCode-10'),
('ItemCode-2'),
('ItemCode-20'),
('Item-Code')
) T(Str)
ORDER BY CASE WHEN Str LIKE '%[0-9]' THEN CAST(RIGHT(Str, LEN(Str) - CHARINDEX('-', Str)) AS INT) ELSE 0 END
This is an expected behavior, based on this:
that is lexicographic sorting which means basically the language
treats the variables as strings and compares character by character
You need to use something like this:
ORDER BY
CASE WHEN ItemCode like '%[0-9]%'
THEN Replicate('0', 100 - Len(ItemCode)) + ItemCode
ELSE ItemCode END

Grouping data into fuzzy gaps and islands

This is essentially a gaps and islands problem however it's atypical. I did cut the example down to bare minimum. I need to identify gaps that exceed a certain threshold and duplicates can't be a problem although this example removes them.
In any case the common solution of using ROW_NUMBER() doesn't help since gaps of even 1 can't be handled and the gap value is a parameter in 'real life'.
The code below actually works correctly. And it's super fast! But if you look at it you'll see why people are rather gun shy about relying upon it. The method was first published 9 years ago here http://www.sqlservercentral.com/articles/T-SQL/68467/ and I've read all 32 pages of comments. Nobody has successfully poked holes in it other than to say "it's not documented behavior". I've tried it on every version from 2005 to 2019 and it works.
The question is, beyond using a cursor or while loop to look at many millions of rows 1 by 1 - which takes, I don't know how long because I cancel after 30 min. - is there a 'supported' way to get the same results in a reasonable amount of time? Even 100x slower would complete 4M rows in 10 minutes and I can't find a way to come close to that!
CREATE TABLE #t (CreateDate date not null
,TufpID int not null
,Cnt int not null
,FuzzyGroup int null);
ALTER TABLE #t ADD CONSTRAINT PK_temp PRIMARY KEY CLUSTERED (CreateDate,TufpID);
-- Takes 40 seconds to write 4.4M rows from a source of 70M rows.
INSERT INTO #T
SELECT X.CreateDate
,X.TufpID
,Cnt = COUNT(*)
,FuzzyGroup = null
FROM SessionState SS
CROSS APPLY(VALUES (CAST(SS.CreateDate as date),SS.TestUser_Form_Part_id)) X(CreateDate,TufpID)
GROUP BY X.CreateDate
,X.TufpID
ORDER BY x.CreateDate,x.TufpID;
-- Takes 6 seconds to update 4.4M rows. They WILL update in clustered index order!
-- (Provided all the rules are followed - see the link above)
DECLARE #FuzzFactor int = 38
DECLARE #Prior int = -#FuzzFactor; -- Insure 1st row has it's own group
DECLARE #Group int;
DECLARE #CDate date;
UPDATE #T
SET #Group = FuzzyGroup = CASE WHEN t.TufpID - #PRIOR < #FuzzFactor AND t.CreateDate = #CDate
THEN #Group ELSE t.TufpID END
,#CDate = CASE WHEN #CDate = t.CreateDate THEN #CDate ELSE t.CreateDate END
,#Prior = CASE WHEN #Prior = t.TufpID-1 THEN #Prior + 1 ELSE t.TufpID END
FROM #t t WITH (TABLOCKX) OPTION(MAXDOP 1);
After the above executes the FuzzyGroup column contains the lowest value of TufpID in the group. IOW the first row (in clustered index order) contains the value of it's own TufpID column. Thereafter every row gets the same value until the date changes or a gap size (in this case 38) is exceeded. In those cases the current TufpID becomes the value put into FuzzyGroup until another change is detected. So after 6 seconds I can run queries that group by FuzzyGroup and analyze the islands.
In practice I do some running counts and totals as well in the same pass and so it takes 8 seconds not 6 but I could do those things with window functions pretty easily if I need to so I left them off.
This is the smallest table and I'll eventually need to handle 100M rows. Thus 10 minutes for 4.4M is probably not good enough but it's a place to start.
This should be reasonably efficient and avoid relying on undocumented behaviour
WITH T1
AS (SELECT *,
PrevTufpID = LAG(TufpID)
OVER (PARTITION BY CreateDate
ORDER BY TufpID)
FROM #T),
T2
AS (SELECT *,
_FuzzyGroup = MAX(CASE
WHEN PrevTufpID IS NULL
OR TufpID - PrevTufpID >= #FuzzFactor
THEN TufpID
END)
OVER (PARTITION BY CreateDate
ORDER BY TufpID ROWS UNBOUNDED PRECEDING)
FROM T1)
UPDATE T2
SET FuzzyGroup = _FuzzyGroup
The execution plan has a single ordered scan through the clustered index, with the row values then flowing through some window function operators and into the update.

TSQL query optimizer view on non-nullable ISNULL()

As part of some dynamic SQL (ick), I've implemented the 'sort NULLs last' solution described here: Sorting null-data last in database query
ORDER BY CASE column WHEN NULL THEN 1 ELSE 0 END, column
My question is: On non-nullable columns that have ISNULL() applied to them, will the query optimizer strip this out when it realises that it will never apply?
It's not clear why your question mentions the ISNULL function when that isn't in your code.
ORDER BY CASE column WHEN NULL THEN 1 ELSE 0 END, column
First of all this code doesn't work, it is equivalent to CASE WHEN column = NULL which is not what you need.
It would need to be
ORDER BY CASE WHEN column IS NULL THEN 1 ELSE 0 END, column
The optimisation question is easy to test.
CREATE TABLE #T
(
X INT NOT NULL PRIMARY KEY
)
SELECT *
FROM #T
ORDER BY X
SELECT *
FROM #T
ORDER BY CASE WHEN X IS NULL THEN 1 ELSE 0 END, X
DROP TABLE #T
The plan shows a sort operation in the second plan indicating that this was not optimised out as you hoped and the pattern is less efficient than ORDER BY X.

SELECT DISTINCT CAST (CASE (WHEN X IN)) THEN 1 ELSE 0 END AS BIT) AS INTO Returns Duplicate Records

I am trying to select a list of patients who have a diabetes diagnosis code at any point in the past ~ 1 1/2 years and return either a 1 (patient does have a diabetes diagnosis code) or 0 (patient does not have a diabetes diagnosis code) into a hash table for later use. I am getting the 1's and 0's that I wanted, however for many patients I am returning both a 1 and a 0. I've tried troubleshooting as best I can, but haven't been able to return just a single value per patient. Here is the query I've been using.
IF object_id('tempdb..#t4') IS NOT NULL DROP TABLE #t4
SELECT DISTINCT CAST(
CASE
WHEN (diag_code IN ('250.00','250.01','250.02','250.03','250.10','250.11',
'250.12','250.13','250.20','250.21','250.22','250.23','250.30','250.31',
'250.32','250.33','250.40','250.41','250.42','250.43','250.50','250.51',
'250.52','250.53','250.60','250.61','250.62','250.63','250.70','250.71',
'250.72','250.73','250.80','250.81','250.82','250.83','250.90','250.91',
'250.92','250.93','357.2','362.01','362.02','362.03','362.04','362.05',
'362.06','362.07','366.41','648.00','648.01','648.02','648.03','648.04',
'111552007','111558006','11530004','123763000','127013003','127014009',
'190321005','190328004','190330002','190331003','190336008','190353001',
'190361006','190368000','190369008','190371008','190372001','190383005',
'190389009','190390000','190392008','190406000','190407009','190410002',
'190411003','190412005','190416001','190417004','190418009','190419001',
'190422004','193184006','197605007','198609003','199223000','199227004',
'199229001','199230006','199231005','199234002','201250006','201251005',
'201252003','23045005','230572002','230577008','237599002','237600004',
'237601000','237604008','237613005','237618001','237619009','237627000',
'25907005','26298008','267379000','267380002','2751001','275918005','28032008',
'28453007','290002008','309426007','310387003','311366001','312912001','313435000',
'313436004','314537004','314771006','314772004','314893005','314902007','314903002',
'33559001','34140002','359611005','359638003','359642000','360546002','371087003',
'38542009','39058009','39181008','408539000','408540003','413183008','414890007',
'414906009','420414003','420422005','421750000','421847006','421895002','422183001',
'422228004','422275004','423263001','424736006','424989000','425159004','425442003',
'426705001','426875007','427089005','428896009','42954008','44054006','4627003',
'46635009','50620007','51002006','5368009','54181000','57886004','59079001','5969009',
'70694009','73211009','74263009','75524006','75682002','76751001','81531005','81830002',
'8801005','91352004','9859006','31','E10.36','E11.36','E11.9','E13','E13.32','E13.33',
'E13.34','E13.35','E13.43','O24.42'))
THEN 1
ELSE 0
END AS bit) as DM,
patientid
INTO #t4
FROM patient_diag
WHERE dateofservice >= CONVERT(DATETIME, '01-01-2014')
Is this something that would be better solved using EXISTS? I tried that as well, but struggled to return the desired results.
Is this something that would be better solved using EXISTS?
Yes. If a patient had both codes that are in the list and codes that aren't you'll get both 1s and 0s. EXISTS would work, but a MAX might be faster:
SELECT MAX(CAST(
CASE
WHEN (diag_code IN ...
'E13.34','E13.35','E13.43','O24.42'))
THEN 1
ELSE 0
END AS bit)) as DM,
patientid
INTO #t4
FROM patient_diag
WHERE dateofservice >= CONVERT(DATETIME, '01-01-2014')
GROUP BY patientid
Note that maintenance will be easier (and performance may be better) if you put all of those codes in a separate table rather than hard-coding them in a query. Or, if there's a master codes table, you could add an attribute that is used by the query.

Resources