SQL Server ORDER BY date and nulls last - sql-server

I am trying to order by date. I want the most recent dates coming in first. That's easy enough, but there are many records that are null and those come before any records that have a date.
I have tried a few things with no success:
ORDER BY ISNULL(Next_Contact_Date, 0)
ORDER BY ISNULL(Next_Contact_Date, 999999999)
ORDER BY coalesce(Next_Contact_Date, 99/99/9999)
How can I order by date and have the nulls come in last? The data type is smalldatetime.

smalldatetime has range up to June 6, 2079 so you can use
ORDER BY ISNULL(Next_Contact_Date, '2079-06-05T23:59:00')
If no legitimate records will have that date.
If this is not an assumption you fancy relying on a more robust option is sorting on two columns.
ORDER BY CASE WHEN Next_Contact_Date IS NULL THEN 1 ELSE 0 END, Next_Contact_Date
Both of the above suggestions are not able to use an index to avoid a sort however and give similar looking plans.
One other possibility if such an index exists is
SELECT 1 AS Grp, Next_Contact_Date
FROM T
WHERE Next_Contact_Date IS NOT NULL
UNION ALL
SELECT 2 AS Grp, Next_Contact_Date
FROM T
WHERE Next_Contact_Date IS NULL
ORDER BY Grp, Next_Contact_Date

According to Itzik Ben-Gan, author of T-SQL Fundamentals for MS SQL Server 2012, "By default, SQL Server sorts NULL marks before non-NULL values. To get NULL marks to sort last, you can use a CASE expression that returns 1 when the" Next_Contact_Date column is NULL, "and 0 when it is not NULL. Non-NULL marks get 0 back from the expression; therefore, they sort before NULL marks (which get 1). This CASE expression is used as the first sort column." The Next_Contact_Date column "should be specified as the second sort column. This way, non-NULL marks sort correctly among themselves." Here is the solution query for your example for MS SQL Server 2012 (and SQL Server 2014):
ORDER BY
CASE
WHEN Next_Contact_Date IS NULL THEN 1
ELSE 0
END, Next_Contact_Date;
Equivalent code using IIF syntax:
ORDER BY
IIF(Next_Contact_Date IS NULL, 1, 0),
Next_Contact_Date;

order by -cast([Next_Contact_Date] as bigint) desc

If your SQL doesn't support NULLS FIRST or NULLS LAST, the simplest way to do this is to use the value IS NULL expression:
ORDER BY Next_Contact_Date IS NULL, Next_Contact_Date
to put the nulls at the end (NULLS LAST) or
ORDER BY Next_Contact_Date IS NOT NULL, Next_Contact_Date
to put the nulls at the front. This doesn't require knowing the type of the column and is easier to read than the CASE expression.
EDIT: Alas, while this works in other SQL implementations like PostgreSQL and MySQL, it doesn't work in MS SQL Server. I didn't have a SQL Server to test against and relied on Microsoft's documentation and testing with other SQL implementations. According to Microsoft, value IS NULL is an expression that should be usable just like any other expression. And ORDER BY is supposed to take expressions just like any other statement that takes an expression. But it doesn't actually work.
The best solution for SQL Server therefore appears to be the CASE expression.

A bit late, but maybe someone finds it useful.
For me, ISNULL was out of question due to the table scan. UNION ALL would need me to repeat a complex query, and due to me selecting only the TOP X it would not have been very efficient.
If you are able to change the table design, you can:
Add another field, just for sorting, such as Next_Contact_Date_Sort.
Create a trigger that fills that field with a large (or small) value, depending on what you need:
CREATE TRIGGER FILL_SORTABLE_DATE ON YOUR_TABLE AFTER INSERT,UPDATE AS
BEGIN
SET NOCOUNT ON;
IF (update(Next_Contact_Date)) BEGIN
UPDATE YOUR_TABLE SET Next_Contact_Date_Sort=IIF(YOUR_TABLE.Next_Contact_Date IS NULL, 99/99/9999, YOUR_TABLE.Next_Contact_Date_Sort) FROM inserted i WHERE YOUR_TABLE.key1=i.key1 AND YOUR_TABLE.key2=i.key2
END
END

Use desc and multiply by -1 if necessary. Example for ascending int ordering with nulls last:
select *
from
(select null v union all select 1 v union all select 2 v) t
order by -t.v desc

I know this is old but this is what worked for me
Order by Isnull(Date,'12/31/9999')

I think I found a way to show nulls in the end and still be able to use indexes for sorting.
The idea is super simple - create a calculatable column which will be based on existing column, and put an index on it.
ALTER TABLE dbo.Users
ADD [FirstNameNullLast]
AS (case when [FirstName] IS NOT NULL AND (ltrim(rtrim([FirstName]))<>N'' OR [FirstName] IS NULL) then [FirstName] else N'ZZZZZZZZZZ' end) PERSISTED
So, we are creating a persisted calculatable column in the SQL, in that column all blank and null values will be replaced by 'ZZZZZZZZ', this will mean, that if we will try to sort based on that column, we will see all the null or blank values in the end.
Now we can use it in our new index.
Like this:
CREATE NONCLUSTERED INDEX [IX_Users_FirstNameNullLast] ON [dbo].[Users]
(
[FirstNameNullLast] ASC
)
So, this is an ordinary nonclustered index. We can change it however we want, i.e. include extra columns, increase number of indexes columns, change sorting order etc.

I know this is a old thread, but in SQL Server nulls are always lower than non-null values. So it's only necessary to order by Desc
In your case Order by Next_Contact_Date Desc should be enough.
Source: order by with nulls- LearnSql

Related

How can I check and remove duplicate rows?

Have problem with quite big table, where are some null values in 3 columns - datetime2 (and 2 float columns).
Nice simple request from similar question returns only 2 rows where datetime2 is null, but nothing else (same as lot of others):
DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, allRemainingCols
FROM MyTable
GROUP BY allRemainingCols
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
Seems to work without datetime2 column having nulls ??
There is manual workaround, but is there any way to create request or procedure using TSQL only ?
SELECT id,remainingColumns
FROM table
order BY remainingColumns
Compare all columns in XL (15 in my case, placed =ROW() in first column as a check and formula next to last column + auto filter for TRUEs): =AND(B1=B2;C1=C2;D1=D2;E1=E2;F1=F2;G1=G2;H1=H2;I1=I2;J1=J2;K1=K2;L1=L2;M1=M2;N1=N2;O1=O2;P1=P2)
Or compare 3 rows like this and select all non-unique rows
=OR(
AND(B1=B2;C1=C2;D1=D2;E1=E2;F1=F2;G1=G2;H1=H2;I1=I2;J1=J2;K1=K2;L1=L2;M1=M2;N1=N2;O1=O2;P1=P2);
AND(B2=B3;C2=C3;D2=D3;E2=E3;F2=F3;G2=G3;H2=H3;I2=I3;J2=J3;K2=K3;L2=L3;M2=M3;N2=N3;O2=O3;P2=P3)
)
Quite much work to find my particular data/answer...
Most of float numbers were slightly different.
Hard to find, but simple CAST(column as binary) can show these invisible differences...
Like 96,6666666666667 vs 0x0000000000000000000000000000000000000000000040582AAAAAAAAAAD vs 0x0000000000000000000000000000000000000000000040582AAAAAAAAAAB etc.
And visible 96.6666666666667 can return something different way again:
0x0000000000000000000000000000000000000F0D0001AB6A489F2D6F0300

Using indexes when comparing datetimes

I have two tables, both of which containing millions of rows of data.
tbl_one:
purchasedtm DATETIME,
userid INT,
totalcost INT
tbl_two:
id BIGINT,
eventdtm DATETIME,
anothercol INT
The first table has a clustered index on the first two columns: CLUSTERED INDEX tbl_one_idx ON(purchasedtm, userid)
The second one has a primary key on its ID column, and also a non-clustered index on the eventdtm column.
I want to run a query which looks for rows in which purchasedtm and eventdtm are on the same day.
Originally, I wrote my query as:
WHERE CAST(tbl_one.purchasedtm AS DATE) = CAST(tbl_two.eventdtm AS DATE)
But this was not going to use either of the two indexes.
Later, I changed my query to this:
WHERE tbl_one.purchasedtm >= CAST(tbl_two.eventdtm AS DATE)
AND tbl_one.purchasedtm < DATEADD(DAY, 1, CAST(tbl_two.eventdtm AS DATE))
This way, because only one side of the comparison is wrapped in a function, the other side can still use its index. Correct?
I also have some additional questions:
I can write the query the other way around too, i.e. keeping tbl_two.eventdtm untouched and wrapping tbl_one.purchasedtm in CAST(). Would that make a difference in performance?
If the answer to the previous question is yes is it because eventdtm has its own dedicated index, while looking up purcahsedtm would only be a partial index match?
Are there other factors I can take into consideration for deciding which of the two choices is better? (For example, if there are millions of rows in tbl_one but billions of rows in tbl_two, would that impact which column I should CAST and which one I should not?)
In genera, if you compare two columns that are both indexed, would we gain any performance compared to a similar scenario in which only one of them is indexed?
And lastly, can I perform my original task without using CAST?
Note: I do not have the ability to create or modify indexes, add columns, etc.
Little. late after commenting but...
As discussed in the comments, code such as CAST(DateTimeColumn AS date) is actually SARGable. Rob Farley posted an article on some of the SARGable and non-SARGable functionality here, however, I'll cover a few things off anyway.
Firstly, applying a function to a column will normally make your query non-SARGable, and especially if it changes the order of the values or the order of them is meaningless. Take something like:
SELECT *
FROM TABLE
WHERE RIGHT(COLUMN,5) = 'value';
The order of the values in the column are utterly unhelpful here, as we're focusing on the right hand characters. Unfortunately, as Rob also discusses:
SELECT *
FROM TABLE
WHERE LEFT(COLUMN,5) = 'value';
This is also non-SARGable. However what about the following?
SELECT *
FROM TABLE
WHERE Column LIKE 'value%';
This is, as the logic isn't applied to the column and the order doesn't change. If the value wehre '%value%' then that too would be non-SARGable.
When applying logic that adds (or subtracts) what you want to find, you always want to apply that to the literal value (or function, like GETDATE()`). For example one of these expressions is SARGable the other is not:
Column + 1 = #Variable --non-SARGable
Column = #Variable - 1 --SARGable
The same applies to things like DATEADD
#DateVariable BETWEEN DateColumn AND DATEADD(DAY, 30,DateColumn) --non-SARGable
DateColumn BETWEEN DATEADD(DAY, -30, #DateVariable) AND #DateVariable --SARGable
Changing the datatype (other than to a date) rarely will keep a query SARGable. CONVERT(date,varchardate,112) will not be SARGable, even though the order of the column is unchanged. Converting an decimal to an int, however, had the same result as converting a datetime to a date, and kept SARGability:
CREATE TABLE testtab (n decimal(2,1) PRIMARY KEY CLUSTERED);
INSERT INTO testtab
VALUES(0.1),
(0.3),
(1.1),
(1.7),
(2.4);
GO
SELECT n
FROM testtab
WHERE CONVERT(int,n) = 2;
GO
DROP TABLE testtab;
Hopefully, that gives you enough to go on, but pelase do ask if you want me to add anything further.

SQL Server : Select ... From with FULL JOIN with a Default value for a column

I created a table, tblNewParts with 3 columns:
NewCustPart
AddedDate
Handled
and I am trying to FULL JOIN it to an existing table, tblPartsWorkedOn.
tblNewParts is defined to have Handled defaulted to 'N'...
SELECT *
FROM dbo.tblPartsWorkedOn AS BASE
FULL JOIN dbo.tblNewParts AS ADDON ON BASE.[CustPN] = ADDON.[NewCustPart]
WHERE ADDON.[Handled] IS NULL
ORDER BY [CustPN] DESC
And I want the field [Handled] to come back as 'N' instead of NULL when I run the query. The problem is that when there aren't any records in the new table, I get NULL's instead of 'N's.
I saw a SELECT CASE WHEN col1 IS NULL THEN defaultval ELSE col1 END as a mostly suitable answer from here. I am wondering if this will work in this instance, and how would I write that in T-SQL for SQL Server 2012? I need all of the columns from both tables, rather than just the one.
I'm making this a question, rather than a comment on the cited link, so as to not obscure the original link's question.
Thank you for helping!
Name the column (alias.column_name) in select statement and use ISNULL(alias.column,'N').
Thanks
After many iterations I found the answer, it's kind of bulky but here it is anyway. Synopsis:
Yes, the CASE statement does work, but it gives the output as an unnamed column. Also, in this instance to get all of the original columns AND the corrected column, I had to use SELECT *, CASE...END as [ColumnName].
But, here is the better solution, as it will place the information into the correct column, rather than adding a column to the end of the table and calling that column 'Unnamed Column'.
Select [ID], [Seq], [Shipped], [InternalPN], [CustPN], [Line], [Status],
CASE WHEN ADDON.[NewCustPart] IS NULL THEN BASE.[CustPN] ELSE
ADDON.[NewCustomerPart] END as [NewCustPart],
GetDate() as [AddedDate],
CASE WHEN ADDON.[Handled] IS NULL THEN 'N' ELSE ADDON.[Handled] END as [Handled]
from dbo.tblPartsWorkedOn as BASE
full join dbo.tblNewParts as AddOn ON Base.[CustPN] = AddOn.NewCustPart
where AddOn.Handled = 'N' or AddOn.Handled is null
order by [NewCustPart] desc
This sql code places the [CustPN] into [NewCustPart] if it's null, it puts a 'N' into the field [Handled] if it's null and it assigns the date to the [AddedDate] field. It also only returns records that have not been handled, so that you get the ones that need to be looked at; and it orders the resulting output by the [NewCustPart] field value.
Resulting Output looks something like this: (I shortened the DateTime for the output here.)
[ID] [SEQ] [Shipped] [InternalPN] [CustPN] [Status] [NewCustPart] [AddedDate] [Handled]
1 12 N 10012A 10012A UP 10012A 04/02/2016 N
...
Rather than with the nulls:
[ID] [SEQ] [Shipped] [InternalPN] [CustPN] [Status] [NewCustPart] [AddedDate] [Handled]
1 12 N 10012A 10012A UP NULL NULL NULL
...
I'm leaving this up, and just answering it rather than deleting it, because I am fairly sure that someone else will eventually ask this same question. I think that lots of examples showing how and why something is done, is a very helpful thing to have as not everything can be generalized. Just some thoughts and I hope that this helps someone else!

Optimize SQL in MS SQL Server that returns more than 90% of records in the table

I have the below sql
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE ( BP.IsDeleted IS NULL
OR BP.IsDeleted = 0 )
GROUP BY BP.BoundProjectId
I already have an index on the table BoundProducts on this column order (BoundProjectId, IsDeleted)
Currently this query takes around 2-3 seconds to return the result. I am trying to reduce it to zero seconds.
This query returns 25077 rows as of now.
Please provide me any ideas to improvise the query.
Looking at this in a bit different point of view, I can think that your OR condition is screwing up your query, why not to rewrite it like this?
SELECT CAST(FORMAT(SUM(COALESCE(BP.InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet
, BP.BoundProjectId AS ProjectId
FROM (
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted IS NULL
UNION ALL
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted = 0
) AS BP
GROUP BY BP.BoundProjectId;
I've had better experience with UNION ALL rather than OR.
I think it should work totally the same. On top of that, I'd create this index:
CREATE NONCLUSTERED INDEX idx_BoundProducts_IsDeleted_BoundProjectId_iInstalledSubTotal
ON dbo.BoundProducts (IsDeleted, BoundProjectId)
INCLUDE (InstalledSubTotal);
It should satisfy your query conditions and seek index quite well. I know it's not a good idea to index bit fields, but it's worth trying.
P.S. Why not to default your IsDeleted column value to 0 and make it NOT NULLABLE? By doing that, it should be enough to do a simple check WHERE IsDeleted = 0, that'd boost your query too.
If you really want to try index seek, it should be possible using query hint forceseek, but I don't think it's going to make it any faster.
The options I suggested last time are still valid, remove format and / or create an indexed view.
You should also test if the problem is the query itself or just displaying the results after that, for example trying it with "select ... into #tmp". If that's fast, then the problem is not the query.
The index name in the screenshot is not the same as in create table statement, but I assume that's just a name you changed for the question. If the scan is happening to another index, then you should include that too.

SQL with unknown purpose

I have to change some SQL queries (SQL Server 2005) done by another person and in that code I see often the following construction:
SELECT fieldA, SUM(CASE fieldB WHEN null THEN 0 ELSE fieldB END) as AliasName FROM ...
I don't understand the case statement because as far as I know, null can not be checked within a case and therefore I think that the above code does the same as:
SELECT fieldA, SUM(fieldB) as AliasName FROM ...
I have also done some tests and have not seen any differences in the result. Am I missing something, or can I replace the upper statement through the short one?
UPDATE
Only for completeness because it's not mentioned in the answers: The upper code returns the same result as the lower. The used case construction does not replace null's through zero's and therefore it can be ommited. If the purpose of the original sql was to make sure that never null will be returned, the coalesce or the isnull-operator can be used (as stated in the answers).
The output of your second statement will contain nulls (when aggregating records that only have null values for fieldB). If you don't mind that, you're ok.
If you want zeros in your output rather than null values, use this:
select fieldA, sum(isnull(fieldB, 0)) as AliasName from ...
You would achieve this more readably with
SELECT fieldA, COALESCE(fieldB, 0) as AliasName

Resources