determine order of operations in query - sql-server

Say I have a query like this:
SELECT *
FROM Foo
WHERE Name IN ('name1', 'name2')
AND (Date<'2013-01-01' AND Date>'2010-01-01')
AND Type = 1
Is there a way to force the SQL server to evaluate the expressions in the order I determine and not what the query optimizer says? For example I want the IN clause evaluated first, the output of that evaluated by Type = 1 and finally the dates, in EXACTLY that order.

Yes it is largely possible (though there are some caveats and counter examples discussed in the answers here)
SELECT *
FROM Foo
WHERE 1 = CASE
WHEN Name IN ( 'name1', 'name2' ) THEN
CASE
WHEN Type = 1 THEN
CASE
WHEN ( Date < '2013-01-01'
AND Date > '2010-01-01' ) THEN 1
END
END
END
But why bother? There are only very limited circumstances in which I can see this would be useful (e.g. preventing divide by zero if an earlier predicate evaluated to 0).
Wrapping the predicates up like this makes the query completely unsargable and prevents index usage for any of the three (otherwise sargable) predicates. It guarantees a full scan reading all rows.
To see an example of this
CREATE TABLE Foo
(
Id INT IDENTITY PRIMARY KEY,
Name VARCHAR(10),
[Date] DATE,
[Type] TINYINT,
Filler CHAR(8000) NULL
)
CREATE NONCLUSTERED INDEX IX_Name
ON Foo(Name)
CREATE NONCLUSTERED INDEX IX_Date
ON Foo(Date)
CREATE NONCLUSTERED INDEX IX_Type
ON Foo(Type)
INSERT INTO Foo
(Name,
[Date],
[Type])
SELECT TOP (100000) 'name' + CAST(0 + CRYPT_GEN_RANDOM(1) AS VARCHAR),
DATEADD(DAY, 7 * CRYPT_GEN_RANDOM(1), '2012-01-01'),
0 + CRYPT_GEN_RANDOM(1)
FROM master..spt_values v1,
master..spt_values v2
Then running the original query in the question vs this query gives plans
Note the second query is costed as being 100% of the cost of the batch.
The Query optimizer left to its own devices first seeks into the 414 rows matching the type predicate and uses that as a build input for the hash table. It then seeks into the 728 rows matching the name, sees if it matches anything in the hash table and for the 4 that do it performs a key lookup for the other columns and evaluates the Date predicate against those. Finally it returns the single matching row.
The second query just ploughs through all the rows in the table and evaluates the predicates in the desired order. The difference in number of pages read is pretty significant.
Original Query
Table 'Foo'. Scan count 3, logical reads 23,
Table 'Worktable'. Scan count 0, logical reads 0
Nested case
Table 'Foo'. Scan count 1, logical reads 100373

Short answer: NO!
You can try to use brackets, hints, study query plan, etc.
But is that wise to mess up with engine/optimizer that way?
You ill need a lot of study and experience to outsmart the optimizer, that said, please let the engine take care of that details for you.

Related

SQL Server index on optional columns

In my scenario i have a table with a lot of optional columns (20 columns in total, say form col00 to col19, every column contain an integer not nullable).
When the column contains a 0 it's considered empty any other values have a meaning.
Any subset of that 20 columns could be queried, so i should query for col01 = int1 and col17 = int2.
I need to improve the performance of such queries, but i don't know how to create a representative index.
Surely i can monitor table for a while and see which columns subset are searchest most, but this is not a satisfiable solution to me (the table is periodically regenerated every few months..and the "tags" encoded that way may change)
I think the best you'll be able to do is to index every column by itself, then use the set operator INTERSECT... in a subquery of your where clause.
INTERSECT returns distinct rows that are output by both the left and right input queries operator. So if you select the primary key of the table in the INTERSECT then you should have a good subquery that can be used in a where-clause. This will require you to re-write your queries however.
Example:
SELECT *
FROM tablename
WHERE primary_key = (
SELECT primary_key FROM tablename WHERE col01 = int1
INTERSECT
SELECT primary_key FROM tablename WHERE col17 = int2
)
That should be sargable, if col01 and col17 have their own index.

Using SQLServer contains for partial words

We are running many products search on a huge catalog with partially matched barcodes.
We started with a simple like query
select * from products where barcode like '%2345%'
But that takes way too long since it requires a full table scan.
We thought a fulltext search will be able to help us here using contains.
select * from products where contains(barcode, '2345')
But, it seems like contains doesn't support finding words that partially contains a text but, only full a word match or a prefix. (But in this example we're looking for '123456').
My answer is: #DenisReznik was right :)
ok, let's take a look.
I have worked with barcodes and big catalogs for many years and I was curious about this question.
So I have made some tests on my own.
I have created a table to store test data:
CREATE TABLE [like_test](
[N] [int] NOT NULL PRIMARY KEY,
[barcode] [varchar](40) NULL
)
I know that there are many types of barcodes, some contains only numbers, other contains also letters, and other can be even much complex.
Let's assume our barcode is a random string.
I have filled it with 10 millions records of random alfanumeric data:
insert into like_test
select (select count(*) from like_test)+n, REPLACE(convert(varchar(40), NEWID()), '-', '') barcode
from FN_NUMBERS(10000000)
FN_NUMBERS() is just a function I use in my DBs (a sort of tally_table)
to get records quick.
I got 10 million records like that:
N barcode
1 1C333262C2D74E11B688281636FAF0FB
2 3680E11436FC4CBA826E684C0E96E365
3 7763D29BD09F48C58232C7D33551E6C9
Let's declare a var to search for:
declare #s varchar(20) = 'D34F15' -- random alfanumeric string
Let's take a base try with LIKE to compare results to:
select * from like_test where barcode like '%'+#s+'%'
On my workstation it takes 24.4 secs for a full clustered index scan. Very slow.
SSMS suggests to add an index on barcode column:
CREATE NONCLUSTERED INDEX [ix_barcode] ON [like_test] ([barcode]) INCLUDE ([N])
500Mb of index, I retry the select, this time 24.0 secs for the non clustered index seek.. less than 2% better, almost the same result. Very far from the 75% supposed by SSMS. It seems to me this index really doesn't worth it. Maybe my SSD Samsung 840 is making the difference..
For the moment I let the index active.
Let's try the CHARINDEX solution:
select * from like_test where charindex(#s, barcode) > 0
This time it took 23.5 second to complete, not really so much better than LIKE.
Now let's check the #DenisReznik 's suggestion that using the Binary Collation should speed up things.
select * from like_test
where barcode collate Latin1_General_BIN like '%'+#s+'%' collate Latin1_General_BIN
WOW, it seems to work! Only 4.5 secs this is impressive! 5 times better..
So, what about CHARINDEX and Collation toghether? Let's try it:
select * from like_test
where charindex(#s collate Latin1_General_BIN, barcode collate Latin1_General_BIN)>0
Unbelivable! 2.4 secs, 10 times better..
Ok, so far I have realized that CHARINDEX is better than LIKE, and that Binary Collation is better than normal string collation, so from now on I will go on only with CHARINDEX and Collation.
Now, can we do anything else to get even better results? Maybe we can try reduce our very long strings.. a scan is always a scan..
First try, a logical string cut using SUBSTRING to virtually works on barcodes of 8 chars:
select * from like_test
where charindex(
#s collate Latin1_General_BIN,
SUBSTRING(barcode, 12, 8) collate Latin1_General_BIN
)>0
Fantastic! 1.8 seconds.. I have tried both SUBSTRING(barcode, 1, 8) (head of the string) and SUBSTRING(barcode, 12, 8) (middle of the string) with same results.
Then I have tried to phisically reduce the size of the barcode column, almost no difference than using SUBSTRING()
Finally I have tried to drop the index on barcode column and repeated ALL above tests...
I was very surprised to get almost same results, with very little differences.
Index performs 3-5% better, but at cost of 500Mb of disk space and and maintenance cost if the catalog is updated.
Naturally, for a direct key lookup like where barcode = #s with the index it takes 20-50 millisecs, without index we can't get less than 1.1 secs using Collation syntax where barcode collate Latin1_General_BIN = #s collate Latin1_General_BIN
This was interesting.
I hope this helps
I often use charindex and just as often have this very debate.
As it turns out, depending on your structure you may actually have a substantial performance boost.
http://cc.davelozinski.com/sql/like-vs-substring-vs-leftright-vs-charindex
The good option here for your case - creating your FTS index. Here is how it could be implemented:
1) Create table Terms:
CREATE TABLE Terms
(
Id int IDENTITY NOT NULL,
Term varchar(21) NOT NULL,
CONSTRAINT PK_TERMS PRIMARY KEY (Term),
CONSTRAINT UK_TERMS_ID UNIQUE (Id)
)
Note: index declaration in the table definition is a feature of 2014. If you have a lower version, just bring it out of CREATE TABLE statement and create separately.
2) Cut barcodes to grams, and save each of them to a table Terms. For example: barcode = '123456', your table should have 6 rows for it: '123456', '23456', '3456', '456', '56', '6'.
3) Create table BarcodeIndex:
CREATE TABLE BarcodesIndex
(
TermId int NOT NULL,
BarcodeId int NOT NULL,
CONSTRAINT PK_BARCODESINDEX PRIMARY KEY (TermId, BarcodeId),
CONSTRAINT FK_BARCODESINDEX_TERMID FOREIGN KEY (TermId) REFERENCES Terms (Id),
CONSTRAINT FK_BARCODESINDEX_BARCODEID FOREIGN KEY (BarcodeId) REFERENCES Barcodes (Id)
)
4) Save a pair (TermId, BarcodeId) for the barcode into the table BarcodeIndex. TermId was generated on the second step or exists in the Terms table. BarcodeId - is an identifier of the barcode, stored in Barcodes (or whatever name you use for it) table. For each of the barcodes, there should be 6 rows in the BarcodeIndex table.
5) Select barcodes by their parts using the following query:
SELECT b.* FROM Terms t
INNER JOIN BarcodesIndex bi
ON t.Id = bi.TermId
INNER JOIN Barcodes b
ON bi.BarcodeId = b.Id
WHERE t.Term LIKE 'SomeBarcodePart%'
This solution force all similar parts of barcodes to be stored nearby, so SQL Server will use Index Range Scan strategy to fetch data from the Terms table. Terms in the Terms table should be unique to make this table as small as possible. This could be done in the application logic: check existence -> insert new if a term doesn't exist. Or by setting option IGNORE_DUP_KEY for clustered index of the Terms table. BarcodesIndex table is used to reference Terms and Barcodes.
Please note that foreign keys and constraints in this solution are the points of consideration. Personally, I prefer to have foreign keys, until they hurt me.
After further testing and reading and talking with #DenisReznik I think the best option could be to add virtual columns to barcode table to split barcode.
We only need columns for start positions from 2nd to 4th because for the 1st we will use original barcode column and the last I think it is not useful at all (what kind of partial match is 1 char on 6 when 60% of records will match?):
CREATE TABLE [like_test](
[N] [int] NOT NULL PRIMARY KEY,
[barcode] [varchar](6) NOT NULL,
[BC2] AS (substring([BARCODE],(2),(5))),
[BC3] AS (substring([BARCODE],(3),(4))),
[BC4] AS (substring([BARCODE],(4),(3))),
[BC5] AS (substring([BARCODE],(5),(2)))
)
and then to add indexes on this virtual columns:
CREATE NONCLUSTERED INDEX [IX_BC2] ON [like_test2] ([BC2]);
CREATE NONCLUSTERED INDEX [IX_BC3] ON [like_test2] ([BC3]);
CREATE NONCLUSTERED INDEX [IX_BC4] ON [like_test2] ([BC4]);
CREATE NONCLUSTERED INDEX [IX_BC5] ON [like_test2] ([BC5]);
CREATE NONCLUSTERED INDEX [IX_BC6] ON [like_test2] ([barcode]);
now we can simply find partial matches with this query
declare #s varchar(40)
declare #l int
set #s = '654'
set #l = LEN(#s)
select N from like_test
where 1=0
OR ((barcode = #s) and (#l=6)) -- to match full code (rem if not needed)
OR ((barcode like #s+'%') and (#l<6)) -- to match strings up to 5 chars from beginning
or ((BC2 like #s+'%') and (#l<6)) -- to match strings up to 5 chars from 2nd position
or ((BC3 like #s+'%') and (#l<5)) -- to match strings up to 4 chars from 3rd position
or ((BC4 like #s+'%') and (#l<4)) -- to match strings up to 3 chars from 4th position
or ((BC5 like #s+'%') and (#l<3)) -- to match strings up to 2 chars from 5th position
this is HELL fast!
for search strings of 6 chars 15-20 milliseconds (full code)
for search strings of 5 chars 25 milliseconds (20-80)
for search strings of 4 chars 50 milliseconds (40-130)
for search strings of 3 chars 65 milliseconds (50-150)
for search strings of 2 chars 200 milliseconds (190-260)
There will be no additional space used for table, but each index will take up to 200Mb (for 1 million barcodes)
PAY ATTENTION
Tested on a Microsoft SQL Server Express (64-bit) and Microsoft SQL Server Enterprise (64-bit) the optimizer of the latter is slight better but the main difference is that:
on express edition you have to extract ONLY the primary key when searching your string, if you add other columns in the SELECT, the optimizer will not use indexes anymore but it will go for full clustered index scan so you will need something like
;with
k as (-- extract only primary key
select N from like_test
where 1=0
OR ((barcode = #s) and (#l=6))
OR ((barcode like #s+'%') and (#l<6))
or ((BC2 like #s+'%') and (#l<6))
or ((BC3 like #s+'%') and (#l<5))
or ((BC4 like #s+'%') and (#l<4))
or ((BC5 like #s+'%') and (#l<3))
)
select N
from like_test t
where exists (select 1 from k where k.n = t.n)
on standard (enterprise) edition you HAVE to go for
select * from like_test -- take a look at the star
where 1=0
OR ((barcode = #s) and (#l=6))
OR ((barcode like #s+'%') and (#l<6))
or ((BC2 like #s+'%') and (#l<6))
or ((BC3 like #s+'%') and (#l<5))
or ((BC4 like #s+'%') and (#l<4))
or ((BC5 like #s+'%') and (#l<3))
You do not include many constraints, which means you want to search for string in a string -- and if there was a way to optimized an index to search a string in a string, it would be just built in!
Other things that make it hard to give a specific answer:
It's not clear what "huge" and "too long" mean.
It's not clear as to how your application works. Are you searching in batch as you add a 1,000 new products? Are you allowing a user to enter a partial barcode in a search box?
I can make some suggestions that may or may not be helpful in your case.
Speed up some of the queries
I have a database with lots of licence plates; sometimes an officer wants to search by the last 3-characters of the plate. To support this I store the license plate in reverse, then use LIKE ('ZYX%') to match ABCXYZ. When doing the search, they have the option of a 'contains' search (like you have) which is slow, or an option of doing 'Begins/Ends with' which is super because of the index. This would solve your problem some of the time (which may be good enough), especially if this is a common need.
Parallel Queries
An index works because it organizes data, an index cannot help with a string within a string because there is no organization. Speed seems to be your focus of optimization, so you could store/query your data in a way that searches in parallel. Example: if it takes 10-seconds to sequentially search 10-million rows, then having 10-parallel processes (so process searches 1-million) will take you from 10-seconds to 1-second (kind'a-sort'a). Think of it as scaling out. There are various options for this, within your single SQL Instance (try data partitioning) or across multiple SQL Servers (if that's an option).
BONUS: If you're not on a RAID setup, that can help with reads since it's a effectively of reading in parallel.
Reduce a bottleneck
One reason searching "huge" datasets take "too long" is because all that data needs to be read from the disk, which is always slow. You can skip-the-disk, and use InMemory Tables. Since "huge" isn't defined, this may not work.
UPDATED:
We know from that FULL-TEXT searches can be used for the following:
Full-Text Search -
MSDN
One or more specific words or phrases (simple term)
A word or a phrase where the words begin with specified text (prefix term)
Inflectional forms of a specific word (generation term)
A word or phrase close to another word or phrase (proximity term)
Synonymous forms of a specific word (thesaurus)
Words or phrases using weighted values (weighted term)
Are any of these fulfilled by your query requirements? If you are having to search for patterns as you described, without an consistent pattern (such as '1%'), then there may not be a way for SQL to use a SARG.
You could use Boolean statements
Coming from a C++ perspective, B-Trees are accessed from Pre-Order, In-Order, and Post-Order traversals and utilize Boolean statements to search the B-Tree. Processed much faster than string comparisons, booleans offer at the least an improved performance.
We can see this in the following two options:
PATINDEX
Only if your column is not numeric, as PATINDEX is designed for strings.
Returns an integer (like CHARINDEX) which is easier to process than strings.
CHARINDEX is a solution
CHARINDEX has no problem searching INTs and again, returns a number.
May require some extra cases built in (i.e. first number is always ignored), but you can add them like so: CHARINDEX('200', barcode) > 1.
Proof of what I am saying, let us go back to the old [AdventureWorks2012].[Production].[TransactionHistory]. We have TransactionID which contains the number of the items we want, and lets for fun assume you want every transactionID that has 200 at the end.
-- WITH LIKE
SELECT TOP 1000 [TransactionID]
,[ProductID]
,[ReferenceOrderID]
,[ReferenceOrderLineID]
,[TransactionDate]
,[TransactionType]
,[Quantity]
,[ActualCost]
,[ModifiedDate]
FROM [AdventureWorks2012].[Production].[TransactionHistory]
WHERE TransactionID LIKE '%200'
-- WITH CHARINDEX(<delimiter>, <column>) > 3
SELECT TOP 1000 [TransactionID]
,[ProductID]
,[ReferenceOrderID]
,[ReferenceOrderLineID]
,[TransactionDate]
,[TransactionType]
,[Quantity]
,[ActualCost]
,[ModifiedDate]
FROM [AdventureWorks2012].[Production].[TransactionHistory]
WHERE CHARINDEX('200', TransactionID) > 3
Note CHARINDEX removes the value 200200 in the search, so you may need to adjust your code appropriately. But look at the results:
Clearly, booleans and numbers are faster comparisons.
LIKE uses string comparisons, which again is much slower to process.
I was a bit surprised at the size of the difference, but the fundamentals are the same. Integers and Boolean statements are always faster to process than string comparisons.
I'm late to the game but here's another way to get a full-text like index in the spirit of #MtwStark's second answer.
This is a solution using a search table join
drop table if exists #numbers
select top 10000 row_number() over(order by t1.number) as n
into #numbers
from master..spt_values t1
cross join master..spt_values t2
drop table if exists [like_test]
create TABLE [like_test](
[N] INT IDENTITY(1,1) not null,
[barcode] [varchar](40) not null,
constraint pk_liketest primary key ([N])
)
insert into dbo.like_test (barcode)
select top (1000000) replace(convert(varchar(40), NEWID()), '-', '') barcode
from #numbers t,#numbers t2
drop table if exists barcodesearch
select distinct ps.n, trim(substring(ps.barcode,ty.n,100)) as searchstring
into barcodesearch
from like_test ps
inner join #numbers ty on ty.n < 40
where len(ps.barcode) > ty.n
create clustered index idx_barcode_search_index on barcodesearch (searchstring)
The final search should look like this:
declare #s varchar(20) = 'D34F15'
select distinct lt.* from dbo.like_test lt
inner join barcodesearch bs on bs.N = lt.N
where bs.searchstring like #s+'%'
If you have the option of full-text searching, you can speed this up even further by adding the full-text search column directly to the barcode table
drop table if exists #liketestupdates
select n, string_agg(searchstring, ' ')
within group (order by reverse(searchstring)) as searchstring
into #liketestupdates
from barcodesearch
group by n
alter table dbo.like_test add search_column varchar(559)
update lt
set search_column = searchstring
from like_test lt
inner join #liketestupdates lu on lu.n = lt.n
CREATE FULLTEXT CATALOG ftcatalog as default;
create fulltext index on dbo.like_test ( search_column )
key index pk_liketest
The final full-text search would look like this:
declare #s varchar(20) = 'D34F15'
set #s = '"*' + #s + '*"'
select n,barcode from dbo.like_test where contains(search_column, #s)
I understand that Estimated Costs aren't the best measure of expected performance but the number's aren't wildly off here.
With the search table join, the Estimated Subtree Cost is 2.13
With the full-text search, the Estimated Subtree Cost is 0.008
Full-text is aimed for bigger texts, let's say texts with more than about 100 chars. You can use LIKE '%string%'. (However it depends how the barcode column is defined.) Do you have an index for barcode? If not, then create one and it will improve your query.
First make the index on column on which you have to put as where clause .
Secondly for the datatype of the column which are used in where clause make them as Char in place of Varchar which will save you some space,in the table and in the indexes that will include that column.
varchar(1) column needs one more byte over char(1)
Do pull only the number of columns you need try to avoid * , be specific to number of columns you wish to select.
Don't write as
select * from products
In place of it write as
Select Col1, Col2 from products with (Nolock)

Indexing and optimization of where clause based on datetime field

I have a database with more than a million of rowset data. When I execute this query it takes hours, mostly due to pageIOLatch_sh. There are currently no indexing. Can you suggest the possible indexing in where clause. I believe it should be on datetime as it is used in where as well as order by , if so which index to use.
if(<some condition>)
BEGIN
select <some columns>
From <some tables with joins(no lock)>
WHERE
((#var2 IS NULL AND a.addr IS NOT NULL)OR
(a.addr LIKE #var2 + '%')) AND
((#var3 IS NULL AND a.ca_id IS NOT NULL) OR
(a.ca_id = #var3)) AND
b.time >= #from_datetime AND b.time <= #to_datetime AND
(
(
b.shopping_product IN ('CX12343', 'BG8945', 'GF4543') AND
b.shopping_category IN ('online', 'COD')
)
OR
(
b.shopping_product = 'LX3454' and b.sub_shopping_list in ('FF544','GT544','KK543','LK5343')
)
OR
(
b.shopping_product = 'LK434434' and b.sub_shopping_list in ('LL5435','PO89554','IO948854','OR4334','TH5444')
)
OR
(
b.shopping_product = 'AZ434434' and b.sub_shopping_list in ('LL54352','PO489554','IO9458854','OR34334','TH54344')
)
)AND
ORDER BY
b.time desc
ELSE
BEGIN
select <some columns>
From <some tables with joins(no lock)>
where <similar where as above with slight difference>
Okay then,
I said "first take indexes on these : shopping_product and shopping_category sub_shopping_list , and secondly u can try on the date , after that see the execution plan. (or would be better to create partition on the time column)"
I'm working on oracle, but the basics are the same.
You can create 3 distinct indexes on that cols : shopping_product, shopping_category, sub_shopping_list . Or you can create 1 composite index for that 3 cols. The point is you need to examine the execution plan which one is the most effective for you.
Oh, and here is a.ca_id column (almost forget), you need an index for this too.
For the date column i think you would better create a partition instead of an index.
Summary, two ways: - create 4 distinct index (shopping_product,shopping_category,sub_shopping_list, ca_id) , create a range typed partition on the date column
- create 1 composite index (shopping_product,shopping_category,sub_shopping_list) and 1 normal index(ca_id) , create a range typed partition on the date column
You probably should learn about indexing if you're dealing with tables of this size. It's not a trivial process. JOIN operations are a big deal when sorting out which indexes you need. Read this. http://use-the-index-luke.com/
In the meantime, if your date-range is highly selective (that is, if
b.time >= #from_datetime AND b.time <= #to_datetime
chooses a reasonably small fraction of the rows in your database) you should try the following compound index.
b.shopping_product, b.time
If that doesn't help, try
b.time
by itself. The idea is to structure your index so the server can do a range scan. Without a knowledge of your whole query, there's not much else to offer.

SQL Server ORDER BY date and nulls last

I am trying to order by date. I want the most recent dates coming in first. That's easy enough, but there are many records that are null and those come before any records that have a date.
I have tried a few things with no success:
ORDER BY ISNULL(Next_Contact_Date, 0)
ORDER BY ISNULL(Next_Contact_Date, 999999999)
ORDER BY coalesce(Next_Contact_Date, 99/99/9999)
How can I order by date and have the nulls come in last? The data type is smalldatetime.
smalldatetime has range up to June 6, 2079 so you can use
ORDER BY ISNULL(Next_Contact_Date, '2079-06-05T23:59:00')
If no legitimate records will have that date.
If this is not an assumption you fancy relying on a more robust option is sorting on two columns.
ORDER BY CASE WHEN Next_Contact_Date IS NULL THEN 1 ELSE 0 END, Next_Contact_Date
Both of the above suggestions are not able to use an index to avoid a sort however and give similar looking plans.
One other possibility if such an index exists is
SELECT 1 AS Grp, Next_Contact_Date
FROM T
WHERE Next_Contact_Date IS NOT NULL
UNION ALL
SELECT 2 AS Grp, Next_Contact_Date
FROM T
WHERE Next_Contact_Date IS NULL
ORDER BY Grp, Next_Contact_Date
According to Itzik Ben-Gan, author of T-SQL Fundamentals for MS SQL Server 2012, "By default, SQL Server sorts NULL marks before non-NULL values. To get NULL marks to sort last, you can use a CASE expression that returns 1 when the" Next_Contact_Date column is NULL, "and 0 when it is not NULL. Non-NULL marks get 0 back from the expression; therefore, they sort before NULL marks (which get 1). This CASE expression is used as the first sort column." The Next_Contact_Date column "should be specified as the second sort column. This way, non-NULL marks sort correctly among themselves." Here is the solution query for your example for MS SQL Server 2012 (and SQL Server 2014):
ORDER BY
CASE
WHEN Next_Contact_Date IS NULL THEN 1
ELSE 0
END, Next_Contact_Date;
Equivalent code using IIF syntax:
ORDER BY
IIF(Next_Contact_Date IS NULL, 1, 0),
Next_Contact_Date;
order by -cast([Next_Contact_Date] as bigint) desc
If your SQL doesn't support NULLS FIRST or NULLS LAST, the simplest way to do this is to use the value IS NULL expression:
ORDER BY Next_Contact_Date IS NULL, Next_Contact_Date
to put the nulls at the end (NULLS LAST) or
ORDER BY Next_Contact_Date IS NOT NULL, Next_Contact_Date
to put the nulls at the front. This doesn't require knowing the type of the column and is easier to read than the CASE expression.
EDIT: Alas, while this works in other SQL implementations like PostgreSQL and MySQL, it doesn't work in MS SQL Server. I didn't have a SQL Server to test against and relied on Microsoft's documentation and testing with other SQL implementations. According to Microsoft, value IS NULL is an expression that should be usable just like any other expression. And ORDER BY is supposed to take expressions just like any other statement that takes an expression. But it doesn't actually work.
The best solution for SQL Server therefore appears to be the CASE expression.
A bit late, but maybe someone finds it useful.
For me, ISNULL was out of question due to the table scan. UNION ALL would need me to repeat a complex query, and due to me selecting only the TOP X it would not have been very efficient.
If you are able to change the table design, you can:
Add another field, just for sorting, such as Next_Contact_Date_Sort.
Create a trigger that fills that field with a large (or small) value, depending on what you need:
CREATE TRIGGER FILL_SORTABLE_DATE ON YOUR_TABLE AFTER INSERT,UPDATE AS
BEGIN
SET NOCOUNT ON;
IF (update(Next_Contact_Date)) BEGIN
UPDATE YOUR_TABLE SET Next_Contact_Date_Sort=IIF(YOUR_TABLE.Next_Contact_Date IS NULL, 99/99/9999, YOUR_TABLE.Next_Contact_Date_Sort) FROM inserted i WHERE YOUR_TABLE.key1=i.key1 AND YOUR_TABLE.key2=i.key2
END
END
Use desc and multiply by -1 if necessary. Example for ascending int ordering with nulls last:
select *
from
(select null v union all select 1 v union all select 2 v) t
order by -t.v desc
I know this is old but this is what worked for me
Order by Isnull(Date,'12/31/9999')
I think I found a way to show nulls in the end and still be able to use indexes for sorting.
The idea is super simple - create a calculatable column which will be based on existing column, and put an index on it.
ALTER TABLE dbo.Users
ADD [FirstNameNullLast]
AS (case when [FirstName] IS NOT NULL AND (ltrim(rtrim([FirstName]))<>N'' OR [FirstName] IS NULL) then [FirstName] else N'ZZZZZZZZZZ' end) PERSISTED
So, we are creating a persisted calculatable column in the SQL, in that column all blank and null values will be replaced by 'ZZZZZZZZ', this will mean, that if we will try to sort based on that column, we will see all the null or blank values in the end.
Now we can use it in our new index.
Like this:
CREATE NONCLUSTERED INDEX [IX_Users_FirstNameNullLast] ON [dbo].[Users]
(
[FirstNameNullLast] ASC
)
So, this is an ordinary nonclustered index. We can change it however we want, i.e. include extra columns, increase number of indexes columns, change sorting order etc.
I know this is a old thread, but in SQL Server nulls are always lower than non-null values. So it's only necessary to order by Desc
In your case Order by Next_Contact_Date Desc should be enough.
Source: order by with nulls- LearnSql

partition function in SQL Server 2005

In MSDN about partition function from here, $PARTITION(Transact-SQL).
I am confused about what the below sample is doing underlying. My understanding is, this SQL statement will iterate all rows in table Production.TransactionHistory, and since for all the rows which will mapping to the same partition, $PARTITION.TransactionRangePF1(TransactionDate) will return the same value, i.e. the partition number for all such rows. So, for example, all rows in partition 1 will result in one row in returning result since they all of the same value of $PARTITION.TransactionRangePF1(TransactionDate). My understanding correct?
USE AdventureWorks ;
GO
SELECT $PARTITION.TransactionRangePF1(TransactionDate) AS Partition,
COUNT(*) AS [COUNT] FROM Production.TransactionHistory
GROUP BY $PARTITION.TransactionRangePF1(TransactionDate)
ORDER BY Partition ;
GO
If your parition function is defined like
CREATE PARTITION FUNCTION TransactionRangePF1(DATETIME)
AS RANGE RIGHT FOR VALUES ('2007-01-01', '2008-01-01', '2009-01-01')
, then this clause:
$PARTITION.TransactionRangePF1(TransactionDate)
is equivalent to:
CASE
WHEN TransactionDate < '2007-01-01' THEN 1
WHEN TransactionDate < '2008-01-01' THEN 2
WHEN TransactionDate < '2009-01-01' THEN 3
ELSE 4
END
If all your dates fall before '2007-01-01', then the first WHEN clause will always fire and it will always return 1.
The query you posted will return at most 1 row for each partition, as it will group all the rows from the partition (if any) into one group.
If there are no rows for any partition, no rows for it will be returned in the resultset.
It returns the number of records in each of the non-empty partitions in the partitioned table Production.TransactionHistory, so yes your reasoning is correct.
Have you tried generating an execution plan for the statement? That might give you some insight into what it's actually doing underneath the cover.
Press "Control-L" to generate an execution plan and post it here if you'd like some interpretation.

Resources