SQL Server 2019 ignoring WHERE clause? - sql-server

I'm getting an error when I try to run a simple aggregating query.
SELECT MAX(CAST(someDate as datetime)) AS MAX_DT FROM #SomeTable WHERE ISDATE(someDate) = 1
ERROR: Conversion failed when converting date and/or time from character string.
Non-date entries should be removed by WHERE clause, but that doesn't seem to be happening. I can work around with an explicit CASE statement inside the MAX(), but I don't want to hack up the query if I can avoid it. If I use a lower COMPATIBILITY_LEVEL, it works fine. If I have fewer than 2^17 rows, it works fine.
-- SQLServer 15.0.4043.16
USE AdventureWorks;
GO
ALTER DATABASE AdventureWorks SET COMPATIBILITY_LEVEL = 150;
GO
-- delete temp table if exists
DROP TABLE IF EXISTS #SomeTable;
GO
-- create temp table
CREATE TABLE #SomeTable (
someDate varchar(20) DEFAULT GETDATE()
);
-- load data, need at least 2^17 rows with at least 1 bad date value
INSERT #SomeTable DEFAULT VALUES;
DECLARE #i int = 0;
WHILE #i < 17
BEGIN
INSERT INTO #SomeTable (someDate) SELECT someDate FROM #SomeTable
SET #i = #i + 1;
END
GO
-- create invalid date row
WITH cteUpdate AS (SELECT TOP 1 * FROM #SomeTable)
UPDATE cteUpdate SET someDate='NOT_A_DATE'
-- error query
SELECT MAX(CAST(someDate as datetime)) AS MAX_DT
FROM #SomeTable
WHERE ISDATE(someDate) = 1
--ERROR: Conversion failed when converting date and/or time from character string.
-- delete temp table if exists
DROP TABLE IF EXISTS #SomeTable;
GO

I would recommend try_cast() rather than isdate():
SELECT MAX(TRY_CAST(someDate as datetime)) AS MAX_DT
FROM #SomeTable
This is a much more reliable approach: instead of relying on some heuristic to guess whether the value is convertible to a datetime (as isdate() does), try_cast actually attempts to convert, and returns null if that fails - which aggregate function max() happily ignores.
try_cast() (and sister functions try_convert()) is a very handy functions, that many other databases are missing.

I actually just encountered the same issue (except I did cast to float). It seems that the SQL Server 2019 Optimizer sometimes (yes, it's not reliable) decides to execute the calculations in the SELECT part before it applies the WHERE.
If you set compatibility level to a lower version this also results in a different optimizer being used (it always uses the optimizer of the compatibility level). Older query optimizers seem to always execute the WHERE part first.
Seems lowering the compatibility level is already the best solution unless you want to replace all CAST with TRY_CAST (which would also mean you won't spot actual errors as easily, such as a faulty WHERE that causes your calculation to then return NULL instead of the correct value).

Related

Concatenating Time and DateTime in where clause

I am trying to combine a SmallDateTime field and a Time value (result of a scalar-valued function) into a DateTime and I keep getting the following error:
Conversion failed when converting date and/or time from character
string.
Here are the variables used throughout:
DECLARE #STARTDATETIME AS DATETIME
DECLARE #ENDDATETIME AS DATETIME
SELECT #STARTDATETIME = '8/29/2016 12:00:00'
SELECT #ENDDATETIME = '8/30/2016 12:00:00'
Column definitions:
FT_START_DATE SmallDateTime
FT_END_DATE SmallDateTime
FT_START_TIME Int
FT_END_TIME Int
The date fields do not contain timestamps. The time fields are basically 24 hour time without the colon dividers. (Example: 142350 = 14:23:50)
Here's the function that is called in my queries:
USE [PWIN171]
GO
/****** Object: UserDefinedFunction [dbo].[dbo.IPC_Convert_Time] Script Date: 9/13/2016 4:50:49 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[dbo.IPC_Convert_Time]
(
#time int
)
RETURNS time
AS
BEGIN
DECLARE #Result time
SELECT #Result = CONVERT(time
, STUFF(
STUFF(
RIGHT('000000' + CONVERT(varchar(6), #time), 6)
, 5, 0, ':')
, 3, 0, ':')
)
RETURN #Result
END
Example 1 - Fails:
This is what I'm after in general.
SELECT * FROM FT WITH (NOLOCK)
WHERE
CAST(FT_END_DATE AS DATETIME) + DBO.[dbo.IPC_Convert_Time](FT_END_TIME) BETWEEN #STARTDATETIME AND #ENDDATETIME;
Example 2 - Works:
This one runs, but it won't get records from 8/29 because the end dates will be before 12:00:00 on 8/29.
SELECT * FROM FT WITH (NOLOCK)
WHERE
FT_END_DATE BETWEEN #STARTDATETIME AND #ENDDATETIME
AND CAST(FT_END_DATE AS DATETIME) + DBO.[dbo.IPC_Convert_Time](FT_END_TIME) BETWEEN #STARTDATETIME AND #ENDDATETIME;
I suppose I could do one where I split apart my paramters and check that the end time is between the time portion of the parameters as well, but that seems to be a step in the wrong direction. The error seems to only appear when there is no other usage of FT_START_DATE or FT_END_DATE in the where clause.
The time converting function works fine in every scenario I have created. I have even tried Example 2 with parameters that would give it 100% overlap with the data covered by Example 1 in case there was bad data causing the error, but it runs fine.
I also don't know exactly where the error is occurring, because it only references the line the select statement begins on, and not the actual location in the code.
Why does it behave like this?
UPDATE:
TIMEFROMPARTS is not available because this is on SQL Server 2008
If I understand this correctly, this can be done much simpler:
Try this:
DECLARE #d DATE=GETDATE();
DECLARE #t TIME=GETDATE();
SELECT #d;
SELECT #t;
SELECT CAST(#d AS datetime)+CAST(#t AS datetime);
A pure date and a pure time can simply be added to combine them...
UPDATE Read your question again...
Try this
SELECT FT_END_DATE
,FT_END_TIME
,CAST(FT_END_DATE AS DATETIME) + DBO.[dbo.IPC_Convert_Time](FT_END_TIME) AS CombinedTime
,*
FROM FT
to see if your attempt is doing the right thing.
If yes, it might help to create a CTE and do the filter on the named column.
Sometimes the engine does not work the order you would expect this.
As CTEs are fully inlined it is quite possible, that this will not help...
SQL Server is well knwon for bringing up such errors, because a type check happens before a conversion took place...
It might be an idea to use the given SELECT with INTO #tbl to push the result set into a new table and do your logic from there...

DATETIME search predicate on DATETIME column much slower than string literal predicate

I'm doing a search on a large table of about 10 million rows. I want to specify a start and end date and return all records in the table created between those dates.
It's a straight-forward query:
declare #StartDateTime datetime = '2016-06-21',
#EndDateTime datetime = '2016-06-22';
select *
FROM Archive.dbo.Order O WITH (NOLOCK)
where O.Created >= #StartDateTime
AND O.Created < #EndDateTime;
Created is a DATETIME column which has a non-clustered index.
This query took about 15 seconds to complete.
However, if I modify the query slightly, as follows, it takes only 1 second to return the same result:
declare #StartDateTime datetime = '2016-06-21',
#EndDateTime datetime = '2016-06-22';
select *
FROM Archive.dbo.Order O WITH (NOLOCK)
where O.Created >= '2016-06-21'
AND O.Created < #EndDateTime;
The only change is replacing the #StartDateTime search predicate with a string literal. Looking at the execution plan, when I used #StartDateTime it did an index scan but when I used a string literal it did an index seek and was 15 times faster.
Does anyone know why using the string literal is so much faster?
I would have thought doing a comparison between a DATETIME column and a DATETIME variable would be quicker than comparing the column to a string representation of a date. I've tried dropping and recreating the index on the Created column and it made no difference. I notice I get similar results on the production system as I do on the test system so the weird behaviour doesn't seem specific to a particular database or SQL Server instance.
All variables have instances that they are recognized.
In OOP languages, we usually distinguish between static/constant variables from temporary variables by using keywords, or when a variable is called into a function where inside that instance the variable is treated as a constant if the function transforms that variable, such like the following in C++:
void string MyFunction(string& name)
//technically, `&` calls the actual location of the variable
//instead of using a logical representation. The concept is the same.
In SQL Server, the Standard chose to implement it a bit differently. There are no constant data types, so instead we use literals which are either
object names (which have similar precedence in the call as system keywords)
names with an object deliminator (including ', [])
or strings with a deliminator CHAR(39) (').
This is the reason you noticed that the two queries produce different results, because those variables are not constants to the Optimizer, which means SQL Server will already have chosen it's execution path beforehand.
If you have SSMS installed, include the Actual Execution Plan (CTRL + M), and notice in the select statement what the Estimated Rows are. This is the highlight of the execution plan. The greater difference between the Estimated and Actual rows, the more likely your query can use optimization. In your example, SQL Server had to guess how many rows, and ended up overshooting the results, losing efficiency.
The solution is one and the same, but you can still encapsulate everything if you wanted to. We use the AdventureWorks2012 for this example:
1) Declare the Variable in the Procedure
CREATE PROC dbo.TEST1 (#NameStyle INT, #FirstName VARCHAR(50) )
AS
BEGIN
SELECT *
FROM Person.PErson
WHERE FirstName = #FirstName
AND NameStyle = #NameStyle; --namestyle is 0
END
2) Pass the variable into Dynamic SQL
CREATE PROC dbo.TEST2 (#NameStyle INT)
AS
BEGIN
DECLARE #Name NVARCHAR(50) = N'Ken';
DECLARE #String NVARCHAR(MAX)
SET #String =
N'SELECT *
FROM Person.PErson
WHERE FirstName = #Other
AND NameStyle = #NameStyle';
EXEC sp_executesql #String
, N'#Other VARCHAR(50), #NameStyle INT'
, #Other = #Name
, #NameStyle = #NameStyle
END
Both plans will produce the same results. I could have used EXEC by itself, but sp_executesql can cache the entire select statement (plus, its more SQL Injection safe)
Notice how in both cases the level of the instance allowed SQL Server to transform the variable into a constant value (meaning it entered the object with a set value), and then the Optimizer was capable of choosing the most efficient execution plan available.
-- Remove Procs
DROP PROC dbo.TEST1
DROP PROC dbo.TEST2
A great article was highlighted in the comment section of the OP, but you can see it here: Optimizing Variables and Parameters - SQLMAG

SQL Server - Implementing sequences

I have a system which requires I have IDs on my data before it goes to the database. I was using GUIDs, but found them to be too big to justify the convenience.
I'm now experimenting with implementing a sequence generator which basically reserves a range of unique ID values for a given context. The code is as follows;
ALTER PROCEDURE [dbo].[Sequence.ReserveSequence]
#Name varchar(100),
#Count int,
#FirstValue bigint OUTPUT
AS
BEGIN
SET NOCOUNT ON;
-- Ensure the parameters are valid
IF (#Name IS NULL OR #Count IS NULL OR #Count < 0)
RETURN -1;
-- Reserve the sequence
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION
-- Get the sequence ID, and the last reserved value of the sequence
DECLARE #SequenceID int;
DECLARE #LastValue bigint;
SELECT TOP 1 #SequenceID = [ID], #LastValue = [LastValue]
FROM [dbo].[Sequences]
WHERE [Name] = #Name;
-- Ensure the sequence exists
IF (#SequenceID IS NULL)
BEGIN
-- Create the new sequence
INSERT INTO [dbo].[Sequences] ([Name], [LastValue])
VALUES (#Name, #Count);
-- The first reserved value of a sequence is 1
SET #FirstValue = 1;
END
ELSE
BEGIN
-- Update the sequence
UPDATE [dbo].[Sequences]
SET [LastValue] = #LastValue + #Count
WHERE [ID] = #SequenceID;
-- The sequence start value will be the last previously reserved value + 1
SET #FirstValue = #LastValue + 1;
END
COMMIT TRANSACTION
END
The 'Sequences' table is just an ID, Name (unique), and the last allocated value of the sequence. Using this procedure I can request N values in a named sequence and use these as my identifiers.
This works great so far - it's extremely quick since I don't have to constantly ask for individual values, I can just use up a range of values and then request more.
The problem is that at extremely high frequency, calling the procedure concurrently can sometimes result in a deadlock. I have only found this to occur when stress testing, but I'm worried it'll crop up in production. Are there any notable flaws in this procedure, and can anyone recommend any way to improve on it? It would be nice to do with without transactions for example, but I do need this to be 'thread safe'.
MS themselves offer a solution and even they say it locks/deadlocks.
If you want to add some lock hints then you'd reduce concurrency for your high loads
Options:
You could develop against the "Denali" CTP which is the next release
Use IDENTITY and the OUTPUT clause like everyone else
Adopt/modify the solutions above
On DBA.SE there is "Emulate a TSQL sequence via a stored procedure": see dportas' answer which I think extends the MS solution.
I'd recommend sticking with the GUIDs, if as you say, this is mostly about composing data ready for a bulk insert (it's simpler than what I present below).
As an alternative, could you work with a restricted count? Say, 100 ID values at a time? In that case, you could have a table with an IDENTITY column, insert into that table, return the generated ID (say, 39), and then your code could assign all values between 3900 and 3999 (e.g. multiply up by your assumed granularity) without consulting the database server again.
Of course, this could be extended to allocating multiple IDs in a single call - provided that your okay with some IDs potentially going unused. E.g. you need 638 IDs - so you ask the database to assign you 7 new ID values (which imply that you've allocated 700 values), use the 638 you want, and the remaining 62 never get assigned.
Can you get some kind of deadlock trace? For example, enable trace flag 1222 as shown here. Duplicate the deadlock. Then look in the SQL Server log for the deadlock trace.
Also, you might inspect what locks are taken out in your code by inserting a call to exec sp_lock or select * from sys.dm_tran_locks immediately before the COMMIT TRANSACTION.
Most likely you are observing a conversion deadlock. To avoid them, you want to make sure that your table is clustered and has a PK, but this advice is specific to 2005 and 2008 R2, and they can change the implementation, rendering this advice useless. Google up "Some heap tables may be more prone to deadlocks than identical tables with clustered indexes".
Anyway, if you observe an error during stress testing, it is likely that sooner or later it will occur in production as well.
You may want to use sp_getapplock to serialize your requests. Google up "Application Locks (or Mutexes) in SQL Server 2005". Also I described a few useful ideas here: "Developing Modifications that Survive Concurrency".
I thought I'd share my solution. I doesn't deadlock, nor does it produce duplicate values. An important difference between this and my original procedure is that it doesn't create the queue if it doesn't already exist;
ALTER PROCEDURE [dbo].[ReserveSequence]
(
#Name nvarchar(100),
#Count int,
#FirstValue bigint OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
IF (#Count <= 0)
BEGIN
SET #FirstValue = NULL;
RETURN -1;
END
DECLARE #Result TABLE ([LastValue] bigint)
-- Update the sequence last value, and get the previous one
UPDATE [Sequences]
SET [LastValue] = [LastValue] + #Count
OUTPUT INSERTED.LastValue INTO #Result
WHERE [Name] = #Name;
-- Select the first value
SELECT TOP 1 #FirstValue = [LastValue] + 1 FROM #Result;
END

Change type of a column with numbers from varchar to int

We have two columns in a database which is currently of type varchar(16). Thing is, it contains numbers and always will contain numbers. We therefore want to change its type to integer. But the problem is that it of course already contains data.
Is there any way we can change the type of that column from varchar to int, and not lose all those numbers that are already in there? Hopefully some sort of sql we can just run, without having to create temporary columns and create a C# program or something to do the conversion and so forth... I imagine it could be pretty easy if SQL Server have some function for converting strings to numbers, but I am very unstable on SQL. Pretty much only work with C# and access the database through LINQ to SQL.
Note: Yes, making the column a varchar in the first place was not a very good idea, but that is unfortunately the way they did it.
The only reliable way to do this will be using a temporary table, but it will not be much SQL:
select * into #tmp from bad_table
truncate table bad_table
alter bad_table alter column silly_column int
insert bad_table
select cast(silly_column as int), other_columns
from #tmp
drop table #tmp
The easiest way to do this is:
alter table myTable alter column vColumn int;
This will work as long as
all of the data will fit inside an int
all of the data can be converted to int (i.e. a value of "car" will fail)
there are no indexes that include vColumn. If there are indexes, you will need to include a drop and create for them to get back to where you were.
Just change the datatype in SQL Server Management Studio.
(You may need to go to menu Tools → Options → Designers, and disable the option that prevents saving changes that re-create the table.)
I totally appreciate the previous answers, but also thought a more complete answer would be helpful to other searchers...
There are a couple caveats that would be helpful if you making the changes on a production type table.
If you have an identity column defined on the table you will have to set IDENTITY_INSERT on and off around the re-insert of data. You will also have to use an explicit column list.
If you want to be sure of not killing data in the database, use TRANSACTIONS around the truncate/alter/reinsert process
If you have a lot of data, then trying to just make the change in SQ Server Management Studio could fail with a timeout and you could lose data.
To expand the answer that #cjk gave, look at the following:
Note: 'tuc' is just a placeholder in this script for the real tablename
begin try
begin transaction
print 'Selecting Data...'
select * into #tmp_tuc from tuc
print 'Truncating Table...'
truncate table tuc
alter table tuc alter column {someColumnName} {someDataType} [not null]
... Repeat above until done
print 'Reinserting data...'
set identity_insert tuc on
insert tuc (
<Explicit column list (all columns in table)>
)
select
<Explicit column list (all columns in table - same order as above)>
from #tmp_tuc
set identity_insert tuc off
drop table #tmp_tuc
commit
print 'Successful!'
end try
begin catch
print 'Error - Rollback'
if ##trancount > 0
rollback
declare #ErrMsg nvarchar(4000), #ErrSeverity int
select #ErrMsg = ERROR_MESSAGE(), #ErrSeverity = ERROR_SEVERITY()
set identity_insert tuc off
RAISERROR(#ErrMsg, #ErrSeverity, 1)
end catch

SQL Server 2005 error

Why can't you do this and is there are work around?
You get this error.
Msg 2714, Level 16, State 1, Line 13
There is already an object named '#temptable' in the database.
declare #x int
set #x = 1
if (#x = 0)
begin
select 1 as Value into #temptable
end
else
begin
select 2 as Value into #temptable
end
select * from #temptable
drop table #temptable
This is a two-part question and while Kev Fairchild provides a good answer to the second question he totally ignores the first - why is the error produced?
The answer lies in the way the preprocessor works. This
SELECT field-list INTO #symbol ...
is resolved into a parse-tree that is directly equivalent to
DECLARE #symbol_sessionid TABLE(field-list)
INSERT INTO #symbol_sessionid SELECT field-list ...
and this puts #symbol into the local scope's name table. The business with _sessionid is to provide each user session with a private namespace; if you specify two hashes (##symbol) this behaviour is suppressed. Munging and unmunging of the sessionid extension is (ovbiously) transparent.
The upshot of all this is that multiple INTO #symbol clauses produce multiple declarations in the same scope, leading to Msg 2714.
You can't do that because of deferred name resolution, you can do it with a real table, just take out the pound signs
You could also create the temp table first on top and then do a regular insert into table
First step... check if the table already exists... if it does, delete it. Next, explicitly create the table rather than using SELECT INTO...
You'll find it much more reliable that way.
IF OBJECT_ID('tempdb..#temptable', 'U') IS NOT NULL
BEGIN
DROP TABLE #temptable
END
CREATE TABLE #temptable (Value INT)
declare #x int
set #x = 1
if (#x = 0)
begin
INSERT INTO #temptable (Value) select 1
end
else
begin
INSERT INTO #temptable (Value) select 2
end
select * from #temptable
drop table #temptable
Also, hopefully the table and field names are simplified for your example and aren't what you really call them ;)
-- Kevin Fairchild
Deferred name resolution is also the reason you cannot be sure that sp_depends gives back correct results, check out this post I wrote a while back Do you depend on sp_depends (no pun intended)
I am going to guess that the issue is that you haven't created the #temptable.
Sorry I can't be more detailed but since you haven't even tried to explain what you are seeing you get a less than stellar answer.
From the look of the code is seems like you might have been prototyping this in SQL Studio or similiar, right? Can I guess that you've run this a few times and had it get to the point where it's created #temptable but then failed before it got to the end and dropped the table again? Restart the SQL editing tool you're using and try again.

Resources