I've got a staging area that I'm trying to validate data in, going through multiple iterations of validation. Currently, I'm fighting some issues with a nvarchar(50) column that I'm attempting to convert to a date.
I'm aware of the common pitfall of poorly formed strings failing date conversion, so here's what I'm doing.
SELECT *
FROM ( SELECT * FROM STAGE_TABLE WHERE ISDATE(DATE_COL) = 1)
WHERE CAST(DATE_COL AS DATE) < GETDATE()
... this results in the standard "Conversion failed when converting date and/or time from character string."
But here's where things get weird for me. If I change the above statement to the following:
SELECT CAST(DATE_COL AS DATE)
FROM ( SELECT * FROM STAGE_TABLE WHERE ISDATE(DATE_COL) = 1)
... all is well, and all I've done is moved the cast from the where clause to the select clause. I think I'm missing something at a fundamental level.
FWIW, if I were to pull all records from STAGE_TABLE without the WHERE ISDATE clause, I would have poorly formed date strings.
Any insights greatly appreciated!
You should find that the first query merges the two WHERE clauses into one, and works out the CAST before the ISDATE (fail).
The 2nd query clearly has to process the WHERE first, so the CAST never sees bad data
I have just tested and can verify:
create table STAGE_TABLE ( date_col nvarchar(50) )
insert into STAGE_TABLE select 'a'
insert into STAGE_TABLE select '20100101'
First query
SELECT *
FROM ( SELECT * FROM STAGE_TABLE WHERE ISDATE(DATE_COL) = 1) X
WHERE CAST(DATE_COL AS DATE) < GETDATE()
First plan
|--Filter(WHERE:(isdate([tempdb].[dbo].[STAGE_TABLE].[date_col])=(1)))
|--Table Scan(OBJECT:([tempdb].[dbo].[STAGE_TABLE]), WHERE:(CONVERT(date,[tempdb].[dbo].[STAGE_TABLE].[date_col],0)<getdate()))
Second query
SELECT CAST(DATE_COL AS DATE)
FROM ( SELECT * FROM STAGE_TABLE WHERE ISDATE(DATE_COL) = 1) X
Second plan
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT(date,[tempdb].[dbo].[STAGE_TABLE].[date_col],0)))
|--Filter(WHERE:(isdate([tempdb].[dbo].[STAGE_TABLE].[date_col])=(1)))
|--Table Scan(OBJECT:([tempdb].[dbo].[STAGE_TABLE]))
There does not seem to be a hint/option to fix the first query (since it gets rolled into one WHERE clause), but you can use this which processes both conditions in one scan pass.
SELECT *
FROM (SELECT * FROM STAGE_TABLE) X
WHERE CAST(CASE WHEN ISDATE(DATE_COL) = 1 THEN DATE_COL ELSE NULL END AS DATE) < GETDATE()
Related
Firstly, may I state that I'm aware of the ability to, e.g., create a new function, declare variables for rowcount1 and rowcount2, run a stored procedure that returns a subset of rows from a table, then determine the entire rowcount for that same table, assign it to the second variable and then 1 / 2 x 100....
However, is there a cleaner way to do this which doesn't result in numerous running of things like this stored procedure? Something like
select (count(*stored procedure name*) / select count(*) from table) x 100) as Percentage...
Sorry for the crap scenario!
EDIT: Someone has asked for more details. Ultimately, and to cut a very long story short, I wish to know what people would consider the quickest and most processor-concise method there would be to show the percentage of rows that are returned in the stored procedure, from ALL rows available in that table. Does that make more sense?
The code in the stored procedure is below:
SET #SQL = 'SELECT COUNT (DISTINCT c.ElementLabel), r.FirstName, r.LastName, c.LastReview,
CASE
WHEN c.LastReview < DateAdd(month, -1, GetDate()) THEN ''OUT of Date''
WHEN c.LastReview >= DateAdd(month, -1, GetDate()) THEN ''In Date''
WHEN c.LastReview is NULL THEN ''Not Yet Reviewed'' END as [Update Status]
FROM [Residents-'+#home_name+'] r
LEFT JOIN [CarePlans-'+#home_name+'] c ON r.PersonID = c.PersonID
WHERE r.Location = '''+#home_name+'''
AND CarePlanType = 0
GROUP BY r.LastName, r.FirstName, c.LastReview
HAVING COUNT(ELEMENTLABEL) >= 14
Thanks
Ant
I could not tell from your question if you are attempting to get the count and the result set in one query. If it is ok to execute the SP and separately calculate a table count then you could store the results of the stored procedure into a temp table.
CREATE TABLE #Results(ID INT, Value INT)
INSERT #Results EXEC myStoreProc #Parameter1, #Parameter2
SELECT
Result = ((SELECT COUNT(*) FROM #Results) / (select count(*) from table))* 100
I am dealing with legacy Informix data apparently never validated properly upon input.
This means that a DATE field could contain
a proper date (12/25/2000),
a garbage date (02/22/0200),
NULL, or
the Lindbergh baby.
The following works nicely in SQL Server:
SELECT
COUNT(1) AS [Grand Total Count of Bad-Date Records],
COUNT(GOOFYDATE) AS [Count of Just NON-NULL Bad-Date Records],
SUM(IIF(GOOFYDATE IS NULL,1,0)) AS [Count of Just NULL Bad-Date Records]
FROM MyTable
WHERE ISDATE(GOOFYDATE)=0
Everything above adds up.
In Informix,
SELECT COUNT(1)
FROM MyTable
WHERE DATE(GOOFYDATE) IS NULL
gives me the Grand Total, as before. However, the following does, too:
SELECT COUNT(1)
FROM MyTable
WHERE DATE(GOOFYDATE) IS NULL
AND GOOFYDATE IS NULL
How may I implement in Informix my ISDATE goal, as accomplished above in SQL Server?
You can write a stored procedure/function to perform this task, so that it will work exactly like the SQL Server equivalent. Something like:
CREATE FUNCTION informix.isdate(str VARCHAR(50), fmt VARCHAR(50))
RETURNING SMALLINT;
DEFINE d DATE;
ON EXCEPTION IN (-1277,-1267,-1263) -- other errors are possible
RETURN 0;
END EXCEPTION;
LET d = TO_DATE(str, fmt); -- acceptable date if exception not raised
IF d < MDY(1,1,1850) THEN -- dates prior to this are "logically invalid"
RETURN 0;
END IF;
RETURN 1;
END FUNCTION;
Which you can use thus:
-- Sample data
CREATE TEMP TABLE test1 (str VARCHAR(50));
INSERT INTO test1 VALUES ("not a date");
INSERT INTO test1 VALUES ("02/25/2016");
INSERT INTO test1 VALUES ("02/30/2016");
INSERT INTO test1 VALUES ("02/01/0000");
SELECT str, ISDATE(str, "%m/%d/%Y") FROM test1;
str (expression)
not a date 0
02/25/2016 1
02/30/2016 0
02/01/0000 0
4 row(s) retrieved.
SELECT str AS invalid_date
FROM test1
WHERE ISDATE(str, "%m/%d/%Y") = 0;
invalid_date
not a date
02/30/2016
02/01/0000
3 row(s) retrieved.
Depending on how goofy your dates are, you may find other errors crop up. Just adjust the ON EXCEPTION clause accordingly. I've written this function to be as general purpose as possible, but you could code the "accepted" date format into the routine rather than pass it as an argument. (I don't recommend that, though.)
Please, observe http://sqlfiddle.com/#!3/6f14c/7
Here is the schema:
CREATE TABLE tmp1 (
ts DATETIME NOT NULL,
message NVARCHAR(128) NOT NULL
)
INSERT INTO tmp1 (ts,message) VALUES
('2015-06-24 00:28:18', '121a'),
('2015-06-24 00:30:18', '28.315b')
Here is the SQL statement:
;with data as (
select ts,CONVERT(FLOAT, replace(message,'a','')) seconds from tmp1
where message LIKE '%a'
)
select * from data where seconds > 100
Running it yields Error converting data type nvarchar to float.
Why?
SQL Server does not guarantee the order of evaluation of expressions. What is happening is that the conversion is happening before the filtering, because it is pushed in to the part of the process that reads the data.
CTEs and subqueries do not affect this optimization. The only way around it in earlier versions of SQL within a single query is to use case:
select ts,
(case when isnumeric(replace(message, 'a', '')) = 1
then CONVERT(FLOAT, replace(message,'a',''))
end)
from tmp1
where message LIKE '%a' and seconds > 100;
In SQL Server 2012+, you can use try_convert() instead:
with data as (
select ts, try_convert(FLOAT, replace(message,'a','')) as seconds
from tmp1
where message LIKE '%a'
)
select * from data where seconds > 100
Gordon's answer is actually right on! Alternatively, I think you could also use trim functions in SQL Server to do this:
;WITH data
AS (
SELECT ts
,CONVERT(FLOAT, ltrim(rtrim(replace(message, 'a', '')))) seconds
FROM tmp1
WHERE message LIKE '%a'
)
SELECT *
FROM data
WHERE seconds > 100
SQL Fiddle Demo
Hi I have a reporting application written against some 3rd party software. Unfortunately it stores all values as nvarchar and does not validate data entry on the client side as a result I am getting the following error when
"Conversion failed when converting date and/or time from character string"
System.Data.SqlClient.SqlException was unhandled by user code
or if I try to execute the code in SSMS:
Msg 241, Level 16, State 1, Procedure settlement_list, Line 10
Conversion failed when converting date and/or time from character string.
I assume this is the result of someone entering a text value in the data field so I've tried this ISDATE code to find the bad value:
SELECT mat3_02_01, CONVERT(datetime, mat3_04_02), mat3_04_02 FROM lntmu11.matter3
WHERE ISDATE(mat3_04_02) <> 1
AND Coalesce(mat3_04_02, '') <> ''
order by mat3_04_02 desc
and I get zero row returned ... I also manually sifted through the data (its sveral 100 thousand rows so its kind of hard and see no bad values ???
Does anyone have any suggestions ?
EDIT ---
Here is the stored proc (I know where clause is ugly)
SELECT mat_no, 'index'=matter.mat1_01_06,
'insurance'=Replace(Replace(matter.mat1_03_01, 'INSURANCE COMPANY', ' '), 'COMPANY', ''),
matter.[status], 'casestage'=mat1_04_01, 'injured'=matter.MAT1_01_07, matter.client,
'terms'=mat3_04_06, 'ClmAmt'=matter.mat1_07_01,
'ClmBal'=matter.mat1_07_03, 'SetTot'=matter3.MAT3_04_09, 'By'=mat3_03_02,
'DtSttld'=mat3_04_02, 'SettlStg'=(MAT3_06_08 + ' / ' + MAT3_06_05)
FROM [lntmu11].matter3 inner join
[lntmu11].matter ON [lntmu11].matter.sysid = [lntmu11].matter3.sysid
WHERE
(DateDiff(month, convert(datetime, MAT3_04_02, 101), GETDATE()) = #range
and mat3_03_02 like #by)
or
(mat3_04_06 like #by2
and DateDiff(month, convert(datetime, MAT3_04_02, 101), GETDATE()) = #range)
ORDER BY MAT3_03_02
You can't force the order the query engine will try to process the statement without first dumping the ISDATE() = 1 rows into a #temp table. You can't guarantee the processing order or short circuiting, even though some will suggest using a CTE or subquery to filter out the bad rows first. So some might suggest:
;WITH x AS
(
SELECT mat3_02_01, mat3_04_02
FROM Intmu11.matter3
WHERE ISDATE(mat3_04_02) = 1
AND mat3_04_02 IS NOT NULL -- edited!
)
SELECT mat3_02_01, CONVERT(DATETIME, mat3_04_02), mat3_04_02
FROM x
ORDER BY mat3_04_02 DESC;
And this may even appear to work, today. But in the long term, really the only way to guarantee this processing order - in current versions of SQL Server - is:
SELECT mat3_02_01, mat3_04_02
INTO #x
FROM Intmu11.matter3
WHERE ISDATE(mat3_04_02) = 1
AND mat3_04_02 IS NOT NULL; -- edited!
SELECT mat3_02_01, CONVERT(DATETIME, mat3_04_02), mat3_04_02
FROM #x
ORDER BY mat3_04_02 DESC;
Have you thought about validating the values on input? For example, you can change where this error appears in the application by slapping them on the wrist when they enter an invalid date, instead of punishing the person who selects their bad data. If you are controlling the update/insert via a stored procedure, you can say:
IF ISDATE(#mat3_04_02) = 0
BEGIN
RAISERROR('Please enter a valid date.', 11, 1);
RETURN;
END
If you aren't controlling data manipulation via stored procedure(s), then you can add a check constraint to the table (after you've cleaned up the existing bad data).
UPDATE Intmu11.matter3 SET mat3_04_02 = NULL
WHERE ISDATE(mat3_04_02) = 0;
ALTER TABLE Intmu11 WITH NOCHECK
ADD CONSTRAINT mat3_04_02_valid_date CHECK (ISDATE(mat3_04_02)=1);
This way when the error message gets bubbled up to the user they will see the constraint name and hopefully will be able to map that to the data entry point on the front end that failed:
Msg 547, Level 16, State 0, Line 1 The INSERT statement conflicted
with the CHECK constraint "mat3_04_02_valid_date". The conflict
occurred in database "your_db", table "Intmu11.matter3", column
'mat3_04_02'. The statement has been terminated.
Or better yet, use the right data type in the first place! Again, after updating the existing bad data to be NULL, you can say:
ALTER TABLE Intmu11.matter3 ALTER COLUMN mat3_04_02 DATETIME;
Now when someone tries to enter a non-date, they'll get the same error that the users are currently getting when they try to select the bad data:
Msg 241, Level 16, State 1, Line 1 Conversion failed when
converting date and/or time from character string.
In SQL Server 2012, you'll be able to get around this with TRY_CONVERT() but you should still be trying to get the data type right from the beginning.
Examine the query where
ISDATE(mat3_04_02) = 1
AND
Coalesce(mat3_04_02, '') = ''
To be a date it must have a value.
But is only matches the second condition if it has not value.
The intersection (and) of those two conditions is always false.
If you are looking for null then "mat3_04_02 is null" but it still will return 0 rows.
Try
SELECT mat3_02_01, CONVERT(datetime, mat3_04_02), mat3_04_02
FROM lntmu11.matter3
WHERE ISDATE(mat3_04_02) = 1
order by CONVERT(datetime, mat3_04_02) desc
I think you would want date sorted and not string sorted
The question started as finding valid dates and it morphed into finding invalid dates
SELECT mat3_02_01, mat3_04_02
FROM lntmu11.matter3
WHERE ISDATE(mat3_04_02) = 0
AND mat3_04_02 is not null
order by mat3_04_02) desc
I'm having some trouble with this statement, owing no doubt to my ignorance of what is returned from this select statement:
declare #myInt as INT
set #myInt = (select COUNT(*) from myTable as count)
if(#myInt <> 0)
begin
print 'there's something in the table'
end
There are records in myTable, but when I run the code above the print statement is never run. Further checks show that myInt is in fact zero after the assignment above. I'm sure I'm missing something, but I assumed that a select count would return a scalar that I could use above?
If #myInt is zero it means no rows in the table: it would be NULL if never set at all.
COUNT will always return a row, even for no rows in a table.
Edit, Apr 2012: the rules for this are described in my answer here:Does COUNT(*) always return a result?
Your count/assign is correct but could be either way:
select #myInt = COUNT(*) from myTable
set #myInt = (select COUNT(*) from myTable)
However, if you are just looking for the existence of rows, (NOT) EXISTS is more efficient:
IF NOT EXISTS (SELECT * FROM myTable)
select #myInt = COUNT(*) from myTable
Declare #MyInt int
Set #MyInt = ( Select Count(*) From MyTable )
If #MyInt > 0
Begin
Print 'There''s something in the table'
End
I'm not sure if this is your issue, but you have to esacpe the single quote in the print statement with a second single quote. While you can use SELECT to populate the variable, using SET as you have done here is just fine and clearer IMO. In addition, you can be guaranteed that Count(*) will never return a negative value so you need only check whether it is greater than zero.
[update] -- Well, my own foolishness provides the answer to this one. As it turns out, I was deleting the records from myTable before running the select COUNT statement.
How did I do that and not notice? Glad you asked. I've been testing a sql unit testing platform (tsqlunit, if you're interested) and as part of one of the tests I ran a truncate table statement, then the above. After the unit test is over everything is rolled back, and records are back in myTable. That's why I got a record count outside of my tests.
Sorry everyone...thanks for your help.