Difference in performance in SQL - sql-server

I have a date column on SQL Server table called dd.
dd
---------------------------
10-01-2015 00:00:00.000
22-05-2015 10:22:32.521
27-05-2015 12:30:48.310
24-12-2014 09:51:11.728
27-05-2015 02:05:40.775
....
I need to retrieve all rows where dd value is from the last 24 hours.
I found 3 options for filtering to get the result needed:
1. `dd >= getdate() - 1`
2. `dd >= dateadd(day, -1, getdate())
3. `dateadd(day, 1, dd) >= getdate()
My questions are:
Are all the 3 options will retrieve all rows I need?
If so what is the difference between them?

dd >= getdate() - 1
This is something like a hack, but it works, but sometimes it can lead to errors(http://www.devx.com/dbzone/Article/34594/0/page/2).
dd >= dateadd(day, -1, getdate())
This is standard way of doing things.
dateadd(day, 1, dd) >= getdate()
This will also work but there is one NO. It will not use index if any index is created on that column. Because it is not a Search Argument(What makes a SQL statement sargable?). When you apply an expression to some column it becomes non SARG and will not use any index.
All 3 version will produce same result, but first is hack and in some cases will lead to bug. Third will not use index. So it is obvious that one should stick on option 2.

First two are exactly like Giorgi said, but on the third one your Index Seek will become Index Scan. SQL Server will still use that index but it is no longer able to jump to specific record but instead it has to scan it to find what it need.
For the purpose of demonstration I selected the table that had DATETIME column indexed and only selected that column to avoid any key lookups, and to keep plan simple.
Also take a look at reads on the table and estimated vs returned row count. As soon as you wrapped the column in a function it is not able to estimate correct number of rows which will cause large performance issues when queries become more complex.

Related

adding conditions makes query slow-oracle

Dear Stackoverflow Nation,
this post is related to the performance of the query.
I execute a simple query as below
select wonum, siteid from workorder where workorder.siteid= 'MCT' and istask=0
and decode(workorder.pmnum,null,workorder.reportdate,workorder.targstartdate) >= '12-MAR-18' and
decode(workorder.pmnum,null,workorder.reportdate,workorder.targstartdate) <= '14-MAR-18';
It executed perfectly , took 6 sec
as i added one more condition type ='MAINTENANCE' ,query took 28 sec
select wonum, siteid from workorder where workorder.siteid= 'MCT' and istask=0 and type ='MAINTENANCE'
and decode(workorder.pmnum,null,workorder.reportdate,workorder.targstartdate) >= '12-MAR-18' and
decode(workorder.pmnum,null,workorder.reportdate,workorder.targstartdate) <= '14-MAR-18'; --28.73
As I know ,I need to create an index on workorder table ,
but I am unable to figure out on which field ,I need to create an index and how it helps to run query fast.
(Note:there is an index (ind_1 - with attributes wonum,siteid ) already in system
kindly help.Apologize if its a basic question for PRos
Generally speaking create indexes on columns involved in where clause. As you described it, indexing the type column might help.
Will it really help? Who knows ... check explain plan. Collect statistics for the table so that Optimizer knows what to do (i.e. chooses the best execution plan). Then you might be able to figure out what to do.
Moreover, it seems that you're forcing Oracle to perform implicit conversions. Saying that
some_date >= '12-mar-18'
means that - if some_date column's datatype is date (looks like it is; otherwise you'd get wrong result) - Oracle has to convert a string '12-mar-18' into a valid date by applying correct format mask (such as dd-mon-yy). Why would you want to do that? Provide date value yourself!
some_date >= date '2018-03-12'
or
some_date >= to_date('12-mar-18', 'dd-mon-yy')
But beware; mar means "March". This query would certainly fail in my database which speaks Croatian, and we don't have any mar months here (it is ožu). Perhaps you'd rather stick to numerics here, i.e. 12-03-18. One more note: this value is difficult to understand; what is 12? Is it 12th day in the month, or is it December? The same goes for 03. Therefore, always use values that cause no confusion, either by providing date literals (which are always in yyyy-mm-dd format - the one I suggested first), or use to_date function with appropriate format mask.

How to execute a "process/function" in SQL Server

Imagine I have a database table with some columns, n columns and n rows, and one of that columns is a date (YY-MM-DD hh:mm:ss)
So I need to take the actual date, I know there is a function called CURRENT_DATE.
And I want to do some "logic" with the actual date and the date for every row in the database (there is a column in the table with a date, that one), that logic simply to compare the years and months between them and if the difference between one to the other is X months, I will return that row, and if not, I will not return it.
So, simply as return everything in the DB with the condition of that "logic" and which will not accomplish, don't return it.
The problem is, where should I put that logic in a SQL query, I don't think I really can. Can I do what I want with SQL, or it's necessary some type of stuff?
Example Data:
So if I want that the query only return that rows that the difference between the actual Date and which it's column Date, is less than 3 months for example,
it should return Google, Amazon, Twitter, YouTube and Microsoft
Unless I'm missing something obvious here, you've just really, really over-complicated a simple where clause:
SELECT A, B, C -- Please tell me these are not your actual column names!
FROM TableName
WHERE C >= DATEADD(MONTH, -3, GETDATE())
AND C <= DATEADD(MONTH, 3, GETDATE()) -- Assuming future dates are also in the table

How to use wildcard for datetime filed

How do I use wildcards for datetme? SubmitDate field is a datetime but the query that I tried returns something totally different. I want records where submitDate begins with 2019-08
This is the code I've tried:
select *
from INVPol
where SubmitDate like '[2019-08]%'
"How do I use wildcards for datetme" Quite simply, you don't. Use proper date logic. For what you have the best way would be the below
SELECT *
FROM dbo.INCPol
WHERE SubmitDate >= '20190801'
AND SubmitDate < '20190901';
Using a lower boundary with a greater or equal to, and an upper boundary with a less than will mean that every row with a date in August 2019 will be returned. This is generally seen as a the "best" way as it's the most encompassing. Logic using BETWEEN can give incorrect results when using values with a time portion. That's because 2019-08-31T00:00:00.0000001 is not BETWEEN '20190801' and '20190831' (it's 1/1000000 of a second after the end threshold); this would mean you would effective lose a days worth of values. Also the date '2019-09-01T00:00:00.0000000' is BETWEEN '20190801' AND '20190901', so you could get (some) unwanted rows.
Trying to use a wildcard on a date would mean you would have to convert the value of the column to a varchar, which will cause performance issues. Leave the date as a date and time datatype and query it as one.

Extract data by day from SQL Server

I need to get all the values from a SQL Server database by day (24 hours). I have timestamps column in TestAllData table and I want to select the data which only corresponds to a specific day.
For instance, there are timestamps of DateTime type like '2019-03-19 12:26:03.002', '2019-03-19 17:31:09.024' and '2019-04-10 14:45:12.015' so I want to load the data for the day 2019-03-19 and separately for the day 2019-04-10. Basically, it is needed to get DateTime values with the same date.
Is this possible to use some functions like DatePart or DateDiff for that?
And how can I solve such problem overall?
As in this case, I do not know the exact difference in hours between a timestamp and the end of the day (because there are various timestamps for 1 day) and I need to extract the day itself from the timestamp. After that, I need to group the data by days or something like this and get block by block. For example:
'2019-03-19' - 1200 records
'2019-04-10' - 3500 records
'2019-05-12' - 10000 records and so on
I'm looking for a more generic solution not supplying a timestamp (like '2019-03-19') as a boundary or in a where clause because the problem is not about simply filtering the data by some date!!
UPDATE: In my dataset, I have about 1,000,000 records and more than 100 unique dates. I was thinking about extracting the set of unique dates and then kind of run a query in the loop where the data would be filtered by the provided day. It would look in such a way:
select * from TestAllData where dayColumn = '2019-03-19'
select * from TestAllData where dayColumn = '2019-04-10'
select * from TestAllData where dayColumn = '2019-05-12'
...
I might use this query in my code, so I may run it in the loop from Scala function. However, I am not sure that in terms of performance it would be ok to run separate unique dates extraction query.
Depending on whether you want to be able to work with all the dates (rather than just a subset), one of the easiest ways to achieve this is with a cast:
;with cte as (SELECT cast(my_datetime as date) as my_date, * from TestAllData)
SELECT * FROM cte where my_date = '2019-02-14'
Note when casting datetime to date, times are truncated, ie just the date part is extracted.
As I say though, whether this is efficient, depends on your needs, as all datetime values from all records will be cast to date, before the data is filtered. If you want to select several dates (as opposed to just one or two), however, it may prove overall quicker, as it reads the whole table once and then gives you a column upon which you can much more efficiently filter.
If this is a permanent requirement, though, I would probably use a persisted computed column, which effectively would mean that the casting is done once initially and then only again if the corresponding value changed. For a large table I would also strongly consider an index on the computed column.

SQL Server DateTime conversion failure

I have a large table with 1 million+ records. Unfortunately, the person who created the table decided to put dates in a varchar(50) field.
I need to do a simple date comparison -
datediff(dd, convert(datetime, lastUpdate, 100), getDate()) < 31
But it fails on the convert():
Conversion failed when converting datetime from character string.
Apparently there is something in that field it doesn't like, and since there are so many records, I can't tell just by looking at it. How can I properly sanitize the entire date field so it does not fail on the convert()? Here is what I have now:
select count(*)
from MyTable
where
isdate(lastUpdate) > 0
and datediff(dd, convert(datetime, lastUpdate, 100), getDate()) < 31
#SQLMenace
I'm not concerned about performance in this case. This is going to be a one time query. Changing the table to a datetime field is not an option.
#Jon Limjap
I've tried adding the third argument, and it makes no difference.
#SQLMenace
The problem is most likely how the data is stored, there are only two safe formats; ISO YYYYMMDD; ISO 8601 yyyy-mm-dd Thh:mm:ss:mmm (no spaces)
Wouldn't the isdate() check take care of this?
I don't have a need for 100% accuracy. I just want to get most of the records that are from the last 30 days.
#SQLMenace
select isdate('20080131') -- returns 1
select isdate('01312008') -- returns 0
#Brian Schkerke
Place the CASE and ISDATE inside the CONVERT() function.
Thanks! That did it.
Place the CASE and ISDATE inside the CONVERT() function.
SELECT COUNT(*) FROM MyTable
WHERE
DATEDIFF(dd, CONVERT(DATETIME, CASE IsDate(lastUpdate)
WHEN 1 THEN lastUpdate
ELSE '12-30-1899'
END), GetDate()) < 31
Replace '12-30-1899' with the default date of your choice.
How about writing a cursor to loop through the contents, attempting the cast for each entry?When an error occurs, output the primary key or other identifying details for the problem record.
I can't think of a set-based way to do this.
Not totally setbased but if only 3 rows out of 1 million are bad it will save you a lot of time
select * into BadDates
from Yourtable
where isdate(lastUpdate) = 0
select * into GoodDates
from Yourtable
where isdate(lastUpdate) = 1
then just look at the BadDates table and fix that
The ISDATE() would take care of the rows which were not formatted properly if it were indeed being executed first. However, if you look at the execution plan you'll probably find that the DATEDIFF predicate is being applied first - thus the cause of your pain.
If you're using SQL Server Management Studio hit CTRL+L to view the estimated execution plan for a particular query.
Remember, SQL isn't a procedural language and short circuiting logic may work, but only if you're careful in how you apply it.
How about writing a cursor to loop through the contents, attempting the cast for each entry?
When an error occurs, output the primary key or other identifying details for the problem record.
I can't think of a set-based way to do this.
Edit - ah yes, I forgot about ISDATE(). Definitely a better approach than using a cursor. +1 to SQLMenace.
In your convert call, you need to specify a third style parameter, e.g., the format of the datetimes that are stored as varchar, as specified in this document: CAST and CONVERT (T-SQL)
Print out the records. Give the hardcopy to the idiot who decided to use a varchar(50) and ask them to find the problem record.
Next time they might just see the point of choosing an appropriate data type.
The problem is most likely how the data is stored, there are only two safe formats
ISO YYYYMMDD
ISO 8601 yyyy-mm-dd Thh:mm:ss:mmm(no spaces)
these will work no matter what your language is.
You might need to do a SET DATEFORMAT YMD (or whatever the data is stored as) to make it work
Wouldn't the isdate() check take care of this?
Run this to see what happens
select isdate('20080131')
select isdate('01312008')
I am sure that changing the table/column might not be an option due to any legacy system requirements, but have you thought about creating a view which has the date conversion logic built in, if you are using a more recent version of sql, then you can possibly even use an indexed view?
I would suggest cleaning up the mess and changing the column to a datetime because doing stuff like this
WHERE datediff(dd, convert(datetime, lastUpdate), getDate()) < 31
cannot use an index and it will be many times slower than if you had a datetime colum,n and did
where lastUpdate > getDate() -31
You also need to take into account hours and seconds of course

Resources