Extract data by day from SQL Server - sql-server

I need to get all the values from a SQL Server database by day (24 hours). I have timestamps column in TestAllData table and I want to select the data which only corresponds to a specific day.
For instance, there are timestamps of DateTime type like '2019-03-19 12:26:03.002', '2019-03-19 17:31:09.024' and '2019-04-10 14:45:12.015' so I want to load the data for the day 2019-03-19 and separately for the day 2019-04-10. Basically, it is needed to get DateTime values with the same date.
Is this possible to use some functions like DatePart or DateDiff for that?
And how can I solve such problem overall?
As in this case, I do not know the exact difference in hours between a timestamp and the end of the day (because there are various timestamps for 1 day) and I need to extract the day itself from the timestamp. After that, I need to group the data by days or something like this and get block by block. For example:
'2019-03-19' - 1200 records
'2019-04-10' - 3500 records
'2019-05-12' - 10000 records and so on
I'm looking for a more generic solution not supplying a timestamp (like '2019-03-19') as a boundary or in a where clause because the problem is not about simply filtering the data by some date!!
UPDATE: In my dataset, I have about 1,000,000 records and more than 100 unique dates. I was thinking about extracting the set of unique dates and then kind of run a query in the loop where the data would be filtered by the provided day. It would look in such a way:
select * from TestAllData where dayColumn = '2019-03-19'
select * from TestAllData where dayColumn = '2019-04-10'
select * from TestAllData where dayColumn = '2019-05-12'
...
I might use this query in my code, so I may run it in the loop from Scala function. However, I am not sure that in terms of performance it would be ok to run separate unique dates extraction query.

Depending on whether you want to be able to work with all the dates (rather than just a subset), one of the easiest ways to achieve this is with a cast:
;with cte as (SELECT cast(my_datetime as date) as my_date, * from TestAllData)
SELECT * FROM cte where my_date = '2019-02-14'
Note when casting datetime to date, times are truncated, ie just the date part is extracted.
As I say though, whether this is efficient, depends on your needs, as all datetime values from all records will be cast to date, before the data is filtered. If you want to select several dates (as opposed to just one or two), however, it may prove overall quicker, as it reads the whole table once and then gives you a column upon which you can much more efficiently filter.
If this is a permanent requirement, though, I would probably use a persisted computed column, which effectively would mean that the casting is done once initially and then only again if the corresponding value changed. For a large table I would also strongly consider an index on the computed column.

Related

adding conditions makes query slow-oracle

Dear Stackoverflow Nation,
this post is related to the performance of the query.
I execute a simple query as below
select wonum, siteid from workorder where workorder.siteid= 'MCT' and istask=0
and decode(workorder.pmnum,null,workorder.reportdate,workorder.targstartdate) >= '12-MAR-18' and
decode(workorder.pmnum,null,workorder.reportdate,workorder.targstartdate) <= '14-MAR-18';
It executed perfectly , took 6 sec
as i added one more condition type ='MAINTENANCE' ,query took 28 sec
select wonum, siteid from workorder where workorder.siteid= 'MCT' and istask=0 and type ='MAINTENANCE'
and decode(workorder.pmnum,null,workorder.reportdate,workorder.targstartdate) >= '12-MAR-18' and
decode(workorder.pmnum,null,workorder.reportdate,workorder.targstartdate) <= '14-MAR-18'; --28.73
As I know ,I need to create an index on workorder table ,
but I am unable to figure out on which field ,I need to create an index and how it helps to run query fast.
(Note:there is an index (ind_1 - with attributes wonum,siteid ) already in system
kindly help.Apologize if its a basic question for PRos
Generally speaking create indexes on columns involved in where clause. As you described it, indexing the type column might help.
Will it really help? Who knows ... check explain plan. Collect statistics for the table so that Optimizer knows what to do (i.e. chooses the best execution plan). Then you might be able to figure out what to do.
Moreover, it seems that you're forcing Oracle to perform implicit conversions. Saying that
some_date >= '12-mar-18'
means that - if some_date column's datatype is date (looks like it is; otherwise you'd get wrong result) - Oracle has to convert a string '12-mar-18' into a valid date by applying correct format mask (such as dd-mon-yy). Why would you want to do that? Provide date value yourself!
some_date >= date '2018-03-12'
or
some_date >= to_date('12-mar-18', 'dd-mon-yy')
But beware; mar means "March". This query would certainly fail in my database which speaks Croatian, and we don't have any mar months here (it is ožu). Perhaps you'd rather stick to numerics here, i.e. 12-03-18. One more note: this value is difficult to understand; what is 12? Is it 12th day in the month, or is it December? The same goes for 03. Therefore, always use values that cause no confusion, either by providing date literals (which are always in yyyy-mm-dd format - the one I suggested first), or use to_date function with appropriate format mask.

How to execute a "process/function" in SQL Server

Imagine I have a database table with some columns, n columns and n rows, and one of that columns is a date (YY-MM-DD hh:mm:ss)
So I need to take the actual date, I know there is a function called CURRENT_DATE.
And I want to do some "logic" with the actual date and the date for every row in the database (there is a column in the table with a date, that one), that logic simply to compare the years and months between them and if the difference between one to the other is X months, I will return that row, and if not, I will not return it.
So, simply as return everything in the DB with the condition of that "logic" and which will not accomplish, don't return it.
The problem is, where should I put that logic in a SQL query, I don't think I really can. Can I do what I want with SQL, or it's necessary some type of stuff?
Example Data:
So if I want that the query only return that rows that the difference between the actual Date and which it's column Date, is less than 3 months for example,
it should return Google, Amazon, Twitter, YouTube and Microsoft
Unless I'm missing something obvious here, you've just really, really over-complicated a simple where clause:
SELECT A, B, C -- Please tell me these are not your actual column names!
FROM TableName
WHERE C >= DATEADD(MONTH, -3, GETDATE())
AND C <= DATEADD(MONTH, 3, GETDATE()) -- Assuming future dates are also in the table

How to use wildcard for datetime filed

How do I use wildcards for datetme? SubmitDate field is a datetime but the query that I tried returns something totally different. I want records where submitDate begins with 2019-08
This is the code I've tried:
select *
from INVPol
where SubmitDate like '[2019-08]%'
"How do I use wildcards for datetme" Quite simply, you don't. Use proper date logic. For what you have the best way would be the below
SELECT *
FROM dbo.INCPol
WHERE SubmitDate >= '20190801'
AND SubmitDate < '20190901';
Using a lower boundary with a greater or equal to, and an upper boundary with a less than will mean that every row with a date in August 2019 will be returned. This is generally seen as a the "best" way as it's the most encompassing. Logic using BETWEEN can give incorrect results when using values with a time portion. That's because 2019-08-31T00:00:00.0000001 is not BETWEEN '20190801' and '20190831' (it's 1/1000000 of a second after the end threshold); this would mean you would effective lose a days worth of values. Also the date '2019-09-01T00:00:00.0000000' is BETWEEN '20190801' AND '20190901', so you could get (some) unwanted rows.
Trying to use a wildcard on a date would mean you would have to convert the value of the column to a varchar, which will cause performance issues. Leave the date as a date and time datatype and query it as one.

How to select first record prior/after a given timestamp in KDB?

I am currently just pulling in all records 1min leading up to the timestamp (e.g. if the timestamp I'm interested in is 2014.04.14T09:30):
select from Prices where timestamp within 2014.04.14T09:29 2014.04.14T09:30, stock=`GOOG
However, this is clearly not very robust. Sometimes the previous record may be at 09:25am and then the query returns nothing. Sometimes the query may return hundreds of records if there have been a lot of price changes, even though all I need is the last record returned.
I know this can be done with an asof join, but want to avoid it for the time being as Prices is simply too big at present.
I am also interested in doing the same, but in finding the first record after a given timestamp.
Note also that Prices is a splayed table
Select last record before the given timestamp:
q)select from Price where stock=`GOOG,i=last i,timestamp<2014.04.14T09:30
Select first record after the given timestamp:
q)select from Price where stock=`GOOG,i=first i,timestamp>2014.04.14T09:30
Use asof or aj to get the performance kdb+ is known for. The bigger Prices is, the more reason for doing so.
I would question your logic for avoiding aj. aj and asof use the bin operator which is binary search and hence more performant than scanning the timestamp column.
Let's create your table and run the solution from the other answer:
Prices:([]stock:`g#1000000?`GOOG,9?`4;timestamp:asc 2014.04.14+1000000?0t;price:1000000?100f,size:1000000?100j)
q)\t do[1000;select from Prices where timestamp<2014.04.14T09:30,stock=`GOOG,i=last i]
10205
We can make this a lot better by reordering the constraints:
q)\t do[1000;select from Prices where stock=`GOOG,timestamp<2014.04.14T09:30,i=last i]
2030
But nothing will beat this:
q)\t do[1000;Prices asof `stock`timestamp!(`GOOG;2014.04.14D09:30)]
9
By the way, you are using datetime in your question, which is deprecated, so I've replaced it with timestamp. This has no impact on performance.
Few more things to remember while using aj:
in-memory prices - the table should be `g#sym and time sorted within sym
on-disk prices - `p#sym and time sorted within sym
Also in case of partitioned/splayed tables, using the where constraints (except the date in the date-partitioned table) can severely impact the performance.

strange appengine query result

What am I doing wrong in this query?
SELECT * FROM TreatmentPlanDetails
WHERE
accountId = 'ag5zfmRvbW9kZW50d2ViMnIRCxIIQWNjb3VudHMYtcjdAQw' AND
status = 'done' AND
category = 'chirurgia orale' AND
setDoneCalendarEventStartTimestamp >= [timestamp for 6 june 2012] AND
setDoneCalendarEventStartTimestamp <= [timestamp for 11 june 2012] AND
deleteStatus = 'notDeleted'
ORDER BY setDoneCalendarEventStartTimestamp ASC
I am not getting any record and I am sure there are records meeting the where clause conditions. To get the correct records I have to widen the timestamp interval by 1 millisecond. Is it normal? Furthermore, if I modify this query by removing the category filter, I am getting the correct results. This is definitely weird.
I also asked on google groups, but I got no answer. Anyway, for details:
https://groups.google.com/forum/?fromgroups#!searchin/google-appengine/query/google-appengine/ixPIvmhCS3g/d4OP91yTkrEJ
Let's talk specifically about creating timestamps to go into the query. What code are you using to create the timestamp record? Apparently that's important, because fuzzing with it a little bit affects the query. It may be relevant that in the datastore, timestamps are recorded as integers representing posix timestamps with microseconds, i.e. the number of microseconds since 1/1/1970 UTC (not counting leap seconds). It's also relevant that dates (i.e. without a time) are represented as midnight, i.e. the earliest time on that day. But please show us the exact code. (It may also be important to show the actual content of the record that you're attempting to retrieve.)
An aside that is not specific to your question: Entity property names count as part of your storage quota. If this is going to be a huge dataset, you might pay more $$ than you'd like for property names like setDoneCalendarEventStartTimestamp.
Because you write :
if I modify this query by removing the category filter, I am getting
the correct results
this probably means that the category was not indexed at the time you write the matching records to the data store. You have to re-write your records to the data store if you want them added to the newly created index.

Resources