Indexed view with data for the last two weeks - sql-server

I'm trying to create indexed view containing only the data for the last 2 weeks.
This part works fine:
CREATE VIEW [dbo].[MainLogView]
WITH SCHEMABINDING
AS
SELECT Id, Date, System, [Function], StartTime, EndTime, Duration, ResponseIsSuccess, ResponseErrors
FROM dbo.MainLog
WHERE (Date >= DATEADD(day, - 14, GETDATE()))
But when I try add index:
CREATE UNIQUE CLUSTERED INDEX IDX_V1
ON MainLogView (Id);
I'm geting:
Cannot create index on view 'dbo.MainLogView'. The function 'getdate'
yields nondeterministic results. Use a deterministic system function,
or modify the user-defined function to return deterministic results.
I know why, but how to reduce the data in a view for the last 2 weeks?
I need small and fast querable portion of data from my table.

You could (I think, but I have no real world experience with indexed views) create a one record table (an actual table, since a view is not allowed in an indexed view) which you fill with the current date - 14 days. This table you can keep up to date; either manually, with a trigger or some other clever mechanism. You can use that table to join, and in effect use as filter.
Of course, when you query the view, you have to be sure to update your 'currentDate' table first!
You'd get something like:
CREATE VIEW [dbo].[MainLogView]
WITH SCHEMABINDING
AS
SELECT Id, Date, System, [Function], StartTime, EndTime, Duration, ResponseIsSuccess, ResponseErrors
FROM dbo.MainLog ML
INNER JOIN dbo.CurrentDate CD
ON ML.Date >= CD.CurrentDateMin14Days
(Totally untested, might not work... This is basically a hack, I am not at all sure the indexed view will give you any performance increase. You might be better off with a regular view.)

Related

Using indexes when comparing datetimes

I have two tables, both of which containing millions of rows of data.
tbl_one:
purchasedtm DATETIME,
userid INT,
totalcost INT
tbl_two:
id BIGINT,
eventdtm DATETIME,
anothercol INT
The first table has a clustered index on the first two columns: CLUSTERED INDEX tbl_one_idx ON(purchasedtm, userid)
The second one has a primary key on its ID column, and also a non-clustered index on the eventdtm column.
I want to run a query which looks for rows in which purchasedtm and eventdtm are on the same day.
Originally, I wrote my query as:
WHERE CAST(tbl_one.purchasedtm AS DATE) = CAST(tbl_two.eventdtm AS DATE)
But this was not going to use either of the two indexes.
Later, I changed my query to this:
WHERE tbl_one.purchasedtm >= CAST(tbl_two.eventdtm AS DATE)
AND tbl_one.purchasedtm < DATEADD(DAY, 1, CAST(tbl_two.eventdtm AS DATE))
This way, because only one side of the comparison is wrapped in a function, the other side can still use its index. Correct?
I also have some additional questions:
I can write the query the other way around too, i.e. keeping tbl_two.eventdtm untouched and wrapping tbl_one.purchasedtm in CAST(). Would that make a difference in performance?
If the answer to the previous question is yes is it because eventdtm has its own dedicated index, while looking up purcahsedtm would only be a partial index match?
Are there other factors I can take into consideration for deciding which of the two choices is better? (For example, if there are millions of rows in tbl_one but billions of rows in tbl_two, would that impact which column I should CAST and which one I should not?)
In genera, if you compare two columns that are both indexed, would we gain any performance compared to a similar scenario in which only one of them is indexed?
And lastly, can I perform my original task without using CAST?
Note: I do not have the ability to create or modify indexes, add columns, etc.
Little. late after commenting but...
As discussed in the comments, code such as CAST(DateTimeColumn AS date) is actually SARGable. Rob Farley posted an article on some of the SARGable and non-SARGable functionality here, however, I'll cover a few things off anyway.
Firstly, applying a function to a column will normally make your query non-SARGable, and especially if it changes the order of the values or the order of them is meaningless. Take something like:
SELECT *
FROM TABLE
WHERE RIGHT(COLUMN,5) = 'value';
The order of the values in the column are utterly unhelpful here, as we're focusing on the right hand characters. Unfortunately, as Rob also discusses:
SELECT *
FROM TABLE
WHERE LEFT(COLUMN,5) = 'value';
This is also non-SARGable. However what about the following?
SELECT *
FROM TABLE
WHERE Column LIKE 'value%';
This is, as the logic isn't applied to the column and the order doesn't change. If the value wehre '%value%' then that too would be non-SARGable.
When applying logic that adds (or subtracts) what you want to find, you always want to apply that to the literal value (or function, like GETDATE()`). For example one of these expressions is SARGable the other is not:
Column + 1 = #Variable --non-SARGable
Column = #Variable - 1 --SARGable
The same applies to things like DATEADD
#DateVariable BETWEEN DateColumn AND DATEADD(DAY, 30,DateColumn) --non-SARGable
DateColumn BETWEEN DATEADD(DAY, -30, #DateVariable) AND #DateVariable --SARGable
Changing the datatype (other than to a date) rarely will keep a query SARGable. CONVERT(date,varchardate,112) will not be SARGable, even though the order of the column is unchanged. Converting an decimal to an int, however, had the same result as converting a datetime to a date, and kept SARGability:
CREATE TABLE testtab (n decimal(2,1) PRIMARY KEY CLUSTERED);
INSERT INTO testtab
VALUES(0.1),
(0.3),
(1.1),
(1.7),
(2.4);
GO
SELECT n
FROM testtab
WHERE CONVERT(int,n) = 2;
GO
DROP TABLE testtab;
Hopefully, that gives you enough to go on, but pelase do ask if you want me to add anything further.

How to force reasonable execution plan for query with LIKE statement?

When creating ad-hoc queries to look for information in a table I have run into this issue over and over.
Let's say I have a table with a million records with fields id - int, createddatetime - timestamp, category - varchar(50) and content - varchar(max). I want to find all records in the last day that have a certain string in the content field. If I create a query like this...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
it may complete in a second because in the last day there may only be 100 records so the LIKE clause is only operating on a small number of records
However if I add one more item to the where clause...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
and category = 'testing'
then it could take many minutes to complete while locking up the table.
It appears to be changing from performing all the straight forward WHERE clause items first and then the LIKE on the limited set of records, over to having the LIKE clause first. There are even times where there are multiple LIKE statements and adding one more causes the query to go from a split second to minutes.
The only solutions I've found are to either generate an intermediate table (maybe temp tables would work), insert records based on the basic WHERE clause items, then run a separate query to filter by one or more LIKE statements. I've tried various JOIN and CTE approaches which usually have no improvement. Alternatively CHARINDEX also appears to work though difficult to use if trying to convert the logic of multiple LIKE statements.
Is there any hint or something that can be placed in the query statement to tell sql server to wait until records are filtered by the basic WHERE clause items before filtering by the LIKE?
I actually just tried this approach and it had the same issue...
select *
from (
select *, charindex('something', content) as found
from bounce
where createddatetime > '2018-1-31'
) t
where found > 0
while the subquery independently returns in a couple seconds, the overall query just never returns. Why is this so bad
Not fancy, but I've had better luck with temp tables than nested select statements... It will isolate the first data set, and then you can select just from that. If you're looking for quick and dirty, which usually serves my purposes for ad-hoc, this may help. If this is a permanent stored proc, the indexing suggestions may serve you better in the long run.
select *
into #like
from table
where createddatetime > '2018-1-31'
and content like '%something%'
select *
from #like
where category = 'testing'

Select from a view using where clause on indexed column in Oracle Database

We have a Oracle database table called StockDailyQuote which contains billions of rows. There are two indexes: on TradeDate (a Date field) and on StockTicker (a string field). This table is actually a partitioned table and on each TradeDate there is an individual table. So we never query against it using
SELECT * FROM StockDailyQuote
because it is impractical to get any result. Instead we always use a WHERE clause on TradeDate and the the speed is acceptable.
Now we have compiled a view out of StockDailyQuote table (joined some other tables to get useful information). Because we can't specify the TradeDate at this stage so our view looks like this (this is a simplified version):
create or replace view MyStockQuoteView as
SELECT t1.StockTicker, t2.CompanyName, t1.TradeDate, t1.ClosePrice
FROM StockDailyQuote t1 join CompanyInstruction t2 on t1.StockTicker =
t2.StockTicker
And I always try to query against this view with a WHERE clause on TradeDate column. I thought when oracle sees my query SELECT * FROM MyStockQuoteView where TradeDate = '20170726', it would be smart enough to add the indexed where clause to the sql and then send it to the engine so the query should be as equally fast as SELECT * from StockDailyQuote where TradeDate = '20170726', assuming my other joins do not take much time. But it doesn't behave like that: it takes really long time to return values. I can only assume it queries against the whole view and then uses the where on the returned value. This makes this query impractical to use. So how can I solve my problem? One solution is to make a procedure and run it daily and save one day's data to a table. But is there some other options?
It would be nice to see the query plan SELECT * FROM MyStockQuoteView where TradeDate = '20170726'.
Oracle normally understands what it needs to read to the partition inside the view.
You can try to explicitly specify the hint, for example:
SELECT /* + full(MyStockQuoteView.StockDailyQuote) */ * FROM MyStockQuoteView where TradeDate = '20170726'

MAX keyword taking a lot of time to select a value from a column

Well, I have a table which is 40,000,000+ records but when I try to execute a simple query, it takes ~3 min to finish execution. Since I am using the same query in my c# solution, which it needs to execute over 100+ times, the overall performance of the solution is deeply hit.
This is the query that I am using in a proc
DECLARE #Id bigint
SELECT #Id = MAX(ExecutionID) from ExecutionLog where TestID=50881
select #Id
Any help to improve the performance would be great. Thanks.
What indexes do you have on the table? It sounds like you don't have anything even close to useful for this particular query, so I'd suggest trying to do:
CREATE INDEX IX_ExecutionLog_TestID ON ExecutionLog (TestID, ExecutionID)
...at the very least. Your query is filtering by TestID, so this needs to be the primary column in the composite index: if you have no indexes on TestID, then SQL Server will resort to scanning the entire table in order to find rows where TestID = 50881.
It may help to think of indexes on SQL tables in the same way as those you'd find in the back of a big book that are hierarchial and multi-level. If you were looking for something, then you'd manually look under 'T' for TestID then there'd be a sub-heading under TestID for ExecutionID. Without an index entry for TestID, you'd have to read through the entire book looking for TestID, then see if there's a mention of ExecutionID with it. This is effectively what SQL Server has to do.
If you don't have any indexes, then you'll find it useful to review all the queries that hit the table, and ensure that one of those indexes is a clustered index (rather than non-clustered).
Try to re-work everything into something that works in a set based manner.
So, for instance, you could write a select statement like this:
;With OrderedLogs as (
Select ExecutionID,TestID,
ROW_NUMBER() OVER (PARTITION BY TestID ORDER By ExecutionID desc) as rn
from ExecutionLog
)
select * from OrderedLogs where rn = 1 and TestID in (50881, 50882, 50883)
This would then find the maximum ExecutionID for 3 different tests simultaneously.
You might need to store that result in a table variable/temp table, but hopefully, instead, you can continue building up a larger, single, query, that processes all of the results in parallel.
This is the sort of processing that SQL is meant to be good at - don't cripple the system by iterating through the TestIDs in your code.
If you need to pass many test IDs into a stored procedure for this sort of query, look at Table Valued Parameters.

joining latest of various usermetadata tags to user rows

I have a postgres database with a user table (userid, firstname, lastname) and a usermetadata table (userid, code, content, created datetime). I store various information about each user in the usermetadata table by code and keep a full history. so for example, a user (userid 15) has the following metadata:
15, 'QHS', '20', '2008-08-24 13:36:33.465567-04'
15, 'QHE', '8', '2008-08-24 12:07:08.660519-04'
15, 'QHS', '21', '2008-08-24 09:44:44.39354-04'
15, 'QHE', '10', '2008-08-24 08:47:57.672058-04'
I need to fetch a list of all my users and the most recent value of each of various usermetadata codes. I did this programmatically and it was, of course godawful slow. The best I could figure out to do it in SQL was to join sub-selects, which were also slow and I had to do one for each code.
This is actually not that hard to do in PostgreSQL because it has the "DISTINCT ON" clause in its SELECT syntax (DISTINCT ON isn't standard SQL).
SELECT DISTINCT ON (code) code, content, createtime
FROM metatable
WHERE userid = 15
ORDER BY code, createtime DESC;
That will limit the returned results to the first result per unique code, and if you sort the results by the create time descending, you'll get the newest of each.
I suppose you're not willing to modify your schema, so I'm afraid my answe might not be of much help, but here goes...
One possible solution would be to have the time field empty until it was replaced by a newer value, when you insert the 'deprecation date' instead. Another way is to expand the table with an 'active' column, but that would introduce some redundancy.
The classic solution would be to have both 'Valid-From' and 'Valid-To' fields where the 'Valid-To' fields are blank until some other entry becomes valid. This can be handled easily by using triggers or similar. Using constraints to make sure there is only one item of each type that is valid will ensure data integrity.
Common to these is that there is a single way of determining the set of current fields. You'd simply select all entries with the active user and a NULL 'Valid-To' or 'deprecation date' or a true 'active'.
You might be interested in taking a look at the Wikipedia entry on temporal databases and the article A consensus glossary of temporal database concepts.
A subselect is the standard way of doing this sort of thing. You just need a Unique Constraint on UserId, Code, and Date - and then you can run the following:
SELECT *
FROM Table
JOIN (
SELECT UserId, Code, MAX(Date) as LastDate
FROM Table
GROUP BY UserId, Code
) as Latest ON
Table.UserId = Latest.UserId
AND Table.Code = Latest.Code
AND Table.Date = Latest.Date
WHERE
UserId = #userId

Resources