SQL Server 2008 index approach when using Year() - sql-server

I have a table 'customertransactions' with a column 'transactiondate' of type DateTime.
I will be querying it with the following:
SELECT SUM(balance) AS totalbal FROM customertransactions WHERE accountcode=?
AND (MONTH(GETDATE())-MONTH(transactiondate)+12*(YEAR(GETDATE())-YEAR(transactiondate)))>= 3
... obviously passing a sanitised parameter for 'accountcode'.
My question is - how do I best create an index to optimise that ?
Thanks.

Primarily, I would consider indexing accountcode. Additionally, if you can rewrite your date clause so that it is sargable, then you may benefit from indexing transactiondate as well.
As always, consider the cardinality of your data, and examine the query plan when adding indexes. There are no hard and fast rules.

Related

Subquery in a join is slowing down performance, looking for a better alternative

Below is a snippet of code similar to what I am using.
DECLARE
#UserParam = NULL --optional paramater
SELECT
rtrim(item) [aKey]
INTO
#aKeyTable
FROM
myDB.dbo.fnSplit(#UserParam,',')
SELECT
/* Lots of columns, not important to the question */
FROM
myDB.dbo.tableB b
JOIN myDB.dbo.tableC c ON c.cKey = b.bKEY
AND (c.columnA IN
(
SELECT
aKey
FROM
#aKeyTable
)
OR #UserParam IS NULL)
My question is this: How do I remove the subquery to improve performance.
Requirements:
#UserParam is optional
#UserParam can have multiple comma separated parameters
#UserParam has to either match columnA in tableC OR be NULL
Using a WHERE clause isn't an option either, it impacts performance too much as well
I am using SQL Server 2014
UPDATE: My entire query is very long and it takes about 15-20 secs on average to run depending on parameters but according to the Execution Plan this subquery is using 89% of the performance. I had it in a WHERE clause previous to this and the performance was comparable and sometimes slower.
Thanks
Hard to know for sure without a query plan to see; that said, perhaps create an index on column aKey?
Did You consider using TVP? They are best for this purpose. Read also Erland Sommarskog for more details.
It is problematic to combine different cases into into one execution plan. When #UserParam is empty the situation is entirely different than when it is not. You should have execution plan for each case. You can induce an IF and make two queries. For more parameters You would end up with dynamic sql as exponential growth of combinations is not manageable otherwise.
Number of rows for table variable should be estimated by optimizer as 1, leading to index seeks. If the selectivity of parameters is good, this should work.

How to speed up SQL like %pattern% searches?

Is there any way to speed up queries like this below ? I am looking for option which would require minimal change to application code.
SELECT *
FROM my_table
WHERE some_column like '%my string%'
ORDER BY some_column
The table which causes most of the slowdown has 2,5 million records and query takes 10 seconds to execute.
Execution plan tells that 99% of the cost is index scan (NonClustered), which is understandable because of LIKE and pattern with "%" on both sides.
If there is "%" just at the end, then index seek is used and query executes in a moment.
So I am looking for something like:
to add some kind of aditional index on the table, probably not
possible ?
a way to put this table and/or index into RAM sa the seek would be
faster
anything else ?
I can use either MS SQL 2012 or 2014, both standard edition.
Bonus question
Is it possible that this very same queries would execute instanteniously on DB2 database ? App was using db2 initially but was migrated over to MS SQL.
There may not be an answer, but there is a reason. When you use a search string with a leading wildcard, such as '%string', you're forcing the optimizer to do a table scan.
You might want to revisit some of the suggestions in this thread.
Good luck!
You can try changing the column collation to some binary form:
in query:
SELECT *
FROM my_table
WHERE some_column COLLATE Latin1_General_BIN like '%my string%'
ORDER BY some_column
or change it in the table design permanently if you can.
Caveat: it's cAsE sEnSiTiVe.
Edit: you can get around case sensitivity by converting both the column and the search string to upper case for example:
SELECT *
FROM my_table
WHERE UPPER(some_column) COLLATE Latin1_General_BIN like '%MY STRING%'
ORDER BY some_column
Edit 2: backup the database before doing any perpanent collation changes, I'm not sure how exactly it compares but I think in query it should be ok.
Explation article.
I'm not sure this solution is an option for you as it stores more data in database. it also may increase the time for update/insert, but its an idea anyway. Too long to put in to comment, so don't blame me!
Add a persisted computed column for the some_column with this formula: REVERSE(some_column) to store reverse of the string
Add index on that column
In your query use some_column like 'my string%' or rev_some_column like REVERSE('%my string'). You'd better to replace REVERSE('%my string') with a variable initiated before query.
I think in this case, both likes will use index.

how in clause in SQL server works

I would like to know how comparisons for IN clause in a DB work. In this case, I am interested in SQL server and Oracle.
I thought of two comparison models - binary search, and hashing. Can someone tell me what method does SQL server follow.
SQL Server's IN clause is basically shorthand for a wordier WHERE clause.
...WHERE column IN (1,2,3,4)
is shorthand for
...WHERE Column = 1
OR Column = 2
OR column = 3
OR column = 4
AFAIK there is no other logic applied that would be different from a standard WHERE clause.
It depends on the query plan the optimizer chooses.
If there is a unique index on the column you're comparing against and you are providing relatively few values in the IN list in comparison to the number of rows in the table, it's likely that the optimizer would choose to probe the index to find out the handful of rows in the table that needed to be examined. If, on the other hand, the IN clause is a query that returns a relatively large number of rows in comparison to the number of rows in the table, it is likely that the optimizer would choose to do some sort of join using one of the many join methods the database engine understands. If the IN list is relatively non-selective (i.e. something like GENDER IN ('Male','Female')), the optimizer may choose to do a simple string comparison for each row as a final processing step.
And, of course, different versions of each database with different statistics may choose different query plans that result in different algorithms to evaluate the same IN list.
IN is the same as EXISTS in SQL Server usually. They will give a similar plan.
Saying that, IN is shorthand for OR..OR as JNK mentioned.
For more than you possibly ever needed to know, see Quassnoi's blog entry
FYI: The OR shorthand leads to another important difference NOT IN is very different to NOT EXISTS/OUTER JOIN: NOT IN fails on NULLs in the list

Sql Server predicates lazy?

I have a query:
SELECT
someFields
FROM
someTable
WHERE
cheapLookup=1
AND (CAST(someField as FLOAT)/otherField)<0.9
So, will the CAST and division be performed in the case that cheapLookup is 0? If not, how can I avoid the calculation in this case?
It depends on the query plan, which is determined by the estimated cost of each considered alternative plan that would produce correct results.
If the predicate 'cheapLookup = 1' can use an index, and it is sufficiently selective, SQL Server would likely choose to seek on that index and apply the second predicate as a residual (that is, only evaluating it on rows that are matched by the seeking operation).
On the other hand, if cheapLookup is not the leading key in an index, or if it is not very selective, SQL Server might choose to scan, applying both predicates to every row encountered.
The second predicate will not be chosen for a seeking operation, unless there happens to be an indexed computed column on the whole expression, and using that index turns out to be the cheapest way to execute the whole query. If a suitable index exists, SQL Server would seek on 'second predicate result < 0.9', and apply 'cheapLookup=1' as a residual. There is also the possibility that the indexed computed column has cheapLookup as its second key, which would result in a pure seek, with no residual.
The other thing about the second predicate is that without a computed column (whether or not indexed), SQL Server will have to guess at the selectivity of the expression. With the computed column, the server might be able to create statistics on the expression-result column, which will help the optimizer. Note that a computed column on 'CAST(someField as FLOAT)/otherField' would have to be persisted before it could be indexed or have statistics created on it, because it contains an imprecise data type.
In summary, it's not the complexity of the expression that counts so much as the estimated cost of the whole plan that uses each of the available access methods considered by the optimizer.
SQL is declarative: you tell the database what you want, not how you want it done. The database is entirely free to evaluate lazily or eagerly. In fact, it can evaluate thrice in reverse order for all I know :)
In rare cases, you can improve performance by reframing your query in such a way that it avoids a specific expensive operation. For example, moving the floating point math to a separate query would force lazy evaluation:
declare #t table (id int, someField float, otherField float)
insert #t select id, someField, otherField from someTable
where cheaplLookup <> 1
delete #t where (CAST(someField as FLOAT)/otherField) >= 0.9
insert #t select id, someField, otherField from someTable
where cheaplLookup = 1
In your example, I would expect SQL Server to choose the best way without any hints or tricks.
What you're referring to is short-circuiting, like other languages (e.g. C#) support.
I believe SQL Server can short-circuit but depends on the scenario / what happens in the optimizer so there is certainly not a guarantee that it will. It just might.
Excellent reference on this by Remus Rusanu here: http://rusanu.com/2009/09/13/on-sql-server-boolean-operator-short-circuit/
It depends on how SQL Server optimizes the query, you could run the Query Analyzer to see for your particular case
A sure fire way to optimize would to say
WITH QueryResult AS (
SELECT
someFields
FROM
someTable
WHERE
cheapLookup=1
)
SELECT * FROM QueryResult WHERE (CAST(someField as FLOAT)/otherField)<0.9

What fields should be indexed on a given table?

I've a table with a lot of registers (more than 2 million). It's a transaction table but I need a report with a lot of joins. Whats the best practice to index that table because it's consuming too much time.
I'm paging the table using the storedprocedure paging method but I need an index because when I want to export the report I need to get the entire query without pagination and to get the total records I need a select all.
Any help?
The SQL Server 2008 Management Studio query tool, if you turn on "Include Actual Execution Plan", will tell you what indexes a given query needs to run fast. (Assuming there's an obvious missing index that is making the query run unusually slow, that is.)
SQL Server 2008 Management Studio Query Screenshot http://img208.imageshack.us/img208/4108/image4sy8.png
We use this all the time on Stack Overflow.. one of the best features of SQL 2008. It works against older SQL instances as well, just install the SQL 2008 tools and point them at a SQL 2005 instance. Not sure if it works on anything earlier, though.
As others have noted, you can also do this manually, but it takes a bit of trial and error. You'll want indexes on fields that are used in ORDER BY and WHERE clauses.
key fields have to be everithing in
the where clause ???
No, that would be overkill. Indexing a field really only works if a) your WHERE clause is selective enough (that is: only selects out about 1-2% of the values; an index on a "Gender" field which can be only one of two or three possible values is pointless), and b) your WHERE clause doesn't involve function calls or other magic.
In your case, TBL.Status might be a candidate - how many possible values are there? You select the '1' and '2' value - if there are hundreds of possible values, then it's a good choice.
On a side note:
this clause here: (TBL.Login IS NULL AND TBL.Login <> 'dev' ) is pretty pointless - if the value of TBL.login IS NULL, then it's DEFINITELY not 'dev' ..... so just the "IS NULL" will be more than sufficient......
The other field you might want to consider putting an index on is the TBL.Date, since you seem to select a range of dates here - that might be a good choice.
Also, on a general note: whenever possible, DO NOT use a SELECT * FROM ...... to select your fields. This causes a lot of overhead for SQL Server. SPECIFY your columns - and ONLY select those that you REALLY NEED - not just all of them for the heck of it.....
Check your queries, and find which fields are used to match them. Those are usually the best candidates!
SQL Server has a 'Database Engine Tuning Advisor' that could help you. This does not exist for SQL Server Express, but does for all other versions of SQL Server.
Load your query in a query window.
On the menu, click Query -> Analyze Query in Database Engine
Tuning Advisor
The tuning advisor will identify indexes that could be added to your table(s) to improve performance. In my experience, the tuning advisor doesn't always help, but most of the time it does. It's where I suggest you start.
ok this is the query in doing
SELECT
TBL.*
FROM
FOREINGDATABASE..TABLENAME TBL
LEFT JOIN Status S
ON TBL.Status = S.Number
WHERE
(TBL.ID = CASE #Reference WHEN 0 THEN TBL.ID ELSE #Reference END) AND
TBL.Date >= #FechaInicial AND
TBL.Date <= #FechaFinal AND
(TBL.Channel = CASE #Canal WHEN '' THEN TBL.Channel ELSE #Canal END)AND
(TBL.DocType = CASE #TipoDocumento WHEN '' THEN TBL.DocType ELSE #TipoDocumento END)AND
(TBL.Document = CASE #NumDocumento WHEN '' THEN TBL.Document ELSE #NumDocumento END)AND
(TBL.Login = CASE #Login WHEN '' THEN TBL.Login ELSE #Login END)AND
(TBL.Login IS NULL AND TBL.Login <> 'dev' ) AND
TBL.Status IN ('1','2')
key fields have to be everithing in the where clause ???
If I am not mistaken, please correct me if I am, I think you should create non-clustered Index on the fields of the conditions of the where clause. (Maybe this can be useful as a starting point to get some candidates for the indexes).
Good Luck
if an Index Scan instead of a seek is performed, the cause might be that the fields are not in the correct order in the index.
put indexes on all columns that you're joining and filtering on.
the use of indexes is also determined by the selectivity of the indexed column.
the best way would be to show us your query so we can try to improve it.

Resources