Index for using IN clause in where condition - sql-server

My application access data from a table in SQL Server. Consider the table name is PurchaseDetail with some other columns.
The select query has below where clauses.
1. name - name has 10000 values only.
2. createdDateTime
The actual query is
select *
from PurchaseDetail
where name in (~2000 name)
and createdDateTime = 'someDateValue';
The SQL tuning advisor gave some recommendation. I tried with those recommended indexes. The performance increased a bit but not completely.
Is there any wrong in my query? or Is there any possible to change/improve my select query?
Because I didn't use IN in where clause before. My table having more than 100 million records.
Any suggestion please?

In this case using IN for that much data is not good at all.
this best way is to use INNER JOIN instead.
It would be nicer if insert those names into a temp table and INNER JOIN it with your SELECT query.

Related

SQL DELETE performance, T-SQL or ISO-compatible query

If I have two very large tables (TableA and TableB), both with an Id column, and I would like to remove all rows from TableA that have their Ids present in TableB. Which would be the fastest? Why?
--ISO-compatible
DELETE FROM TabelA
WHERE Id IN (SELECT Id FROM TableB)
or
-- T-SQL
DELETE A FROM TabelA AS A
INNER JOIN TableB AS B
ON A.Id = B.Id
If there are indexes on each Id, they should perform equally well.
If there are not indexes on each Id, exists() or in () may perform better.
In general I prefer exists() over in () because it allows you to easily add more than one comparison when needed.
delete a
from tableA as a
where exists (
select 1
from tableB as b
where a.Id = b.Id
)
Reference:
in vs inner join - Gail Shaw
exists() vs in - Gail Shaw
As long as your Id in TableB is unique, both queries should create the same execution plan. Just include the execution plan to each queries and verify it.
Take a look at this nice post: in-vs-join-vs-exists
There's an easy way to find out, using the execution plan (press ctrl + L on SSMS).
Since we don't know the data model behind your tables (the eventual indexes etc), we can't know for sure which query will be the fastest.
By experience, I can tell you that, for very large tables (>1mil rows), the delete clause is quite slow, because of all the logging. Depending on the operation you're doing, you will want SQL Server NOT TO log the delete.
You might want to check at this question :
How to delete large data of table in SQL without log?

TSQL Operators IN vs INNER JOIN

Using SQL Server 2014:
Is there any performance difference between the following statements?
DELETE FROM MyTable where PKID IN (SELECT PKID FROM #TmpTableVar)
AND
DELETE FROM MyTable INNER JOIN #TmpTableVar t ON MyTable.PKID = t.PKID
In your given example the execution plans will be the same (most probably).
But having same execution plans doesn't mean that they are the best execution plans you can possibly have for this statement.
The problem I see in both of your queries is the use of the Table Variable.
SQL Server always assumes that there is only 1 row in the table variable. Only in SQL Server 2014 and later version this assumption has been changed to 100 rows.
So no matter how many rows you have this the table variable SQL Server will always assume you have one row in the #TmpTableVar.
You can change your code slightly to give SQL Server a better idea of how many rows there will be in that table by replacing it with a Temporary table and since it is a PK_ID Column in your table variable you can also create an index on that table, to give best chance to sql server to come up with the best possible execution plan for this query.
SELECT PKID INTO #Temp
FROM #TmpTableVar
-- Create some index on the temp table here .....
DELETE FROM MyTable
WHERE EXISTS (SELECT 1
FROM #Temp t
WHERE MyTable.PKID = t.PKID)
Note
In operator will work fine since it is a primary key column in the table variable. but if you ever use IN operator on a nullable column, the results may surprise you, The IN operator goes all pear shape as soon as it finds a NULL values in the column it is checking on.
I personally prefer Exists operator for such queries but inner joins should also work just fine but avoid IN operators if you can.

Does MS SQL Server automatically create temp table if the query contains a lot id's in 'IN CLAUSE'

I have a big query to get multiple rows by id's like
SELECT *
FROM TABLE
WHERE Id in (1001..10000)
This query runs very slow and it ends up with timeout exception.
Temp fix for it is querying with limit, break this query into 10 parts per 1000 id's.
I heard that using temp tables may help in this case but also looks like ms sql server automatically doing it underneath.
What is the best way to handle problems like this?
You could write the query as follows using a temporary table:
CREATE TABLE #ids(Id INT NOT NULL PRIMARY KEY);
INSERT INTO #ids(Id) VALUES (1001),(1002),/*add your individual Ids here*/,(10000);
SELECT
t.*
FROM
[Table] AS t
INNER JOIN #ids AS ids ON
ids.Id=t.Id;
DROP TABLE #ids;
My guess is that it will probably run faster than your original query. Lookup can be done directly using an index (if it exists on the [Table].Id column).
Your original query translates to
SELECT *
FROM [TABLE]
WHERE Id=1000 OR Id=1001 OR /*...*/ OR Id=10000;
This would require evalutation of the expression Id=1000 OR Id=1001 OR /*...*/ OR Id=10000 for every row in [Table] which probably takes longer than with a temporary table. The example with a temporary table takes each Id in #ids and looks for a corresponding Id in [Table] using an index.
This all assumes that there are gaps in the Ids between 1000 and 10000. Otherwise it would be easier to write
SELECT *
FROM [TABLE]
WHERE Id BETWEEN 1001 AND 10000;
This would also require an index on [Table].Id to speed it up.

Proper way to filter a table using values in another table in MS Access?

I have a table of transactions with some transaction IDs and Employee Numbers. I have two other tables which are basically just a column full of transactions or employees that need to be filtered out from the first.
I have been running my query like this:
SELECT * FROM TransactionMaster
Where TransactionMaster.TransID
NOT IN (SELECT TransID from BadTransactions)
AND etc...(repeat for employee numbers)
I have noticed slow performance when running these types of queries. I am wondering if there is a better way to build this query?
If you want all TransactionMaster rows which don't include a TransID match in BadTransactions, use a LEFT JOIN and ask for only those rows where BadTransactions.TransID Is Null (unmatched).
SELECT tm.*
FROM
TransactionMaster AS tm
LEFT JOIN
BadTransactions AS bt
ON tm.TransID = bt.TransID
WHERE bt.TransID Is Null;
That query should be relatively fast with TransID indexed.
If you have Access available, create a new query using the "unmatched query wizard". It will guide you through the steps to create a similar query.

SQL Server Pagination w/o row_number() or nested subqueries?

I have been fighting with this all weekend and am out of ideas. In order to have pages in my search results on my website, I need to return a subset of rows from a SQL Server 2005 Express database (i.e. start at row 20 and give me the next 20 records). In MySQL you would use the "LIMIT" keyword to choose which row to start at and how many rows to return.
In SQL Server I found ROW_NUMBER()/OVER, but when I try to use it it says "Over not supported". I am thinking this is because I am using SQL Server 2005 Express (free version). Can anyone verify if this is true or if there is some other reason an OVER clause would not be supported?
Then I found the old school version similar to:
SELECT TOP X * FROM TABLE WHERE ID NOT IN (SELECT TOP Y ID FROM TABLE ORDER BY ID) ORDER BY ID where X=number per page and Y=which record to start on.
However, my queries are a lot more complex with many outer joins and sometimes ordering by something other than what is in the main table. For example, if someone chooses to order by how many videos a user has posted, the query might need to look like this:
SELECT TOP 50 iUserID, iVideoCount FROM MyTable LEFT OUTER JOIN (SELECT count(iVideoID) AS iVideoCount, iUserID FROM VideoTable GROUP BY iUserID) as TempVidTable ON MyTable.iUserID = TempVidTable.iUserID WHERE iUserID NOT IN (SELECT TOP 100 iUserID, iVideoCount FROM MyTable LEFT OUTER JOIN (SELECT count(iVideoID) AS iVideoCount, iUserID FROM VideoTable GROUP BY iUserID) as TempVidTable ON MyTable.iUserID = TempVidTable.iUserID ORDER BY iVideoCount) ORDER BY iVideoCount
The issue is in the subquery SELECT line: TOP 100 iUserID, iVideoCount
To use the "NOT IN" clause it seems I can only have 1 column in the subquery ("SELECT TOP 100 iUserID FROM ..."). But when I don't include iVideoCount in that subquery SELECT statement then the ORDER BY iVideoCount in the subquery doesn't order correctly so my subquery is ordered differently than my parent query, making this whole thing useless. There are about 5 more tables linked in with outer joins that can play a part in the ordering.
I am at a loss! The two above methods are the only two ways I can find to get SQL Server to return a subset of rows. I am about ready to return the whole result and loop through each record in PHP but only display the ones I want. That is such an inefficient way to things it is really my last resort.
Any ideas on how I can make SQL Server mimic MySQL's LIMIT clause in the above scenario?
Unfortunately, although SQL Server 2005 Row_Number() can be used for paging and with SQL Server 2012 data paging support is enhanced with Order By Offset and Fetch Next, in case you can not use any of these solutions you require to first
create a temp table with identity column.
then insert data into temp table with ORDER BY clause
Use the temp table Identity column value just like the ROW_NUMBER() value
I hope it helps,

Resources