SQL Server - alternative to using NOT EXISTS - sql-server

I have a list of about 200,000 records with EntityID column which I load into a temp table variable.
I want to insert any records from the Temp table variable if EntityID from the Temp table does not exist in the dbo.EntityRows table. The dbo.EntityRows table contains about 800,000 records.
The process is very slow compared to when the dbo.EntityRows table had about 500,000 records.
My first guess is because of the NOT EXISTS clause, each row from the Temp variable must scan the entire 800k rows of the dbo.EntityRows table to determine if it exists or not.
QUESTION: Are there alternative ways to run this comparison check without using the NOT EXISTS, which incurs a hefty cost and will only get worse as dbo.EntityRows continues to grow?
EDIT: Appreciate the comments. Here is the query (I left out the part after the IF NOT EXISTS check. After that, if NOT EXISTS, I insert into 4 tables).
declare #EntityCount int, #Counter int, #ExistsCounter int, #AddedCounter int
declare #LogID int
declare #YdataInsertedEntityID int, #YdataSearchParametersID int
declare #CurrentEntityID int
declare #CurrentName nvarchar(80)
declare #CurrentSearchParametersID int, #CurrentSearchParametersIDAlreadyDone int
declare #Entities table
(
Id int identity,
EntityID int,
NameID nvarchar(80),
SearchParametersID int
)
insert into #Entities
select EntityID, NameID, SearchParametersID from YdataArvixe.dbo.Entity order by entityid;
set #EntityCount = (select count(*) from #Entities);
set #Counter = 1;
set #LogID = null;
set #ExistsCounter = 0;
set #AddedCounter = 0;
set #CurrentSearchParametersIDAlreadyDone = -1;
While (#EntityCount >= #Counter)
begin
set #CurrentEntityID = (select EntityID from #Entities
where id = #Counter)
set #CurrentName = (select nameid from #Entities
where id = #Counter);
set #CurrentSearchParametersID = (select SearchParametersID from #Entities
where id = #Counter)
if not exists (select 1 from ydata.dbo.entity
where NameID = #CurrentName)
begin
-- I insert into 4 tables IF NOT EXISTS = true
end

I am not sure but there are following ways how we can check
(SELECT COUNT(er.EntityID) FROM dbo.EntityRows er WHERE er.EntityID = EntityID) <> 0
(SELECT er.EntityID FROM dbo.EntityRows er WHERE er.EntityID = EntityID) IS NOT NULL
EntityID NOT EXISTS (SELECT er.EntityID FROM dbo.EntityRows er)
EntityID NOT IN (SELECT er.EntityID FROM dbo.EntityRows er)
But as per my belief getting count will give good performance.
Also index will help to improve performance as 'Felix Pamittan' said

As #gotqn said, start by using a temporary table. Create an index on EntityID after the table is filled. If you don't have an index on EntityID in EntityRows, create one.
I do things like this a lot, and I generally use the following pattern:
INSERT INTO EntityRows (
EntityId, ...
)
SELECT T.EntityId, ...
FROM #tempTable T
LEFT JOIN EntityRows E
ON T.EntityID = E.EntityID
WHERE E.EntityID IS NULL
Please comment if you'd like further info.

Well, the answer was pretty basic. #Felix and #TT had the right suggestion. Thanks!
I put a non-clustered index on the NameID field in ydata.dbo.entity.
if not exists (select 1 from ydata.dbo.entity
where NameID = #CurrentName)
So it can now process the NOT EXISTS part quickly using the index instead of scanning the entire dbo.entity table. It is moving fast again.

Related

How to fiind out the missing records (ID) from an indexed [order] table in sql

I have a table [Order] that has records with sequential ID (in odd number only, i.e. 1,3,5,7...989, 991, 993, 995, 997, 999), it is seen that a few records were accidentally deleted and should be inserted back, first thing is to find out what records are missing in the current table, there are hundreds of records in this table
Don't know how to write the query, can anyone kindly help, please?
I am thinking if I have to write a stored procedure or function but would be better if I can avoid them for environment reasons.
Below peuso code is what I am thinking:
set #MaxValue = Max(numberfield)
set #TestValue = 1
open cursor on recordset ordered by numberfield
foreach numberfield
while (numberfield != #testvalue) and (#testvalue < #MaxValue) then
Insert #testvalue into #temp table
set #testvalue = #textvalue + 2
Next
Next
UPDATE:
Expected result:
Order ID = 7 should be picked up as the only missing record.
Update 2:
If I use
WHERE
o.id IS NULL;
It returns nothing:
Since I didn't get a response from you, in the comments, I've altered the script for you to fill in accordingly:
declare #id int
declare #maxid int
set #id = 1
select #maxid = max([Your ID Column Name]) from [Your Table Name]
declare #IDseq table (id int)
while #id < #maxid --whatever you max is
begin
insert into #IDseq values(#id)
set #id = #id + 1
end
select
s.id
from #IDseq s
left join [Your Table Name] t on s.id = t.[Your ID Column Name]
where t.[Your ID Column Name] is null
Where you see [Your ID Column Name], replace everything with your column name and the same goes for [Your Table Name].
I'm sure this will give you the results you seek.
We can try joining to a number table, which contains all the odd numbers which you might expect to appear in your own table.
DECLARE #start int = 1
DECLARE #end int = 1000
WITH cte AS (
SELECT #start num
UNION ALL
SELECT num + 2 FROM cte WHERE num < #end
)
SELECT num
FROM cte t
LEFT JOIN [Order] o
ON t.num = o.numberfield
WHERE
o.numberfield IS NULL;

How do I loop through a table, search with that data, and then return search criteria and result to new table?

I have a set of records that need to be validated (searched) in a SQL table. I will call these ValData and SearchTable respectively. A colleague created a SQL query in which a record from the ValData can be copied and pasted in to a string variable, and then it is searched in the SearchTable. The best result from the SearchTable is returned. This works very well.
I want to automate this process. I loaded the ValData to SQL in a table like so:
RowID INT, FirstName, LastName, DOB, Date1, Date2, TextDescription.
I want to loop through this set of data, by RowID, and then create a result table that is the ValData joined with the best match from the SearchTable. Again, I already have a query that does that portion. I just need the loop portion, and my SQL skills are virtually non-existent.
Suedo code would be:
DECLARE #SearchID INT = 1
DECLARE #MaxSearchID INT = 15000
DECLARE #FName VARCHAR(50) = ''
DECLARE #FName VARCHAR(50) = ''
etc...
WHILE #SearchID <= #MaxSearchID
BEGIN
SET #FNAME = (SELECT [Fname] FROM ValData WHERE [RowID] = #SearchID)
SET #LNAME = (SELECT [Lname] FROM ValData WHERE [RowID] = #SearchID)
etc...
Do colleague's query, and then insert(?) search criteria joined with the result from the SearchTable in to a temporary result table.
END
SELECT * FROM FinalResultTable;
My biggest lack of knowledge comes in how do I create a temporary result table that is ValData's fields + SearchTable's fields, and during the loop iterations how do I add one row at a time to this temporary result table that includes the ValData joined with the result from the SearchTable?
If it helps, I'm using/wanting to join all fields from ValData and all fields from SearchTable.
Wouldn't this be far easier with a query like this..?
SELECT FNAME,
LNAME
FROM ValData
WHERE (FName = #Fname
OR LName = #Lname)
AND RowID <= #MaxSearchID
ORDER BY RowID ASC;
There is literally no reason to use a WHILE other than to destroy performance of the query.
With a bit more trial and error, I was able to answer what I was looking for (which, at its core, was creating a temp table and then inserting rows in to it).
CREATE TABLE #RESULTTABLE(
[feedname] VARCHAR(100),
...
[SCORE] INT,
[Max Score] INT,
[% Score] FLOAT(4),
[RowID] SMALLINT
)
SET #SearchID = 1
SET #MaxSearchID = (SELECT MAX([RowID]) FROM ValidationData
WHILE #SearchID <= #MaxSearchID
BEGIN
SET #FNAME = (SELECT [Fname] FROM ValidationData WHERE [RowID] = #SearchID)
...
--BEST MATCH QUERY HERE
--Select the "top" best match (order not guaranteed) in to the RESULTTABLE.
INSERT INTO #RESULTTABLE
SELECT TOP 1 *, #SearchID AS RowID
--INTO #RESULTTABLE
FROM #TABLE3
WHERE [% Score] IN (SELECT MAX([% Score]) FROM #TABLE3)
--Drop temp tables that were created/used during best match query.
DROP TABLE #TABLE1
DROP TABLE #TABLE2
DROP TABLE #TABLE3
SET #SearchID = #SearchID + 1
END;
--Join the data that was validated (searched) to the results that were found.
SELECT *
FROM ValidationData vd
LEFT JOIN #RESULTTABLE rt ON rt.[RowID] = vd.[RowID]
ORDER BY vd.[RowID]
DROP TABLE #RESULTTABLE
I know this could be approved by doing a join, probably with the "BEST MATCH QUERY" as an inner query. I am just not that skilled yet. This takes a manual process which took hours upon hours and shortens it to just an hour or so.

How to fill in the new created column efficiently?

I Have a table below and I will create a new column say named 'Amount'. The existing column 'Id' is foreign key and link the information in a table say 'Loan'('Id' is the primary key of table 'Loan' ). The simple thing I wanna do is assign each new created Amount cell with the right amount obtain from table 'Loan', mapping with 'Id'. I currently use local variable and self-created table type to find the amount value in 'Loan' case by case. Is there any other more efficient way to execute the same operation? Many thanks.
MyTable was as below:
My code was as followed:
ALTER TABLE MyTable
ADD Amount MONEY
CREATE TYPE ListofID AS TABLE (idx INT IDENTITY(1,1), ID VARCHAR(255))
DECLARE #Table_ID_List ListofID
INSERT #Table_ID_List (
ID )
SELECT Id FROM MyTable
DECLARE #i INT
DECLARE #cnt INT
SELECT #i = min(idx) - 1, #cnt = max(idx) FROM #Table_ID_List
DECLARE #app VARCHAR(255)
WHILE #i<#cnt
BEGIN
SELECT #i = #i + 1
SELECT #app = (SELECT ID FROM #Table_ID_List WHERE idx = #i)
UPDATE MyTable
SET Amount =(SELECT Amount FROM Loan WHERE Id = #app)
WHERE Id = #app
END
Depending on the size of your table, you may want to use the SWITCH statement to transfer the whole table at once to the new schema.
https://sqlperformance.com/2012/08/t-sql-queries/t-sql-tuesday-schema-switch-a-roo
I may have missed your question, but I believe you can do this directly via an Update statement. Something along the lines of:
ALTER TABLE MyTable ADD Amount MONEY
UPDATE
t1
SET
t1.Amount = t2.Amount
FROM
MyTable t1
JOIN Loan t2
ON t1.Id = t2.Id
Wouldn't that do the trick?

SP: handling nulls

I have this Table structure:
Id int not null --PK
Title varchar(50)
ParentId int null --FK to same Table.Id
I'm writing a SP that returns a row's "brothers", here's the code
select * from Table
where Table.ParentId = (select Table.ParentId from Table where Table.id = #Id)
and Table.Id <> #Id
It works perfectly for rows having a parent, but for those who's parent are null (root records), it returns no row. This is working as expected since null = null is always false.
I'm looking for help on how to better design my SP to handle this specific case. I'm not a DBA and my TSQL knowledge is basic.
EDIT: I've updated my SQL query like this:
DECLARE #Id INT
SET #Id = 1
DECLARE #ParentId INT
SET #ParentId = (SELECT Table.ParentId FROM Table WHERE Table.Id = #Id)
SELECT * FROM Table
WHERE (
(#ParentId IS NULL AND (Table.ParentId IS NULL))
OR (Table.ParentId = #ParentId)
)
AND Table.Id <> #Id
It does do the job but if the Id is not in the table, it still returns the row who have no parents. Going to lunch, continue looking at this later.
Thanks in advance,
Fabian
I'm not sure this is the best solution, but you could try to use the COALESCE operator using a "not valid" id for NULL
select * from Table
where COALESCE(Table.ParentId,-1) = (select COALESCE(Table.ParentId,-1) from Table where Table.id = #Id)
and Table.Id <> #Id
Assuming -1 is never used as an ID
It's possible I have not understood your problem description however, in order to return Brothers only when they exist for a given Parent, the following query should suffice:
select Brother.*
from Table Parent
inner join Table Brother on
Parent.id = Brother.ParentID
where Parent.Id= #Id and Brother.Id <> #Id

Table variable in SQL Server

I am using SQL Server 2005. I have heard that we can use a table variable to use instead of LEFT OUTER JOIN.
What I understand is that, we have to put all the values from the left table to the table variable, first. Then we have to UPDATE the table variable with the right table values. Then select from the table variable.
Has anyone come across this kind of approach? Could you please suggest a real time example (with query)?
I have not written any query for this. My question is - if someone has used a similar approach, I would like to know the scenario and how it is handled. I understand that in some cases it may be slower than the LEFT OUTER JOIN.
Please assume that we are dealing with tables that have less than 5000 records.
Thanks
It can be done, but I have no idea why you would ever want to do it.
This realy does seem like it is being done backwards. But if you are trying this for your own learning only, here goes:
DECLARE #MainTable TABLE(
ID INT,
Val FLOAT
)
INSERT INTO #MainTable SELECT 1, 1
INSERT INTO #MainTable SELECT 2, 2
INSERT INTO #MainTable SELECT 3, 3
INSERT INTO #MainTable SELECT 4, 4
DECLARE #LeftTable TABLE(
ID INT,
MainID INT,
Val FLOAT
)
INSERT INTO #LeftTable SELECT 1, 1, 11
INSERT INTO #LeftTable SELECT 3, 3, 33
SELECT *,
mt.Val + ISNULL(lt.Val, 0)
FROM #MainTable mt LEFT JOIN
#LeftTable lt ON mt.ID = lt.MainID
DECLARE #Table TABLE(
ID INT,
Val FLOAT
)
INSERT INTO #Table
SELECT ID,
Val
FROM #MainTable
UPDATE #Table
SET Val = t.Val + lt.Val
FROM #Table t INNER JOIN
#LeftTable lt ON t.ID = lt.ID
SELECT *
FROM #Table
I don't think it's very clear from your question what you want to achieve? (What your tables look like, and what result you want). But you can certainly select data into a variable of a table datatype, and tamper with it. It's quite convenient:
DECLARE #tbl TABLE (id INT IDENTITY(1,1), userId int, foreignId int)
INSERT INTO #tbl (userId)
SELECT id FROM users
WHERE name LIKE 'a%'
UPDATE #tbl t
SET
foreignId = (SELECT id FROM foreignTable f WHERE f.userId = t.userId)
In that example I gave the table variable an identity column of its own, distinct from the one in the source table. I often find that useful. Adjust as you like... Again, it's not very clear what the question is, but I hope this might guide you in the right direction...?
Every scenario is different, and without full details on a specific case it's difficult to say whether it would be a good approach for you.
Having said that, I would not be looking to use the table variable approach unless I had a specific functional reason to - if the query can be fulfilled with a standard SELECT query using an OUTER JOIN, then I'd use that as I'd expect that to be most efficient.
The times where you may want to use a temp table/table variable instead, are more when you want to get an intermediary resultset and then do some processing on it before then returning it out - i.e. the kind of processing that cannot be done with a straight forward query.
Note the table variables are very handy, but take into account that they are not guaranteed to reside in-memory - they can get persisted to tempdb like standard temp tables.
Thank you, astander.
I tried with an example given below. Both of the approaches took 19 seconds. However, I guess some tuning will help the Table varaible update approach to become faster than LEFT JOIN.
AS I am not a master in tuning I request your help. Any SQL expert ready to prove it?
---- PLease replace "" with '' below. I am not familiar with how to put code in this forum... It causes some troubles....
CREATE TABLE #MainTable (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(100)
)
DECLARE #Count INT
SET #Count = 0
DECLARE #Iterator INT
SET #Iterator = 0
WHILE #Count <8000
BEGIN
INSERT INTO #MainTable SELECT #Count, "Cust"+CONVERT(VARCHAR(10),#Count)
SET #Count = #Count+1
END
CREATE TABLE #RightTable
(
OrderID INT PRIMARY KEY,
CustomerID INT,
Product VARCHAR(100)
)
CREATE INDEX [IDX_CustomerID] ON #RightTable (CustomerID)
WHILE #Iterator <400000
BEGIN
IF #Iterator % 2 = 0
BEGIN
INSERT INTO #RightTable SELECT #Iterator,2, "Prod"+CONVERT(VARCHAR(10),#Iterator)
END
ELSE
BEGIN
INSERT INTO #RightTable SELECT #Iterator,1, "Prod"+CONVERT(VARCHAR(10),#Iterator)
END
SET #Iterator = #Iterator+1
END
-- Using LEFT JOIN
SELECT mt.CustomerID,mt.FirstName,COUNT(rt.Product) [CountResult]
FROM #MainTable mt
LEFT JOIN #RightTable rt ON mt.CustomerID = rt.CustomerID
GROUP BY mt.CustomerID,mt.FirstName
---------------------------
-- Using Table variable Update
DECLARE #WorkingTableVariable TABLE
(
CustomerID INT,
FirstName VARCHAR(100),
ProductCount INT
)
INSERT
INTO #WorkingTableVariable (CustomerID,FirstName)
SELECT CustomerID, FirstName FROM #MainTable
UPDATE #WorkingTableVariable
SET ProductCount = [Count]
FROM #WorkingTableVariable wt
INNER JOIN
(SELECT CustomerID,COUNT(rt.Product) AS [Count]
FROM #RightTable rt
GROUP BY CustomerID) IV ON wt.CustomerID = IV.CustomerID
SELECT CustomerID,FirstName, ISNULL(ProductCount,0) [CountResult] FROM #WorkingTableVariable
ORDER BY CustomerID
--------
DROP TABLE #MainTable
DROP TABLE #RightTable
Thanks
Lijo
In my opinion there is one reason to do this:
If you have a complicated query with lots of inner joins and one left join you sometimes get in trouble because this query is hundreds of times less fast than using the same query without the left join.
If you query lots of records with a result of very few records to be joined to the left join you could get faster results if you materialize the intermediate result into a table variable or temp table.
But usually there is no need to really update the data in the table variable - you could query the table variable using the left join to return the result.
... just my two cents.

Resources