Improve the query performance - sql-server

Initially there is 2 tables involved
col1 - Int
Col2 -int
select * from table inner table2 on table.col1=table2.col2
-- Fine it give result in short time in a 2 minutes
But after change the col2 to nvarchar(30)
select * from table inner table2 on table.col1=convert(nvarchar(30),table2.col2 )
-- its running more than a hours
Any solution to optimize the query

Joining 2 tables using an nvarchar(30) will be slower than an int column as it is bigger. I would stick with using int if possible.

Related

Performance tuning on join two tables columns with patindex

Sample data:
Note:
The table tbl_test1 is filtered table, may have less records based on filtered earlier.
The following is just the data sample for understanding purpose. The actual table tbl_test2 is having 70 columns and 100 millions of records.
The WHERE condition is dynamic comes with any combination.
The display columns are also dynamic, i mean one or more columns.
create table tbl_test1
(
col1 varchar(100)
);
insert into tbl_test1 values('John Mak'),('Omont Boy'),('Will Smith'),('Mak John');
create table tbl_test2
(
col1 varchar(100)
);
insert into tbl_test2 values('John Mak'),('Smith Will'),('Jack Don');
query 1: The following query is take more than 10 min and still running for 100 millions records.
select t2.col1
from tbl_test2 t2
inner join tbl_test1 t2 on patindex('%'+t1.col1+'%',t2.col1) > 0
query 2: This also keeps running unable to get the result after 10 min of wait.
select t2.col1
from tbl_test2 t2
where exists
(
select * from tbl_test1 t1 where charindex(t1.col1,t2.col1) > 0
)
expected result:
col1
----------
John Mak
Smith Will

Make use of index when JOIN'ing against multiple columns

Simplified, I have two tables, contacts and donotcall
CREATE TABLE contacts
(
id int PRIMARY KEY,
phone1 varchar(20) NULL,
phone2 varchar(20) NULL,
phone3 varchar(20) NULL,
phone4 varchar(20) NULL
);
CREATE TABLE donotcall
(
list_id int NOT NULL,
phone varchar(20) NOT NULL
);
CREATE NONCLUSTERED INDEX IX_donotcall_list_phone ON donotcall
(
list_id ASC,
phone ASC
);
I would like to see what contacts matches the phone number in a specific list of DoNotCall phone.
For faster lookup, I have indexed donotcall on list_id and phone.
When I make the following JOIN it takes a long time (eg. 9 seconds):
SELECT DISTINCT c.id
FROM contacts c
JOIN donotcall d
ON d.list_id = 1
AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)
Execution plan on Pastebin
While if I LEFT JOIN on each phone field seperately it runs a lot faster (eg. 1.5 seconds):
SELECT c.id
FROM contacts c
LEFT JOIN donotcall d1
ON d1.list_id = 1
AND d1.phone = c.phone1
LEFT JOIN donotcall d2
ON d2.list_id = 1
AND d2.phone = c.phone2
LEFT JOIN donotcall d3
ON d3.list_id = 1
AND d3.phone = c.phone3
LEFT JOIN donotcall d4
ON d4.list_id = 1
AND d4.phone = c.phone4
WHERE
d1.phone IS NOT NULL
OR d2.phone IS NOT NULL
OR d3.phone IS NOT NULL
OR d4.phone IS NOT NULL
Execution plan on Pastebin
My assumption is that the first snippet runs slowly because it doesn't utilize the index on donotcall.
So, how to do a join towards multiple columns and still have it use the index?
SQL Server might think resolving IN (c.phone1, c.phone2, c.phone3, c.phone4) using an index is too expensive.
You can test if the index would be faster with a hint:
SELECT c.*
FROM contacts c
JOIN donotcall d with (index(IX_donotcall_list_phone))
ON d.list_id = 1
AND d.phone IN (c.phone1, c.phone2, c.phone3, c.phone4)
From the query plans you posted, it shows the first plan is estimated to produce 40k rows, but it just returns 21 rows. The second plan estimates 1 row (and of course returns 21 too.)
Are your statistics up to date? Out-of-date statistics can explain the query analyzer making bad choices. Statistics should be updated automatically or in a weekly job. Check the age of your statistics with:
select object_name(ind.object_id) as TableName
, ind.name as IndexName
, stats_date(ind.object_id, ind.index_id) as StatisticsDate
from sys.indexes ind
order by
stats_date(ind.object_id, ind.index_id) desc
You can update them manually with:
EXEC sp_updatestats;
With this poor database structure, a UNION ALL query might be fastest.

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.
I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.
Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.

How to get the last parent from table for the following scenario in SQL Server 2000?

I have the following scenario of a tree in SQL Server 2000.
There is database with two tables, figuratively speaking
Table1 (Row_Id int, Id char(9), etc.)
AND
Table2 (Row_Id int, Parent_Id char(9), Parent_Parent_Id char(9), etc).
Parent_Id in Table2 refers to Id in Table1.
Parent_Parent_Id in Table2 refers to Id in Table1 too (thus child can have more than one parent).
For example, let's consider the tables with some data:
Table1
Row_Id Id
1 a
2 b
3 c
4 d
5 e
6 ...
Table2
Row_Id Parent_Id Parent_Parent_Id
1 a b
2 b с
3 c d
4 d e
5 ... ...
This scenario with data shows that element with Id 'a' from Table1 has no more parents and that the last parent of the element with Id 'a' is 'e'.
The other words I would like to write stored procedure with input parameter inId (it is any Id from Table1) and as result I would like to get the last parent who has no more parents.
Now I do this via loop with
SELECT ...
FROM Table1
LEFT JOIN Table2 ON Table1.Id = Table2.Parent_Id
WHERE Table1.Id = inId
until I get NULL in the right.
How do you think is there any better way to do it?
Thank you.
In SQL Server versions after 2000 there are several ways to to this efficiently. In SQL 2000 however, you are stuc resolving this manually. There are two options:
a recursive function or procedure
a loop
If the loop is written correctly, it is most likely going to be the faster of the two. Just make sure you have appropriate indexes on the tables.
You also don't need to join to table1. just do something like SELECT #newParent = Parent_Parent_Id FORM dbo.Table2 WHERE Parent_Id = #currentParent; in the loop and stop if no new parent is found.
I have written the loop, that have decided my problem. Thank you.
USE MyDatabase
GO
DECLARE #currentParent char(9),
#newParent char(9)
SELECT #currentParent = 'a',
#newParent = #currentParent
WHILE (1 = 1)
BEGIN
SELECT #newParent = Parent_Parent_Id
FROM Table2
WHERE Parent_Id = #currentParent;
IF (#newParent = #currentParent)
BEGIN
PRINT #newParent;
BREAK;
END;
SELECT #currentParent = #newParent;
END

set difference in SQL query

I'm trying to select records with a statement
SELECT *
FROM A
WHERE
LEFT(B, 5) IN
(SELECT * FROM
(SELECT LEFT(A.B,5), COUNT(DISTINCT A.C) c_count
FROM A
GROUP BY LEFT(B,5)
) p1
WHERE p1.c_count = 1
)
AND C IN
(SELECT * FROM
(SELECT A.C , COUNT(DISTINCT LEFT(A.B,5)) b_count
FROM A
GROUP BY C
) p2
WHERE p2.b_count = 1)
which takes a long time to run ~15 sec.
Is there a better way of writing this SQL?
If you would like to represent Set Difference (A-B) in SQL, here is solution for you.
Let's say you have two tables A and B, and you want to retrieve all records that exist only in A but not in B, where A and B have a relationship via an attribute named ID.
An efficient query for this is:
# (A-B)
SELECT DISTINCT A.* FROM (A LEFT OUTER JOIN B on A.ID=B.ID) WHERE B.ID IS NULL
-from Jayaram Timsina's blog.
You don't need to return data from the nested subqueries. I'm not sure this will make a difference withiut indexing but it's easier to read.
And EXISTS/JOIN is probably nicer IMHO then using IN
SELECT *
FROM
A
JOIN
(SELECT LEFT(B,5) AS b1
FROM A
GROUP BY LEFT(B,5)
HAVING COUNT(DISTINCT C) = 1
) t1 On LEFT(A.B, 5) = t1.b1
JOIN
(SELECT C AS C1
FROM A
GROUP BY C
HAVING COUNT(DISTINCT LEFT(B,5)) = 1
) t2 ON A.C = t2.c1
But you'll need a computed column as marc_s said at least
And 2 indexes: one on (computed, C) and another on (C, computed)
Well, not sure what you're really trying to do here - but obviously, that LEFT(B, 5) expression keeps popping up. Since you're using a function, you're giving up any chance to use an index.
What you could do in your SQL Server table is to create a computed, persisted column for that expression, and then put an index on that:
ALTER TABLE A
ADD LeftB5 AS LEFT(B, 5) PERSISTED
CREATE NONCLUSTERED INDEX IX_LeftB5 ON dbo.A(LeftB5)
Now use the new computed column LeftB5 instead of LEFT(B, 5) anywhere in your query - that should help to speed up certain lookups and GROUP BY operations.
Also - you have a GROUP BY C in there - is that column C indexed?
If you are looking for just set difference between table1 and table2,
the below query is simple that gives the rows that are in table1, but not in table2, such that both tables are instances of the same schema with column names as
columnone, columntwo, ...
with
col1 as (
select columnone from table2
),
col2 as (
select columntwo from table2
)
...
select * from table1
where (
columnone not in col1
and columntwo not in col2
...
);

Resources