tsql - joining two tables from different databases hosted on separate servers - sql-server

I have got 2 databases sitting on different physical servers and linked. I need to join DB1.T1 with DB2.T2 and create an id. The problem is performance. My senior insists using a function and I have created it below.
IF OBJECT_ID (N'dbo.getXXXId', N'FN') IS NOT NULL
DROP FUNCTION dbo.getTRId;
GO
CREATE FUNCTION dbo.getTRId (#gcPRef bigint)
RETURNS varchar (100)
WITH EXECUTE AS CALLER --may not be necessary. not sure.
AS
BEGIN
DECLARE #TRID varchar (100);
SELECT #TRID = CONVERT(varchar (12), hu2.PropId)
+ '_'+ CONVERT(varchar (12), c.WSId)
FROM [172.29.110.133].DB1.dbo.checks c
join [172.29.110.133].DB1.[dbo].VHier
ON VHier.xx= c.xx
join [172.29.110.133].DB1.[dbo].rvc
ON rvc.xx= VHier.xx
AND rvc.yy= VHier.yy
join [172.29.110.133].DB1.[dbo].HUNIT hu
ON c.xx= hu.xx
WHERE c.CheckId = #gcPRef;
RETURN (#TRID);
END;
GO
I use the query below to query each checkid using the function above.
select getTRId(guestCheckPRef), guestCheckid from DB2.Guest_CHECKS GC
where GC.closeBusinessDate = '2014-06-25'
A couple of things you may like to know:
DB1 and DB2 are hosted on different physical servers.
I am not a DBA so please let me know if I am doing anything wrong.
Approximately 45000 records created daily. so this is the amount of rows..
I have already tried joining them without involving a function. it takes forever. in 30 seconds, 450 records returned only. I cannot keep tables locked for a long time.
CONSTRAINT [DB1.PK_CHECK] PRIMARY KEY CLUSTERED
CONSTRAINT [DB2.XPKGUEST_CHECKS] PRIMARY KEY NONCLUSTERED
I do not know if constraints are playing a role here. DB2.GUEST_CHECKS.guestCheckPRef is NOT even a FK here. guestCheckPRef is PK in DB1.CHECK.
performance is very poor. I need to return DB2.propid + DB2.wsid + DB1.guestCheckid.
This is all I can give for now. Any suggestion is appreciated. It does not have to be done with a function.
Thanks in advance. Regards.Oz.

Here are a few things to try or consider:
Have you checked that the query is using the best available indexes? You could try running the query through the query analyser to see if there's any indexes you could add to improve performance.
What version of SQL Server are you running? Depending on the version you might be able to replicate the table from one server to the other to alleviate the cost of running a query across your network.
I notice that several of the joins are across to the other server - could you consolidate all of those joins into a single view that is optimised using indexes - may result in less network traffic.
Try putting your function on the other server and calling it from the first server to see if there's any performance improvement.

Doing a "select" in a function is generally considered "not a good idea". The select in the function will be repeated once for each row in the result set, which is probably why the performance is bad.
Erp. This was supposed to be a comment, not an answer. To turn this into a proper answer, rewrite the query as a join, without using the function. (I.e. take the contents of the function and integrate it into a single join.)
Your example query should look something like this:
;with getTRID as
(SELECT CONVERT(varchar (12), hu2.PropId)
+ '_'+ CONVERT(varchar (12), c.WSId) AS TRID
FROM [172.29.110.133].DB1.dbo.checks c
join [172.29.110.133].DB1.[dbo].VHier
ON VHier.xx= c.xx
join [172.29.110.133].DB1.[dbo].rvc
ON rvc.xx= VHier.xx
AND rvc.yy= VHier.yy
join [172.29.110.133].DB1.[dbo].HUNIT hu
ON c.xx= hu.xx)
select getTRId.TRID, guestCheckid from DB2.Guest_CHECKS GC
inner join getTRID ON CheckId = guestCheckPRef
where GC.closeBusinessDate = '2014-06-25'
N.B. I'm working from memory here so, please, no flames for syntax errors! Thx.
Steve G.

Related

Use of inserted and deleted tables for logging - is my concept sound?

I have a table with a simple identity column primary key. I have written a 'For Update' trigger that, among other things, is supposed to log the changes of certain columns to a log table. Needless to say, this is the first time I've tried this.
Essentially as follows:
Declare Cursor1 Cursor for
select a.*, b.*
from inserted a
inner join deleted b on a.OrderItemId = b.OrderItemId
(where OrderItemId is the actual name of the primary identity key).
I then do the usual open the cursor and go into a fetch next loop. With the columns I want to test, I do:
if Update(Field1)
begin
..... do some logging
end
The columns include varchars, bits, and datetimes. It works, sometimes. The problem is that the log function is writing the a and b values of the field to a log and in some cases, it appears that the before and after values are identical.
I have 2 questions:
Am I using the Update function correctly?
Am I accessing the before and after values correctly?
Is there a better way?
If you are using SQL Server 2016 or higher, I would recommend skipping this trigger entirely and instead using system-versioned temporal tables.
Not only will it eliminate the need for (and performance issues around) the trigger, it'll be easier to query the historical data.

SQL query runs into a timeout on a sparse dataset

For sync purposes, I am trying to get a subset of the existing objects in a table.
The table has two fields, [Group] and Member, which are both stringified Guids.
All rows together may be to large to fit into a datatable; I already encountered an OutOfMemory exception. But I have to check that everything I need right now is in the datatable. So I take the Guids I want to check (they come in chunks of 1000), and query only for the related objects.
So, instead of filling my datatable once with all
SELECT * FROM Group_Membership
I am running the following SQL query against my SQL database to get related objects for one thousand Guids at a time:
SELECT *
FROM Group_Membership
WHERE
[Group] IN (#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, #Guid5, ..., #Guid999)
The table in question now contains a total of 142 entries, and the query already times out (CommandTimeout = 30 seconds). On other tables, which are not as sparsely populated, similar queries don't time out.
Could someone shed some light on the logic of SQL Server and whether/how I could hint it into the right direction?
I already tried to add a nonclustered index on the column Group, but it didn't help.
I'm not sure that WHERE IN will be able to maximally use an index on [Group], or if at all. However, if you had a second table containing the GUID values, and furthermore if that column had an index, then a join might perform very fast.
Create a temporary table for the GUIDs and populate it:
CREATE TABLE #Guids (
Guid varchar(255)
)
INSERT INTO #Guids (Guid)
VALUES
(#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, ...)
CREATE INDEX Idx_Guid ON #Guids (Guid);
Now try rephrasing your current query using a join instead of a WHERE IN (...):
SELECT *
FROM Group_Membership t1
INNER JOIN #Guids t2
ON t1.[Group] = t2.Guid;
As a disclaimer, if this doesn't improve the performance, it could be because your table has low cardinality. In such a case, an index might not be very effective.

How can I speed up this sql server query?

-- Holds last 30 valdates
create table #valdates(
date int
)
insert into #valdates
select distinct top (30) valuation_date
from tbsm.tbl_key_rates_summary
where valuation_date <= 20150529
order by valuation_date desc
select
sum(fv_change), sc_group, valuation_date
from
(select *
from tbsm.tbl_security_scorecards_summary
where valuation_date in (select date from #valdates)) as fact
join
(select *
from tbsm.tbl_security_classification
where sc_book = 'UC' ) as dim on fact.classification_id = dim.classification_id
group by
valuation_date, sc_group
drop table #valdates
This query takes around 40 seconds to return because the fact table has almost 13 million rows.. Can I do anything about this?
Based on the fact that there's no proper index that supports the fetch, that's probably the easiest (or only) option to really improve the performance. Most likely index like this would improve the situation a lot:
create index idx_security_scorecards_summary_1 on
tbl_security_scorecards_summary (valuation_date, classification_id)
include (fv_change)
Everything depends of course on how good the selectivity of the valuation_date and classification_id fields are (=how big portion of the table needs to be fetched) and might work better with the fields in opposite order. The field fv_change is in the include section so that it's included in the index structure so there's no need to fetch it from the base table.
Include fields help if the SQL has to fetch a lot of rows from the table. If the amount of rows that this touches is small, then it might not help at all. Like always in indexing, this of course slows down the inserts / updates, and is optimized for this case only and you should of course look at the bigger picture too.
The select is written in a little bit strange way, not sure if that makes any difference, but you could also try the normal way to do this:
select
sum(fact.c), dim.sc_group, fact.valuation_date
from
tbsm.tbl_security_scorecards_summary fact
join tbsm.tbl_security_classification dim
on fact.classification_id = dim.classification_id
where
fact.valuation_date in (select date from #valdates) and
dim.sc_book = 'UC'
group by
fact.valuation_date,
dim.sc_group
Looking at "statistics io" output should give you a good idea which table is causing the slowness, and looking at query plan to see if there's any strange operators might help to understand the situation better.

Tuning Select statement to obtain faster results

I have benefited from this website for a long time now. This is my first question on the site. It is regarding performance tuning a reporting query. Here it goes.
1.
SELECT Count(b1.primkey)
from tableA b1 --WITH (NOLOCK)
join tableA b2 --WITH (NOLOCK)
on b1.email = b2.email
and DateDiff(day, b2.BookedDate , b1.BookedDate) > 1
tableA has around 7 million rows. Email is a varchar(100) field. Bookeddate is a datetime field. primkey is a primary key column that is an int.
My purpose of writing this query is to find out the count entries that have same email ids but have come in one day late. This query take about 45 minutes to run. I really want to reduce the time it takes to execute.
Since this is for reporting, i tried in vain to use --WITH (NOLOCK) option to improve the read time. I have a column store index on tableA and I know that it is being used by the SQL optimizer - can see in the execution plan. I am using SQL Server 2012.
Can someone tell me in such a case, what would be better? Using a nonclustered index on email or a nonclustered columnstore index on tableA?
Please help me.
Your query is relatively complex. You are essentially joining two tables that have 7 million records each on a column that is not unique.
How about the following query instead:
select Email
from TableA
group by Email
having MAX(BookedDate) > MIN(BookedDate) + 1
Also make sure you have an index with Email and BookedDate.
Hope this helps.
You have 3 options here:
Create clustered index on email field at least for a larger table.
But I suppose there are other queries running on these tables, and
clustered index is needed on other fields
Move emails to another table, and store email id's in TableA and
TableB; join on int field would be much faster than on varchar
fields
Create indexes on email fields with included columns BookedDate (no
need to include primkey, you can count on another field, or count(*). Code: create index idx_email on TableA include(BoodedDate)
I think that third option is the one you should go with. There's not much work to be done, and there will be great performance gain. The only problem is that index on varchar field will take a lot of space and impact insert/update operations; but you said that this is a reporting db, so I think you can allow that.

MAX keyword taking a lot of time to select a value from a column

Well, I have a table which is 40,000,000+ records but when I try to execute a simple query, it takes ~3 min to finish execution. Since I am using the same query in my c# solution, which it needs to execute over 100+ times, the overall performance of the solution is deeply hit.
This is the query that I am using in a proc
DECLARE #Id bigint
SELECT #Id = MAX(ExecutionID) from ExecutionLog where TestID=50881
select #Id
Any help to improve the performance would be great. Thanks.
What indexes do you have on the table? It sounds like you don't have anything even close to useful for this particular query, so I'd suggest trying to do:
CREATE INDEX IX_ExecutionLog_TestID ON ExecutionLog (TestID, ExecutionID)
...at the very least. Your query is filtering by TestID, so this needs to be the primary column in the composite index: if you have no indexes on TestID, then SQL Server will resort to scanning the entire table in order to find rows where TestID = 50881.
It may help to think of indexes on SQL tables in the same way as those you'd find in the back of a big book that are hierarchial and multi-level. If you were looking for something, then you'd manually look under 'T' for TestID then there'd be a sub-heading under TestID for ExecutionID. Without an index entry for TestID, you'd have to read through the entire book looking for TestID, then see if there's a mention of ExecutionID with it. This is effectively what SQL Server has to do.
If you don't have any indexes, then you'll find it useful to review all the queries that hit the table, and ensure that one of those indexes is a clustered index (rather than non-clustered).
Try to re-work everything into something that works in a set based manner.
So, for instance, you could write a select statement like this:
;With OrderedLogs as (
Select ExecutionID,TestID,
ROW_NUMBER() OVER (PARTITION BY TestID ORDER By ExecutionID desc) as rn
from ExecutionLog
)
select * from OrderedLogs where rn = 1 and TestID in (50881, 50882, 50883)
This would then find the maximum ExecutionID for 3 different tests simultaneously.
You might need to store that result in a table variable/temp table, but hopefully, instead, you can continue building up a larger, single, query, that processes all of the results in parallel.
This is the sort of processing that SQL is meant to be good at - don't cripple the system by iterating through the TestIDs in your code.
If you need to pass many test IDs into a stored procedure for this sort of query, look at Table Valued Parameters.

Resources