I need to join a Teradata table with about 0.5 billion records and a local table with about 10,000 records. I have it working in MS Access and it takes about 15 minutes to run. I would prefer to do it in SQL Server but can't even get a join with 1 record in the local SQL table to work.
Why is MS Access able to do this, albeit slowly, whereas SQL Server chokes? What is MS Access doing differently from SQL Server?
The SQL Server query with a join that fails:
SELECT a.trk, a.wgt
FROM openquery(TERADATA, 'SELECT trk, wgt
FROM SHIPMENT_DB.pkg') a
INNER JOIN (Local_Tbl) b ON a.trk = b.Tracking_Number
A simple SQL Server query without a join that works:
SELECT *
FROM openquery(TERADATA,'SELECT trk, wgt
FROM SHIPMENT_DB.pkg
WHERE trk = ''773423067500''')
Not the answer, but I had a similar issue using OPENDATASOURCE. Performance was terrible, the query took hours to run.
The solution was to ensure all colmns involved in the WHERE clause had mathcing datatypes. In my case the remote column was INT but in the query it was being passed as a varchar: ...'WHERE remote_table.ID = ''4'''...
Once I changed all values to the appropriate datatypes the query took seconds to run.
Look at the Execution Plan in SQL Server. Since it knows very little about the dataset that is going to come back from Teradata, it is making some has assumptions.
Swapping the order of the tables in the join will help. Using an explicit INNER HASH JOIN may help (once you've switched the order).
Related
I created a SQL linked server to connect to PostgreSQL 11, and issue a query like this
SELECT TOP 1 *
FROM [LinkedServer_PostgreSQL].[DBname].[dbo].[TableName] a;
It is very slow and seems to take forever, I ended up killing the query every time. Upon investigation, I found the query on the PostgreSQL server is selecting all rows, the TOP N row clause on the PostgreSQL is not converted. That's why the query cannot finish, because the table has more than 10 million rows.
As I cannot replace TOP clause using LIMIT clause on SQL server. This really beats me. Help, anyone?
I have a use case for OPENDATASROUCE. However, my SQL query has multiple tables with left joins.
Most of the examples have one table only. How I connect in case I have 2 tables (2nd table has left join)
Below is a typical example and working great:
SELECT *
FROM OPENDATASOURCE('SQLNCLI', 'Data Source=RemoteServerName;Integrated Security=SSPI').Billing.dbo.Invoices
But I need to join invoices table with 'customer' table like below. I am not sure how I do that?? Please help
SELECT *
FROM OPENDATASOURCE('SQLNCLI', 'Data Source=RemoteServerName;Integrated Security=SSPI').Billing.dbo.Invoices as inv
left join Billing.dbo.customers as cust
on inv.customer = cust.customer
OPENDATASOURCE is one way to talk to a remote server using the "linked server" or "distributed query" functionality in SQL Server. However, it is not likely the best path for you to use in this case as it does not allow for the SQL Server Query Optimizer to rewrite the query and push parts of the query down to the remote source (potentially reducing the number of rows returned to you over a slower network connection vs. your local database). If possible, creating an actual linked server would help you here. This would give you the option to say to the optimizer "these two tables are from the same remote source". Then the optimizer can consider plans that remotes a single query to the remote server that joins those two tables together, applies any filters and group by clauses, and then returns the result to the calling server.
Here's the mechanism to add a linked server:
https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-addlinkedserver-transact-sql?view=sql-server-2017
Once you have a remote server (which I'll call "remote" here), you can write the query using the 4-part name syntax for remote servers instead of using OPENDATASOURCE.
SELECT * FROM REMOTE.Billing.DBO.Invoices LEFT JOIN REMOTE.Billing.DBO.Invoices on <join condition> <WHERE clause>
Here is a paper on how linked servers work under the covers which should give you a conceptual overview as to why this approach is likely better for you:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.8007
Best of luck!
I've received a database that was previously on SQL Server 2008R2 but was just put on a SQL Server 2014 instance. There were no maintenance tasks run of any kind run on the database since 2014 (e.g. Rebuilding of indexes, updating statistics, etc.).
Once we ran update statistics as part of our regularly scheduled maintenance that we do on a set schedule, the performance of some queries has taken a massive hit to the point where some select statements will seem to never finish.
The queries have some CASE...WHEN statements in them, but I wouldn't expect there to be such a performance hit. Does anybody have any thoughts on what might cause such issues?
I've tried updating the compatibility level to 120 since it was on 100 when the database first came in but, that didn't make any difference on the performance.
If you have only just moved the database, give the system some time to build up its execution plans and cache. Also, do your index maintenance and then something like this for the stats. Dont use sp_updatestats though as it just uses a sample of data not a full scan.
what results do you get for this:
SELECT
[sch].[name] + '.' + [so].[name] AS [TableName] ,
[ss].[name] AS [Statistic],
[sp].[last_updated] AS [StatsLastUpdated] ,
[sp].[rows] AS [RowsInTable] ,
[sp].[rows_sampled] AS [RowsSampled] ,
[sp].[modification_counter] AS [RowModifications],
Convert (decimal(18,2),(convert(numeric,[sp].[modification_counter]) / convert(numeric,[sp].[rows]) * 100)) as [Percent_changed]
FROM [sys].[stats] [ss]
JOIN [sys].[objects] [so] ON [ss].[object_id] = [so].[object_id]
JOIN [sys].[schemas] [sch] ON [so].[schema_id] = [sch].[schema_id]
OUTER APPLY [sys].[dm_db_stats_properties]([so].[object_id],
[ss].[stats_id]) sp
WHERE [so].[type] = 'U'
AND [sp].[modification_counter] > 0
And [sp].[last_updated] < getdate()-1
ORDER BY [Percent_changed] DESC
I'm using SQL Server 2008R2
I'd like your views on the the two SQL statements below as regards to performance and best practice.
select
*
from BcpSample1
where dataloadid = (select MAX(id) from LoadControl_BcpSample1 where Status = 'completed')
And
select
a.*
from CiaBcpSample1 a
inner join (select ActiveDataLoadId = MAX(id) from LoadControl_BcpSample1 where Status = 'completed') as t
on a.DataLoadId = t.ActiveDataLoadId
I tried the Query plan in SQL Server Studio but after 2 runs both are returning showing the same query plans.
Thanks
In "regards to performance and best practice." it depends on many things. What works well one time might not be the best the next. You have to test, measure the performance and then choose.
You say the plan generated by SQL Server is the same, so in this instance there shouldn't be any difference. Choose the query easiest to maintain and move on to the next problem.
I use Oracle Database Link to query data from SQL Server. The query is like:
select *
from tableA#DL_SqlServer a
join tableB#DL_SqlServer b
on a.ID = b.ID
tableA and tableB is large and the result is relatively small. This query executes quickly in SQL Server since indexes are built both on the two tables. But it is very slow on Oracle Database Link to SQL Server. I guess the join operation is performed on Oracle side not on SQL Server side, thus the indexes are not used. Since I just need the joined result, I prefer to perform the query entirely on SQL Server and get the small result only. I konw that using SQL Server's linked server and OPENQUERY function can achieve this goal. I wonder how to do this on Oracle Database Link. Thanks! Btw, I have no privilege to create views on SQL Sevrer.
You most likely need to use the DBMS_HS_PASSTHROUGH package. Something like
DECLARE
l_cursor PLS_INTEGER;
BEGIN
l_cursor := dbms_hs_passthrough.open_cursor#dblink_to_sql_server;
dbms_hs_passthrough.parse#dblink_to_sql_server( l_cursor, <<select statement>> );
while dbms_hs_passthrough.fetch_row#link_to_sql_server(l_cursor) > 0
loop
dbms_hs_passthrough.get_value#dblink_to_sqlserver( l_cursor, 1, <<local variable for first column>> );
dbms_hs_passthrough.get_value#dblink_to_sqlserver( l_cursor, 2, <<local variable for second column>> );
...
end loop;
dbms_hs_passthrough.close_cursor#dblink_to_sqlserver( l_cursor );
END;