SQL Performance using MAX on joined tabled - sql-server

I'm using SQL Server 2008R2
I'd like your views on the the two SQL statements below as regards to performance and best practice.
select
*
from BcpSample1
where dataloadid = (select MAX(id) from LoadControl_BcpSample1 where Status = 'completed')
And
select
a.*
from CiaBcpSample1 a
inner join (select ActiveDataLoadId = MAX(id) from LoadControl_BcpSample1 where Status = 'completed') as t
on a.DataLoadId = t.ActiveDataLoadId
I tried the Query plan in SQL Server Studio but after 2 runs both are returning showing the same query plans.
Thanks

In "regards to performance and best practice." it depends on many things. What works well one time might not be the best the next. You have to test, measure the performance and then choose.
You say the plan generated by SQL Server is the same, so in this instance there shouldn't be any difference. Choose the query easiest to maintain and move on to the next problem.

Related

Performance problems after updating statistics SQL Server 2014

I've received a database that was previously on SQL Server 2008R2 but was just put on a SQL Server 2014 instance. There were no maintenance tasks run of any kind run on the database since 2014 (e.g. Rebuilding of indexes, updating statistics, etc.).
Once we ran update statistics as part of our regularly scheduled maintenance that we do on a set schedule, the performance of some queries has taken a massive hit to the point where some select statements will seem to never finish.
The queries have some CASE...WHEN statements in them, but I wouldn't expect there to be such a performance hit. Does anybody have any thoughts on what might cause such issues?
I've tried updating the compatibility level to 120 since it was on 100 when the database first came in but, that didn't make any difference on the performance.
If you have only just moved the database, give the system some time to build up its execution plans and cache. Also, do your index maintenance and then something like this for the stats. Dont use sp_updatestats though as it just uses a sample of data not a full scan.
what results do you get for this:
SELECT
[sch].[name] + '.' + [so].[name] AS [TableName] ,
[ss].[name] AS [Statistic],
[sp].[last_updated] AS [StatsLastUpdated] ,
[sp].[rows] AS [RowsInTable] ,
[sp].[rows_sampled] AS [RowsSampled] ,
[sp].[modification_counter] AS [RowModifications],
Convert (decimal(18,2),(convert(numeric,[sp].[modification_counter]) / convert(numeric,[sp].[rows]) * 100)) as [Percent_changed]
FROM [sys].[stats] [ss]
JOIN [sys].[objects] [so] ON [ss].[object_id] = [so].[object_id]
JOIN [sys].[schemas] [sch] ON [so].[schema_id] = [sch].[schema_id]
OUTER APPLY [sys].[dm_db_stats_properties]([so].[object_id],
[ss].[stats_id]) sp
WHERE [so].[type] = 'U'
AND [sp].[modification_counter] > 0
And [sp].[last_updated] < getdate()-1
ORDER BY [Percent_changed] DESC

Why is a Teradata query faster in MS-Access than SQL Server

I need to join a Teradata table with about 0.5 billion records and a local table with about 10,000 records. I have it working in MS Access and it takes about 15 minutes to run. I would prefer to do it in SQL Server but can't even get a join with 1 record in the local SQL table to work.
Why is MS Access able to do this, albeit slowly, whereas SQL Server chokes? What is MS Access doing differently from SQL Server?
The SQL Server query with a join that fails:
SELECT a.trk, a.wgt
FROM openquery(TERADATA, 'SELECT trk, wgt
FROM SHIPMENT_DB.pkg') a
INNER JOIN (Local_Tbl) b ON a.trk = b.Tracking_Number
A simple SQL Server query without a join that works:
SELECT *
FROM openquery(TERADATA,'SELECT trk, wgt
FROM SHIPMENT_DB.pkg
WHERE trk = ''773423067500''')
Not the answer, but I had a similar issue using OPENDATASOURCE. Performance was terrible, the query took hours to run.
The solution was to ensure all colmns involved in the WHERE clause had mathcing datatypes. In my case the remote column was INT but in the query it was being passed as a varchar: ...'WHERE remote_table.ID = ''4'''...
Once I changed all values to the appropriate datatypes the query took seconds to run.
Look at the Execution Plan in SQL Server. Since it knows very little about the dataset that is going to come back from Teradata, it is making some has assumptions.
Swapping the order of the tables in the join will help. Using an explicit INNER HASH JOIN may help (once you've switched the order).

SQLAlchemy: Multiple databases (on the same server) in a single session?

I'm running MS SQL Server and am trying to perform a JOIN between two tables located in different databases (on the same server). If I connect to the server using pyodbc (without specifying a database), then the following raw SQL works fine.
SELECT * FROM DatabaseA.dbo.tableA tblA
INNER JOIN DatabaseB.dbo.tableB tblB
ON tblA.id = tblB.id
Unfortunately, I just can't seem to get the analog to work using SQLAlchemy. I've seen this topic touched on in a few places:
Is there a way to perform a join across multiple sessions in sqlalchemy?
Cross database join in sqlalchemy
How do I connect to multiple databases on the same SQL Server with sqlalchemy?
How can I use multiple databases in the same request in Cherrypy and SQLAlchemy?
Most recommend to use different engines / sessions, but I crucially need to perform joins between the databases, so I don't think this approach will be helpful. Another typical suggestion is to use the schema parameter, but this does not seem to work for me. For example the following does not work.
engine = create_engine('mssql+pyodbc://...') #Does not specify database
metadataA = MetaData(bind=engine, schema='DatabaseA.dbo', reflect=True)
tableA = Table('tableA', metadataA, autoload=True)
metadataB = MetaData(bind=engine, schema='DatabaseB.dbo', reflect=True)
tableB = Table('tableB', metadataB, autoload=True)
I've also tried varients where schema='DatabaseA' and schema='dbo'. In all cases SQLAlchemy throws a NoSuchTableError for both tables A and B. Any ideas?
If you can create a synonym in one of the databases, you can keep your query local to that single database.
USE DatabaseB;
GO
CREATE SYNONYM dbo.DbA_TblA FOR DatabaseA.dbo.tableA;
GO
Your query then becomes:
SELECT * FROM dbo.DbA_TblA tblA
INNER JOIN dbo.tableB tblB
ON tblA.id = tblB.id
I'm able to run a test just like this here, reflecting from two remote databases, and it works fine.
Using a recent SQLAlchemy (0.8.3 recommended at least)?
turn on "echo='debug'" - what tables is it finding?
after the reflect all, what's present in metadataA.tables metadataB.tables?
is the casing here exactly what's on SQL server ? (e.g. tableA). Using a case sensitive name like that will cause it to be quoted.

Sql Select Query Performance

I have following query which takes almost 1 minute to execute.
public static Func<Entities, string, IQueryable<string>> compiledInvoiceQuery =
CompiledQuery.Compile((Entities ctx, string orderNumb) =>
(from order in ctx.SOP10100
where order.ORIGNUMB == orderNumb
select order.SOPNUMBE).Union(
from order in ctx.SOP30200
where order.ORIGNUMB == orderNumb
select order.SOPNUMBE)
);
It filters on basis of ORIGNUMB which is not my primary key, i can not even put any index on it. Do we have any other way to make it faster? I tested on sql server and found that only query
from order in ctx.SOP10100
where order.ORIGNUMB == orderNumb
select order.SOPNUMBE
or
select SOPNUMBE
from SOP10100
where ORIGNUMB = #orderNumb
is taking more than 55 seconds. Please suggest.
If it's taking 55 seconds on the server, then it's nowto to do with linq.
Why can't you have an index on it, because you need one....
Only other option is to rejig your logic to filter out records (using indexed columns), before you start searching for an ordernumber match.
One of the big problems with LINQ to SQL is that you have very little control over the SQL being generating.
Since you are running a union and not a join, it should be a pretty simple SQL. Something like this:
SELECT *
FROM SOP10100
WHERE ORIGNUMB = 'some number'
UNION
SELECT *
FROM SOP30200
WHERE ORIGNUMB = 'some number'
You can use SQL Server Profiler to see the SQL statements that are being run against the database to see if the SQL is like this or something more complicated. You can then run the SQL generated in SQL Server Management Stuido and turn on Include Client Statistics and Include Actual Execution Plan to see what exactly is causing the performance issue.

Microsoft SQL Server: How to improve the performance of a dumb query?

I have been asked to help with performance issue of a SQL server installation. I am not a SQL Server expert, but I decided to take a look. We are using a closed source application that appears to work OK. However after a SQL Server upgrade from 2000 to 2005, application performance has reportedly suffered considerably. I ran SQL profiler and caught the following query (field names changed to protect the innocent) taking about 30 seconds to run. My first thought was that I should optimize the query. But that is not possible, given that the application is closed source and the vendor is not helpful. So I am left, trying to figure out how to make this query run fast without changing it. It is also not clear to me how this query ran faster on the older SQL server 2000 product. Perhaps there was some sort of performance tuning applied to on that instance that did not carry over or does not work on the new SQL server. DBCC PINTABLE comes to mind.
Anyway, here is the offending query:
select min(row_id) from Table1 where calendar_id = 'Test1'
and exists
(select id from Table1 where calendar_id = 'Test1' and
DATEDIFF(day, '12/30/2010 09:21', start_datetime) = 0
)
and exists
(select id from Table1 where calendar_id = 'Test1' and
DATEDIFF(day, end_datetime, '01/17/2011 09:03') = 0
);
Table1 has about 6200 entries and looks like this. I have tried creating various indices to no effect.
id calendar_id start_datetime end_datetime
int, primary key varchar(10) datetime datetime
1 Test1 2005-01-01... 2005-01-01...
2 Test1 2005-01-02... 2005-01-02...
3 Test1 2005-01-03... 2005-01-03...
...
I would be very grateful if somebody could help resolve this mystery.
Thanks in advance.
The one thing that should help is a covering index on calendar_id:
create index <indexname>
on table (calendar_id, id)
include (start_datetime, end_datetime);
This will satisfy the calendar_id = 'Test1' predicates, the min(row_id) sort and will provide the material to evaluate the non-SARG-able DATEFIFF predicates. If there are no other columns in the table, then this is probably the clustered index you need and the id primary key should be a non-clustered one.
Make sure the indexes made the conversion. Then update statistics.
Check the differences between the execution plan on the old sql server and the new one. http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx
About the other only thing you can do beyond Remus Rusanu's index suggestion, is to upgrade to the Enterprise edition which has a more advanced scan feature (on both SQL Server 2005 and 2008 Enterprise Edition) which allows multiple tasks to share full table scans.
Beyond that, I do not think there is anything you can do if you cannot change the query. The reason is that the query is doing a comparison against a function result in the Where clause. That means it will force SQL Server to do a table scan on Table1 each time it is executed.
Reading Pages (more info about Advanced Scanning)

Resources