SQL Query select from multiple large tables, Execution Time too long - sql-server

I have written a SQL Query that does what I need (I think) however it takes way too long to execute, as the query is searching through every record and Column and there are A LOT
I have tried using joins however my understanding of joins is limited and I did not manage
I tried to add a nested select statement in the where clause however didnt seem to help or Ive done it incorrectly
SELECT DISTINCT
OCRD.[E_Mail] ,
OCPR.[E_MailL],
OCPR.[Name],
OWHS.[WhsCode]
FROM OCRD, OCPR, OWHS
WHERE OWHS.WhsCode = 'zzdb';
If Possible, I would like to check that the values in OCRD.[E_Mail] as well as OCPR.[E_MailL]are not duplicated but included from both tables.
I want the query to simply return Names and Emails, where WhsCode = 'zzdb'
and not take an hour+ to execute
Thank you. Any help appreciated

Related

SQL query returns duplicate value after joining tables

I need help using SUM and GROUP BY in SQL Server.
I am generating the query based on 5 tables. I have tried in SQL Server.
Some parts of the query are working, but when I advance the query, I get the wrong results/data.
The problem is that the data is processed twice instead of once on every group by field, e.g. farmer_ID, where the farmer bears or has two or more records. 
This happens when i add more tables to the join - on one or two tables, the sum values are okay.
Hence I get farmer_sales = 200 instead of 100.
Kindly let me know how I can get some help
Thanks
David   
You can use the outer join (Left or Right) and choose the table that have only one record for each item
Another solution you can use Keyword Distinct before the column name
Nobody here will be able to help without table definitions, the requirements & the query.
With these issues I find it helpful to write the query so you get the proper rows you need. It sounds like you have either dodgy data or incomplete join conditions but it's not possible to tell that without the above. You can debug your data by finding a problem (e.g. farmer_sales) and working through the raw data & query from there. You will have an incomplete PK/FK relationship in your query or a missing constraint allowing bad data. Or you have misunderstood the requirement or the requirement does not makes sense to the data model.
Once you have the query working correctly you can add the aggregations.
One bit of general advice I can give is that adding DISTINCT is almost always the wrong approach.

Using an IN clause with table variable causes my query to run MUCH slower

I am using SSRS report whereby I need to pass multiple parameters to some SQL code.
Based on this blog post, the best way to handle multiple parameters is to used a split function, so that is the road I am following.
However, I am having some bad performance after following this.
For example, the following WHERE clause will return the data in 4 seconds:
AND DimBusinessDivision.Id IN (
22
)
This will also correctly return in 4 seconds:
DECLARE #BusinessDivisionId INT = 22
AND DimBusinessDivision.Id IN (
#BusinessDivisionId
)
However, using the split function such as below, It takes 2 minutes (which is the same time it takes without a WHERE clause:
AND DimBusinessDivision.Id IN (
SELECT Item FROM dbo.FuncSplit(#BusinessDivisionId, ',')
)
I've also tried creating a temp table and a table variable before the SQL statement with the results of the table but there's no difference. I have a feeling this has to do with the fact that the values are not literal values and that SQL server doesn't know what query plan to follow, or something similar. Does anyone know of any ways to increase the performance of this?
It simply doesn't like using a table to get the values in even if the table has the same amounts of rows.
UPDATE: I have used the table function as an inner join which has fixed the issue. Any idea's why this made all the difference?
INNER JOIN
dbo.FuncSplit(#BusinessDivisionIds, ',') AS FilteredBusinessDivisions ON
FilteredBusinessDivisions.Item = DimBusinessDivision.Id
A few things to play with:
Try the non-performant query and add OPTION (RECOMPILE); at the end of the query. If it magically runs much faster, then yes the issue was a bad cached query plan. For more information on this specific problem, you can Google "parameter sniffing" for a more thourough explanation.
You may also want to look at the function definition and toss a RECOMPILE in there too, and see what difference that makes.
Look at the estimated query plan and try to determine the difference.
But the root of the problem, I think, is that you are reinventing the wheel with this "split" function. You can have multi-valued parameters in SSRS and use "WHERE col IN #param": https://technet.microsoft.com/en-us/library/aa337396(v=sql.105).aspx
Unless there's a very specific reason you must split a comma separated list and cannot use normal parameters, just use a regular parameter that accepts multiple values.
Edit: I looked at the article you linked to. It's quite easy to have a SELECT ALL option in any reporting tool (not just SSRS), though it's not obvious. Using the "magic value" as written in the article you linked to works just fine. Can I ask what limitation is prompting you to need to do this string splitting?

Sql query gets too slow

Few days ago I wrote one query and it gets executes quickly but now a days it takes 1 hrs.
This query run on my SQL7 server and it takes about 10 seconds.
This query exists on another SQL7 server and until last week it took about
10 seconds.
The configuration of both servers are same. Only the hardware is different.
Now, on the second server this query takes about 30 minutes to extract the s
ame details, but anybody has changed any details.
If I execute this query without Where, it'll show me the details in 7
seconds.
This query still takes about same time if Where is problem
Without seeing the query and probably the data I can't do a lot other than offer tips.
Can you put more constraints on the query. If you can reduce the amount of data involved then this will speed up the query.
Look at the columns used in your joins, where and having clauses and order by. Check that the tables that the columns belong to contain indices for these columns.
Do you need to use the user defined function or can it be done another way?
Are you using subquerys? If so can these be pulled out into separate views?
Hope this helps.
Without knowing how much data is going into your tables, and not knowing your schema, it's hard to give a definitive answer but things to look at:
Try running UPDATE STATS or DBCC REINDEX.
Do you have any indexes on the tables? If not, try adding indexes to columns used in WHERE clauses and JOIN predicates.
Avoid cross table OR clauses (i.e, where you do WHERE table1.col1 = #somevalue OR table2.col2 = #someothervalue). SQL can't use indexes effectively with this construct and you may get better performance by splitting the query into two and UNION'ing the results.
What do your functions (UDFs) do and how are you using them? It's worth noting that dropping them in the columns part of a query gets expensive as the function is executed per row returned: thus if a function does a select against the database, then you end up running n + 1 queries against the database (where n = number of rows returned in the main select). Try and engineer the function out if possible.
Make sure your JOINs are correct -- where you're using a LEFT JOIN, revisit the logic and see if it needs to be a LEFT or whether it can be turned into an INNER JOIN. Sometimes people use LEFT JOINs, but when you examine the logic in the rest of the query, it can sometimes be apparent that the LEFT JOIN gives you nothing (because, for example, someone may had added a WHERE col IS NOT NULL predicate against the joined table). INNER JOINs can be faster, so it's worth reviewing all of these.
It would be a lot easier to suggest things if we could see the query.

SQL Server query taking up 100% CPU and runs for hours

I have a query that has been running every day for a little over 2 years now and has typically taken less than 30 seconds to complete. All of a sudden, yesterday, the query started taking 3+ hours to complete and was using 100% CPU the entire time.
The SQL is:
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER JOIN vwSomeData alpha ON beta.id = alpha.id
alpha.id is a BIGINT type and beta.id is an INT type. dbo.fnGetStuff() is a simple SELECT statement with 2 INNER JOINs on tables in the same DB, using a WHERE id = #id. The function returns approximately 11000 results.
The view vwSomeData is a simple SELECT statement with two INNER JOINs that returns about 590000 results.
Both the view and the function will complete in less than 10 seconds when executed by themselves. Selecting the results of the function into a temporary table first and then joining on that makes the query finish in < 10 seconds.
How do I troubleshoot what's going on? I don't see any locks in the activity manager.
Look at the query plan. My guess is that there is a table scan or more in the execution plan. This will cause huge amounts of I/O for the few record you get in the result.
You could use the SQL Server Profiler tool to monitor what queries are running on SQL Server. It doesn't show the locks, but it can for instance also give you hints on how to improve your query by suggesting indexes.
If you've got a reasonably recent version of SQL Server Management Studio, it has a Database Tuning Adviser as well, under Tools. It takes a trace from the Profiler and makes some, sometimes highly useful, suggestions. Makes sure there's not too many queries - it takes a long time to build advice.
I'm not an expert on it, but have had some luck with it in the past.
Do you need to use a function? Can you re-write the entire thing into a stored procedure in which you pass in the #ID as a parameter.
Even if your table has indexes because you pass the #ID as a variable to the WHERE clause potentially greatly increasing the amount of time for the query to run.
The reason the indexes may not be used is because the Query Analyzer does not know the value of the variables when it selects an access method to perform the query. Because this is a batch file, only one pass is made of the Transact-SQL code, preventing the Query Optimizer from knowing what it needs to know in order to select an access method that uses the indexes.
You might want to consider an INDEX query hint if you cannot re-write the SQL.
it might also be possible, since this just started happening, that the INDEXes have become fragmented and might need to be rebuilt.
I've had similar problems with joining functions that return large datasets. I had to do what you've already suggested. Put the results in a temp table and join on that.
Look at the estimated plan, this will probably shed some light. Typically when query cost gets orders of magnitude more expensive it is because a loop or merge join is being used where a hash join is more appropriate. If you see a loop or merge join in the estimated plan, look at the number of rows it expects to process - is it far smaller than the number of rows you know will actually be in play? You can also specify a hint to use a hash join and see if it performs much better. If so, try updating statistics and see if it goes back to a hash join without a hint.
SELECT
#id,
alpha.A, alpha.B, alpha.C,
beta.X, beta.Y, beta.Z,
alpha.P, alpha.Q
FROM
[DifferentDatabase].dbo.fnGetStuff(#id) beta
INNER HASH JOIN vwSomeData alpha ON beta.id = alpha.id
-- having no idea what type of schema is in place and just trying to throw out ideas:
Like others have said... use Profiler and find the source of pain... but I'm thinking it is the function on the other database. Since that function might be a source of pain, have you thought about a little denormalization or anything on [DifferentDatabase]. I think you'll find a bit more scalability in joining to a more flattened table with indexes than a costly function.
Run this command:
SET SHOWPLAN_ALL ON
Then run your query. It will display the execution plan, look for a "SCAN" on an index or a table. That is most likely what is happening to your query now. If that is the case, try to figure out why it is not using indexes now (refresh statistics, etc)

Query hangs with INNER JOIN on datetime field

We've got a weird problem with joining tables from SQL Server 2005 and MS Access 2003.
There's a big table on the server and a rather small table locally in Access. The tables are joined via 3 fields, one of them a datetime field (containing a day; idea is to fetch additional data (daily) from the big server table to add data to the local table).
Up until the weekend this ran fine every day. Since yesterday we experienced strange non-time-outs in Access with this query. Non-time-out means that the query runs forever with rather high network transfer, but no timeout occurs. Access doesn't even show the progress bar. Server trace tells us that the same query is exectuted over and over on the SQL server without error but without result either. We've narrowed it down to the problem seemingly being accessing server table with a big table and either JOIN or WHERE containing a date, but we're not really able to narrow it down. We rebuilt indices already and are currently restoring backup data, but maybe someone here has any pointers of things we could try.
Thanks, Mike.
If you join a local table in Access to a linked table in SQL Server, and the query isn't really trivial according to specific limitations of joins to linked data, it's very likely that Access will pull the whole table from SQL Server and perform the join locally against the entire set. It's a known problem.
This doesn't directly address the question you ask, but how far are you from having all the data in one place (SQL Server)? IMHO you can expect the same type of performance problems to haunt you as long as you have some data in each system.
If it were all in SQL Server a pass-through query would optimize and use available indexes, etc.
Thanks for your quick answer!
The actual query is really huge; you won't be happy with it :)
However, we've narrowed it down to a simple:
SELECT * FROM server_table INNER JOIN access_table ON server_table.date = local_table.date;
If the server_table is a big table (hard to say, we've got 1.5 million rows in it; test tables with 10 rows or so have worked) and the local_table is a table with a single cell containing a date. This runs forever. It's not only slow, It just does nothing besides - it seems - causing network traffic and no time out (this is what I find so strange; normally you get a timeout, but this just keeps on running).
We've just found KB article 828169; seems to be our problem, we'll look into that. Thanks for your help!
Use the DATEDIFF function to compare the two dates as follows:
' DATEDIFF returns 0 if dates are identical based on datepart parameter, in this case d
WHERE DATEDIFF(d,Column,OtherColumn) = 0
DATEDIFF is optimized for use with dates. Comparing the result of the CONVERT function on both sides of the equal (=) sign might result in a table scan if either of the dates is NULL.
Hope this helps,
Bill
Try another syntax ? Something like:
SELECT * FROM BigServerTable b WHERE b.DateFld in (SELECT DISTINCT s.DateFld FROM SmallLocalTable s)
The strange thing in your problem description is "Up until the weekend this ran fine every day".
That would mean the problem is really somewhere else.
Did you try creating a new blank Access db and importing everything from the old one ?
Or just refreshing all your links ?
Please post the query that is doing this, just because you have indexes doesn't mean that they will be used. If your WHERE or JOIN clause is not sargable then the index will not be used
take this for example
WHERE CONVERT(varchar(49),Column,113) = CONVERT(varchar(49),OtherColumn,113)
that will not use an index
or this
WHERE YEAR(Column) = 2008
Functions on the left side of the operator (meaning on the column itself) will make the optimizer do an index scan instead of a seek because it doesn't know the outcome of that function
We rebuilt indices already and are currently restoring backup data, but maybe someone here has any pointers of things we could try.
Access can kill many good things....have you looked into blocking at all
run
exec sp_who2
look at the BlkBy column and see who is blocking what
Just an idea, but in SQL Server you can attach your Access database and use the table there. You could then create a view on the server to do the join all in SQL Server. The solution proposed in the Knowledge Base article seems problematic to me, as it's a kludge (if LIKE works, then = ought to, also).
If my suggestion works, I'd say that it's a more robust solution in terms of maintainability.

Resources