Using Machine Learning for generating Query Execution Plan - query-optimization

I am aware of cost based and rule based optimizers ? Are there any Query engines that use Machine Learning (based on similar queries run in the past) for generating Query execution plans ?

Related

Can I use Plan Guides to optimize a slow performing Query?

A synchronization program is syncing data between our SQL server and an online database. Every 5 minutes the program runs query's on all tables, al in the format:
select max(ID) from table
After that, the program retreives information from the online database, using the max(ID) to retreive only newer records.
The query runs fast on small tables, but some tables have millions of records.
The performance can be boosted by using a Where statement:
select max(ID) from table where date >= dateadd(dd,-30,getdate())
Unfortunately it's an old program, which cannot be changed anymore.
(no supplier and no source code)
I read something about Plan Guides, which should give performance boosts to query's.
Can I use a plan guide to alter these query's so they run much faster??
I would try to stay clear of plan guides unless you have tried everything else to help SQL Server choose a better execution plan; reason, you are forcing SQL Server to choose maybe not the best execution plan and if your statistics are right, SQL Server does a heck of a job to provide you with a very good estimation plan.
Sorry if this is going to be long winded but SQL Server performance is based on statistics and if your statistics are off, your query will not perform optimally. This is where I would first start.
I would first run your query and generate an estimated execution plan and actual execution plan. If you have never worked with execution plans, here are a few sites to start you off to understand operators and how to interpret some of the more common operators; Rasanu Consulting (Joins), Microsoft (Joins), Simple Talk (Common Operatiors), Microsoft (All Operators). The execution plans are generated from statistics SQL Server stores on indexes and non-indexed columns. I am assuming your query is stored in a procedure where the initial execution plan is stored. Unfortunately, changes to the query in the stored proc can cause performance issues because of an outdated plan and the proc will need to be recompiled.
Based on the results of your execution plan, you may find a simple clustered index will help performance or find that fine tuning your where clause will result in better carnality estimates. Here are a few sites that do a really good job explaining statistics; Patrick Keisler, Itzik Ben-Gan, Microsoft
You may be wondering "what does this have to do with my question?" which would expect a simple yes or no answer but, effectively correcting query performance starts with an understanding statistics and execution plans.
Hope this helps!

Query is extremely slow due to Lazy Index Spool

On a powerful machine the SQL Server query is running too slowly.
In the execution plan I can see that most of the time spent goes to a "Lazy Index Spooling" process. In the query some aggregate functions are being used for calculation of values.
How can I speed-up the query (machine resources are enough)?
Thanks in advance.

Hive vs SQL Server performance

1) I started using hive from last 2 months. I have a same task as that in SQL. I found that Hive is slow and takes more time to execute queries while SQL executes it in very few minutes/seconds.
After executing the task in Hive when I cross check the result in both (SQL and Hive), I found some difference in results (Not all but in some tables).
e.g. : I have one table which has 2012 records, when I executed a task in Hive in the same table in Hive I got 2007 records.
Why it is happening?
2) If I think to speed up my execution in Hive then what should I do for it?
(Currently I am executing all this stuff on single cluster only. If I think to increase the clusters then how many cluster should I need it to increase the performance)
Please suggest me some solution or some good practices so that I can do it keenly.
Thanks.
Hive and SQL Server are not comparable in any way other than the similarity in the syntax of the query language.
While SQL Server is built to be able to respond in realtime from a single machine, hive is for processing large data sets that may span hundreds or thousands of machines.
Hive (via hadoop) has a lot of overhead for starting up a job.
Hive and hadoop will not cache data in memory like sql server does.
Hive has only recent added indexes so most queries end up being a table scan.
If your dataset fits on a single computer you probably want to stick with SQL Server and not hive. Hive performance tuning is mostly based in Hadoop performance tuning although depending on the types of queries you run there can be free performance from using the LazyBinarySerDe.
Hive does have some differences from regular SQL that may be effecting your query. Without more details I can't speculate as to why.
Ignore the "they aren't comparable in any way" comment. If it stores data, it is comparable to any other method of storing data.
But be aware that SQL Server, 13 years ago, had 1000+ people being paid full-time to improve their product. So while that doesn't "Prove" anything, it does increase ones confidence that more work = more results.
More importantly, look for any non-trivial benchmark done on an open source and/or non-relational method of storing data vs one of the mainstream relational databases. You won't find them. That says a lot to me. (Also, mainstream isn't necessary since the current world's fastest data engine isn't even mainstream. But if that level is needed, look at ExoSol.)
If your need is to learn to work with technology at your job and that technology is Hive, my recommendation is to find someone who is really focused on getting the most out of Hive query performance as possible. If there is a Hive query guru out there, find them. But if you need a lot more than what they can give you, you're using the wrong technology.
And if Hive isn't a requirement, I would avoid it and other technologies lacking the compelling business model that will guarantee their survival past 5 years and move them out of niche category they currently exist in (currently 20 times less popular than any mainstream data engine - https://db-engines.com/en/ranking).

SQL Server 2008 R2 - optimization issue

I'm running into a performance issue with the current schema. So I built an equivalent schema to solve the issue.
I ran some tests on both schemas and the results are hard to understand. For the record, the data is the same.
I get the following from the Profiler when executing equivalent requests on the two schemas.
Old schema:
1,300,000 reads
5,000 CPU
4 seconds execution time
New schema:
30,000 reads
3,000 CPU
6 seconds execution time
The difference seems to be in the query plan used. The old schema has parallelism in the query plan. The new schema isn't using parallelism.
Has anyone faced similar situations (less IO/CPU but more execution time). How did you solve it?
Is there a way to force parallelism? I've played with query hints(http://msdn.microsoft.com/en-us/library/ms18171). I'm able to stop parallelism on the old schema but can't seem the query on the new schema to use parallelism.
Thanks in advance.
Louis,
Currently there is no way to force parallelism in SQL Server straight out of the box but Adam Machanic did some work to do that though.
http://whoisactive.com
Coming to your first question, yes we have seen cases like that too. Note that Parallelism is cpu bound and that's why you are seeing more cpu time but overall less execution time as you have multiple threads doing the work for you.
http://www.simple-talk.com/sql/learn-sql-server/understanding-and-using-parallelism-in-sql-server/
Make sure you have proper indexes in place and also stats are updated with full scan. In the long run it is best if Query Optimizer makes the decisions by itself but if you want to overwrite the QO plans then you may have to add lot more details. Schema, data and repro.
HTH

What is the usage of Execution Plan in SQL Server?

What is the usage of Execution Plan in SQL Server? When can these plans help me?
When your queries all run fast, all is good in the world and execution plans don't really matter that much. However, when something is running slow, they are very important. They are primarily used to help tune (speed up) slow SQL. Without execution plans, you'd just be guessing at what to change to make your SQL go faster.
Here is the simplest way they can help you. Take a slow query and do the following in a SQL Server Management Studio query window:
1) run the command:
SET SHOWPLAN_ALL ON
2) run your slow query
3) your query will not run, but the execution plan will be returned.
4) look through the PhysicalOp column output for the word SCAN within any text in this column, this is usually the part of the query that is causing the slowdown. Analyze your joins and index usage in regards to this row of the output and if you can eliminate the scan, you will usually improve the query speed.
There are may useful columns (TotalSubTreeCost, etc) in the output, you will become familiar with them as you learn how to read execution plans and tune your slow queries.
When you need to perform performance profiling on a specific query.
Have a look at SQL Server Query Execution Plan Analysis
When it comes time to analyze the
performance of a specific query, one
of the best methods is to view the
query execution plan. A query
execution plan outlines how the SQL
Server query optimizer actually ran
(or will run) a specific query. This
information if very valuable when it
comes time to find out why a specific
query is running slow.
It's helpful in identifying where bottlenecks are occurring in long running queries. You can make some quite impressive performance improvements simply by knowing how the server executes your complex query.
If I remember correctly it also identifies good candidates for indexing which is another way to increase performance.
These plans describe how SQL Server goes about executing your query. It's the result of a cost-based algorithm by the SQL Server query optimiser, which comes up with a plan for how to get to the end result in the expected best possible way.
It's useful because it will show you where time is being spent in the query, whether indexes are being used or not, what type of process is being done on those indexes (scan, seek) etc.
So if you have a poorly performing query, the execution plan will highlight what the costliest parts are and allow you to see what needs optimising (e.g. may be a missing index, may be an inefficiently written query resulting in an index scan instead of a seek).

Resources