ORM and database indexes

ORM and database indexes - sql-server

What approach do you have towards creating and maintaining database indexes when using ORM such as NHibernate/Hibernate.
Since the ORM is generating the queries, are there any tools you could recommend that could analyze query plans of those and suggest the kind of indexes that should be created?
My current approach is ... wait until something works slow and then find the slow query and optimize it ... but this is sort of lame isn't it? My goal is not to end up with tens or hundreds of indexes of which nobody knows which are actually being used by the system and which aren't. So again index maintenance.
My environment is NHibernate + SQL Server 2005.

I find that the columns that need indexing are typically "obvious". By that I mean if you create queries like "select p from Person p where p.surname = :surname" then whatever column surname refers to needs an index.
Likewise every foreign key should be indexed.
And no I don't wait until performance is actually a problem. Indexes are just something I do right from the start.
Oh the other thing I wanted to add was that most (if not all) ORMs have the ability to turn on statement logging. These often aren't particularly readable (single line, table names of t0, t1, t2, etc) but this could tell you what queries were run and how often.

The standard tools you would use to analyse slow queries / poor indexing apply whether or not you are using an ORM. You can use sql server profiler to examine the sql statements that are running against your database and then use the index plan features in the query window in sql server management studio / sql query analyser to see the details of your query plans and get an idea of which indexes you may need to add.
You can also use the Database Engine Tuning Advisor in sql management studio, although whether or not that tool is actually more useful than simply spending some time thinking about your database design and querying patterns is open to question.

Related

Automatic database indexing

I have a database which is used by a multi-tenant application. In this database workloads are dynamic and change continuously. Therefore I have to allocate a DA to continuously manage the database. But I thought to use an automated service for this task such as Azure SQL Database Advisor - Automatic index management (platform is not important - I am OK with using MS sql server or oracle or other RDBMS).
I want to know how these automated indexes are actually working.Can I replace database administrator with these automatic indexers. I read that whenever a query execution plan is generated it will find out all the useful indexes to execute that query. Then it uses the indexes which really exist and cache some data about indexes which don't exist. If an index data is cached again and again the sql adviser will show that as a recommended index. But I want to know can we relay on this, what about update and insert queries? If I have a table where records are frequently updated, these automated indexing systems will consider that?

Note that Index Advisor is only available in SQL Database (Azure).
In the background Index Advisor is a machine learning algorithm, a relatively simple and quite effective one. It will analyze your workload, see if you would benefit from indexes. If he thinks you would it will show you as a recommendation - if you turn automatic index creation/dropping on it will actually create the index. To understand better how it works take a look at Channel 9. Note that before you apply a recommendation you can have an estimated impact.
Now the algorithm can make mistakes, right? So once the recommendation is applied it can automatically be reverted based on its performance.
Also note that next to Index Advisor you can check the Query Performance Insights that will show the performance of you queries. So this can help your DBA diagnose other, non-index related problems.
But note that Index Advisor will not drop and create for you new indexes every hour, it takes for him a day or two. So if your database's workload is changing very fast then I am not sure any automatic management tool or DBA will react quickly enough for your workload.

Database tuning advices

Possibly some of you don't even know about these features so you will learn a lot from this post which will in fact help me to optimize better and some of you probably use them on daily basis so you can help me and other less DBA proof users.
I'm using SQL-Server 2005 Standard
I run SQL Server Profiler a lot. Each time i find ad hoc queries or sps which execution time exceed my possible limits of under 100ms for complex queries and above 30ms for short ones (number does not mean a thing, just to make some sense). After i find possibly problematic queries i write them down so i can use Database Engine Tuning Advisor which executes overloaded queries on tables and at the result gives me indexes i need to build in order to improve performance. Each night i execute index rebuild function from Maintenance Plans.
Now question time!!!
1.if Database Engine Tuning Advisor gives me 10 indexes to create while improvement percentage is about 40% should i use it's advice or not? Better question is what is ratio number of indexes/improvement percentage i should follow. Indexes take space and time to rebuild.
2.If i create about 5-7 indexes for each problematic query, i can end up with 500 indexes per DB. How many indexes can i build so DB will perform normally? are there any limitations?
3.Is there any other way to optimize ( nor re-design ) your DB other than using my method or going sp by sp by your hands and eyes?

There's no right answer to this question as it depends heavily on your workload.
For workloads with a heavy ratio of reads (e.g. data warehouse) it might make sense to create an index which it would be positively counter productive to create for an environment with a greater amount of writes.
The DTA can help with this regard by assessing the impact on the overall workload but you would need to try and capture a representative sample (not just the poor performing queries). SQL Profiler is quite resource intensive so to do this with the least possible impact on your server you would need to use a server side SQL trace with appropriate filters to only log events related to the database of interest.
To identify the poorest performing queries in isolation If you have at least SQL2005 SP1 client tools installed you should be able to right click the database node in Management Studio and use the Reports -> Standard Reports menu to see the plans in the cache with highest CPU/IO.
If you are interested in this area I recommend the book SQL Server 2008 Query Performance Tuning Distilled (most of it applicable to SQL2005 as well)

You can get SQL Profiler to log to a table, so it will write the queries to a table you specify. If you can, leave it running for a few hours - Or however long it takes to cover as many queries/events as possible.
Next, use Database Engine Tuning Advisor - And get it to use this table of queries as its source input. You will find it looks at the whole pattern, and will recommend you create some indices, and remove others.
This is better than looking at queries one by one in isolation, although that still has its place.

How do you gather statistics from SQL Server?

sys.dm_exec_query_stats seems to be a very useful function to gather statistics from your database which you can use as a starting point to find queries which need to be optimized. selecting * gives somewhat cryptic results, how do you make the results readable? What type of queries do you get from it? Are there other functions or queries you use to gain performance statistics?

To make the results useful, you need to cross reference the information with a few other DMV's and also concentrate your analysis and tunning efforts on the most poorly performing queries.
Here is (one I made earlier) an example of using the DMV you have mentioned to identify the most costly SQL Server queries.
How to identify the most costly SQL Server queries using DMV’s
You can easily extend this to look at other metrics too.
If you want to make performance tuning a breeze for yourself, you should consider installing the freely available SQL Server Performance Dashboard Reports.
These can be used to identify SQL Server Waits, the queries that consume the most I/O, the longest running queries by duration etc.

Why don't you first use 'set pagesize 0'.

SQL Server 2008 Full Text Search (FTS) versus Lucene.NET

I know there have been questions in the past about SQL 2005 versus Lucene.NET but since 2008 came out and they made a lot of changes to it and was wondering if anyone can give me pros/cons (or link to an article).

SQL Server FTS is going to be easier to manage for a small deployment. Since FTS is integrated with the DB, the RDBMS handles updating the index automatically. The con here is that you don't have an obvious scaling solution short of replicating DB's. So if you don't need to scale, SQL Server FTS is probably "safer". Politically, most shops are going to be more comfortable with a pure SQL Server solution.
On the Lucene side, I would favor SOLR over straight-up Lucene. With either solution you have to do more work yourself updating the index when the data changes, as well as mapping data yourself to the SOLR/Lucene index. The pros are that you can easily scale by adding additional indexes. You could run these indexes on very lean linux servers, which eliminates some license costs. If you take the Lucene/SOLR route, I would aim to put ALL the data you need directly into the index, rather than putting pointers back to the DB in the index. You can include data in the index that is not searchable, so for example you could have pre-built HTML or XML stored in the index, and serve it up as a search result. With this approach your DB could be down but you are still able to serve up search results in a disconnected mode.
I've never seen a head-to-head performance comparison between SQL Server 2008 and Lucene, but would love to see one.

I built a medium-size knowledge base (maybe 2GB of indexed text) on top of SQL Server 2005's FTS in 2006, and have now moved it to 2008's iFTS. Both situations have worked well for me, but the move from 2005 to 2008 was actually an improvement for me.
My situation was NOT like StackOverflow's in the sense that I was indexing data that was only refreshed nightly, however I was trying to join search results from multiple CONTAINSTABLE statements back in to each other and to relational tables.
In 2005's FTS, this meant each CONTAINSTABLE would have to execute its search on the index, return the full results and then have the DB engine join those results to the relational tables (this was all transparent to me, but it was happening and was expensive to the queries). 2008's iFTS improved this situation because the database integration allows the multiple CONTAINSTABLE results to become part of the query plan which made a lot of searches more efficient.
I think that both 2005 and 2008's FTS engines, as well as Lucene.NET, have architectural tradeoffs that are going to align better or worse to a lot of project circumstances - I just got lucky that the upgrade worked in my favor. I can completely see why 2008's iFTS wouldn't work in the same configuration as 2005's for the highly OLTP nature of a use case like StackOverflow.com. However, I would not discount the possibility that the 2008 iFTS could be isolated from the heavy insert transaction load... but it also sounds like it could be as much work to accomplish that as move to Lucene.NET ... and the cool factor of Lucene.NET is hard to ignore ;)
Anyway, for me, the ease and efficiency of SQL 2008's iFTS in the majority of situations probably edges out Lucene's 'cool' factor (though it is easy to use, I've never used it in a production system so I'm reserving comment on that). I would be interesting in knowing how much more efficient Lucene is (has turned out to be? is it implemented now?) in StackOverflow or similar situations.

This might help:
https://blog.stackoverflow.com/2008/11/sql-2008-full-text-search-problems/
Haven't used SQL Server 2008 personally, though based on that blog entry, it looks like the full-text search functionality is slower than it was in 2005.

we use both full-text-search possibilities, but in my opinion it depends on the data itself and your needs.
we scale with web-servers, and therefore i like lucene, because i don't have that much load on the sql-server.
for starting at null and wanting to have a full-textsearch i would prefer the sql-server solution, because i think it is really fast to get results, if you want lucene you have to implement more at start (and also get some know-how).

One consideration that you need to keep in mind is what kind of search constraints you have in addition to the full-text constraint. If you are doing constraints that lucene can't provide, then you will almost certainly want to use FTS. One of the nice things about 2008 is that they improved the integration of FTS with standard sql server queries so performance should be better with mixed database and FT constraints than it was in 2005.

What steps should be necessary to optimize a poorly performing query?

I know this is a broad question, but I've inherited several poor performers and need to optimize them badly. I was wondering what are the most common steps involved to optimize. So, what steps do some of you guys take when faced with the same situation?
Related Question:
What generic techniques can be applied to optimize SQL queries?

Look at the execution plan in query analyzer
See what step costs the most
Optimize the step!
Return to step 1 [thx to Vinko]

In SQL Server you can look at the Query Plan in Query Analyzer or Management Studio. This will tell you the rough percentage of time spent in each batch of statements. You'll want to look for the following:
Table scans; this means you are completely missing indexes
Index scans; your query may not be using the correct indexes
The thickness of the arrows between each step in a query tells you how many rows are being produced by that step, very thick arrows means you are processing a lot of rows, and can indicate that some joins need to be optimized.
Some other general tips:
A large amount of conditional statements, such as multiple if-else statements, can cause SQL Server to constantly rebuild the query plan. You can check for this using Profiler.
Make sure that different queries aren't blocking each other, such as an update statement blocking a select statement. This can be avoided by specifying the (nolock) hint in SQL Server select statements.
As others have mentioned, try out the Performance Tuning wizard in Management Studio.
Finally, I would highly recommend creating a set of load tests (using Visual Studio 2008 Test Edition), which you can use to simulate your application's behavior when dealing with a large amount of requests. Some SQL performance bottlenecks only manifest themselves under these circumstances, and being able to reproduce them makes it a lot easier to fix.

Indexes may be a good place to start...
The low hanging fruit can be knocked down with the SQL Server Index Tuning Wizard.

I'm not sure about other databases, but for SQL Server I recommend the Execution Plan. It very clearly (albeit with lots of vertical and horizontal scrolling, unless you've got a 400" monitor!) shows what steps of your query are sucking up the time.
If you've got one step that takes a crazy 80%, then maybe an index could be added, then after tweaking the index, re-run the Execution Plan to find your next biggest step.
After a couple tweaks you may find that there really are no steps that stand out from the others i.e. they're all 1-2% each. If that is the case, then you might then need to see if there is a way you can cut down the amount of data included in your query, do those four million closed sales orders need to be included in the "Active Sales Orders" query? No, so exclude all those with STATUS='C' ... or something like that.
Another improvement you'll see from the Execution Plan is bookmark lookups, basically it finds a match in the index, but then SQL Server has to quickly trawl through the table to find the record you want. This operation might at times take longer than just scanning the table in the first place would have, if that is the case, do you really need that index?
With indexes, and especially with SQL Server 2005 you should look to the INCLUDE clause, this basically allows you to have a column in an index without really being in the index, so if all the data you need for your query is in your index or is an included columnn then SQL Server doesn't have to even look at the table, a big performance pickup.

There are a couple of things you can look at to optimize your query performance.
Ensure that you just have the minimum of data. Make sure you select only the columns you need. Reduce field sizes to a minimum.
Consider de-normalising your database to reduce joins
Avoid loops (i.e. fetch cursors), stick to set operations.
Implement the query as a stored procedure as this is pre-compiled and will execute faster.
Make sure that you have the correct indexes set up. If your database is used mostly for searching then consider more indexes.
Use the execution plan to see how the processing is done. What you want to avoid is a table scan as this is costly.
Make sure that the Auto Statistics is set to on. SQL needs this to help decide the optimal execution. See Mike Gunderloy's great post for more info. Basics of Statistics in SQL Server 2005
Make sure your indexes are not fragmented Reducing SQL Server Index Fragmentation
Make sure your tables are not fragmented. How to Detect Table Fragmentation in SQL Server 2000 and 2005

The execution plan is a great start and will help you figure out what part of your query you need to tackle.
Once you figure out the where, it is time to tackle the how and why. Take a look at the type of queries you are trying to preform. Avoid loops at all cost as they are slow. Avoid cursors at all costs because they are slow. Stick to set based queries when ever possible.
There are ways to give sql hints on the type of joins to use if you are using joins. Be careful here though, while one hint may speed up your query once, it may slow down your query 10 fold the next time through depending on the data and parameters.
Finally, make sure your database is well indexed. A good place to start is any field that is contained in a where clause probably should have a index on it.

Look at the indexes on the tables that make the query. An indexes may be needed on particular fields that participate in the where clause. Also look at the fields used in the joins in the query (if joins exist). If indexes already exist, look at the type of index.
Failing that (because there are negatives to using locking hints) Look at locking hints and explicitly naming the index to use in the join. Using NOLOCKS is more obvious if you're getting a lot of deadlocked transactions.
Do what roman and Andy S mentioned first though.