Speeding up SQL query with multiple joins - sql-server

I have a .NET e-commerce solution running off a mid-sized SQL Server express database. The system queries the order data which involves many joins (potentially 20 tables) which is quite slow, particularly during periods of heavy use, and I think I have exhausted the options for indexing the tables and optimising the queries.
I now believe the best option going forward is denormalization - see https://msdn.microsoft.com/en-us/library/cc505841.aspx
What I would like to know is:
Would SQL Server columnstore indexes be a better option?
I am considering using in-memory OLTP on the denormalized tables because having the data in memory will undoubtedly make queries faster but it doesn't seem like the intended use, so should I?
Should I use something like ElasticSearch instead, and what would be the benefit over SQL Server in-memory OLTP?
Should I use SQL Server OLAP instead? Seems like overkill...

Related

Why are NoSQL equivalents of MySQL joins so much more expensive than SQL ones?

Could someone get into the technicals as to why NoSQL equivalents of MySQL joins are so much more expensive than SQL ones?
I'm not sure it is more expensive.
Joins in a non-distributed system are straightforward. The data's on the same server, and you simply go and find it.
In a distributed system - which just about every NoSQL is - the data might be local, or it might be remote. You have to do at least a calculation to determine the location of the data, then get the data from the other servers.
It is a hard problem (maybe even a fool's errand) to optimize the cluster so associated data is on the "same server" - because what is "associated" changes. Some distributed NoSQL databases, such as my companies' Aerospike database, make a point of randomly distributing data so joins are all equal cost - even if slower. Some cases - like small, well known tables with commonly read and infrequently written data - can be implemented on every server easily, but few NoSQL allow that (currently).
To create a comparison, you'll have to create a distributed SQL cluster, using your favorite distributed SQL technology (such as MySQL Cluster), and compare client join patterns vs the overhead of SQL parsing, optimizer runs, and then fetching from different parts of the cluster.
I've never seen such a benchmark. Most NoSQL use patterns eschew the normalize-and-join SQL patterns, so it is not well tested.

Hive vs SQL Server performance

1) I started using hive from last 2 months. I have a same task as that in SQL. I found that Hive is slow and takes more time to execute queries while SQL executes it in very few minutes/seconds.
After executing the task in Hive when I cross check the result in both (SQL and Hive), I found some difference in results (Not all but in some tables).
e.g. : I have one table which has 2012 records, when I executed a task in Hive in the same table in Hive I got 2007 records.
Why it is happening?
2) If I think to speed up my execution in Hive then what should I do for it?
(Currently I am executing all this stuff on single cluster only. If I think to increase the clusters then how many cluster should I need it to increase the performance)
Please suggest me some solution or some good practices so that I can do it keenly.
Thanks.
Hive and SQL Server are not comparable in any way other than the similarity in the syntax of the query language.
While SQL Server is built to be able to respond in realtime from a single machine, hive is for processing large data sets that may span hundreds or thousands of machines.
Hive (via hadoop) has a lot of overhead for starting up a job.
Hive and hadoop will not cache data in memory like sql server does.
Hive has only recent added indexes so most queries end up being a table scan.
If your dataset fits on a single computer you probably want to stick with SQL Server and not hive. Hive performance tuning is mostly based in Hadoop performance tuning although depending on the types of queries you run there can be free performance from using the LazyBinarySerDe.
Hive does have some differences from regular SQL that may be effecting your query. Without more details I can't speculate as to why.
Ignore the "they aren't comparable in any way" comment. If it stores data, it is comparable to any other method of storing data.
But be aware that SQL Server, 13 years ago, had 1000+ people being paid full-time to improve their product. So while that doesn't "Prove" anything, it does increase ones confidence that more work = more results.
More importantly, look for any non-trivial benchmark done on an open source and/or non-relational method of storing data vs one of the mainstream relational databases. You won't find them. That says a lot to me. (Also, mainstream isn't necessary since the current world's fastest data engine isn't even mainstream. But if that level is needed, look at ExoSol.)
If your need is to learn to work with technology at your job and that technology is Hive, my recommendation is to find someone who is really focused on getting the most out of Hive query performance as possible. If there is a Hive query guru out there, find them. But if you need a lot more than what they can give you, you're using the wrong technology.
And if Hive isn't a requirement, I would avoid it and other technologies lacking the compelling business model that will guarantee their survival past 5 years and move them out of niche category they currently exist in (currently 20 times less popular than any mainstream data engine - https://db-engines.com/en/ranking).

How do you gather statistics from SQL Server?

sys.dm_exec_query_stats seems to be a very useful function to gather statistics from your database which you can use as a starting point to find queries which need to be optimized. selecting * gives somewhat cryptic results, how do you make the results readable? What type of queries do you get from it? Are there other functions or queries you use to gain performance statistics?
To make the results useful, you need to cross reference the information with a few other DMV's and also concentrate your analysis and tunning efforts on the most poorly performing queries.
Here is (one I made earlier) an example of using the DMV you have mentioned to identify the most costly SQL Server queries.
How to identify the most costly SQL Server queries using DMV’s
You can easily extend this to look at other metrics too.
If you want to make performance tuning a breeze for yourself, you should consider installing the freely available SQL Server Performance Dashboard Reports.
These can be used to identify SQL Server Waits, the queries that consume the most I/O, the longest running queries by duration etc.
Why don't you first use 'set pagesize 0'.

ORM and database indexes

What approach do you have towards creating and maintaining database indexes when using ORM such as NHibernate/Hibernate.
Since the ORM is generating the queries, are there any tools you could recommend that could analyze query plans of those and suggest the kind of indexes that should be created?
My current approach is ... wait until something works slow and then find the slow query and optimize it ... but this is sort of lame isn't it? My goal is not to end up with tens or hundreds of indexes of which nobody knows which are actually being used by the system and which aren't. So again index maintenance.
My environment is NHibernate + SQL Server 2005.
I find that the columns that need indexing are typically "obvious". By that I mean if you create queries like "select p from Person p where p.surname = :surname" then whatever column surname refers to needs an index.
Likewise every foreign key should be indexed.
And no I don't wait until performance is actually a problem. Indexes are just something I do right from the start.
Oh the other thing I wanted to add was that most (if not all) ORMs have the ability to turn on statement logging. These often aren't particularly readable (single line, table names of t0, t1, t2, etc) but this could tell you what queries were run and how often.
The standard tools you would use to analyse slow queries / poor indexing apply whether or not you are using an ORM. You can use sql server profiler to examine the sql statements that are running against your database and then use the index plan features in the query window in sql server management studio / sql query analyser to see the details of your query plans and get an idea of which indexes you may need to add.
You can also use the Database Engine Tuning Advisor in sql management studio, although whether or not that tool is actually more useful than simply spending some time thinking about your database design and querying patterns is open to question.

SQL Server 2008 Full Text Search (FTS) versus Lucene.NET

I know there have been questions in the past about SQL 2005 versus Lucene.NET but since 2008 came out and they made a lot of changes to it and was wondering if anyone can give me pros/cons (or link to an article).
SQL Server FTS is going to be easier to manage for a small deployment. Since FTS is integrated with the DB, the RDBMS handles updating the index automatically. The con here is that you don't have an obvious scaling solution short of replicating DB's. So if you don't need to scale, SQL Server FTS is probably "safer". Politically, most shops are going to be more comfortable with a pure SQL Server solution.
On the Lucene side, I would favor SOLR over straight-up Lucene. With either solution you have to do more work yourself updating the index when the data changes, as well as mapping data yourself to the SOLR/Lucene index. The pros are that you can easily scale by adding additional indexes. You could run these indexes on very lean linux servers, which eliminates some license costs. If you take the Lucene/SOLR route, I would aim to put ALL the data you need directly into the index, rather than putting pointers back to the DB in the index. You can include data in the index that is not searchable, so for example you could have pre-built HTML or XML stored in the index, and serve it up as a search result. With this approach your DB could be down but you are still able to serve up search results in a disconnected mode.
I've never seen a head-to-head performance comparison between SQL Server 2008 and Lucene, but would love to see one.
I built a medium-size knowledge base (maybe 2GB of indexed text) on top of SQL Server 2005's FTS in 2006, and have now moved it to 2008's iFTS. Both situations have worked well for me, but the move from 2005 to 2008 was actually an improvement for me.
My situation was NOT like StackOverflow's in the sense that I was indexing data that was only refreshed nightly, however I was trying to join search results from multiple CONTAINSTABLE statements back in to each other and to relational tables.
In 2005's FTS, this meant each CONTAINSTABLE would have to execute its search on the index, return the full results and then have the DB engine join those results to the relational tables (this was all transparent to me, but it was happening and was expensive to the queries). 2008's iFTS improved this situation because the database integration allows the multiple CONTAINSTABLE results to become part of the query plan which made a lot of searches more efficient.
I think that both 2005 and 2008's FTS engines, as well as Lucene.NET, have architectural tradeoffs that are going to align better or worse to a lot of project circumstances - I just got lucky that the upgrade worked in my favor. I can completely see why 2008's iFTS wouldn't work in the same configuration as 2005's for the highly OLTP nature of a use case like StackOverflow.com. However, I would not discount the possibility that the 2008 iFTS could be isolated from the heavy insert transaction load... but it also sounds like it could be as much work to accomplish that as move to Lucene.NET ... and the cool factor of Lucene.NET is hard to ignore ;)
Anyway, for me, the ease and efficiency of SQL 2008's iFTS in the majority of situations probably edges out Lucene's 'cool' factor (though it is easy to use, I've never used it in a production system so I'm reserving comment on that). I would be interesting in knowing how much more efficient Lucene is (has turned out to be? is it implemented now?) in StackOverflow or similar situations.
This might help:
https://blog.stackoverflow.com/2008/11/sql-2008-full-text-search-problems/
Haven't used SQL Server 2008 personally, though based on that blog entry, it looks like the full-text search functionality is slower than it was in 2005.
we use both full-text-search possibilities, but in my opinion it depends on the data itself and your needs.
we scale with web-servers, and therefore i like lucene, because i don't have that much load on the sql-server.
for starting at null and wanting to have a full-textsearch i would prefer the sql-server solution, because i think it is really fast to get results, if you want lucene you have to implement more at start (and also get some know-how).
One consideration that you need to keep in mind is what kind of search constraints you have in addition to the full-text constraint. If you are doing constraints that lucene can't provide, then you will almost certainly want to use FTS. One of the nice things about 2008 is that they improved the integration of FTS with standard sql server queries so performance should be better with mixed database and FT constraints than it was in 2005.

Resources