Quick and dirty way to compare SQL server performance - sql-server

Further to my previous question about the Optimal RAID setup for SQL server, could anyone suggest a quick and dirty way of benchmarking the database performance on the new and old servers to compare them? Obviously, the proper way would be to monitor our actual usage and set up all sorts of performance counters and capture the queries, etc., but we are just not at that level of sophistication yet and this isn't something we'll be able to do in a hurry. So in the meanwhile, I'm after something that would be a bit less accurate, but quick to do and still better than nothing. Just as long as it's not misleading, which would be worse than nothing. It should be SQL Server specific, not just a "synthetic" benchmark. It would be even better if we could use our actual database for this.

Measure the performance of your application itself with the new and old servers. It's not necessarily easy:
Set up a performance test environment with your application on (depending on your architecture this may consist of several machines, some of which may be able to be VMs, but some of which may not be)
Create "driver" program(s) which give the application simulated work to do
Run batches of work under the same conditions - remember to reboot the database server between runs to nullify effects of caching (Otherwise your 2nd and subsequent runs will probably be amazingly fast)
Ensure that the performance test environment has enough hardware machines in to be able to load the database heavily - this may mean swapping out some VMs for real hardware.
Remember to use production-grade hardware in your performance test environment - even if it is expensive.
Our database performance test cluster contains six hardware machines, several of which are production-grade, one of which contains an expensive storage array. We also have about a dozen VMs on a 7th simulating other parts of the service.

you can always insert, read, and delete a couple of million rows - it's not a realistic mix of operations but it should strain the disks nicely...

Find at least a couple of the queries that are taking some time, or at least that you suspect are taking time, insert a lot of data if you don't have it already, and run the queries having set:
SET STATISTICS IO ON
SET STATISTICS TIME ON
SET STATISTICS PROFILE ON
Those should give you a rough idea of the resources being consumed.
You can also run SQL Server Profiler to get a general idea of what queries are taking a long time and how long they are taking plus other statistics. It outputs a lot of data so try to filter it down a little bit, possibly by long duration or one of the other performance statistics.

Related

SQL query Performance testing regarding cache

The situation
We are planning to migrate our entire (production)DWH to a new cluster. One of the requirements is that the new cluster is atleast as fast as the current cluster. This calls for performance testing with the current cluster serving as a baseline.
When conducting these tests we want both enviorments to be near identical in terms of behaviour.
I can already clone the user behaviour from the live production cluster and execute it on the new cluster. Leaving the cache to be tackled.
The catch
Since we are going to compare this new cluster to the live production enviorment I can't simply clear the cache of both servers. Clearing the cache of the new cluster would be possible since it isn't in production yet. However I am not going to clear the cache of the live production cluster since this is still being used and will have a big impact on the performance.
I was wondering if it would be possible to clone/mimic the cache between the two clusters.
I'm also open for an entire different approach on this matter.
I think you are going about this the wrong way and here is why. I assume the following:
The new cluster's hardware is of the same vendor, quality, etc as the previous
The cores / CPU, RAM, etc is as good or better on the new instance
The instance is of the same version, or an upgraded version of SQL Server. Note, upgrading doesn't mean the queries will perform better in all cases
The storage is the same, or better (SAN / NAS configurations)
The server settings are the same (MAXDOP, etc)
If these aren't true, then I don't see why you are conducting the test anyway since it wouldn't be comparing apples to apples. With that being said, I still don't see how the tests would be equal even if you could mirror the plan cache. You could create a brand new query that would be used for performance testing, and run it on both instances to compare their performance (it would use a new plan) but here's a big catch... you aren't going to kick off all the users from production instance, so your baseline query will contend for resources. Unless you have an identical mirror of your production server, which no users are using, I don't see how you're going to get an unbiased test.
With all that being said, most often you are upgrading to faster, better hardware so one could feel safe that it would be faster, or at least not slower, assuming equal configurations. Additionally, there are tons of performance tuning blogs out there from Dave Pinal, Paul White, Brent Ozar, Paul Randall, Aaron Bertrand, etc... ranging from optimal server settings to query tuning. This alone, could be a night and day difference in performance along with proper DB maintenance (fixing index fragmentation, fixing queries with hundreds or thousands of plans which only get used once, fixing indexing in general, etc)

How expensive is access to database? How often do we access to it?

I'm about to write an application for Android, and it will use Mysql.
I know that access to DB is really expensive in terms of time, and would like to know how often do applications like instant messaging, online gaming access to databases?
For example in a game, we would like to save the positions of a player in the world, when he's moving all the time.
Is the database access actually not expensive, and there is a way to be connected to it all the time and just do request that are actually not expensive?
Or is IT really expensive in anyway, and there are techniques to access to it for example every X interval of time, and saving it locally in the meantime?
I Know that my question is really general, and it depends always on what we need and want.
My question came out because i made a really simple login application that connects and does 1 request to database, and it takes 1 second (a lot!!) to get the result, so how online applications can be so fast?
Thank you
Before answering this I would recommend simulating the process as much as possible, benchmarking and you can work towards the best solution for your use case.
e.g. If I have an application submitting data to a database simulate the submission so I can easily run multiple submissions at the same time and see what the bottle neck is...and see how it compares when I using caching, replication, indexes, etc.
Also reading company blogs can be helpful as they often share success stories that support the usage of a particular approach
How expensive is access to database?
Accessing a database can be a pretty quick operation
SELECT 1; // 0.005 Secs :D
However there are situations that can lead to poor performance (slow reads, writes and updates) but there are some relatively simple ways to combat this
Indexes
The best way to improve the performance of SELECT operations is to
create indexes on one or more of the columns that are tested in the
query. The index entries act like pointers to the table rows, allowing
the query to quickly determine which rows match a condition in the
WHERE clause, and retrieve the other column values for those rows.
Replication
spreading the load among multiple slaves to improve performance. In
this environment, all writes and updates must take place on the master
server. Reads, however, may take place on one or more slaves. This
model can improve the performance of writes (since the master is
dedicated to updates), while dramatically increasing read speed across
an increasing number of slaves.
How often do we access to it?
If you are solely using a database you will access it every time you n position and every time you need to find out their position.
This is where you would explore options to prevent accessing the database.
Memory caches such as redis or memcache
Replication - Only read from slaves
It depends on your design and requirement.
1) Most of the applications manage Connection Pools to minimize the initialization time.
2) Most of the ORM frameworks have external Cache to improve the reading performance. So if you do heavy data reading in your application then don't worry about storing it in locally. The Cache will be effective in this case.
3) When you store locally either in File (or) some format, then it will also add extra performance delay.
4) If you keep the data in primary memory, then obviously Game performance would be better. That's why Gamers prefer high end graphics card, and huge RAM.
For most databases there is the option of batch insertions. Obviously even a small overhead will accumulate if you have to many connections over time. And performing single insertions will have a greater overhead than on batch. The only issue is how often?.... And you should test how often you wan't to insert and how much information you should store locally before doing a batch insertion.

Should I keep this "GlobalConnection" or create connection for every query?

I have inherited a legacy Delphi application that uses ADO to connect to SQL Server.
The application has a notion of a "Global Connection" -- that is a single connection that it opens at the start, and then keeps open all throughout the running of the application (which can be days, weeks, or longer....)
So my question is this: Should I keep this way of doing things or should I switch to a "connect-query-disconnect" mode of doing things? Does it matter?
Switching would be a non-trivial task, but I'll do it if it means better performance, data management, etc.
Well, it depends on what you're expecting to get out of it, and what kind of application it is.
There's nothing in particular wrong with using a single long-running connection, as long as the application can gracefully handle disconnections and recover or log/notify when it can't reconnect.
The problem with a connect-query-disconnect setup is that you're adding the overhead of connecting and disconnecting on every query. That's going to slow things down, and in an interactive GUI application users may notice the additional overhead. You also have to make sure that authorization is transparently handled if it isn't already.
At the same time, there may be interactive performance gains to be had if you can push all the queries off onto background threads and asynchronously update the GUI. If contention appears because the queries are serialized, you can migrate to a connection-pool system fairly readily as well and improve things even more. This has a fairly high complexity cost to it though, so now you're looking to balancing what the gains are compared to the work involved.
Right now, my ultimate response is "if it ain't broke, don't fix it." Changes along the lines you propose are a lot of work -- how much do the users of this application stand to gain? Are there other problems to solve that might benefit them more?
Edit: Okay, so it's broke. Well, slow at least, which is all the same to me. If you've ruled out problems with the SQL Server itself, and the queries are performing as fast as they can (i.e. DB schema is sane, the right indexes are available, queries aren't completely braindead, server has enough RAM and fast enough I/O, network isn't flaky, etc.), then yes, it's time to find ways to improve the performance of the app itself.
Simply moving to a connect-query-disconnect is going to make things worse, and the more queries you're issuing the bigger the drop off is going to be. It sounds like you're going to need to rearchitect the app so that you can run fewer queries, run them in the background, cache more aggressively on the client, or some combination of all 3.
Don't forget the making the clients perform better means that server side performance gets more important since it's probably going to be handling a higher load if clients start making multiple connections and issuing multiple queries in parallel.
As mr Frazier told before - the one global connection is not bad per se.
If you intend to change, first detect WHAT is the problem. Let's see some scenarios:
1
Some screens(IOW: an set of 1..n forms to operate in a business entity) are slow. Possible causes:
insuficient filtering resulting in a pletora of records being pulled from database without necessity.
the number of records are ok, but takes too much to render it. Solution: faster controls or intelligent rendering (ex.: Virtual list views)
too much queries each time you open an screen. Possible solutions: use TClientDatasets (or any in-memory dataset) to hold infrequently modified lookup tables. An more sophisticated cache for more extensive tables or opening those datasets in other threads can improve response times.
Scrolling on datasets with controls bound can be slow (just to remember, because those little details can be easily forgotten).
2
Whole app simply slows down. Checklist:
Network cards are ok? An few net cards mal-functioning can wreak havoc even on good structured networks as they create unnecessary noise on the line.
[MSSQL DBA HAT ON] The next on the line of attack is SQL Server. Ask the DBA to trace blocks and deadlocks. Register slow queries and work on them speed up. This relate directly to #1.1 and #1.3
Detect if some naive developer have done SELECT inside transactions. In read committed isolation, it's just overhead, as it'll create more network traffic. Open the query, retrieve the data and close the dataset.
Review the database schema, if you can.
Are any data-bound operations on a bulk of records (let's say, remarking the price of some/majority/all products) being done on the app? Make an SP or refactor the operation on an query, it'll be much faster and will reduce the load of the entire server.
Extensive operations on a group of records? Learn how to do that operations at once on the server instead of one-by-one record. See an examination of most used alternatives on the MSSQL MVP Erland Sommarskog's article on array and list on MSSQL.
Beware of queries with WHERE like : WHERE SomeFunction(table1.blabla) = #SomeParam . Most of time, that ones will not use an index causing to read the entire table to select the desired data. If is a big table.... Indexing on a persisted computed columns can make miracles...[MSSQL HAT OFF]
That's what I can think of without a little more detail... ;-)
Database connections are time consuming resources to create and the rule of thumb should be create as little as possible and reuse as much as possible. That's why some other technologies have database connection pools, which are typically established at application/service startup and then kept as long as possible and shared among threads.
From your comment, the application has performances issues, but it's difficult without more details to make any recommendation.
Should try to nail down what is slow - are all queries slow or just some specific ones?
If just some specific ones is there some correlation.
My 2 cents.

What are the pros and cons of a distributed second level cache versus focusing on tuning database

we have a website that uses nhibernate and 2nd level cache. We are having a debate as one person wants to turn off the second level cache as we are moving to a multi webserver environment (with a load balancer in front).
One argument is to get rid of the second level cache and focus on optimizing and tuning the Db. the other argument is to roll out a distributed cache as the second level cache.
I am curious to hear folks pro and con here of DB tuning versus distributed cache (factoring in effort involved, cost, complexity, etc)
In case of a load balancing scenario you have to use a distributed cache provider to get best performance and consistency, that has nothing to do with optimizing your database. In any scenario you should optimize you database.
Both. You should have a distributed cache to prevent unecessary calls to the database and a tuned database so the initial calls are quickly returned. As an example, facebook required a significant amount of caching to scale, but I'm sure it wouldn't do much good if the initial queries took 10 minutes. :)
Two words: measure it.
Since you already have cache implement it you can probably measure what the impact would be of turning it off for benchmark purposes.
I would think that a multi-web server and a distributed second level cache can -and probably should- coexist.
First of all if we take as example memcached, it supports distributed object storing so if you're not using that, you could switch to that. it works.
Secondly, I'm guessing that you're introducing the web-server farm to respond to increasing web requests which will in turn mean increasing requests for data. If you kill your caching, it won't matter how much you optimize your database you're going to thrash it with queries. So you are going to improve your execution time, but while you wait for the database to return your data.
This is especially true for the case that web-node 1 requests dataset A and web-node 2 requests dataset A --> you are going to do the same query twice while with second level caching you only do it once.
So my recommendation is:
Don't kill your second level cache. You have already spent resources to implement it and by disabling it you are NOT going to improve your application's performance. Even a single node of memcached is going to be faster than having none at all.
Do optimize your database operations. This means both from the database side (indexes, views, sp's, functions, perhaps a cluster with read-only and write-only nodes) and application side (optimize your queries, lazy/eager loading profiling, don't fetch data you don't need, combine multiple queries into single-round-trips via Future, MutliQuery, MultiCriteria)
Do optimize your second-level cache implementation. There are datasets that have an infinite expiration date, and thus you query the db for them only once, and there are datasets that have short expiration dates, and thus probably expensive queries are executed more frequently. By optimizing your queries and your db you are going to improve the performance for the queries but the second-level cache is going to save your skin on peak load where short-expiration date datasets will be fetched by the cache more frequently.
If using textual queries is an everyday operation use the database's full-text capabilities or, even better, use a independent service like Lucene.NET (which can be integrated with NHibernate via NHibernate.Search)
That's a very difficult topic. In either case you need proficiency. Either a very proficient DBA, or a very proficient NHibernate / Cache administrator.
Personally, I prefer having full control over my SQL and tuning the database. Since you only have multiple webservers (and not necessarily multiple database instances), you might be better off that way, too. Modern databases have very efficient caches, so usually you create more harm with badly configured second-level caches in the application, rather than just letting the database cache sql statements, cursors, data, buffers, etc. I have experienced this to work very well for around 15 weblogic servers and only one database with lots of memory.
Since you do have NHibernate already, though, moving away from it, back to SQL (maybe with LINQ?) might be quite a costly task, that's not worth the effort.
We use NHibernate's 2nd level cache in our multi-server environment using Microsoft AppFabric distributed cache framework (NHibernate Velocity Provider) with great success.
Having said that, using 2nd level cache requires deeper understanding of the framework to prevent unexpected results. In addition, before using distributed caches, it is important to measure their overhead.
So my answer is basically - before using 2nd-level cache, you should really test and see whether it is really needed.

Performance Testing - How much data should I create

I'm very new to performance engineering, so I have a very basic question.
I'm working in a client-server system that uses SQL server backend. The application is a huge tax-related application that requires testing performance at peak load. Meaning that there should be like 10 million tax returns in the system when we run scenarios related to creating tax returns and submitting them. Then there will also be proportional number of users that need to be created.
Now I'm hearing in meetings that we need to create 10 million records to test performance and run scenarios with 5000 users and I just don't think it is feasible.
When one talks about creating a smaller dataset and extrapolating the performance planning, a very common answer I hear is that we need to 10 million records because we cannot tell from a smaller data set how the database or network will behave.
So how does one plan capacity and test performance on large enterprise application without creating peak level of data or running peak number of scenarios?
Thanks.
Personally, I would throw as much data and traffic at it as you can. Forget what traffic you "think you need to handle". And just see how much traffic you CAN handle and go from there. Knowing the limits of your system is more valuable than simply knowing it can handle 10 million records.
Maybe it does handle 10 million, but at 11 million it dies a horrible death. Or maybe it's well written and will scale to 100 million before it dies. There's a very distinct difference between the two even though both pass the "10 million test"
Now I'm hearing in meetings that we need to create 10 million records to test performance and run scenarios with 5000 users and I just don't think it is feasible.
Why do you think so?
Of course you can (and should) test with limited amounts of data, but you also really, really need to test with a realistic load, which means testing with the amount (and type) of data that you will use in production.
This is just a special case of a general rule: For system or integration testing, you need to test in a scenario that is as close as possible to production; ideally you just copy/clone a live production system, data, config and all and use that for testing. That is actually what we do (if we technically can and the client agrees). We just run a few SQL scripts to randomize personal data in the test data set, so prevent privacy concerns.
There are always issues that crop up because production data is somehow different from what you tested on, and this is the only way to prevent (or at least limit) these problems.
I've planned and implemented reporting and imports, and they invariably break or misbehave the first time they're exposed to real data, because there are always special cases or scaling problems you didn't expect. You want that breakage to happen during development, not in production :-).
In short:
Bite the bullet, and (after having done all the tests with "toy data"), get a realistic dataset to test on. If you don't have the hardware to handle that, then you don't have the right hardware for your tests :-).
I would take a look at Redgate's SQL Data Generator. It does a good job of generating representative data.
Have a peek at "The art of application performance testing / Ian Molyneaux, O’Reilly, 2009".
Your test data is ideally a realistic variety of records. But for first approximations you could have just a few unique records, and duplicate them until you have the desired size. Then use ApacheBench to roughly approximate the traffic.
To help generate data look at ruby faker and perl data faker. I have had good luck with it in generating large data sets for testing. SQL generator from redgate is good too.

Resources