Slow XA transactions in JBoss

Slow XA transactions in JBoss - sql-server

We are running jboss 4.2.2 with SQL server 2005 (sqljdbc driver 1.2).
We have recently installed new relic and can see a large bottleneck with our transactions.
Generally for any one web request the bottleneck of sits on one of these:
master..xp_sqljdbc_xa_start
master..xp_sqljdbc_xa_commit
org.jboss.resource.adapter.jdbc.WrapperDataSource.getConnection()
master..xp_sqljdbc_xa_end
Several hundred ms are spent on one of these items (in some cases several seconds). Cumulatively most of the response time is spent on these items.
I'm trying to indentify whether its any of the following:
Will moving away from XA transactions help?
Is there a larger problem at my database that I dont have visibility over?
Can I upgrade my SQL driver to help with this?
Or is this an indication that there are just a lot of queries, and we should start by looking at our code, and trying to lower the number of queries overall?

XA transactions are necessary if you are performing work against more than one resource in a single transaction, if you need consistency then you need XA. However you are talking in terms of "queries" which might imply that you are mostly doing read-only activities, and so XA may be overkill. Furthermore you don't speak about using multiple databases or other transactional resources so do you really need XA at all?
So first step: understand the requirements, do you need transaction scopes that span several database interactions? If you are just doing a single query then XA is not needed. If you have mixture of activities needing XA and simple queries not needing XA then use two different connections, one with XA and one without - this clarifies your intention. However I would expect XA drivers to use single resource optimisations so that if XA is not needed you don't pay the overhead so I suspect something more is going on here. (disclaimer I don't use JBoss so my intuition is suspect).
So look to see whether your transaction scopes are appropriate, isolation levels are sensible and so on. Are you getting contention because transactions are unreasonably long, for example are transactions held over user think time?
Next those multi-second waits: that implies contention (or some bizarre network issue) The only reason I can think of for an xa_start being slow is that writing a transaction log is taking unreasonably long - are your logs perhaps on some slow network device? Waits for getConnection() might simply imply that your connection pool is too small (or you're holding connections for too long) If xa_commit and xa_end are taking a long time I'd want to know what the resource managers are doing, can you get any info from the database server.
My overall position: If you truly need XA then you will pay some logging and network message overheads, but these should not be costing you hundreds or thousands of millseconds. Most business systems need XA in a small subset of their overall resource accesses typically when updating two otherwise independent systems, and almost never in read-only scenarios - absolute consistency across distributed systems is pretty much meaningless so using XA for queries is almost certainly overkill.

Related

Should I keep this "GlobalConnection" or create connection for every query?

I have inherited a legacy Delphi application that uses ADO to connect to SQL Server.
The application has a notion of a "Global Connection" -- that is a single connection that it opens at the start, and then keeps open all throughout the running of the application (which can be days, weeks, or longer....)
So my question is this: Should I keep this way of doing things or should I switch to a "connect-query-disconnect" mode of doing things? Does it matter?
Switching would be a non-trivial task, but I'll do it if it means better performance, data management, etc.

Well, it depends on what you're expecting to get out of it, and what kind of application it is.
There's nothing in particular wrong with using a single long-running connection, as long as the application can gracefully handle disconnections and recover or log/notify when it can't reconnect.
The problem with a connect-query-disconnect setup is that you're adding the overhead of connecting and disconnecting on every query. That's going to slow things down, and in an interactive GUI application users may notice the additional overhead. You also have to make sure that authorization is transparently handled if it isn't already.
At the same time, there may be interactive performance gains to be had if you can push all the queries off onto background threads and asynchronously update the GUI. If contention appears because the queries are serialized, you can migrate to a connection-pool system fairly readily as well and improve things even more. This has a fairly high complexity cost to it though, so now you're looking to balancing what the gains are compared to the work involved.
Right now, my ultimate response is "if it ain't broke, don't fix it." Changes along the lines you propose are a lot of work -- how much do the users of this application stand to gain? Are there other problems to solve that might benefit them more?
Edit: Okay, so it's broke. Well, slow at least, which is all the same to me. If you've ruled out problems with the SQL Server itself, and the queries are performing as fast as they can (i.e. DB schema is sane, the right indexes are available, queries aren't completely braindead, server has enough RAM and fast enough I/O, network isn't flaky, etc.), then yes, it's time to find ways to improve the performance of the app itself.
Simply moving to a connect-query-disconnect is going to make things worse, and the more queries you're issuing the bigger the drop off is going to be. It sounds like you're going to need to rearchitect the app so that you can run fewer queries, run them in the background, cache more aggressively on the client, or some combination of all 3.
Don't forget the making the clients perform better means that server side performance gets more important since it's probably going to be handling a higher load if clients start making multiple connections and issuing multiple queries in parallel.

As mr Frazier told before - the one global connection is not bad per se.
If you intend to change, first detect WHAT is the problem. Let's see some scenarios:
1
Some screens(IOW: an set of 1..n forms to operate in a business entity) are slow. Possible causes:
insuficient filtering resulting in a pletora of records being pulled from database without necessity.
the number of records are ok, but takes too much to render it. Solution: faster controls or intelligent rendering (ex.: Virtual list views)
too much queries each time you open an screen. Possible solutions: use TClientDatasets (or any in-memory dataset) to hold infrequently modified lookup tables. An more sophisticated cache for more extensive tables or opening those datasets in other threads can improve response times.
Scrolling on datasets with controls bound can be slow (just to remember, because those little details can be easily forgotten).
2
Whole app simply slows down. Checklist:
Network cards are ok? An few net cards mal-functioning can wreak havoc even on good structured networks as they create unnecessary noise on the line.
[MSSQL DBA HAT ON] The next on the line of attack is SQL Server. Ask the DBA to trace blocks and deadlocks. Register slow queries and work on them speed up. This relate directly to #1.1 and #1.3
Detect if some naive developer have done SELECT inside transactions. In read committed isolation, it's just overhead, as it'll create more network traffic. Open the query, retrieve the data and close the dataset.
Review the database schema, if you can.
Are any data-bound operations on a bulk of records (let's say, remarking the price of some/majority/all products) being done on the app? Make an SP or refactor the operation on an query, it'll be much faster and will reduce the load of the entire server.
Extensive operations on a group of records? Learn how to do that operations at once on the server instead of one-by-one record. See an examination of most used alternatives on the MSSQL MVP Erland Sommarskog's article on array and list on MSSQL.
Beware of queries with WHERE like : WHERE SomeFunction(table1.blabla) = #SomeParam . Most of time, that ones will not use an index causing to read the entire table to select the desired data. If is a big table.... Indexing on a persisted computed columns can make miracles...[MSSQL HAT OFF]
That's what I can think of without a little more detail... ;-)

Database connections are time consuming resources to create and the rule of thumb should be create as little as possible and reuse as much as possible. That's why some other technologies have database connection pools, which are typically established at application/service startup and then kept as long as possible and shared among threads.
From your comment, the application has performances issues, but it's difficult without more details to make any recommendation.
Should try to nail down what is slow - are all queries slow or just some specific ones?
If just some specific ones is there some correlation.
My 2 cents.

Which built-in Postgres replication fits my Django-based use-case best?

I've noticed that Postgres now has built-in replication, including synchronous replication, streaming replication and some other variants. it even provides the ability to control synchrony for specific operations at the application-leve (e.g., use synchronous for important stuff like money transfers, but maybe don't for less critical things like user comments, etc.)
I'm working on a software using Django 1.5 (i.e, dev) and will possibly need synchronous replication (will have commerce related transactions going on).
Do you think that the built-in tools are best for the job in most cases, and do you have any thoughts on one variant of the built-in replication vs another, ease of use related, quality, etc.?
One final thing; Slony and PGPool II seem to be pretty popular (Slony, in particular) for replication. Is there A) a particular, technical reason for their popularity over built-in replication or B) is it just because a lot of people are using versions that don't have built-in replication, or C) have I been under a rock and PG built-in replication is already quite popular?
Update (more details)
I have only 2 physical servers, and they're located in the same rack. My intention is to provide a slave which can automatically turn into the master, should something go catastophically wrong in one machine (or even something simple like double power supply failure, etc.). I don't mind if my clients experience downtime during an automatic failover, so long as the downtime is a few minutes or so, not an hour or something.
I would like for zero data loss, and am willing to sacrifice more time in the failover process for that. Is there a way to make this trade off without going for synchronous replication (e.g, streaming logs without write back confirmation or something)?
What strategy or variant of replication would you recommend?

I think you misunderstand the benefits and cost of synchronous commits on replication. In PostgreSQL, replication works by recovering slaves up to the master, using the standard crash recovery features of PostgreSQL. In the event that, for example, the power goes out, you can be sure that the write-ahead log segments will be run on both master and slave. With asynchronous commit, the commit is written to the WAL, the application is notified and the slave is notified more or less all at the same time depending on network characteristics, etc.
Where synchronous commit comes in handy is where two things are true:
You have more than one slave (this is critical!) and
You need durability guarantees that asynchronous commits can't offer you.
With synchronous commit, the database waits until it hears back from a configurable number of slaves to tell the application that the commit has happened. This offers durability guarantees in a few cases where asynchronous commits are unable to work.
For example, suppose your master server takes a bullet through a raid array and immediately crashes (sorry, I couldn't think of any better examples with good hardware). Or suppose someone trips on a power cord and not only powers off the server but corrupts the software RAID device. In this case it is possible that a couple of transactions may not have been replicated and your WAL is unrecoverable, so those transactions are lost. With synchronized commit, the application would have waited until durability guarantees were met.
One thing this means is that if you do synchronous commit with only one slave your availability cannot outlast a crash on either master or slave, so your availability will be worse than it would have been with just one server. It also means that if your slave is geographically removed, that you introduce significant latency in your application's request to commit transactions.
Switching from async to sync commit it not a big change, but generally, I think that you get the most out of sync commit when you have already done as much as you can assurance and availability-wise on your hardware already. Start with async and move up when you can't further optimize your setup as async.

Re: "Slony and PGPool II seem to be pretty popular (Slony, in particular) for replication. Is there A) a particular, technical reason for their popularity over built-in replication or B) is it just because a lot of people are using versions that don't have built-in replication, or C) have I been under a rock and PG built-in replication is already quite popular?"
Slony is popular because it has been around for quite a long time, and the built-in PostgreSQL replication is relatively new. Cascading replication built in to PostgreSQL is even newer, and is something that Slony-I was built with.
There are two main advantages to Slony-I, first, you can replicate between differing versions of PostgreSQL, whereas the built-in replication system not only must use the same version, but the two servers must also be binary compatible. The other advantage is that you can replicate only certain tables on Slony-I instead of the whole database cluster. The disadvantages of Slony-I are numerous, and include poor user-friendliness, no synchronous commits, and difficult DDL (schema) changes. I believe that use of the built-in replication in Postgres will quickly exceed the Slony-I user base if it hasn't already done so.
As far as I remember, PGPool II uses statement-based replication (like what MySQL has had built-in), and is definitely the least desirable, in my opinion.
I would use the built-in hot standby/streaming replication in PostgreSQL. You can start with synchronous commit turned on and turn it off if you don't need it or the penalty is too high, or vice versa. Over a LAN, asynchronous mode seems to reach the slave in the order of a hundred milliseconds or so (from my own experience).

Is relational database appropriate for soft real-time system?

I'm working on a real-time video analysis system which processes the video stream frame by frame. At each frame it can generate several events which should be recorded and some delivered to another system via network. The system is soft real-time, i.e. message latencies higher than 25ms are highly undesirable, but not fatal.
Are relational databases (specifically, MySQL and Postgres) appropriate as the datastore for such system?
Can I expect the DB to work well when it is installed on its own server and has ~50 25fps streams of single-row SQL inserts coming in over the network?
EDIT: I think in general performance would not be a problem, but I worry about the latency variance. If it will occasionally delay for 1000 ms, that would be very bad.
Oh, and the system runs 24/7 so the DB could grow arbitrarily big. Does that degrade the insert latency?

I wouldn't worry too much about performance when choosing a relational database over another type of datastore, choose the solution that best meets your requirements for accessing that data later. However, if you do choose not only a RDBMS but one over the network then you might want to consider buffering events to a local disk briefly on their way over to the DB. Use a separate thread or process or something to push events into the DB to keep the realtime system unaffected.

Biggest problems are how unpredictable the latency will be and how it never goes down, always up. But modern hardware to the rescue, specify a machine with enough cpu cores. You can count on at least two, getting four is easy. So you can spin up a thread and dedicate one core to the dbase updates, isolating it from your soft real-time code. Now you don't care about the variability in the delays, at least as long as the dbase updates don't take so long that you generate data faster than it can consume.
Setup a dbase server and load it up with fake data, double the amount you think it ever needs to store. Test continuously while you develop, add the instrumenting code you need to measure how it is doing at an early stage in the project.

As I've written, if you queue the rows that need to be saved and save them in an async way (so not to stop the "main" thread) there shouldn't be any problem... BUT!!!
You want to save them in a DB... So someone else will read the rows AT THE SAME TIME they are being written. Sadly it's normally quite difficult to tell to a DB "this work is very high priority, everything else can be stalled but not this". So if someone does:
BEGIN TRANSACTION
SELECT COUNT(*) FROM TABLE
WAITFOR DELAY '01:00:00'
(I'm using T-Sql here... But I think it's quite clear. Ask for the COUNT(*) of the table, so that there is a lock on the table and then WAITFOR an hour)
then the writes could be stalled and go in timeout. In general if you configure everyone but the app to be able only to do reads, these problems shouldn't be present.

What are the pros and cons of a distributed second level cache versus focusing on tuning database

we have a website that uses nhibernate and 2nd level cache. We are having a debate as one person wants to turn off the second level cache as we are moving to a multi webserver environment (with a load balancer in front).
One argument is to get rid of the second level cache and focus on optimizing and tuning the Db. the other argument is to roll out a distributed cache as the second level cache.
I am curious to hear folks pro and con here of DB tuning versus distributed cache (factoring in effort involved, cost, complexity, etc)

In case of a load balancing scenario you have to use a distributed cache provider to get best performance and consistency, that has nothing to do with optimizing your database. In any scenario you should optimize you database.

Both. You should have a distributed cache to prevent unecessary calls to the database and a tuned database so the initial calls are quickly returned. As an example, facebook required a significant amount of caching to scale, but I'm sure it wouldn't do much good if the initial queries took 10 minutes. :)

Two words: measure it.
Since you already have cache implement it you can probably measure what the impact would be of turning it off for benchmark purposes.

I would think that a multi-web server and a distributed second level cache can -and probably should- coexist.
First of all if we take as example memcached, it supports distributed object storing so if you're not using that, you could switch to that. it works.
Secondly, I'm guessing that you're introducing the web-server farm to respond to increasing web requests which will in turn mean increasing requests for data. If you kill your caching, it won't matter how much you optimize your database you're going to thrash it with queries. So you are going to improve your execution time, but while you wait for the database to return your data.
This is especially true for the case that web-node 1 requests dataset A and web-node 2 requests dataset A --> you are going to do the same query twice while with second level caching you only do it once.
So my recommendation is:
Don't kill your second level cache. You have already spent resources to implement it and by disabling it you are NOT going to improve your application's performance. Even a single node of memcached is going to be faster than having none at all.
Do optimize your database operations. This means both from the database side (indexes, views, sp's, functions, perhaps a cluster with read-only and write-only nodes) and application side (optimize your queries, lazy/eager loading profiling, don't fetch data you don't need, combine multiple queries into single-round-trips via Future, MutliQuery, MultiCriteria)
Do optimize your second-level cache implementation. There are datasets that have an infinite expiration date, and thus you query the db for them only once, and there are datasets that have short expiration dates, and thus probably expensive queries are executed more frequently. By optimizing your queries and your db you are going to improve the performance for the queries but the second-level cache is going to save your skin on peak load where short-expiration date datasets will be fetched by the cache more frequently.
If using textual queries is an everyday operation use the database's full-text capabilities or, even better, use a independent service like Lucene.NET (which can be integrated with NHibernate via NHibernate.Search)

That's a very difficult topic. In either case you need proficiency. Either a very proficient DBA, or a very proficient NHibernate / Cache administrator.
Personally, I prefer having full control over my SQL and tuning the database. Since you only have multiple webservers (and not necessarily multiple database instances), you might be better off that way, too. Modern databases have very efficient caches, so usually you create more harm with badly configured second-level caches in the application, rather than just letting the database cache sql statements, cursors, data, buffers, etc. I have experienced this to work very well for around 15 weblogic servers and only one database with lots of memory.
Since you do have NHibernate already, though, moving away from it, back to SQL (maybe with LINQ?) might be quite a costly task, that's not worth the effort.

We use NHibernate's 2nd level cache in our multi-server environment using Microsoft AppFabric distributed cache framework (NHibernate Velocity Provider) with great success.
Having said that, using 2nd level cache requires deeper understanding of the framework to prevent unexpected results. In addition, before using distributed caches, it is important to measure their overhead.
So my answer is basically - before using 2nd-level cache, you should really test and see whether it is really needed.

Quick and dirty way to compare SQL server performance

Further to my previous question about the Optimal RAID setup for SQL server, could anyone suggest a quick and dirty way of benchmarking the database performance on the new and old servers to compare them? Obviously, the proper way would be to monitor our actual usage and set up all sorts of performance counters and capture the queries, etc., but we are just not at that level of sophistication yet and this isn't something we'll be able to do in a hurry. So in the meanwhile, I'm after something that would be a bit less accurate, but quick to do and still better than nothing. Just as long as it's not misleading, which would be worse than nothing. It should be SQL Server specific, not just a "synthetic" benchmark. It would be even better if we could use our actual database for this.

Measure the performance of your application itself with the new and old servers. It's not necessarily easy:
Set up a performance test environment with your application on (depending on your architecture this may consist of several machines, some of which may be able to be VMs, but some of which may not be)
Create "driver" program(s) which give the application simulated work to do
Run batches of work under the same conditions - remember to reboot the database server between runs to nullify effects of caching (Otherwise your 2nd and subsequent runs will probably be amazingly fast)
Ensure that the performance test environment has enough hardware machines in to be able to load the database heavily - this may mean swapping out some VMs for real hardware.
Remember to use production-grade hardware in your performance test environment - even if it is expensive.
Our database performance test cluster contains six hardware machines, several of which are production-grade, one of which contains an expensive storage array. We also have about a dozen VMs on a 7th simulating other parts of the service.

you can always insert, read, and delete a couple of million rows - it's not a realistic mix of operations but it should strain the disks nicely...

Find at least a couple of the queries that are taking some time, or at least that you suspect are taking time, insert a lot of data if you don't have it already, and run the queries having set:
SET STATISTICS IO ON
SET STATISTICS TIME ON
SET STATISTICS PROFILE ON
Those should give you a rough idea of the resources being consumed.
You can also run SQL Server Profiler to get a general idea of what queries are taking a long time and how long they are taking plus other statistics. It outputs a lot of data so try to filter it down a little bit, possibly by long duration or one of the other performance statistics.