Use cases for BMC and BMT in EJB 3.1 - ejb-3.1

I am analyzing the possible use cases in which Bean Managed Transactions(BMT) and Bean Managed Concurrency(BMC) can be used. Following is the result of my ground work on BMT,
Transactions that may run for an unpredictable time
Container imposes a timeout (eventhough it is configurable) on transactions.
For some transactions (eg. FTPing files over to a third party vendor, persisting the results of a jms queue distribution, or committing across two different data sources), transaction times can be quite unpredictable.
In such cases, Bean Provider may take control over Transaction management using BMT.
Sticking on to CMT in such cases might result in a lot of exceptions and transaction failures.
Multiple transactions within a single bean method
CMT allows only a single transaction for a bean method.
In case, we need to use many transactions, we must use BMT - ofcourse it may be possible, in most cases, to have a work around in CMT. But workaround may not be optimal sometimes.
Single transaction that spans multiple EJB method calls
If multiple methods must be called with the scope of the same transaction, BMT can be used.
How ever, I have not found any interesting use case for BMC. Only information I was able get is from Enterprise JavaBeans 3.1(6th edition), which says
" ... container-managed concurrency does not cover the full breadth of concerns that multithreaded code must address. In these cases the specification makes available the full power of the Java languageā€™s concurrent tools by offering a bean-managed concurrency mode."
It will be really nice if you could share your experiences on specific use cases where you have used BMT or BMC.

BMT is also useful if you want to begin/commit transactions multiple times from within the same bean method. It's also useful if you want precise control over exceptions thrown during beforeCompletion; for example, JPA optimistic locking can throw an exception during commit that can only be handled from within the EJB if you're using BMT.
BMC is useful if your methods are delegating to another service that already handles concurrency. For example, if you have a singleton that is managing data using a ConcurrentHashMap, then you don't need the container to perform additional synchronization.

Related

Distributed transactions - why do we save tranlogs to file system?

All transaction managers (Atomikos, Bitronix, IBM WebSphere TM etc) save some "transaction logs" into 'tranlogs' folder to file system.
When something terrible happens and server gets down sometimes tranlogs become broken.
They require some manual recovery procedure.
I've been told that by simply clearing broken tranlogs folder I risk to have an inconsistent state of resources that participated in transactions.
As a "dumb" developer I feel more comfortable with simple concepts. I want to think that distributed transaction management should be alike the regular transaction management:
If something went wrong at any party (network, app error, timeout) - I expect the whole multi-resource transaction not to be committed in any part of it. All leftovers should be cleaned up sooner or later automatically.
If transaction managers fails (file system fault, power supply fault) - I expect all the transactions under this TM to be rollbacked (apparently, at DB timeout level).
File storage for tranlogs is optional if I don't want to have any automatic TX recovery (whatever it would mean).
Questions
Why can't I think like this? What's so complicated about 2PC?
What are the exact risks when I clear broken tranlogs?
If I am wrong and I really need all the mess with 2PC file system state. Don't you feel sick about the fact that TX manager can actually break storage state in an easy and ugly manner?
When I was first confronted with 2 phase commit in real life in 1994 (initially on a larger Oracle7 environment), I had a similar initial reaction. What a bloody shame that it is not generally possible to make it simple. But looking back at algorithm books of university, it become clear that there is no general solution for 2PC.
See for instance how to come to consensus in a distributed environment
Of course, there are many specific cases where a 2PC commit of a transaction can be resolved more easy to either complete or roll back completely and with less impact. But the general problem stays and can not be solved.
In this case, a transaction manager has to decide at some time what to do; a transaction can not remain open forever. Therefor, as an ultimate solution they will always need to have go back to their own transaction logs, since one or more of the other parties may not be able to reliably communicate status now and in the near future. Some transaction managers might be more advanced and know how to resolve some cases more easily, but the need for an ultimate fallback stays.
I am sorry for you. Fixing it generally seems to be identical to "Falsity implies anything" in binary logic.
Summarizing
On Why can't I think like this? and What's so complicated about 2PC: See above. This algorithmetic problem can't be solved universally.
On What are the exact risks when I clear broken tranlogs?: the transaction manager has some database backing it. Deleting translogs is the same problem in general relational database software; you loose information on the transactions in process. Some db platforms can still have somewhat or largely integer files. For background and some database theory, see Wikipedia.
On Don't you feel sick about the fact that TX manager can actually break storage state in an easy and ugly manner?: yes, sometimes when I have to get a lot of work done by the team, I really hate it. But well, it keeps me having a job :-)
Addition: to 2PC or not
From your addition I understand that you are thinking whether or not to include 2PC in your projects.
In my opinion, your mileage may vary. Our company has as policy for 2PC: avoid it whenever possible. However, in some environments and especially with legacy systems and complex environments such a found in banking you can not work around it. The customer requires it and they may be not willing to allow you to perform a major change in other infrastructural components.
When you must do 2PC: do it well. I like a clean architecture of the software and infrastructure, and something that is so simple that even 5 years from now it is clear how it works.
For all other cases, we stay away from two phase commit. We have our own framework (Invantive Producer) from client, to application server to database backend. In this framework we have chosen to sacrifice elements of ACID when normally working in a distributed environment. The application developer must take care himself of for instance atomicity. Often that is possible with little effort or even doesn't require thinking about. For instance, all software must be safe for restart. Even with atomicity of transactions this requires some thinking to do it well in a massive multi user environment (for instance locking issues).
In general this stupid approach is very easy to understand and maintain. In cases where we have been required to do two phase commit, we have been able to just replace some plug-ins on the framework and make some changes to client-side code.
So my advice would be:
Try to avoid 2PC.
But encapsulate your transaction logic nicely.
Allowing to do 2PC without a complete rebuild, but only changing things where needed.
I hope this helps you. If you can tell me more about your typical environments (size in #tables, size in GB persistent data, size in #concurrent users, typical transaction mgmt software and platform) may be i can make some additions or improvements.
Addition: Email and avoiding message loss in 2PC
Regarding whether suggesting DB combining with JMS: No, combining DB with JMS is normally of little use; it will itself already have some db, therefor the original question on transaction logs.
Regarding your business case: I understand that per event an email is sent from a template and that the outgoing mail is registered as an event in the database.
This is a hard nut to crack; I've been enjoying doing security audits and one of the easiest security issues to score was checking use of email.
Email - besides not being confidential and tampersafe in most situations like a postcard - has no guarantees for delivery and/or reading without additional measures. For instance, even when email is delivered directly between your mail transfer agent and the recipient, data loss can occur without the transaction monitor being informed. That even gets worse when multiple hops are involved. For instance, each MTA has it's own queueing mechanism on which a "bomb can be dropped" leading to data loss. But you can also think of spam measures, bad configuration, mail loops, pressing delete file by accident, etc. Even when you can register the sending of the email without any loss of transaction information using 2PC, this gives absolutely no clue on whether the email will arrive at all or even make it across the first hop.
The company I work for sells a large software package for project-driven businesses. This package has an integrated queueing mechanism, which also handles email events. Typically combined in most implementation with Exchange nowadays. A few months we've had a nice problem: transaction started, opened mail channel, mail delivered to Exchange as MTA, register that mail was handled... transaction aborted, since Oracle tablespace full. On the next run, the mail was delivered again to Exchange, again abort, etc. The algorithm has been enhanced now, but from this simple example you can see that you need all endpoints to cooperate in your 2PC, even when some of the endpoints are far away in an organisation receiving and displaying your email.
If you need measures to ensure that an email is delivered or read, you will need to supplement it by additional measures. Please pick one of application controls, user controls and process controls from literature.

Does additional Solr search handler has any impact on performance?

I am having three custom defined search handlers for Solr 4. It works fine, however I want to know whether it has any impact on RAM & CPU utilization and overall performance when considering an index size of 10 gb with replication setup.
I do not find any documentation on this. Any idea would be great.
Or do you recommend to stick to default handlers or use single handlers? Why?
Just by defining more request handler, no you should not have a performance impact. As quoted from the Reference Docs
A request handler is a Solr plug-in that defines the logic to be used when Solr processes a request.
So a request handler is the blueprint of processing logic to perform, when a request comes in. Just by defining more ways how this can be handled does not bear a performance impact. Of course this could be a matter of numbers. If you define hundreds or thousands of handlers that could have an impact. I just never saw such a configuration.
The operations that consume CPU and RAM are operations that interact with the index, like searching and indexing. When your clients start using the new request handlers, you will see a grown consumption in resources. But this is due to the usage by the client, not the mere definition of the way how to consume.

GAE transaction failure and idempotency

The Google App Engine documentation contains this paragraph:
Note: If your application receives an exception when committing a
transaction, it does not always mean that the transaction failed. You
can receive DatastoreTimeoutException,
ConcurrentModificationException, or DatastoreFailureException
exceptions in cases where transactions have been committed and
eventually will be applied successfully. Whenever possible, make your
Datastore transactions idempotent so that if you repeat a transaction,
the end result will be the same.
Wait, what? It seems like there's a very important class of transactions that just simply cannot be made idempotent because they depend on current datastore state. For example, a simple counter, as in a like button. The transaction needs to read the current count, increment it, and write out the count again. If the transaction appears to "fail" but doesn't REALLY fail, and there's no way for me to tell that on the client side, then I need to try again, which will result in one click generating two "likes." Surely there is some way to prevent this with GAE?
Edit:
it seems that this is problem inherent in distributed systems, as per non other than Guido van Rossum -- see this link:
app engine datastore transaction exception
So it looks like designing idempotent transactions is pretty much a must if you want a high degree of reliability.
I was wondering if it was possible to implement a global system across a whole app for ensuring idempotency. The key would be to maintain a transaction log in the datastore. The client would generated a GUID, and then include that GUID with the request (the same GUID would be re-sent on retries for the same request). On the server, at the start of each transaction, it would look in the datastore for a record in the Transactions entity group with that ID. If it found it, then this is a repeated transaction, so it would return without doing anything.
Of course this would require enabling cross-group transactions, or having a separate transaction log as a child of each entity group. Also there would be a performance hit if failed entity key lookups are slow, because almost every transaction would include a failed lookup, because most GUIDs would be new.
In terms of the additional $ cost in terms of additional datastore interactions, this would probably still be less than if I had to make every transaction idempotent, since that would require a lot of checking what's in the datastore in each level.
dan wilkerson, simon goldsmith, et al. designed a thorough global transaction system on top of app engine's local (per entity group) transactions. at a high level, it uses techniques similar to the GUID one you describe. dan dealt with "submarine writes," ie the transactions you describe that report failure but later surface as succeeded, as well as many other theoretical and practical details of the datastore. erick armbrust implemented dan's design in tapioca-orm.
i don't necessarily recommend that you implement his design or use tapioca-orm, but you'd definitely be interested in the research.
in response to your questions: plenty of people implement GAE apps that use the datastore without idempotency. it's only important when you need transactions with certain kinds of guarantees like the ones you describe. it's definitely important to understand when you do need them, but you often don't.
the datastore is implemented on top of megastore, which is described in depth in this paper. in short, it uses multi-version concurrency control within each entity group and Paxos for replication across datacenters, both of which can contribute to submarine writes. i don't know if there are public numbers on submarine write frequency in the datastore, but if there are, searches with these terms and on the datastore mailing lists should find them.
amazon's S3 isn't really a comparable system; it's more of a CDN than a distributed database. amazon's SimpleDB is comparable. it originally only provided eventual consistency, and eventually added a very limited kind of transactions they call conditional writes, but it doesn't have true transactions. other NoSQL databases (redis, mongo, couchdb, etc.) have different variations on transactions and consistency.
basically, there's always a tradeoff in distributed databases between scale, transaction breadth, and strength of consistency guarantees. this is best known by eric brewer's CAP theorem, which says the three axes of the tradeoff are consistency, availability, and partition tolerance.
The best way I came up with making counters idempotent is using a set instead of an integer in order to count. Thus, when a person "likes" something, instead of incrementing a counter I add the like to the thing like this:
class Thing {
Set<User> likes = ....
public void like (User u) {
likes.add(u);
}
public Integer getLikeCount() {
return likes.size();
}
}
this is in java, but i hope you get my point even if you are using python.
This method is idempotent and you can add a single user for how many times you like, it will only be counted once. Of course, it has the penalty of storing a huge set instead of a simple counter. But hey, don't you need to keep track of likes anyway? If you don't want to bloat the Thing object, create another object ThingLikes, and cache the like count on the Thing object.
another option worth looking into is app engine's built in cross-group transaction support, which lets you operate on up to five entity groups in a single datastore transaction.
if you prefer reading on stack overflow, this SO question has more details.

Slow XA transactions in JBoss

We are running jboss 4.2.2 with SQL server 2005 (sqljdbc driver 1.2).
We have recently installed new relic and can see a large bottleneck with our transactions.
Generally for any one web request the bottleneck of sits on one of these:
master..xp_sqljdbc_xa_start
master..xp_sqljdbc_xa_commit
org.jboss.resource.adapter.jdbc.WrapperDataSource.getConnection()
master..xp_sqljdbc_xa_end
Several hundred ms are spent on one of these items (in some cases several seconds). Cumulatively most of the response time is spent on these items.
I'm trying to indentify whether its any of the following:
Will moving away from XA transactions help?
Is there a larger problem at my database that I dont have visibility over?
Can I upgrade my SQL driver to help with this?
Or is this an indication that there are just a lot of queries, and we should start by looking at our code, and trying to lower the number of queries overall?
XA transactions are necessary if you are performing work against more than one resource in a single transaction, if you need consistency then you need XA. However you are talking in terms of "queries" which might imply that you are mostly doing read-only activities, and so XA may be overkill. Furthermore you don't speak about using multiple databases or other transactional resources so do you really need XA at all?
So first step: understand the requirements, do you need transaction scopes that span several database interactions? If you are just doing a single query then XA is not needed. If you have mixture of activities needing XA and simple queries not needing XA then use two different connections, one with XA and one without - this clarifies your intention. However I would expect XA drivers to use single resource optimisations so that if XA is not needed you don't pay the overhead so I suspect something more is going on here. (disclaimer I don't use JBoss so my intuition is suspect).
So look to see whether your transaction scopes are appropriate, isolation levels are sensible and so on. Are you getting contention because transactions are unreasonably long, for example are transactions held over user think time?
Next those multi-second waits: that implies contention (or some bizarre network issue) The only reason I can think of for an xa_start being slow is that writing a transaction log is taking unreasonably long - are your logs perhaps on some slow network device? Waits for getConnection() might simply imply that your connection pool is too small (or you're holding connections for too long) If xa_commit and xa_end are taking a long time I'd want to know what the resource managers are doing, can you get any info from the database server.
My overall position: If you truly need XA then you will pay some logging and network message overheads, but these should not be costing you hundreds or thousands of millseconds. Most business systems need XA in a small subset of their overall resource accesses typically when updating two otherwise independent systems, and almost never in read-only scenarios - absolute consistency across distributed systems is pretty much meaningless so using XA for queries is almost certainly overkill.

Why do major DB vendors not provide truly asynchronous APIs?

I work with Oracle and Mysql, and I struggle to understand why the APIs are not written such that I can issue a call, go away and do something else, and then come back and pick it up later eg NIO - I am forced to dedicate a thread to waiting for data. It seems that the SQL interfaces are the only place where sync IO is still forced, which means tying up a thread waiting for the DB.
Can anybody explain the reasons for this? Is there something fundamental that makes this difficult?
It would be great to be able to use 1-2 threads to manage my DB query issue and result fetch, rather than use worker threads to retrieve data.
I do note that there are two experimental attempts (eg: adbcj) at implementing an async API but none seem to be ready for Production use.
Database servers should be able to handle thousands of clients. To provide an asyncronous interface, the DB server will need to keep the resultset from the query in memory, so you can pick it up at later stage. It will quickly become out of resources.
A considerable problem with async is many many libraries use threadlocal for transactions.
For example in Java Much of the JDBC specification relies on a synchronous behavior to achieve single thread per-transaction. That is you write your transaction in procedural order.
To do it right transactions would have to be done through callback but they are not. I know of only node.js that does this but its unclear if its really async.
Of course even if you do async I'm not sure if it will really improve performance as the database itself if is probably doing it synchronous.
There are lots of ways to avoid thread over-population in (Java):
Is asynchronous jdbc call possible?
Personally to get around this issue I use a Message Bus like RabbitMQ.

Resources