In my application I read from JMS and write to database. To handle this situation the ActiveMQ documentation has following:
onMessage
try {
if I have not processed this message successfully before {
do some stuff in the database / with EJBs etc
jdbc.commit() (unless auto-commit is enabled on the JDBC)
}
jms.commit()
} catch (Exception e) {
jms.rollback()
}
My question is let's say we face issue when doing jms.commit(), then we rollback the jms session. But our db commit already done. Since we rolledback jms session, queue will send that data again to consumer which will result in duplicate data in database. We have experienced this on failover scenario on ActiveMQ Artemis queue. Is there a alternative way i can handle this without causing duplicates on db or data loss ?
Typically this situation would be handled using XA transactions (which use a 2-phase commit protocol). This is extremely common in Java EE where you have an MDB consuming a message and then working with a database or another JMS provider. All the work is done atomically so if any one part fails (e.g. MDB consuming the message, database work, sending message to another provider) then they all fail so that all the data across systems remains consistent.
You appear to be eschewing XA transactions (not sure why) and manually committing or rolling back individual JMS and JDBC transactions. There's always going to be a (fairly high) risk of data inconsistency with this approach. How you deal with it will depend on your specific constraints.
To be clear, the pseudo-code from the ActiveMQ documentation which you cited is not using 2-phase commit.
Related
I have an application where I need to store some data in a database (mysql for instance) and then publish some data in a message queue. My problem is: If the application crashes after the storage in the database, my data will never be written in the message queue and then be lost (thus eventual consistency of my system will not be guaranted).
How can I solve this problem ?
I have an application where I need to store some data in a database (mysql for instance) and then publish some data in a message queue. My problem is: If the application crashes after the storage in the database, my data will never be written in the message queue and then be lost (thus eventual consistency of my system will not be guaranted). How can I solve this problem ?
In this particular case, the answer is to load the queue data from the database.
That is, you write the messages that need to be queued to the database, in the same transaction that you use to write the data. Then, asynchronously, you read that data from the database, and write it to the queue.
See Reliable Messaging without Distributed Transactions, by Udi Dahan.
If the application crashes, recovery is simple -- during restart, you query the database for all unacknowledged messages, and send them again.
Note that this design really expects the consumers of the messages to be designed for at least once delivery.
I am assuming that you have a loss-less message queue, where once you get a confirmation for writing data, the queue is guaranteed to have the record.
Basically, you need a loop with a transaction that can roll back or a status in the database. The pseudo code for a transaction is:
Begin transaction
Insert into database
Write to message queue
When message queue confirms, commit transaction
Personally, I would probably do this with a status:
Insert into database with a status of "pending" (or something like that)
Write to message queue
When message confirms, change status to "committed" (or something like that)
In the case of recovery from failure, you may need to check the message queue to see if any "pending" records were actually written to the queue.
I'm afraid that answers (VoiceOfUnreason, Udi Dahan) just sweep the problem under the carpet. The problem under carpet is: How the movement of data from database to queue should be designed so that the message will be posted just once (without XA). If you solve this, then you can easily extend that concept by any additional business logic.
CAP theorem tells you the limits clearly.
XA transactions is not 100% bullet proof solution, but seems to me best of all others that I have seen.
Adding to what #Gordon Linoff said, assuming durable messaging (something like MSMQ?) the method/handler is going to be transactional, so if it's all successful, the message will be written to the queue and the data to your view model, if it fails, all will fail...
To mitigate the ID issue you will need to use GUIDs instead of DB generated keys (if you are using messaging you will need to remove your referential integrity anyway and introduce GUIDS as keys).
One more suggestion, don't update the database, but inset only/upsert (the pending row and then the completed row) and have the reader do the projection of the data based on the latest row (for example)
Writing message as part of transaction is a good idea but it has multiple drawbacks like
If your
a. database/language does not support transaction
b. transaction are time taking operation
c. you can not afford to wait for queue response while responding to your service call.
d. If your database is already under stress, writing message will exacerbate the impact of higher workload.
the best practice is to use Database Streams. Most of the modern databases support streams(Dynamodb, mongodb, orcale etc.). You have consumer of database stream running which reads from database stream and write to queue or invalidate cache, add to search indexer etc. Once all of them are successful you mark the stream item as processed.
Pros of this approach
it will work in the case of multi-region deployment where there is a regional failure. (you should read from regional stream and hydrate all the regional data stores.)
No Overhead of writing more records or performance bottle necks of queues.
You can use this pattern for other data sources as well like caching, queuing, searching.
Cons
You may need to call multiple services to construct appropriate message.
One database stream might not be sufficient to construct appropriate message.
ensure the reliability of your streams, like redis stream is not reliable
NOTE this approach also does not guarantee exactly once semantics. The consumer logic should be idempotent and should be able to handle duplicate message
I'm trying to create a router to integrate a number of JMS topics & Queues. I am constrained by the fact the client I am working for can't change the JMS implementation (TibCo EMS with some custom client libraries) and the fact that they have written their own XA transaction manager which doesn't quite conform with the JTA spec. It is very important that message delivery is guaranteed.
I've done a lot of reading and experimenting with Camel and I've realised that I probably need to write my own JMS component, as the standard JMS component is not going to integrate with the JMS client libraries or TM I have.
I need to be able to put hooks into the route lifecycle at the following points:
During the route startup, I need to identify all JMS connections and enlist them as XA resources with the TM implementation
When a message is received at the consumer, I need to start a transaction including all the JMS connections in the route
When a routing decision is made, I need to send the message to the producer and commit the transaction
Given the above, I think I can implement a very simplified version of the camel-jms component which strips out all the Spring parts and only contains the bare minimum required to interact with my JMS libraries.
Where would be the best place to initialise the transaction manager? I've been looking at DefaultCamelContext, RoutePolicy and RouteContext but I can't find a place where all the endpoints are resolved and initialised.
I solved this problem by implementing the UserTransaction and TransactionManager interfaces and creating a PlatformTransactionManager which the Camel JMS component uses to create the DefaultMessageListenerContainer.
One important point to note is that the transacted property on the Camel JMSComponent refers to local transactions, not XA transactions. If you set this property to true after passing a PlatformTransactionManager to the component, the DMLC will effectively try to commit your transaction twice, which won't work.
This leaves me with a nice working example consuming from one JMS broker and producing to another, but it is very slow - ~5 messages per second. Unfortunately Spring JMS does not support batching so it seems the best solution here is to adjust the JMS topic configurations such that routing only takes place between topics on the same broker.
I am implementing a callback in java to store messages in a database. I have a client subscribing to '#'. But the problem is when this # client disconnects and reconnect it adds duplicate entries in the database of retained messages. If I search for previous entries bigger tables will be expensive in computing power. So should I allot a separate table for each sensor or per broker. I would really appreciate if you suggest me better designs.
Subscribing to wildcard with a single client is definitely an anti-pattern. The reasons for that are:
Wildcard subscribers get all messages of the MQTT broker. Most client libraries can't handle that load, especially not when transforming / persisting messages.
If you wildcard subscriber dies, you will lose messages (unless the broker queues endlessly for you, which also doesn't work)
You essentially have a single point of failure in your system. Use MQTT brokers which are hardened for production use. These are much more robust single point of failures than your hand-written clients. (You can overcome the SIP through clustering and load balancing, though).
So to solve the problem, I suggest the following:
Use a broker which can handle shared subscriptions (like HiveMQ or MessageSight), so you can balance all messages between many clients
Use a custom plugin for doing the persistence at the broker instead of the client.
You can also read more about that topic here: http://www.hivemq.com/blog/mqtt-sql-database
Also consider using QoS = 3 for all message to make sure one and only one message is delivered. Also you may consider time-stamp each message to avoid inserting duplicate messages if QoS requirement is not met.
Imagine a Java ecosystem where three separate Spring web application is running in separate JVMs and on separate machines (no application server involved, just simple servlet containers). Two of these applications are using their own database accessed using JPA. Now the third application (a coordinator) provides services to the outside world and some service function executes remote operations which requires participation from the other two apps in a transactional manner, which means that if one of the applications fails to do the data manipulation in the database, the other should be rolled back as well. The problem is: how can this be achieved using Spring?
Currently we are using REST to communicate between the applications. Clearly this cannot support transactions, even though there are efforts to make this happen.
I've found JTA which is capable of organizing global transactions. JTA involves creating XAResource instances which are participating in the globally managed transactions. If i understood correctly, these XAResource instance can reside on separate JVMs. Initialization, commit and rollback of resources happens via JMS communication which means it requires a message broker to transfer messages between participants. There are various JTA implementation exists, I've found Atomikos which seems to be the most used.
Now the thing i don't see is how this all comes up if i have a Spring application on each application side. I've not found any example projects yet which is doing JTA over a network. Also i don't undertstand what are XAResources representing. If i use JPA, and say i have a Account object in an application which stores a user's balance, and i have to decrease the balance from the coordinator, should i create an XAResource implementation which allows decreasing the balance? Or XAResource is implemented by a lower level thing like the JDBC driver or Spring Data JPA? In the latter case how can i provide high level CRUD operations for the transaction coordinator.
XAResource is a lower level API. You could write your own for the coordinator, but I think that isn't necesary. Instead, leverage JMS + JTA on the coordinator and JTA on the app servers.
In the normal case, you'd have this:
Coordinator receives request and starts JTA transaction
Coordinator calls app 1 over JMS
App 1 receives JMS message
App 1 calls DB 1 using JTA
Coordinator calls app 2 over JMS
App 2 receives JVM message
App 2 calls DB 2 using JTA
Coordinator commits tx
Note that JTA is used for all the transactions - this will be a global TX that's shared across all the servers. If any of these steps fail, then they will be rolled back.
Spring should be able to make this transparent once you get it all set up. Just make sure your DAO & service calls are transactional. Atomikos will need to be configured so that each server uses the same JTA tx manager.
REST does support transactions now, via the Atomikos TCC implementation available from www.atomikos.com - it is the actual implementation of the design in the talk you are referring to...
HTH
This answer is a summary of more detailed post:
How would you tune Distributed ( XA ) transaction for performance?
This diagram depicts the comunication flow in between the transaction coordinator and the transatcion participant.
In your particular case your transaction coordinator will be Atomikos or Bitornix or any other provider. Everythin in the flow belo the end(XID) is completlyinvisible for a developer and is performed only by the transaction coordinator. The first points start,end are within the application scope.
Now based on your question. You can not have distributed transaction in between applications. You can have distributed transaction in between infrastructure that supports them. If you want to have transactions in between application components separated by the network you better use Transaction Compensation and this is a whole different topic.
What you can do with Distributed transaction is from one application, one service, one component whatever enlist multiple databases or resources supporting XA and then execute some transaction.
I see the post below stating Atomikos having some sort of infrastructure supporting XA for REST. In general the classic algorithm for transaction compensation such as Try Cancel Conirm attern is very close to a 2 phase commit protocol. Without looking into the details my guess is that they gave implemented something around this lines.
I have a very simple scenario involving a database and a JMS in an application server (Glassfish). The scenario is dead simple:
1. an EJB inserts a row in the database and sends a message.
2. when the message is delivered with an MDB, the row is read and updated.
The problem is that sometimes the message is delivered before the insert has been committed in the database. This is actually understandable if we consider the 2 phase commit protocol:
1. prepare JMS
2. prepare database
3. commit JMS
4. ( tiny little gap where message can be delivered before insert has been committed)
5. commit database
I've discussed this problem with others, but the answer was always: "Strange, it should work out of the box".
My questions are then:
How could it work out-of-the box?
My scenario sounds fairly simple, why isn't there more people with similar troubles?
Am I doing something wrong? Is there a way to solve this issue correctly?
Here are a bit more details about my understanding of the problem:
This timing issue exist only if the participant are treated in this order. If the 2PC treats the participants in the reverse order (database first then message broker) that should be fine. The problem was randomly happening but completely reproducible.
I found no way to control the order of the participants in the distributed transactions in the JTA, JCA and JPA specifications neither in the Glassfish documentation. We could assume they will be enlisted in the distributed transaction according to the order when they are used, but with an ORM such as JPA, it's difficult to know when the data are flushed and when the database connection is really used. Any idea?
You are experiencing the classic XA 2-PC race condition. It does happen in production environments.
There are 3 things coming to my mind.
Last agent optimization where JDBC is the non-XA resource.(Lose recovery semantics)
Have JMS Time-To-Deliver. (Deliberately Lose real time)
Build retries into JDBC code. (Least effect on functionality)
Weblogic has this LLR optimization avoids this problem and gives you all XA guarantees.