Distributed transactions across Spring web applications - database

Imagine a Java ecosystem where three separate Spring web application is running in separate JVMs and on separate machines (no application server involved, just simple servlet containers). Two of these applications are using their own database accessed using JPA. Now the third application (a coordinator) provides services to the outside world and some service function executes remote operations which requires participation from the other two apps in a transactional manner, which means that if one of the applications fails to do the data manipulation in the database, the other should be rolled back as well. The problem is: how can this be achieved using Spring?
Currently we are using REST to communicate between the applications. Clearly this cannot support transactions, even though there are efforts to make this happen.
I've found JTA which is capable of organizing global transactions. JTA involves creating XAResource instances which are participating in the globally managed transactions. If i understood correctly, these XAResource instance can reside on separate JVMs. Initialization, commit and rollback of resources happens via JMS communication which means it requires a message broker to transfer messages between participants. There are various JTA implementation exists, I've found Atomikos which seems to be the most used.
Now the thing i don't see is how this all comes up if i have a Spring application on each application side. I've not found any example projects yet which is doing JTA over a network. Also i don't undertstand what are XAResources representing. If i use JPA, and say i have a Account object in an application which stores a user's balance, and i have to decrease the balance from the coordinator, should i create an XAResource implementation which allows decreasing the balance? Or XAResource is implemented by a lower level thing like the JDBC driver or Spring Data JPA? In the latter case how can i provide high level CRUD operations for the transaction coordinator.

XAResource is a lower level API. You could write your own for the coordinator, but I think that isn't necesary. Instead, leverage JMS + JTA on the coordinator and JTA on the app servers.
In the normal case, you'd have this:
Coordinator receives request and starts JTA transaction
Coordinator calls app 1 over JMS
App 1 receives JMS message
App 1 calls DB 1 using JTA
Coordinator calls app 2 over JMS
App 2 receives JVM message
App 2 calls DB 2 using JTA
Coordinator commits tx
Note that JTA is used for all the transactions - this will be a global TX that's shared across all the servers. If any of these steps fail, then they will be rolled back.
Spring should be able to make this transparent once you get it all set up. Just make sure your DAO & service calls are transactional. Atomikos will need to be configured so that each server uses the same JTA tx manager.

REST does support transactions now, via the Atomikos TCC implementation available from www.atomikos.com - it is the actual implementation of the design in the talk you are referring to...
HTH

This answer is a summary of more detailed post:
How would you tune Distributed ( XA ) transaction for performance?
This diagram depicts the comunication flow in between the transaction coordinator and the transatcion participant.
In your particular case your transaction coordinator will be Atomikos or Bitornix or any other provider. Everythin in the flow belo the end(XID) is completlyinvisible for a developer and is performed only by the transaction coordinator. The first points start,end are within the application scope.
Now based on your question. You can not have distributed transaction in between applications. You can have distributed transaction in between infrastructure that supports them. If you want to have transactions in between application components separated by the network you better use Transaction Compensation and this is a whole different topic.
What you can do with Distributed transaction is from one application, one service, one component whatever enlist multiple databases or resources supporting XA and then execute some transaction.
I see the post below stating Atomikos having some sort of infrastructure supporting XA for REST. In general the classic algorithm for transaction compensation such as Try Cancel Conirm attern is very close to a 2 phase commit protocol. Without looking into the details my guess is that they gave implemented something around this lines.

Related

spring boot different instances of a microservice and data integrity

If several instance of a same microservice contain their own database,for scalability, how update all the databases when a create, update or delete operation is made ? which tool compatible with Eureka and Zuul spring propose for that ?
I would suggest you to use RabbitMQ
The basic architecture of a message queue is simple, there are client applications called producers that create messages and deliver them to the broker (the message queue). Other applications, called consumers, connects to the queue and subscribes to the messages to be processed. A software can be a producer, or consumer, or both a consumer and a producer of messages. Messages placed onto the queue are stored until the consumer retrieves them.
Why to use this RabbitMQ??
https://www.cloudamqp.com/blog/2015-05-18-part1-rabbitmq-for-beginners-what-is-rabbitmq.html
Official document for rabbitMQ....
https://www.rabbitmq.com/
How to install rabbitMQ:
https://www.journaldev.com/11655/spring-rabbitmq
Configuration in spring boot application:
https://spring.io/guides/gs/messaging-rabbitmq/
I would suggest you use an Event-based architecture where any service has done his work it produces the event and other services subscribe that event will also start his work.
you can use Kafka queue for same. also, read Distributed Sagas for Microservices
One more thing is that inter-communication use UDP instead of TCP.
Most of the databases offer replication these days with near 0 latency. Unless you use the other databases you can let the database do the synchronization for you.

Resuming Camel Processing after power failure

I'm currently developing a Camel Integration app in which resumption from a previous state of processing is important. When there's a power outage, for instance, it's important that all previously processed messages are not re-processed. The processing should resume from where it left off before the outage.
I've gone through a number of possible solutions including Terracotta and Apache Shiro. I'm not sure how to use either as documentation on the integration with Apache Camel is scarce. I've not settled on the two, however.
I'm looking for suggestions on the potential alternatives I can use or a pointer to some tutorial to get me started.
The difficulty in surviving outages lies primarily in state, and what to do with in-flight messages.
Usually, when you're talking state within routes the solution is to flush it to disk, or other nodes in the cluster. Taking the aggregator pattern as an example, aggregated state is persisted in an aggregation repository. The default implementation is in memory, so if the power goes out, all the state is lost. However, there are other implementations, including one for JDBC, and another using Hazelcast (a lightweight in-memory data grid). I haven't used Hazelcast myself, but JDBC does a synchronous write to disk. The aggregator pattern allows you to resume from where you left off. A similar solution exists for idempotent consumption.
The second question, around in-flight messages is a little more complicated, and largely depends on where you are consuming from. If you're in the middle of handling a web service request, and the power goes out, does it matter if you have lost the message? The user can simply retry. Any effects on external systems can be wrapped in a transaction, or an idempotent consumer with JDBC idempotent repository.
If you are building out integrations based on messaging, you should consume within a transaction, so that if your server goes down, the messages go back into the broker and can be replayed to another consumer.
Be careful when using seda: or threads blocks, these use an in-memory queue to pass exchanges between threads, any messages flowing down these sorts of routes will be lost if someone trips over the power cable. If you can't afford message loss, and need this sort of processing model, consider using a JMS queue as the endpoints between the two routes (with transactions to ensure you pick up where you left off).

distributed system: sql server broker implementation

I have a distributed application consisting of 8 servers, all running .NET windows services. Each service polls the database for work packages that are available.
The polling mechanism is important for other reasons (too boring for right now).
I'm thinking that this polling mechanism would be best implemented in a queue as the .NET services will all be polling the database regularly and when under load I don't want deadlocks.
I'm thinking I would want each .NET service to put a message into an input queue. The database server would pop each message of the input queue one at a time, process it, and put a reply message on another queue.
The issue I am having is that most examples of SQL Server Broker (SSB) are between database services and not initiated from a .NET client. I'm wondering if SQL Server Broker is just the wrong tool for this job. I see that the broker T-SQL DML is available from .NET but the way I think this should work doesn't seem to fit with SSB.
I think that I would need a single SSB service with 2 queues (in and out) and a single activation stored procedure.
This doesn't seem to be the way SSB works, am I missing something ?
You got the picture pretty much right, but there are some missing puzzle pieces in that picture:
SSB is primarily a communication technology, designed to deliver messages across the network with exactly-once-in-order semantics (EOIO), in a fully transactional fashion. It handles network connectivity (authentication, traffic confidentiality and integrity) and acknowledgement and retry logic for transmission.
Internal Activation is an unique technology in that it eliminates the requirement for a resident service to poll the queue. Polling can never achieve the dynamic balance needed for low latency and low resource consumption under light load. Polling forces either high latency (infrequent polling to save resources) or high resource utilization (frequent polling required to provide low latency). Internal activation also has self-tunning capability to ramp up more processors to answer to spikes in load (via max_queue_readers) while at the same time still being capable of tuning down the processing under low load, by deactivating processors. One of the often overlooked advantages of the Internal Activation mechanism is the fact that is fully contained within a database, ie. it fails over with a cluster or database mirroring failover, and it travels with the database in backups and copy-attach operations. There is also an External Activation mechanism, but in general I much more favor the internal one for anything that fits an internal context (Eg. not and HTTP request, that must be handled outside the engine process...)
Conversation Group Locking is again unique and is a means to provide exclusive access to correlated message processing. Application can take advantage by using the conversation_group_id as a business logic key and this pretty much completely eliminates deadlocks, even under heavy multithreading.
There is also one issue which you got wrong about Service Broker: the need to put a response into a separate queue. Unlike most queueing products with which you may be familiar, SSB primitive is not the message but a 'conversation'. A conversation is a fully duplex, bidirectional communication channel, you can think of it much like a TCP socket. An SSB service does not need to 'put responses in a queue' but instead it can simply send a response on the conversation handle of the received message (much like how you would respond in a TCP socket server by issue a send on the same socket you got the request from, not by opening a new socket and sending a response). Furthermore SSB will take care of the inherent message correlation so that the sender will know exactly the response belong to which request it sent, since the response will come back on the same conversation handle the request was sent on (much like in a TCP socket case the client receives the response from the server on the same socket it sent the request on). This conversation handle is important again when it comes to the correlation locking of related conversation groups, see the link above.
You can embed .Net logic into Service Broker processing via SQLCLR. Almost all SSB applciations have a non-SSB service at at least one end, directly or indirectly (eg. via a trigger), a distributed application entirely contained in the database is of little use.

Database Trigger And JMS

Cound I create a trigger to send a record message to JMS? If yes, how can I do?
Thanks in advance!
I would summarize your options as follows:
Databases Supporting JMS
Oracle is the only database that I am aware of that supports JMS natively in the form of Oracle Advanced Queueing. If your message receiver is not too keen on that JMS implementation, it is usually possible to find some sort of messaging bridge that will transform and forward messages from one JMS implementation to another. For example:
Apache Active MQ
JBoss Messaging
Databases Supporting Java
Some databases such as Oracle and DB2 have a built in Java Virtual Machine and support the loading of third party libraries (Jars) and custom classes that can be invoked by trigger code. Depending on the requirements of your JMS client, this be an issue on account of the version of Java supported (if you need Java 5+ but the DB only supports Java 3). Also keep in mind that threading in some of these embedded JVMs is not what you might expect it to be, but, one might also expect that the sending of JMS messages might be more forgiving of this than the receiving of the same.
Databases Supporting External Invocations (but not in Java)
Several databases support different means of triggering asynchronous events out to connected clients which can in turn forward JMS messages built from the payload of the event:
Oracle: DBMS_ALERT (synchronous), DBMS_PIPE, DCN
Postgres: SQLNotify
Some databases (all of the above and including SQLServer) allow you to send SMTP messages from database procedural code (which can be invoked by triggers). While it is not JMS, a mail listener could potentially listen for mail messages (that might conveniently have a JSON or XML message body) and forward this content as a JMS message.
A basic alternate of this is databases packages that allow HTTP posts to call out to external sources where you might have a servlet listening and forwarding the submitted content as a JMS message.
Other databases such as Postgres support non-java languages such as Perl, Python and Tcl where you might employ some clever scripting to send a message to an external message transformer that will forward as JMS. Active MQ (and therefore its message bridge) supports a multi-language JMS client that includes Python and Perl (and many others).
Lowest Common Denominator
Short of all that, your trigger can write an event to a table and an external client can poll the contents of the table, looking for new data and forwarding JMS messages when it finds it. The JMS message can either include the content, or simply indicate that content exists and what the PK is, and the consumer can come and get it.
This is a technique widely supported in Apache Camel which has adapters (technically called Components) specifically for polling databases:
Hibernate
JDBC
SQL
iBatis
Events read from a database table by Camel can then be transformed and routed to a variety of destinations including a JMS server (in the form of a JMS message). Implementing this Camel is fairly straight forward and well documented, so this is not a bad way to go.
I hope this was helpful.
IBM z and i series servers have db2 with native SQL function such as MQSEND to send to IBM MQ and also MQREAD to read from IBM MQ. Also these server you can create triggers that call a program.

Good Strategy for Message Queuing?

I'm currently designing an application which I will ultimately want to move to Windows Azure. In the short term, however, it will be running on a server which I will host myself.
The application involves a number of separate web applications - some of these are essentially WCF services which receive data, and some are sites for users to manage data. In addition, there will need to be a worker service running in the background which will process data in various ways.
I'm very keen to use a decoupled architecture for this. Ideally I'm wanting the components (i.e. web apps and worker service) to know as little as possible about each other. It seems like using a message queue will be the best solution here - the web apps can enqueue messages with work units into the queue and the worker service can pick them out and process them as needed.
However, I want to work out a good set of technologies for doing this, bearing in mind that I'll ultimately be moving to Azure and want to minimise the amount of re-work I'll need to do when I migrate to the cloud. Azure has a Queue component built in which looks ideal for my needs. What I'd like to do is create something myself which will mimic this as closely as possible.
It looks like there are several options (I'm using .NET on Windows, with a SQL Server 2005 back end) - the ones I've found so far are:
MSMQ
SQL Server service broker
Rolling my own using a database table and some stored procs
I was wondering if anyone has any suggestions for this - or if anyone has done anything similar and has advice on things to do/to avoid. I realise that every situation is different, but in this case I think my queuing requirements are pretty generic so I'd love to hear anyone else's thoughts about the best way to do this.
Thanks in advance,
John
If you have Azure in mind, perhaps you should start straight on Azure as the APIs and semnatics are significantly different between Azure queues and any of MSMQ or SSB.
A quick 3048 meters comparison of MSMQ vs. SSB (I'll leave a custom table-as-queue out of comparison as it really depends how you implement it...)
Deployment: MSMQ is a Windows component, SSB is a SQL compoenent. SSB requires a SQL instance to store any message, so disconencted clients need access to an instance (can be Express). MSMQ requires deployment of MSMQ on the client (part of OS, but optional install).
Programmability: MSMQ offers a fully fledged, supported, WCF channel. SSB offers only an experimental WCF channel at http://ssbwcf.codeplex.com
Performance: SSB will be significantly faster than MSMQ in transacted mode. MSMQ will be faster if let operate in untransacted mode (best effort, unordered, delivery)
Queriability: SSB queues can be SELECTE-ed uppon (view any message, full SQL JOIN/WHERE/ORDER/GROUP power), MSMQ queues can be peeked (only next message)
Recoverability: SSB queues are integrated in the database so they are backed up and restored with the database, keeping a consitent state with the application state. MSMQ queues are backed up in the NT file backup subsytem, so to keep the backup in sync (coherent) the queue and database have to be suspended.
Transactions (since every enqueue/dequeue is always accompanied by a database update): SSB is fully integrated in SQL so dequeueing and enqueueing are local transaction operations. MSMQ is a separate TM (Transaction Manager) so queue/dequeue has to be a Distributed Transaction operation to enroll both SQL and MSMQ in the transaction.
Management and Monitoring: both equaly bad. No tools whatsoever.
Correlated Messages processing: SSB can block processing of correlated message by concurent threads via built-in Conversation Group Locking.
Event Driven: SSB has Activation to launch stored procedures, MSMQ uses Windows Activation service. Similar. SSB though has self load balancing capalities due to the way WAITFOR(RECEIVE) and MAX_QUEUE_READERS interact.
Availability: SSB piggybacks on the SQL Server High Availability story, it can work either in a clustered or in database miroring environment. MSMQ rides the Windows clustering story only. Database Mirroring is much cheaper than clustering as a HA solution.
In addition I'd add that SSB and MSMQ differ significantly at the level ofthe primitive they offer: SSB primitive is a conversation, while MSMQ primitive is a message. Think TCP vs. UDP semantics.
Pick a queue back end that works for you, or that is better suited to your environment. #Remus has given a great comparison between MSMQ and SSB. MSMQ is going to be the easier one to implement, but has some notable limitations, while SSB is going to feel very heavy as its at the other end of the spectrum.
Have It Your Way
To minimize the rework from you applications, abstract the queues access behind an interface, and then provide an implementation for the queue transport you ultimately decide to go with. When its time to move to Azure, or another queue transport, you just provide a new implementation of your interface.
You get to control the semantics of how you want to interact with the queue to give a consistent usable API from your applications.
A rough idea might be:
interface IQueuedTransport
{
void SendMessage(XmlDocument);
XmlDocument ReceiveMessage();
}
public class MSMQTransport : IQueuedTransport {}
public class AzureQueueTransport : IQueuedTransport {}
You may not be building the be-all queuing transport, just what meets your needs. If you work with Xml, pass xml. If you work in byte arrays, pass byte arrays. :)
Good luck!
Z
Use Win32 Mailslots. They will be reliable on a single server, are easy to implement, and do not require any extra software.

Resources