Webservice Orchestration for database sync - database

I have two webservices, each service has its own database, one is master (A) and other is slave (B). If a call is made to service B, it also calls A to sync A's database.
If for some reason A is not available, B needs to bring A up to date with its data at a later time.
Any suggestion on what mechanism can be used for out of process data synchronization?

It sounds like the command pattern could be useful to you - store the "missed" transactions and apply them later. You may have to do some trickery to work out which of the last few calls you made to A happened, and which didn't.
If A is updated from another source and you loose the link (rather than A going down completely), you may have a battle on your hands to resolve any conflicts. I'd recommend a Temporal Database of some sort to help manage that.
Alternatively, have you thought of using a messaging system such as MSMQ?

You can setup a master-master replication between 2 databases. We do this with MySQL.

Related

Persistent job queue?

Internet says using database for queues is an anti-pattern, and you should use (RabbitMQ or Beanstalked or etc)
But I want all requests stored. So I can later lookup how long they took, any failed attempts or errors or notes logged, who requested it and with what metadata, what was the end result, etc.
It looks like all the queue libraries don't have this option. You can't persist the data to allow you to query it later.
I want what those queues do, but with a "persist to database" option. Does this not exist? How do people deal with this? Do you use a queue library and copy over all request information into your database when the request finishes?
(the language/database I'm using is anything, whatever works best for this)
If you want to log requests, and meta-data about how long they took etc, then do so - log it to the database when you know the relevant results, and run your analytic queries as you would expect to.
The reason to not be using the database as a temporary store is that under high traffic, the searching for, and locking of unprocessed jobs, and then updating or deleting them when they are complete, can take a great deal of effort. That is especially true if don't remove jobs from the active table, and so have to search ever more completed jobs to find those that have yet to be done.
One can implement the task queue by themselves using a persistent backend (like database) to persist the tasks in queues. But the problem is, it may not scale well and also, it is always better to use a proven implementation instead of reinventing the wheel. These are tougher problems to solve and it is better to use the existent frameworks.
For instance, if you are implementing in Python, the typical choice is to use Celary with Redis/RabbitMQ backend.

"Real Time" data change detection in SQL Server

We have a requirement for notifying external systems of changes in data in various tables in a SQL Server database. The choice of which data to monitor is somewhat under the control of the user (gets to choose from a list of what we support). The recipients of the notifications may be on a locally connected network (i.e., in the same data center) or they may be remote.
We currently handle this by application code within our data access layer that detects changes and queues notifications on a Service Broker queue which is monitored by a Windows service that performs the actual notification. Not quite real time but close enough.
This has proven to have some maintenance problems so we are looking at using one of the change detection mechanisms that are built into SQL Server. Unfortunately none of the ones I have looked at (I think I looked at them all) seem to fit very well:
Change Data Capture and Change Tracking: Major problem is that they require polling the captured information to determine changes that are to be passed on to recipients. I suspect that will introduce too much overhead.
Notification Services: Essentially uses SQL Server as a web server, which is a horrible waste of licenses. It also requires access through at least two firewalls in the network, which is unacceptable from a security perspective.
Query Notification: Seems the most likely candidate but does not seem to lend itself particularly well to dynamically choosing the data elements to watch. The need to re-register the query after each notification is sent means that we would keep SQL Server busy with managing the registrations
Event Notification: Designed to notify on database or instance level events, not really applicable to data change detection.
About the best idea I have come up with is to use CDC and put insert triggers on the change data tables. The triggers would queue something to a Service Broker queue that would be handled by some other code to perform the notifications. This is essentially what we do now except using a SQL Server feature to do the change detection. I'm not even sure that you can add triggers to those tables but I thought I'd get feedback before spending a lot of time with a POC.
That seems like an awful roundabout way to get the job done. Is there something I have missed that will make the job easier or have I misinterpreted one of these features?
Thanks and I apologize for the length of this question.
Why don't you use update and insert triggers? A trigger can execute clr code, which is explained enter link description here

Is RabbitMQ, ZeroMQ, Service Broker or something similar an appropriate solution for creating a high availability database webservice?

I have a CRUD webservice, and have been tasked with trying to figure out a way to ensure that we don't lose data when the database goes down. Everyone is aware that if the database goes down we won't be able to get "reads" but for a specific subset of the operations we want to make sure that we don't lose data.
I've been given the impression that this is something that is covered by services like 0MQ, RabbitMQ, or one of the Microsoft MQ services. Although after a few days of reading and research, I'm not even certain that the messages we're talking about in MQ services include database operations. I am however 100% certain that I can queue up as many hello worlds as I could ever hope for.
If I can use a message queue for adding a layer of protection to the database, I'd lean towards Rabbit (because it appears to persist through crashes) but since the target is a Microsoft SQL server databse, perhaps one of their solutions (such as SQL Service Broker, or MSMQ) is more appropriate.
The real fundamental question that I'm not yet sure of though is whether I'm even playing with the right deck of cards (so to speak).
With the desire for a high-availablity webservice, that continues to function if the database goes down, does it make sense to put a Rabbit MQ instance "between" the webservice and the database? Maybe the right link in the chain is to have RabbitMQ send messages to the webserver?
Or is there some other solution for achieving this? There are a number of lose ideas at the moment around finding a way to roll up weblogs in the event of database outage or something... but we're still in early enough stages that (at least I) have no idea what I'm going to do.
Is message queue the right solution?
Introducing message queuing in between a service and it's database operations is certainly one way of improving service availability. Writing to a local temporary queue in a store-and-forward scenario will always be more available than writing to a remote database server, simply by being a local operation.
Additionally by using queuing you gain greater control over the volume and nature of database traffic your database has to handle at peak. Database writes can be queued, routed, and even committed in a different order.
However, in order to do this you need to be aware that when a database write is performed it is processed off-line. Even under conditions where this happens almost instantaneously, you are losing a benefit that the synchronous nature of your current service gives you, which is that your service consumers can always know if the database write operation is successful or not.
I have written about this subject before here. The user posting the question had similar concerns to you. Whether you do this or not is a decision you have to make based on whether this is something your consumers care about or not.
As for the technology stacks you are thinking of this off-line model is implementable with any of them pretty much, with the possible exception of Service broker, which doesn't integrate well with code (see my answer here: https://stackoverflow.com/a/45690344/569662).
If you're using Windows and unlikely to need to migrate, I would go for MSMQ (which supports durable messaging via transactional queues) as it's lightweight and part of Windows.

CQRS Design Pattern Updates

I was looking to implement CQRS pattern. For the process of updating the read database, is it best to use a windows service, or to update the view at the time of creating a new record in the update database? Is it best to use triggers, or some other process? I've seen a couple of approaches and haven't made up my mind what is the best approach to achieve this.
Thanks.
Personally I love to use messaging to solve these kind of problems.
You commands result in events when they are processed and if you use messaging to publish the events one or more downstream read services can subscribe to the events and process them to update the read models.
The reason why messaging is nice in this case is that it allows you to decouple the write and read side from each other. Also, it allows you to easily have several subscribers if you find a need for it. Additionally, messaging using a persistent queuing system like MSMQ enables retrying of failed messages. It also means that you can take a read model offline (for updates etc) and when it comes back up it can then process all the events in the queue.
I'm no friend of Triggers in relational databases, but I imagine the must be pretty hard to test. And triggers would introduce routing logic where it doesn't belong. Could it be also that if the trigger action fails, the entire write transaction rolls back? Triggers is probably the least beneficial solution.
It depends on how tolerant your application must be with regards to eventual consistency.
If your app has no problem with read data being 5 minutes old, there's no need to denormalize upon every write data change. In that case, a background service that kicks in every n minutes or that kicks in only when the CPU consumption is below a certain threshold, for instance, can be a good solution.
If, on the other hand, your app is time-sensitive, such as in the case of frequently changing statuses, machine monitoring, stock exchange data etc., then you will want to keep the lag as low as possible and denormalize on the spot -- that is, in-process or at least in real-time. So in this case you may choose to run the denormalizers in a constantly-running process or to add them to the chain of event handlers straight in your code.
Your call.

Why do major DB vendors not provide truly asynchronous APIs?

I work with Oracle and Mysql, and I struggle to understand why the APIs are not written such that I can issue a call, go away and do something else, and then come back and pick it up later eg NIO - I am forced to dedicate a thread to waiting for data. It seems that the SQL interfaces are the only place where sync IO is still forced, which means tying up a thread waiting for the DB.
Can anybody explain the reasons for this? Is there something fundamental that makes this difficult?
It would be great to be able to use 1-2 threads to manage my DB query issue and result fetch, rather than use worker threads to retrieve data.
I do note that there are two experimental attempts (eg: adbcj) at implementing an async API but none seem to be ready for Production use.
Database servers should be able to handle thousands of clients. To provide an asyncronous interface, the DB server will need to keep the resultset from the query in memory, so you can pick it up at later stage. It will quickly become out of resources.
A considerable problem with async is many many libraries use threadlocal for transactions.
For example in Java Much of the JDBC specification relies on a synchronous behavior to achieve single thread per-transaction. That is you write your transaction in procedural order.
To do it right transactions would have to be done through callback but they are not. I know of only node.js that does this but its unclear if its really async.
Of course even if you do async I'm not sure if it will really improve performance as the database itself if is probably doing it synchronous.
There are lots of ways to avoid thread over-population in (Java):
Is asynchronous jdbc call possible?
Personally to get around this issue I use a Message Bus like RabbitMQ.

Resources