SQL table queue - run procedure on data receipt - sql-server

I'm optimizing a legacy application and only have access to the database, not the UI code.
There is a specific table insert occurring that I need to catch, so I can do some additional processing, but without adding any obvious lag.
I'm thinking of adding an INSERT trigger which places an entry on a queue, so that processing can continue after returning.
SQL queues using the Service Broker seem to require a 2-way conversation, therefore 2 queues. I don't need this so another possibility could be to use a table as a queue.
But, if I use a table, is there a way for entries to be processed very soon after they arrive, as would be the case with a Service Broker SQL queue? Is the only option to have checks running at a scheduled period? This would be a pain if I need to add more.
A trigger on that "Queue" table would also not be a great idea as it would add to performance lag.
Of course I could always just ignore the response queue, but that doesn't feel right.

Related

Eventual consistency with both database and message queue records

I have an application where I need to store some data in a database (mysql for instance) and then publish some data in a message queue. My problem is: If the application crashes after the storage in the database, my data will never be written in the message queue and then be lost (thus eventual consistency of my system will not be guaranted).
How can I solve this problem ?
I have an application where I need to store some data in a database (mysql for instance) and then publish some data in a message queue. My problem is: If the application crashes after the storage in the database, my data will never be written in the message queue and then be lost (thus eventual consistency of my system will not be guaranted). How can I solve this problem ?
In this particular case, the answer is to load the queue data from the database.
That is, you write the messages that need to be queued to the database, in the same transaction that you use to write the data. Then, asynchronously, you read that data from the database, and write it to the queue.
See Reliable Messaging without Distributed Transactions, by Udi Dahan.
If the application crashes, recovery is simple -- during restart, you query the database for all unacknowledged messages, and send them again.
Note that this design really expects the consumers of the messages to be designed for at least once delivery.
I am assuming that you have a loss-less message queue, where once you get a confirmation for writing data, the queue is guaranteed to have the record.
Basically, you need a loop with a transaction that can roll back or a status in the database. The pseudo code for a transaction is:
Begin transaction
Insert into database
Write to message queue
When message queue confirms, commit transaction
Personally, I would probably do this with a status:
Insert into database with a status of "pending" (or something like that)
Write to message queue
When message confirms, change status to "committed" (or something like that)
In the case of recovery from failure, you may need to check the message queue to see if any "pending" records were actually written to the queue.
I'm afraid that answers (VoiceOfUnreason, Udi Dahan) just sweep the problem under the carpet. The problem under carpet is: How the movement of data from database to queue should be designed so that the message will be posted just once (without XA). If you solve this, then you can easily extend that concept by any additional business logic.
CAP theorem tells you the limits clearly.
XA transactions is not 100% bullet proof solution, but seems to me best of all others that I have seen.
Adding to what #Gordon Linoff said, assuming durable messaging (something like MSMQ?) the method/handler is going to be transactional, so if it's all successful, the message will be written to the queue and the data to your view model, if it fails, all will fail...
To mitigate the ID issue you will need to use GUIDs instead of DB generated keys (if you are using messaging you will need to remove your referential integrity anyway and introduce GUIDS as keys).
One more suggestion, don't update the database, but inset only/upsert (the pending row and then the completed row) and have the reader do the projection of the data based on the latest row (for example)
Writing message as part of transaction is a good idea but it has multiple drawbacks like
If your
a. database/language does not support transaction
b. transaction are time taking operation
c. you can not afford to wait for queue response while responding to your service call.
d. If your database is already under stress, writing message will exacerbate the impact of higher workload.
the best practice is to use Database Streams. Most of the modern databases support streams(Dynamodb, mongodb, orcale etc.). You have consumer of database stream running which reads from database stream and write to queue or invalidate cache, add to search indexer etc. Once all of them are successful you mark the stream item as processed.
Pros of this approach
it will work in the case of multi-region deployment where there is a regional failure. (you should read from regional stream and hydrate all the regional data stores.)
No Overhead of writing more records or performance bottle necks of queues.
You can use this pattern for other data sources as well like caching, queuing, searching.
Cons
You may need to call multiple services to construct appropriate message.
One database stream might not be sufficient to construct appropriate message.
ensure the reliability of your streams, like redis stream is not reliable
NOTE this approach also does not guarantee exactly once semantics. The consumer logic should be idempotent and should be able to handle duplicate message

Simple way to continuously look up in a very busy table without blocking it

I have a table that receives thousands of inserts per minute and it's important that the inserts are done extremely quickly as it would otherwise cause performance problems elsewhere in my application. That's why this table does not contain any indexes at all. This works out well and the insert performance is satisfactory.
However, I would also like to look up in this table very often, like every second with complex queries. I can't do it on the table as it is, because without indexes it would be performing way too slow.
So I would like to continuously move the data from this table to another table that does contain indexes. However, let's say I move it with a simple sql script every minute, I am afraid I risk blocking inserts to the table while I'm moving records (insert + delete in transaction) and that would be a problem for the performance of my application.
So what would be a smart (but/and simple) way to accomplish this?
Here are my best ideas so far:
1. Use SQL Server Service Broker
I put a trigger on the table and add new items to the service broker which I think should be asynchronous and not cause performance problems. Then I need another job to read from the queue. I haven't done this before and I'm not sure how good this solution is.
2. Use "Replication"
I do not like this solution because of the complex setup of replication so I mention it here to say that I would be happy to not receive suggestions on this.
3. Just do it
Maybe I'm overthinking it. Should I run this every minute?
BEGIN Transaction
-- Insert all rows in other table
-- Delete all rows from table
END Transaction
Note, the target table will also be quite busy, so perhaps this job would wait in queue for heavy queries to complete.
Looking forward to hearing your suggestions on how to approach this challenge.
This is an interesting problem. For sure replication on a cluster with read on secondary nodes is probably the most clean solution, but if you cannot put that infrastructure in place or cannot afford the Entreprise license, then I would go with the broker.You would need to test both of your solution but my guess is that the "Just do it" solution will encounter many deadlocks and lead to slowness problems.
I know this is an old question so I was wondering what solution did you choose in the end?My vote goes to the Broker.
Since your read queries are complex, you could do a sort-of "big data" architecture where the broker doesn't simply copies data to another table but also pre-calculates some of the stuff you need in your heavy read queries. For example, if you need to get the average or sum of some data for a period of date, you could pre-calculate those with the broker so that your read queries are easier to run. That way your other queries would also be lighter on the system.

Very long camel redelivery policy

I am using Camel and I have a business problem. We consume order messages from an activemq queue. The first thing we do is check in our DB to see if the customer exists. If the customer doesn't exist then a support team needs to populate the customer in a different system. Sometimes this can take a 10 hours or even the following day.
My question is how to handle this. It seems to me at a high level I can dequeue these messages, store them in our DB and re-run them at intervals (a custom coded solution) or I could note the error in our DB and then return them back to the activemq queue with a long redelivery policy and expiration, say redeliver every 2 hours for 48 hours.
This would save a lot of code but my question is if approach 2 is a sound approach or could lead to resource issues or problems with not knowing where messages are?
This is a pretty common scenario. If you want insight into how the jobs are progressing, then it's best to use a database for this.
Your queue consumption should be really simple: consume the message, check if the customer exists; if so process, otherwise write a record in a TODO table.
Set up a separate route to run on a timer - every X minutes. It should pull out the TODO records, and for each record check if the customer exists; if so process, otherwise update the record with the current timestamp (the last time the record was retried).
This allows you to have a clear view of the state of the system, that you can then integrate into a console to see what the state of the outstanding jobs is.
There are a couple of downsides with your Option 2:
you're relying on the ActiveMQ scheduler, which uses a KahaDB variant sitting alongside your regular store, and may not be compatible with your H/A setup (you need a shared file system)
you can't see the messages themselves without scanning through the queue, which is an antipattern - using a queue as a database - you may as well use a database, especially if you can anticipate needing to ever selectively remove a particular message.

Reliable asynchronous processing in SQL Server

Some of the services are provided to our customers by a 3rd party. The data which is created on their remote servers is replicated to an on-premises SQL server.
I need to perform some work on that 3rd party server which database is not directly accessible to me.They expose a set of APIs for that purpose. The work is performed on a linked SQL server by a SQL Server Agent job.
Business scenario : customers can receive "badges" .A badge can be given to a customer by calling the UpdateCustomerBadgeInfo web method on a 3rd party server.
So a typical requirement for an automated task would look like this:
"Find all customers who logged in more than 50 times during theday, give them the [has-no-life] badge and send them an SMS notification"
The algorithm would be:
- Select all the matching accounts into a #TempTable
for each customer record:
- Call UpdateCustomerBadgeInfo() method (via CLR)
- If successfully updated badge info-> Enqueue SMS message (queue table)
- Log successful actions (so that the record will not be picked up next time)
The biggest problem with the way it works now is that it takes a lot of time to process large datasets in a WHILE loop.
So the 3rd party provider created a solution to perform batch updates of the customer data. They created a table on the on-premises SQL server to which batch update requests are submitted and later picked up by their service for validation and processing.
The question is :
How the above algorithm should be changed to fit into this asynchronous model?
This answer is valid only if I understood the situation correctly:
3rd party server used to expose web method to update customers one by one
now they expect to get this information from SQL Server table available to you for INSERT/UPDATE/DELETE
you can just stuff your customer-related requests into this table and they will be processed some time later
when the customer-related info gets updated, you have to perform some additional local actions (queue SMS, log activity)
Generally, I don't see any significant changes to the algorithm, but I will try to explain what would I do in this case.
Select all the matching accounts into a #TempTable
This may not be neccessary because you already have the table to stuff your requests in - 3rd party table. Only problem would be synchronizing requests, but for this to analyze you have to provide more details (multiple requests for the same customer allowed? protection of re-issuing the same request?)
for each customer record...
This should be the only change in your implementation. It now has the meaning - for each customer record that is asynchronously processed on 3rd party side. Of course, your 3rd party must give you some clue that they really did process your customer requeset, or you have no idea what to work with. So, when they validate and proces the data, they can provide e.g. nullable columns 'success_time' and 'error_time' to leave you message what has been done and when. If there is success, you continue with processing. If not, you can probably do something about that as well.
But how to react when you get async information back (e.g. sucess_time IS NOT NULL)? Well, there are multiple ways to do that. Personally I try to avoid triggers because they can make your life complicated (their visibility sucks, can cause problems with replication, can cause problems with transactions...) I use them if I really need first-class immediate responsiveness. Another possibility is using async queues with custom activation, which means Service Broker. However, a lot of people avoid using SB technology - it's different than the rest of SQL server, it has its speciffics, debugging is not so easy as with plain old SQL statements etc etc. Aother possibility would be batch processing async responses on your side using agent job. Since you are already using a job, you should be fine with it. Basically, the table should act as a synchronization point - you fill your requests (INSERT), 3rd party processes them (SELECT). After requests get processed they mark them as such (UPDATE success_time or error_time) and at the end you process that response (SELECT) using your agent job task. And your processing includes SMS message and logging, maybe even DELETING from 3rd party table.
Another thing to mention is that you need synchronization methods here. First, don't do anything without transactions, or you may end up processing ghost responses and/or skipping valid waiting responses. Second, when you SELECT responses (rows that are procesed on 3rd party side), you could get some improvent using READPAST hint (skip what is locked). However, if you need to update/delete from your 3rd party table after processing response, you may use SELECT with UPDLOCK to block another side of temperig with the data between your INSERT and UPDATE. Or you don't use any locking hints if you are not completely sure what goes on with the table in question.
Hope it helps.

Message Queue or DataBase insert and select

I am designing an application and I have two ideas in mind (below). I have a process that collects data appx. 30 KB and this data will be collected every 5 minutes and needs to be updated on client (web side-- 100 users at any given time). Information collected does not need to be stored for future usage.
Options:
I can get data and insert into database every 5 minutes. And then client call will be made to DB and retrieve data and update UI.
Collect data and put it into Topic or Queue. Now multiple clients (consumers) can go to Queue and obtain data.
I am looking for option 2 as better solution because it is faster (no DB calls) and no redundancy of storage.
Can anyone suggest which would be ideal solution and why ?
I don't really understand the difference. The data has to be temporarily stored somewhere until the next update, right.
But all users can see it, not just the first person to get there, right? So a queue is not really an appropriate data structure from my interpretation of your system.
Whether the data is written to something persistent like a database or something less persistent like part of the web server or application server may be relevant here.
Also, you have tagged this as real-time, but I don't see how the web-clients are getting updates real-time without some kind of push/long-pull or whatever.
Seems to me that you need to use a queue and publisher/subscriber pattern.
This is an article about RabitMQ and Publish/Subscribe pattern.
I can get data and insert into database every 5 minutes. And then client call will be made to DB and retrieve data and update UI.
You can program your application to be event oriented. For ie, raise domain events and publish your message for your subscribers.
When you use a queue, the subscriber will dequeue the message addressed to him and, ofc, obeying the order (FIFO). In addition, there will be a guarantee of delivery, different from a database where the record can be delete, and yet not every 'subscriber' have gotten the message.
The pitfalls of using the database to accomplish this is:
Creation of indexes makes querying faster, but inserts slower;
Will have to control the delivery guarantee for every subscriber;
You'll need TTL (Time to Live) strategy for the records purge (considering delivery guarantee);

Resources