Generating events from SQL server - sql-server

I am looking for a best practice or example of how I might be able to generate events for all update events on a given SQL Server 2008 R2 db. To be more descriptive, I am working on a POC where I would essentially publish update events to a queue (RabbitMq in my case) that could then be consumed by various consumers. This would be the first part of implementing a CQRS query-only data model via event sourcing. By placing on the que anybody could then subscribe to these events for replication into any number of query-only data models. This part is clear and fairly well-defined. the problem I am having is determining the best approach for generating the events out from SQL server. I have been given a few ideas such as monitoring the transaction log and SSIS. However, I'm not entirely sure if these options are adviseable or even feasible.
Does anybody have any experience with this sort of thing or have any notions on how to go about such an adventure? any help or guidance would be greatly appreciated.

You cannot monitor the log because, even if you would be able to understand it, you have the problem of the log being recycled before you had a chance to read it. Unless the log is somehow marked not to be truncated it will reused. For instance when transactional replication is enabled the log be pinned until is read by the replication agent and only then truncated.
SSIS is a very broad concept and saying that 'using SSIS to detect changes' is akin to saying 'I'll use a programing language to solve my problem'. The details is how would you use SSIS? There is no way, with or without SSIS, to reliably detect data changes on an arbitrary schema. Even data models specifically designed to allow for detecting changes have issues, specially at detecting deletes.
However there are viable alternatives. You can deploy Change Data Capture and delegate to the engine itself to track the changes. Consuming these detected changes and publishing them to consumers (via RabbitMQ if that's your fancy) is a something SSIS would be good at. But you have to understand that SSIS does not fare well to continuos, real-time tasks. It is designed to run periodically on batches, so your change notification consumers will be notified in spikes, with long delays (minutes), when the SSIS jobs run.
For a real-time approach a better solution is Service Broker. One possibility is to SEND Service Broker messages from triggers, but I would not recommend it. A better design is to have the application itself publish the changes by SEND-ing the message explicitly, when it does the data modification. With SQL Server 2012 is possible to multicast Service Broker messages to other SQL Server consumers (including SQL Server Express). SSB message delivery is fully transactional (no message gets sent if transaction rolls back) and does not require two-phase-commit with a message store resource manager. But to broadcast via RabbitMQ you would need to bridge the communication, ie. RECEIVE the SSB messages and transform them into RabbitMQ notifications.

Related

Can someone tell me if SQL Server Service Broker is needed for this scenario?

My first ever question on stack overflow so please go easy. I have a long running windows application that continually processes sql server commands. I also have a web front end that users use occasionally use to update the same db. I've noticed that sometimes (depending on what the windows application is processing at the time) that if a user submits something to the db I receive out of memory exceptions on the server. I realise I need to dig around a bit more and optimise the code. However I cannot afford the server to go down and expect that in the future i'll be allowing more and more users on the frontend. What i really need is a system that will queue the users requests (they are not time critical) and process them when the db is ready.
I'm using SQL 2012 express.
Is SQL Service Broker the best solution, i've also looked into MSMQ.
If so can someone point me in the right direction for it would be appreciate. In my search i'm just finding a lot of things it does that i don't think i need.
Cheers
It depends on where you're doing the persistence work, and / or calculations. If you're doing the hard work in your Windows Application, then using a Service Broker queue won't be worthwhile, as all you will be doing is receiving a message from the Service Broker queue in your Windows Application, doing your calculations and / or queries from the Windows Application, and then persisting the results to the database: as your database is already under memory pressure, this seems like an unnecessary extra load as you could just as easily queue and retrieve the message from MSMQ (or any other queueing technology).
If however you are doing all the work in the database and your Windows Application just acts as a marshalling service - eg taking the request and palming it off to a stored procedure for actioning - then Service Broker Queues may be worth using: because they are already operating within the context of the database, they can be very efficient at persisting amd querying data.
You would also want to take into failure modes, depending on whether or not you can afford to lose any messages. To ensure message persistence in MSMQ you have to use Transactional Messaging: Service Broker is more efficient at transactional queue processing than MSMQ (because it has transaction support built in, unlike MSMQ which has to use DTC, which adds an overhead) - but if your volume of messages is low, this may not be an issue.

Using MSMQ when you already have SQL Server and BizTalk

Simple question: Is there any good reason to add MSMQ to an existing messaging framework which already has multiple BizTalk and SQL Server nodes?
Here's the background: We have a messaging framework to process bills, the load is rather low right now (at most 10,000 a day), but it's ramping up. We use BizTalk and SQL Server for all the processing, and we started noticing a few timeouts when inserting (synchronously) into one of the databases (NOT the BizTalk message box). One of our senior programmers suggested we use MSMQ to save (asynchronously) the data that causes the timeout and process it later; the solution he designed works and it's about to be deployed, but I'm still wondering if that was the right decision, considering that we could have used BizTalk itself or SQL Server Service Broker (SSSB). There's a lot of discussions about those three technologies, but they're usually about having to choose one of them over the others, I haven't seen any case of anyone who already had BizTalk and SSSB and decided to add MSMQ to the mix. In our case I think it's an unnecessary addition to our technology stack, but that may be my own bias (and ignorance too), since I know SSSB better and never did anything big with MSMQ. What do you think?
It sounds like you should figure out why your inserts are taking so long, and fix that instead. 10,000 / day is nothing for a decent box running SQL Server.
EDIT:
Adding any sort of asynchronous processing is a form of kicking the can down the road. Assume your inserts take one minute (I realize they probably don't, but for argument's sake). If you make your inserts asynchronous, you can still only handle 1440 inserts per day until you start falling behind. You are always going to need to speed up your inserts eventually.
Now with that said, I don't think that there is any compelling benefit in this case of using MSMQ over SSSB (or vice-versa). It could be argued that with MSMQ you need to hand-code a listener daemon that does your inserts, whereas with SSSB you have that automatically within the database. On the other hand, with MSMQ you are offloading the storage of the messages to another server, potentially offloading some of the immediate stress from your SQL Server.
I would argue that if you just wanted to take the database calls off-line then you could do that with BizTalk (for example, by creating an "offline" host - thereby creating a new host queue).
Where msmq really excels is on the inbound side of BizTalk. Systems can call to BizTalk not caring about the availability of BizTalk itself. The messages will just hang around until BizTalk is available again.
I'm with Hugh - we've used MSMQ (and IBM MQ Series) successfully with BizTalk for asynchronous, transactional traffic (mostly financial transactions, where the need for traceable, reliable, ACID type message delivery outweighs any need for transaction latency).
We've found the benefits of MSMQ to be:
Transactional delivery - messages can be pulled off by the destination system and inserted into SQL under a 2 phase UOW.
Hugh's point about delivery decoupled from system availability (and you still have the Dead Letter Queue if the target system is down for an unreasonable period of time)
Load balancing / throttling - a destination system can protect against overzealous message delivery by pulling messages off the queue at a more even pace.
Auditing - using the journalling on MSMQ allows an additional layer of tracing.
Also note that there is a WCF adapter for MSMQ - no requirement for custom listeners.
We generally stay away from calling SQL directly from BizTalk:
For reading this equates to polling the database in the hope that there are messages ready to be sent (this can create issues relating to frequency of calling, i.e. redundancy, induced latency, and load on SQL, and contention - e.g. polling while data is being added by an app to the tables. We would rather have each app decide when to submit messages to BizTalk / ESB.
for write operations, unless data is offloaded into a staging area for processing by destination apps, it can lead to much of the 'business' processing moving into BizTalk (i.e. validation, applying business rules etc) - IMHO this is too fine-grained for BizTalk. And as you've found, it can be hard to control the rate of message delivery into SQL (e.g. unless you start using Singleton Orhcestrations etc), which again causes locking / contention issues.

MSMQ as buffer for SQL Server Inserts

I'm learning about MSMQ and am successfully using it to queue email and text messages from a consumer-facing ASP.NET MVC website, to be handled by a separate client application.
In the event of a missing SQL Server database, perhaps while swapping drives or a broken database deploy, would it make sense to queue non time-critical inserts in a local MSMQ queue to improve up-time?
Theoretically, I can then pause/resume queue processing (persistence) while making database changes. Has anyone tried this or is there a better way?
If you're looking at higher availability by queueing locally then you should consider Service Broker deployed on SQL Express instances collocated with your IIS/ASP instance. The advantage of using SSB over MSMQ is that you have consistency between your message store and your data store (one consistent backup/restore, one consistent failover unit), it does scale much better than MSMQ under load, it does not require tw-phase-commit DTC to coordinate the MSMQ dequeue with the DB insert (can use one local DB transaction to dequeue/insert), it offers queryability of the pending messages (SELECT .. FROM queue), is integrated with the DB HA/DR solution (cluster failover/mirroring), you get DB contained activation and it all works from the familiar T-SQL programming environment. MSMQ's main advantage is support of a client side C#/.Net API.
I was on a team that implemented this for purposes of guaranteed delivery. We used MSMQ to forward the insert requests to the database server, which had its own service running that dequeued the requests and ran the inserts, then acknowledged the message (to ensure delivery). It's been running for over a year now, and we've never been asked to come figure out why it isn't working...seems pretty solid to me.
This is very subjective because it depends on what your application does and how. Generally, something like MSMQ is not used for this purpose, rather you want to set up some kind of high-availability clustering on your database of choice. The occurrence of a database going completely down is rare in most cases, and generally a bigger problem for most LOB applications than just having somewhere to store data entered while the DB is down for whatever reason.
There's also overhead to think about. An INSERT operation to a database is relatively quick (in the larger scheme of things); writing a serialized something into a queue and having something pick it up and do that insert operation is going to add large amounts of lag to your application, not to mention the fact that you'll have to account for the fact that now everything is asynchronous.
That said, MSMQ can be used to ensure delivery of stuff from one end of an application to another, so I suppose there are instances where this scenario might be desirable. Most of the time though you're just better off trusting your DB and using MSMQ to enable asynchronous processing and performing interprocess and intermachine communication.

SQL Server Service Broker -- Suggestion for Handling Two-Phase Commit Between SQL Server Instances

We're exploring different approaches for communicating between two different SQL Server instances. One of the desired workflows is to send a message of some sort to the "remote" side requesting, let's say, deletion of a record. When that system completes the deletion, it holds its transaction open and sends a response back to the initiator, who then deletes its corresponding record, commits its transaction, and then sends a message back to the "remote" side, telling it, finally, to commit the deletion on its side as well.
It's a poor man's approximation of two-phase commit. There's a religious debate going on in the department as to whether SQL Server Service Broker can or can't handle this type of scenario reasonably cleanly. Can anyone shed light on whether it can? Any experience in similar types of workflows? Is there a better mechanism I should be looking at to accomplish this, considering that the SQL Server instances are on separate, non-domain machines?
Edit: To clarify, we can't use distributed transactions due to the fact that network security is both tight and somewhat arbitrary. We're not allowed the configuration that would make that possible.
Unless I'm misunderstanding the requirements, I'd say it's a perfect job for Service Broker. Service Broker frees you from the need of using distributed transactions and 2PC. What you do with Service Broker is reduce the problem to local transactions and transactional messaging between the servers.
In your particular case, one of the servers would delete its record and then (as part of the same transaction) send a message to the other server requesting deletion of the corresponding record. After enqueuing the message, the first server can commit the transaction and forget the whole thing without waiting for synchronization with the second server. Service Broker guarantees that once enqueuing of a message is committed, the message will be transactionally delivered to the destination, which can then delete its record as part of the same transaction in which it received the message, thus making sure the message processing and data changes are atomic.
Have you tried using a distibuted transaction?
It will do everything you need but each server will need to connect to each other as a linked server.

Scheduled execution of code to conduct database operations in SQL Server

If I want to conduct some database operations on a scheduled basis, I could:
Use SQL Server Agent (if SQL Server) to periodically call the stored procedure and/or execute the T-SQL
Run some external process (scheduled by the operating system's task scheduler for example) which executes the database operation
etc.
Two questions:
What are some other means of accomplishing this
What decision criteria should one use to decide the best approach?
Thank you.
Another possibility is to have a queue of tasks somewhere, and when applications that otherwise use the database perform some operation, they also do some tasks out of the queue. Wikipedia does something like this with its job queue. The scheduling isn't as certain as with the other methods, but you can e.g. put off doing housekeeping work when your server happens to be heavily loaded.
Edit:
It's not necessarily better or worse than the other techniques. It's suitable for tasks that do not have to be performed by any specific deadline, but should be done "every now and then", or "soon, but not necessarily right now".
Advantages
You don't need to write a separate application or set up SQL Server Agent.
You can use any criteria you can program to decide whether to run a task or not: immediately, once a certain time has passed, or only if the server is not under heavy load.
If the scheduled tasks are ones like optimising indices, then you can do them less frequently when they are less necessary (e.g. when updates are rare), and more frequently when updates are common.
Disadvantages
You might need to modify multiple applications to cooperate correctly.
You need to ensure that the queue doesn't build up too much.
You can't reliably ensure that a task runs before a certain time.
You might have long periods where you get no requests (e.g. at night) where deferred/scheduled tasks could get done, but don't. You could combine it with one of the other ideas, having a special program that just does the jobs in the queue, but you could just not bother with the queue at all.
You can't really rely on external processes. All 'OS' based solutions I've seen failed to deliver in the real world: a database is way more than just the data, primarily because of the backup/restore strategy, the high availability strategy, the disaster recoverability strategy and all the other 'ities' you pay for in your SQL Server license. An OS scheduler based will be an external component completely unaware and unintegrated with any of them. Ie. you cannot back/restore your schedule with your data, it will not fail over with your database, you cannot ship it to a remote disaster recovery site through your SQL data shipping channel.
If you have Agent (ie. not Express edition) then use Agent. Has a long history of use and the know how around it is significant. The only problem with Agent is its dependence on msdb that makes it disconnect from the application database and thus does not play well with mirroring based availability and recoverability solutions.
For Express editions (ie. no Agent) the best option is to roll your own scheduler based on conversation timers (at least in SQL 2k5 and forward). You use conversations to chedule yourself messages at the desired moment and rely on activated procedures to run the tasks. They are transactional and integrated with your database, so you can rely on them being there after a restore and after a mirroring or clustering fail over. Unfortunately the know how around how to use them is fairly skim, I have several articles about the subject on my site rusanu.com. I've seen systems replicating a fair amount of Agent API on Express relying entirely on conversation timers.
I generally go with the operating systems scheduling method (task scheduler for Windows, cron for Unix).
I deal with multiple database platforms (SQL Server, Oracle, Informix) and want to keep the task scheduling as generic as possible.
Also, in our production environment we have to get a DBA involved for any troubleshooting / restarting of jobs that are running in the database. We have better access to the application servers with the scheduled tasks on them.
I think the best approach for the decision criteria is what the job is. If it's a completely internal SQL Server task or set of tasks that does not relate to the outside world, I would say a SQL Job is the best bet. If on the other hand, you are retrieving data and then doing something with it that is inherently outside SQL Server, very difficult to do in T-SQL or time consuming, perhaps the external service is the best bet.
I'd go with SQL Server Agent. It's well integrated with SQL Server; various SQL Server features use Agent (Log Shipping, for instance). You can create an Agent job to run one or more SSIS packages, for instance.
It's also integrated with operator notification, and can be scripted, or else executed through SMO.

Resources