MSMQ as buffer for SQL Server Inserts

MSMQ as buffer for SQL Server Inserts - sql-server

I'm learning about MSMQ and am successfully using it to queue email and text messages from a consumer-facing ASP.NET MVC website, to be handled by a separate client application.
In the event of a missing SQL Server database, perhaps while swapping drives or a broken database deploy, would it make sense to queue non time-critical inserts in a local MSMQ queue to improve up-time?
Theoretically, I can then pause/resume queue processing (persistence) while making database changes. Has anyone tried this or is there a better way?

If you're looking at higher availability by queueing locally then you should consider Service Broker deployed on SQL Express instances collocated with your IIS/ASP instance. The advantage of using SSB over MSMQ is that you have consistency between your message store and your data store (one consistent backup/restore, one consistent failover unit), it does scale much better than MSMQ under load, it does not require tw-phase-commit DTC to coordinate the MSMQ dequeue with the DB insert (can use one local DB transaction to dequeue/insert), it offers queryability of the pending messages (SELECT .. FROM queue), is integrated with the DB HA/DR solution (cluster failover/mirroring), you get DB contained activation and it all works from the familiar T-SQL programming environment. MSMQ's main advantage is support of a client side C#/.Net API.

I was on a team that implemented this for purposes of guaranteed delivery. We used MSMQ to forward the insert requests to the database server, which had its own service running that dequeued the requests and ran the inserts, then acknowledged the message (to ensure delivery). It's been running for over a year now, and we've never been asked to come figure out why it isn't working...seems pretty solid to me.

This is very subjective because it depends on what your application does and how. Generally, something like MSMQ is not used for this purpose, rather you want to set up some kind of high-availability clustering on your database of choice. The occurrence of a database going completely down is rare in most cases, and generally a bigger problem for most LOB applications than just having somewhere to store data entered while the DB is down for whatever reason.
There's also overhead to think about. An INSERT operation to a database is relatively quick (in the larger scheme of things); writing a serialized something into a queue and having something pick it up and do that insert operation is going to add large amounts of lag to your application, not to mention the fact that you'll have to account for the fact that now everything is asynchronous.
That said, MSMQ can be used to ensure delivery of stuff from one end of an application to another, so I suppose there are instances where this scenario might be desirable. Most of the time though you're just better off trusting your DB and using MSMQ to enable asynchronous processing and performing interprocess and intermachine communication.

Related

Can someone tell me if SQL Server Service Broker is needed for this scenario?

My first ever question on stack overflow so please go easy. I have a long running windows application that continually processes sql server commands. I also have a web front end that users use occasionally use to update the same db. I've noticed that sometimes (depending on what the windows application is processing at the time) that if a user submits something to the db I receive out of memory exceptions on the server. I realise I need to dig around a bit more and optimise the code. However I cannot afford the server to go down and expect that in the future i'll be allowing more and more users on the frontend. What i really need is a system that will queue the users requests (they are not time critical) and process them when the db is ready.
I'm using SQL 2012 express.
Is SQL Service Broker the best solution, i've also looked into MSMQ.
If so can someone point me in the right direction for it would be appreciate. In my search i'm just finding a lot of things it does that i don't think i need.
Cheers

It depends on where you're doing the persistence work, and / or calculations. If you're doing the hard work in your Windows Application, then using a Service Broker queue won't be worthwhile, as all you will be doing is receiving a message from the Service Broker queue in your Windows Application, doing your calculations and / or queries from the Windows Application, and then persisting the results to the database: as your database is already under memory pressure, this seems like an unnecessary extra load as you could just as easily queue and retrieve the message from MSMQ (or any other queueing technology).
If however you are doing all the work in the database and your Windows Application just acts as a marshalling service - eg taking the request and palming it off to a stored procedure for actioning - then Service Broker Queues may be worth using: because they are already operating within the context of the database, they can be very efficient at persisting amd querying data.
You would also want to take into failure modes, depending on whether or not you can afford to lose any messages. To ensure message persistence in MSMQ you have to use Transactional Messaging: Service Broker is more efficient at transactional queue processing than MSMQ (because it has transaction support built in, unlike MSMQ which has to use DTC, which adds an overhead) - but if your volume of messages is low, this may not be an issue.

Generating events from SQL server

I am looking for a best practice or example of how I might be able to generate events for all update events on a given SQL Server 2008 R2 db. To be more descriptive, I am working on a POC where I would essentially publish update events to a queue (RabbitMq in my case) that could then be consumed by various consumers. This would be the first part of implementing a CQRS query-only data model via event sourcing. By placing on the que anybody could then subscribe to these events for replication into any number of query-only data models. This part is clear and fairly well-defined. the problem I am having is determining the best approach for generating the events out from SQL server. I have been given a few ideas such as monitoring the transaction log and SSIS. However, I'm not entirely sure if these options are adviseable or even feasible.
Does anybody have any experience with this sort of thing or have any notions on how to go about such an adventure? any help or guidance would be greatly appreciated.

You cannot monitor the log because, even if you would be able to understand it, you have the problem of the log being recycled before you had a chance to read it. Unless the log is somehow marked not to be truncated it will reused. For instance when transactional replication is enabled the log be pinned until is read by the replication agent and only then truncated.
SSIS is a very broad concept and saying that 'using SSIS to detect changes' is akin to saying 'I'll use a programing language to solve my problem'. The details is how would you use SSIS? There is no way, with or without SSIS, to reliably detect data changes on an arbitrary schema. Even data models specifically designed to allow for detecting changes have issues, specially at detecting deletes.
However there are viable alternatives. You can deploy Change Data Capture and delegate to the engine itself to track the changes. Consuming these detected changes and publishing them to consumers (via RabbitMQ if that's your fancy) is a something SSIS would be good at. But you have to understand that SSIS does not fare well to continuos, real-time tasks. It is designed to run periodically on batches, so your change notification consumers will be notified in spikes, with long delays (minutes), when the SSIS jobs run.
For a real-time approach a better solution is Service Broker. One possibility is to SEND Service Broker messages from triggers, but I would not recommend it. A better design is to have the application itself publish the changes by SEND-ing the message explicitly, when it does the data modification. With SQL Server 2012 is possible to multicast Service Broker messages to other SQL Server consumers (including SQL Server Express). SSB message delivery is fully transactional (no message gets sent if transaction rolls back) and does not require two-phase-commit with a message store resource manager. But to broadcast via RabbitMQ you would need to bridge the communication, ie. RECEIVE the SSB messages and transform them into RabbitMQ notifications.

Using MSMQ when you already have SQL Server and BizTalk

Simple question: Is there any good reason to add MSMQ to an existing messaging framework which already has multiple BizTalk and SQL Server nodes?
Here's the background: We have a messaging framework to process bills, the load is rather low right now (at most 10,000 a day), but it's ramping up. We use BizTalk and SQL Server for all the processing, and we started noticing a few timeouts when inserting (synchronously) into one of the databases (NOT the BizTalk message box). One of our senior programmers suggested we use MSMQ to save (asynchronously) the data that causes the timeout and process it later; the solution he designed works and it's about to be deployed, but I'm still wondering if that was the right decision, considering that we could have used BizTalk itself or SQL Server Service Broker (SSSB). There's a lot of discussions about those three technologies, but they're usually about having to choose one of them over the others, I haven't seen any case of anyone who already had BizTalk and SSSB and decided to add MSMQ to the mix. In our case I think it's an unnecessary addition to our technology stack, but that may be my own bias (and ignorance too), since I know SSSB better and never did anything big with MSMQ. What do you think?

It sounds like you should figure out why your inserts are taking so long, and fix that instead. 10,000 / day is nothing for a decent box running SQL Server.
EDIT:
Adding any sort of asynchronous processing is a form of kicking the can down the road. Assume your inserts take one minute (I realize they probably don't, but for argument's sake). If you make your inserts asynchronous, you can still only handle 1440 inserts per day until you start falling behind. You are always going to need to speed up your inserts eventually.
Now with that said, I don't think that there is any compelling benefit in this case of using MSMQ over SSSB (or vice-versa). It could be argued that with MSMQ you need to hand-code a listener daemon that does your inserts, whereas with SSSB you have that automatically within the database. On the other hand, with MSMQ you are offloading the storage of the messages to another server, potentially offloading some of the immediate stress from your SQL Server.

I would argue that if you just wanted to take the database calls off-line then you could do that with BizTalk (for example, by creating an "offline" host - thereby creating a new host queue).
Where msmq really excels is on the inbound side of BizTalk. Systems can call to BizTalk not caring about the availability of BizTalk itself. The messages will just hang around until BizTalk is available again.

I'm with Hugh - we've used MSMQ (and IBM MQ Series) successfully with BizTalk for asynchronous, transactional traffic (mostly financial transactions, where the need for traceable, reliable, ACID type message delivery outweighs any need for transaction latency).
We've found the benefits of MSMQ to be:
Transactional delivery - messages can be pulled off by the destination system and inserted into SQL under a 2 phase UOW.
Hugh's point about delivery decoupled from system availability (and you still have the Dead Letter Queue if the target system is down for an unreasonable period of time)
Load balancing / throttling - a destination system can protect against overzealous message delivery by pulling messages off the queue at a more even pace.
Auditing - using the journalling on MSMQ allows an additional layer of tracing.
Also note that there is a WCF adapter for MSMQ - no requirement for custom listeners.
We generally stay away from calling SQL directly from BizTalk:
For reading this equates to polling the database in the hope that there are messages ready to be sent (this can create issues relating to frequency of calling, i.e. redundancy, induced latency, and load on SQL, and contention - e.g. polling while data is being added by an app to the tables. We would rather have each app decide when to submit messages to BizTalk / ESB.
for write operations, unless data is offloaded into a staging area for processing by destination apps, it can lead to much of the 'business' processing moving into BizTalk (i.e. validation, applying business rules etc) - IMHO this is too fine-grained for BizTalk. And as you've found, it can be hard to control the rate of message delivery into SQL (e.g. unless you start using Singleton Orhcestrations etc), which again causes locking / contention issues.

Caching to a local SQL instance on a web server

I run a very high traffic(10m impressions a day)/high revenue generating web site built with .net. The core meta data is stored on a SQL server. My team and I have a unique caching strategy that involves querying the database for new meta data at regular intervals from a middle tier server, serializing the data to files and sending those to the web nodes. The web application uses the data in these files (some are actually serialized objects) to instantiate objects and caches those in memory to use for real time requests.
The advantage of this model is that it:
Allows the web nodes to cache all data in memory and not incur any IO overhead querying a database.
If the database ever goes down either unexpectedly or for maintenance windows, the web servers will continue to run and generate revenue. You can even fire up a web server without having to retrieve its initial data from the DB because all the data it needs are in files on its own disks.
Allows us to be completely horizontally scalable. If throughput suffers, we can just add a web server.
The disadvantages are that this caching and persistense layers adds complexity in the code that queries the database, packages the data and unpackages it on the web server. Any time our domain model requires us to add entities, more of this "plumbing" has to be coded. This architecture has been in place for four years and there are probably better ways to tackle this.
One strategy I have been considering is using replication to replicate our master sql server database to local database instances installed on each web server. The web server application would use normal sql/ORM techniques to instantiate objects. Here, we can still sustain a master database outage and we would not have to code up specialized caching code and could instead use nHibernate to handle the persistence.
This seems like a more elegant solution and would like to see what others think or if anyone else has any alternatives to suggest.

I think you're overthinking this. SQL Server already has mechanisms available to you to handle these kinds of things.
First, implement a SQL Server cluster to protect your main database. You can fail over from node to node in the cluster without losing data, and downtime is a matter of seconds, max.
Second, implement database mirroring to protect from a cluster failure. Depending on whether you use synchronous or asynchronous mirroring, your mirrored server will either be updated in realtime or a few minutes behind. If you do it in realtime, you can fail over to the mirror automatically inside your app - SQL Server 2005 & above support embedding the mirror server's name in the connection string, so you don't even have to lift a finger. The app just connects to whatever server's live.
Between these two things, you're protected from just about any main database failure short of a datacenter-wide power outage or network outage, and there's none of the complexity of the replication stuff. That covers your high availability issue, and lets you answer the scaling question separately.
My favorite starting point for scaling is using three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of read-only SQL Servers that are getting updated by replication or log shipping. In your original design, these lived on the web servers, but that's dangerous practice and a maintenance nightmare. SQL Server needs a lot of memory (not to mention money for licensing) and you don't want to be tied into adding a database server for every single web server.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, and doesn't involve any coding complexity - just pick the right connection string.

Have you considered memcached? Since it is:
in memory
can run locally
fully scalable horizontally
prevents the need to re-cache on each web server
It may fit the bill. Check out Google for lots of details and usage stories.

Just some addition to what RickNZ proposed above..
Since your master data which you are caching currently won't change so frequently and probably over some maintenance window, here is what should you do first on database side:
Create a SNAPSHOT replication for the master tables which you want to cache. Adding new entities will be equally easy.
On all the webservers, install SQL Express and subscribe to this Publication.
Since, this is not a frequently changing data, you can rest assure, no much server resource usage issue minus network trips for master data.
All your caching which was available via previous mechanism is still availbale minus all headache which comes when you add new entities.
Next, you can leverage .NET mechanisms as suggested above. You won't face memcached cluster failure unless your webserver itself goes down. There is a lot availble in .NET which a .NET pro can point out after this stage.

It seems to me that Windows Server AppFabric is exactly what you are looking for. (AKA "Velocity"). From the introductory documentation:
Windows Server AppFabric provides a
distributed in-memory application
cache platform for developing
scalable, available, and
high-performance applications.
AppFabric fuses memory across multiple
computers to give a single unified
cache view to applications.
Applications can store any
serializable CLR object without
worrying about where the object gets
stored. Scalability can be achieved by
simply adding more computers on
demand. The cache also allows for
copies of data to be stored across the
cluster, thus protecting data against
failures. It runs as a service
accessed over the network. In
addition, Windows Server AppFabric
provides seamless integration with
ASP.NET that enables ASP.NET session
objects to be stored in the
distributed cache without having to
write to databases. This increases
both the performance and scalability
of ASP.NET applications.

Have you considered using SqlDependency caching?
You could also write the data to the local disk at the web tier, if you're concerned about initial start-up time or DB outages. But at least with a SqlDependency, you shouldn't have to poll the DB to look for changes. It can also be made relatively transparent.
In my experience, adding a DB instance on web servers generally doesn't work out too well from a scalability or performance perspective.
If you're concerned about performance and scalability, you might consider partitioning your data tier. The specifics depend on your app, but as an example, you could move read-only data onto a couple of SQL Express servers that are populated with replication.
In case it helps, I talk about this subject at length in my book (Ultra-Fast ASP.NET).

Pattern for very slow DB Server

I am building an Asp.net MVC site where I have a fast dedicated server for the web app but the database is stored in a very busy Ms Sql Server used by many other applications.
Also if the web server is very fast, the application response time is slow mainly for the slow response from the db server.
I cannot change the db server as all data entered in the web application needs to arrive there at the end (for backup reasons).
The database is used only from the webapp and I would like to find a cache mechanism where all the data is cached on the web server and the updates are sent to the db asynchronously.
It is not important for me to have an immediate correspondence between read db data and inserted data: think like reading questions on StackOverflow and new inserted questions that are not necessary to show up immediately after insertion).
I thought to build an in between WCF service that would exchange and sync the data between the slow db server and a local one (may be an Sqllite or an SqlExpress one).
What would be the best pattern for this problem?

What is your bottleneck? Reading data or Writing data?
If you are concerning about reading data, using a memory based data caching machanism like memcached would be a performance booster, As of most of the mainstream and biggest web sites doing so. Scaling facebook hi5 with memcached is a good read. Also implementing application side page caches would drop queries made by the application triggering lower db load and better response time. But this will not have much effect on database servers load as your database have some other heavy users.
If writing data is the bottleneck, implementing some kind of asyncronyous middleware storage service seems like a necessity. If you have fast and slow response timed data storage on the frontend server, going with a lightweight database storage like mysql or postgresql (Maybe not that lightweight ;) ) and using your real database as an slave replication server for your site is a good choise for you.

I would do what you are already considering. Use another database for the application and only use the current one for backup-purposes.

I had this problem once, and we decided to go for a combination of data warehousing (i.e. pulling data from the database every once in a while and storing this in a separate read-only database) and message queuing via a Windows service (for the updates.)
This worked surprisingly well, because MSMQ ensured reliable message delivery (updates weren't lost) and the data warehousing made sure that data was available in a local database.
It still will depend on a few factors though. If you have tons of data to transfer to your web application it might take some time to rebuild the warehouse and you might need to consider data replication or transaction log shipping. Also, changes are not visible until the warehouse is rebuilt and the messages are processed.
On the other hand, this solution is scalable and can be relatively easy to implement. (You can use integration services to pull the data to the warehouse for example and use a BL layer for processing changes.)

There are many replication techniques that should give you proper results. By installing a SQL Server instance on the 'web' side of your configuration, you'll have the choice between:
Making snapshot replications from the web side (publisher) to the database-server side (suscriber). You'll need a paid version of SQLServer on the web server. I have never worked on this kind of configuration but it might use a lot of the web server ressources at scheduled synchronization times
Making merge (or transactional if requested) replication between the database-server side (publisher) and web side(suscriber). You can then use the free version of MS-SQL Server and schedule the synchronization process to run according to your tolerance for potential loss of data if the web server goes down.

I wonder if you could improve it adding a MDF file in your Web side instead dealing with the Sever in other IP...
Just add an SQL 2008 Server Express Edition file and try, as long as you don't pass 4Gb of data you will be ok, of course there are more restrictions but, just for the speed of it, why not trying?

You should also consider the network switches involved. If the DB server is talking to a number of web servers then it may be being constrained by the network connection speed. If they are only connected via a 100mb network switch then you may want to look at upgrading that too.

the WCF service would be a very poor engineering solution to this problem - why make your own when you can use the standard SQLServer connectivity mechanisms to ensure data is transferred correctly. Log shipping will send the data across at selected intervals.
This way, you get the fast local sql server, and the data is preserved correctly in the slow backup server.
You should investigate the slow sql server though, the performance problem could be nothing to do with its load, and more to do with the queries and indexes you're asking it to work with.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight