Simple question: Is there any good reason to add MSMQ to an existing messaging framework which already has multiple BizTalk and SQL Server nodes?
Here's the background: We have a messaging framework to process bills, the load is rather low right now (at most 10,000 a day), but it's ramping up. We use BizTalk and SQL Server for all the processing, and we started noticing a few timeouts when inserting (synchronously) into one of the databases (NOT the BizTalk message box). One of our senior programmers suggested we use MSMQ to save (asynchronously) the data that causes the timeout and process it later; the solution he designed works and it's about to be deployed, but I'm still wondering if that was the right decision, considering that we could have used BizTalk itself or SQL Server Service Broker (SSSB). There's a lot of discussions about those three technologies, but they're usually about having to choose one of them over the others, I haven't seen any case of anyone who already had BizTalk and SSSB and decided to add MSMQ to the mix. In our case I think it's an unnecessary addition to our technology stack, but that may be my own bias (and ignorance too), since I know SSSB better and never did anything big with MSMQ. What do you think?
It sounds like you should figure out why your inserts are taking so long, and fix that instead. 10,000 / day is nothing for a decent box running SQL Server.
EDIT:
Adding any sort of asynchronous processing is a form of kicking the can down the road. Assume your inserts take one minute (I realize they probably don't, but for argument's sake). If you make your inserts asynchronous, you can still only handle 1440 inserts per day until you start falling behind. You are always going to need to speed up your inserts eventually.
Now with that said, I don't think that there is any compelling benefit in this case of using MSMQ over SSSB (or vice-versa). It could be argued that with MSMQ you need to hand-code a listener daemon that does your inserts, whereas with SSSB you have that automatically within the database. On the other hand, with MSMQ you are offloading the storage of the messages to another server, potentially offloading some of the immediate stress from your SQL Server.
I would argue that if you just wanted to take the database calls off-line then you could do that with BizTalk (for example, by creating an "offline" host - thereby creating a new host queue).
Where msmq really excels is on the inbound side of BizTalk. Systems can call to BizTalk not caring about the availability of BizTalk itself. The messages will just hang around until BizTalk is available again.
I'm with Hugh - we've used MSMQ (and IBM MQ Series) successfully with BizTalk for asynchronous, transactional traffic (mostly financial transactions, where the need for traceable, reliable, ACID type message delivery outweighs any need for transaction latency).
We've found the benefits of MSMQ to be:
Transactional delivery - messages can be pulled off by the destination system and inserted into SQL under a 2 phase UOW.
Hugh's point about delivery decoupled from system availability (and you still have the Dead Letter Queue if the target system is down for an unreasonable period of time)
Load balancing / throttling - a destination system can protect against overzealous message delivery by pulling messages off the queue at a more even pace.
Auditing - using the journalling on MSMQ allows an additional layer of tracing.
Also note that there is a WCF adapter for MSMQ - no requirement for custom listeners.
We generally stay away from calling SQL directly from BizTalk:
For reading this equates to polling the database in the hope that there are messages ready to be sent (this can create issues relating to frequency of calling, i.e. redundancy, induced latency, and load on SQL, and contention - e.g. polling while data is being added by an app to the tables. We would rather have each app decide when to submit messages to BizTalk / ESB.
for write operations, unless data is offloaded into a staging area for processing by destination apps, it can lead to much of the 'business' processing moving into BizTalk (i.e. validation, applying business rules etc) - IMHO this is too fine-grained for BizTalk. And as you've found, it can be hard to control the rate of message delivery into SQL (e.g. unless you start using Singleton Orhcestrations etc), which again causes locking / contention issues.
Related
My first ever question on stack overflow so please go easy. I have a long running windows application that continually processes sql server commands. I also have a web front end that users use occasionally use to update the same db. I've noticed that sometimes (depending on what the windows application is processing at the time) that if a user submits something to the db I receive out of memory exceptions on the server. I realise I need to dig around a bit more and optimise the code. However I cannot afford the server to go down and expect that in the future i'll be allowing more and more users on the frontend. What i really need is a system that will queue the users requests (they are not time critical) and process them when the db is ready.
I'm using SQL 2012 express.
Is SQL Service Broker the best solution, i've also looked into MSMQ.
If so can someone point me in the right direction for it would be appreciate. In my search i'm just finding a lot of things it does that i don't think i need.
Cheers
It depends on where you're doing the persistence work, and / or calculations. If you're doing the hard work in your Windows Application, then using a Service Broker queue won't be worthwhile, as all you will be doing is receiving a message from the Service Broker queue in your Windows Application, doing your calculations and / or queries from the Windows Application, and then persisting the results to the database: as your database is already under memory pressure, this seems like an unnecessary extra load as you could just as easily queue and retrieve the message from MSMQ (or any other queueing technology).
If however you are doing all the work in the database and your Windows Application just acts as a marshalling service - eg taking the request and palming it off to a stored procedure for actioning - then Service Broker Queues may be worth using: because they are already operating within the context of the database, they can be very efficient at persisting amd querying data.
You would also want to take into failure modes, depending on whether or not you can afford to lose any messages. To ensure message persistence in MSMQ you have to use Transactional Messaging: Service Broker is more efficient at transactional queue processing than MSMQ (because it has transaction support built in, unlike MSMQ which has to use DTC, which adds an overhead) - but if your volume of messages is low, this may not be an issue.
I am looking for a best practice or example of how I might be able to generate events for all update events on a given SQL Server 2008 R2 db. To be more descriptive, I am working on a POC where I would essentially publish update events to a queue (RabbitMq in my case) that could then be consumed by various consumers. This would be the first part of implementing a CQRS query-only data model via event sourcing. By placing on the que anybody could then subscribe to these events for replication into any number of query-only data models. This part is clear and fairly well-defined. the problem I am having is determining the best approach for generating the events out from SQL server. I have been given a few ideas such as monitoring the transaction log and SSIS. However, I'm not entirely sure if these options are adviseable or even feasible.
Does anybody have any experience with this sort of thing or have any notions on how to go about such an adventure? any help or guidance would be greatly appreciated.
You cannot monitor the log because, even if you would be able to understand it, you have the problem of the log being recycled before you had a chance to read it. Unless the log is somehow marked not to be truncated it will reused. For instance when transactional replication is enabled the log be pinned until is read by the replication agent and only then truncated.
SSIS is a very broad concept and saying that 'using SSIS to detect changes' is akin to saying 'I'll use a programing language to solve my problem'. The details is how would you use SSIS? There is no way, with or without SSIS, to reliably detect data changes on an arbitrary schema. Even data models specifically designed to allow for detecting changes have issues, specially at detecting deletes.
However there are viable alternatives. You can deploy Change Data Capture and delegate to the engine itself to track the changes. Consuming these detected changes and publishing them to consumers (via RabbitMQ if that's your fancy) is a something SSIS would be good at. But you have to understand that SSIS does not fare well to continuos, real-time tasks. It is designed to run periodically on batches, so your change notification consumers will be notified in spikes, with long delays (minutes), when the SSIS jobs run.
For a real-time approach a better solution is Service Broker. One possibility is to SEND Service Broker messages from triggers, but I would not recommend it. A better design is to have the application itself publish the changes by SEND-ing the message explicitly, when it does the data modification. With SQL Server 2012 is possible to multicast Service Broker messages to other SQL Server consumers (including SQL Server Express). SSB message delivery is fully transactional (no message gets sent if transaction rolls back) and does not require two-phase-commit with a message store resource manager. But to broadcast via RabbitMQ you would need to bridge the communication, ie. RECEIVE the SSB messages and transform them into RabbitMQ notifications.
I have one central database and 25 client databases and all have same schema.
I want that whenever some changes are done in some tables of the central database then these changes flow down to the client database.
The databases used is SQL Express so I cannot use replication.
The solution that I have today is to make keep track of the changes in the central database and then a program makes a text file with these changes and sends them down to the client databases.Another program reads these text files and updates the client database.
There are three problems with this:-
1. The files get lost or arrive in jumbled order which messes up the client data
2. the process is slow
3. the programs are sometimes shutdown so the whole sync flow gets stopped.
Is there a reliable alternative that is fast and secure ?
I wonder how banking software are made ...they never lose transactions and they are fast.
Add an UpdateDate column to all the entities that need to be replicated. At each client add a linked server to the central repository. Now, every 5 minutes or so, poll your central repository for changes using the last UpdateDate of a client entity and grab the delta.
Then use merge or insert and update to merge data on the client. That's a very reliable way of doing homebrew replication. To keep track of deleted elements you would either want to mark them as deleted or have another table to keep track of entity kind and its reference, again combined with UpdateDate for replication.
Update
Then you mention transactions and banking software. When you do your replication via files, we ain't talkin' about no transactional replication here, not by a long shot.
If you need transactional consistency you need to subscribe to the transaction flow of the data warehouse.
I don't want to be unhelpful and you haven't given any background about your business needs, but you have to decide if your priority is really "fast and secure" or if it's actually "cheap". Replicating changes between multiple databases in a reliable, consistent way is not easy (as you know) and it's highly unlikely that you will be able to develop a solution yourself that has the features, stability and performance of SQL Server replication.
SQL Express can be a replication subscriber, by the way, so it's not clear why it doesn't meet your needs. But if it doesn't, you should estimate the cost to your business (or customer) of dealing with issues caused by an unreliable solution: your time, business downtime, finding and correcting incorrect data, customer complaints, lost business etc. Then compare that to the cost of 25 SQL Server licenses (you should certainly be able to get a good discount when you order that volume), additional hardware (if any) and the costs of training, consulting and/or learning how to use replication. Then extrapolate those costs over 5 years or so. You may find that it's cheaper just to buy the solution you need. And of course buying the full SQL Server edition means you get a lot of other new features that might be useful to you.
If you (or your boss) is really determined to get something for nothing, you might want to investigate PostgreSQL or MySQL. They both have free replication solutions that seem to be widely enough used to be reliable for many companies. Of course, you then need to calculate the costs of switching to a new database platform.
If you have one central database and 25 clients, you can easily do it with one (yes only one) SQL server licence for the main database. Subscribers to this database can run SQL express. As long as users access the the client databases, you are not even obliged to buy SQL CALs.
Back to banking software, be sure that they are paying good money for their server licenses! So don't be surprised if these are reliable and fast ...
I'm learning about MSMQ and am successfully using it to queue email and text messages from a consumer-facing ASP.NET MVC website, to be handled by a separate client application.
In the event of a missing SQL Server database, perhaps while swapping drives or a broken database deploy, would it make sense to queue non time-critical inserts in a local MSMQ queue to improve up-time?
Theoretically, I can then pause/resume queue processing (persistence) while making database changes. Has anyone tried this or is there a better way?
If you're looking at higher availability by queueing locally then you should consider Service Broker deployed on SQL Express instances collocated with your IIS/ASP instance. The advantage of using SSB over MSMQ is that you have consistency between your message store and your data store (one consistent backup/restore, one consistent failover unit), it does scale much better than MSMQ under load, it does not require tw-phase-commit DTC to coordinate the MSMQ dequeue with the DB insert (can use one local DB transaction to dequeue/insert), it offers queryability of the pending messages (SELECT .. FROM queue), is integrated with the DB HA/DR solution (cluster failover/mirroring), you get DB contained activation and it all works from the familiar T-SQL programming environment. MSMQ's main advantage is support of a client side C#/.Net API.
I was on a team that implemented this for purposes of guaranteed delivery. We used MSMQ to forward the insert requests to the database server, which had its own service running that dequeued the requests and ran the inserts, then acknowledged the message (to ensure delivery). It's been running for over a year now, and we've never been asked to come figure out why it isn't working...seems pretty solid to me.
This is very subjective because it depends on what your application does and how. Generally, something like MSMQ is not used for this purpose, rather you want to set up some kind of high-availability clustering on your database of choice. The occurrence of a database going completely down is rare in most cases, and generally a bigger problem for most LOB applications than just having somewhere to store data entered while the DB is down for whatever reason.
There's also overhead to think about. An INSERT operation to a database is relatively quick (in the larger scheme of things); writing a serialized something into a queue and having something pick it up and do that insert operation is going to add large amounts of lag to your application, not to mention the fact that you'll have to account for the fact that now everything is asynchronous.
That said, MSMQ can be used to ensure delivery of stuff from one end of an application to another, so I suppose there are instances where this scenario might be desirable. Most of the time though you're just better off trusting your DB and using MSMQ to enable asynchronous processing and performing interprocess and intermachine communication.
I run a very high traffic(10m impressions a day)/high revenue generating web site built with .net. The core meta data is stored on a SQL server. My team and I have a unique caching strategy that involves querying the database for new meta data at regular intervals from a middle tier server, serializing the data to files and sending those to the web nodes. The web application uses the data in these files (some are actually serialized objects) to instantiate objects and caches those in memory to use for real time requests.
The advantage of this model is that it:
Allows the web nodes to cache all data in memory and not incur any IO overhead querying a database.
If the database ever goes down either unexpectedly or for maintenance windows, the web servers will continue to run and generate revenue. You can even fire up a web server without having to retrieve its initial data from the DB because all the data it needs are in files on its own disks.
Allows us to be completely horizontally scalable. If throughput suffers, we can just add a web server.
The disadvantages are that this caching and persistense layers adds complexity in the code that queries the database, packages the data and unpackages it on the web server. Any time our domain model requires us to add entities, more of this "plumbing" has to be coded. This architecture has been in place for four years and there are probably better ways to tackle this.
One strategy I have been considering is using replication to replicate our master sql server database to local database instances installed on each web server. The web server application would use normal sql/ORM techniques to instantiate objects. Here, we can still sustain a master database outage and we would not have to code up specialized caching code and could instead use nHibernate to handle the persistence.
This seems like a more elegant solution and would like to see what others think or if anyone else has any alternatives to suggest.
I think you're overthinking this. SQL Server already has mechanisms available to you to handle these kinds of things.
First, implement a SQL Server cluster to protect your main database. You can fail over from node to node in the cluster without losing data, and downtime is a matter of seconds, max.
Second, implement database mirroring to protect from a cluster failure. Depending on whether you use synchronous or asynchronous mirroring, your mirrored server will either be updated in realtime or a few minutes behind. If you do it in realtime, you can fail over to the mirror automatically inside your app - SQL Server 2005 & above support embedding the mirror server's name in the connection string, so you don't even have to lift a finger. The app just connects to whatever server's live.
Between these two things, you're protected from just about any main database failure short of a datacenter-wide power outage or network outage, and there's none of the complexity of the replication stuff. That covers your high availability issue, and lets you answer the scaling question separately.
My favorite starting point for scaling is using three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of read-only SQL Servers that are getting updated by replication or log shipping. In your original design, these lived on the web servers, but that's dangerous practice and a maintenance nightmare. SQL Server needs a lot of memory (not to mention money for licensing) and you don't want to be tied into adding a database server for every single web server.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, and doesn't involve any coding complexity - just pick the right connection string.
Have you considered memcached? Since it is:
in memory
can run locally
fully scalable horizontally
prevents the need to re-cache on each web server
It may fit the bill. Check out Google for lots of details and usage stories.
Just some addition to what RickNZ proposed above..
Since your master data which you are caching currently won't change so frequently and probably over some maintenance window, here is what should you do first on database side:
Create a SNAPSHOT replication for the master tables which you want to cache. Adding new entities will be equally easy.
On all the webservers, install SQL Express and subscribe to this Publication.
Since, this is not a frequently changing data, you can rest assure, no much server resource usage issue minus network trips for master data.
All your caching which was available via previous mechanism is still availbale minus all headache which comes when you add new entities.
Next, you can leverage .NET mechanisms as suggested above. You won't face memcached cluster failure unless your webserver itself goes down. There is a lot availble in .NET which a .NET pro can point out after this stage.
It seems to me that Windows Server AppFabric is exactly what you are looking for. (AKA "Velocity"). From the introductory documentation:
Windows Server AppFabric provides a
distributed in-memory application
cache platform for developing
scalable, available, and
high-performance applications.
AppFabric fuses memory across multiple
computers to give a single unified
cache view to applications.
Applications can store any
serializable CLR object without
worrying about where the object gets
stored. Scalability can be achieved by
simply adding more computers on
demand. The cache also allows for
copies of data to be stored across the
cluster, thus protecting data against
failures. It runs as a service
accessed over the network. In
addition, Windows Server AppFabric
provides seamless integration with
ASP.NET that enables ASP.NET session
objects to be stored in the
distributed cache without having to
write to databases. This increases
both the performance and scalability
of ASP.NET applications.
Have you considered using SqlDependency caching?
You could also write the data to the local disk at the web tier, if you're concerned about initial start-up time or DB outages. But at least with a SqlDependency, you shouldn't have to poll the DB to look for changes. It can also be made relatively transparent.
In my experience, adding a DB instance on web servers generally doesn't work out too well from a scalability or performance perspective.
If you're concerned about performance and scalability, you might consider partitioning your data tier. The specifics depend on your app, but as an example, you could move read-only data onto a couple of SQL Express servers that are populated with replication.
In case it helps, I talk about this subject at length in my book (Ultra-Fast ASP.NET).