Same ZMQ_IDENTITY for multiple Subcribers - uuid

I am building a zeromq PUB-SUB pattern with lots of subscribers to one publisher.
I want to build durable Subscribers, so I found out about the ZMQ_IDENTITy socket-option.
What should I take into account when choosing the value for the Identity?
And can I take the same for all subscribers?
Since they are all the same type but on different machines, that should be a problem, right?
Furthermore, are UUIDs only generated vie inproc service as in the zguide example explained?

Don't use the IDENTITY option. Durable subscribers were removed from ZeroMQ/3.x so your application would be stuck on the 2.x version.
The reason that durable subscribers are so bad we removed them is that they consume arbitrary amounts of memory on the publisher (the messages have to stay somewhere), which leads too easily to out-of-memory, the fastest way to kill your pubsub system for all subscribers.
If you want a durable subscriber model, you can construct it quite easily on top of ZeroMQ, and there's an full worked example in the Guide (Clone pattern).
I'm not sure if anyone's built a durable pubsub broker over ZeroMQ, it would be a fun exercise and not so difficult.

Related

Message queue with conditional processing and consensus between workers

I'm implementing workflow engine, where a job request is received first and executed later by a pool of workers. Sounds like a typical message queue use case.
However, there are some restrictions for parallel processing. For example, it's not allowed to run concurrent jobs for the same customer. In other words, there must be some sort of consensus between workers.
I'm currently using database table with business identifiers, status flags, row locking and conditional queries to store and poll available jobs according to spec. It works, but using database for asynchronous processing feels counterintuitive. Does messaging systems support my requirements of conditional processing?
As an author of a few workflow engines, I believe that the persistence component for maintaining state is essential. I cannot imagine a workflow engine that only uses queues.
Unless you are just doing it just for fun implementing your own is a weird idea. A fully featured workflow engine is an extremely complex piece of software comparable to a database. I would recommend looking into existing ones instead of building your own if it is for production use. You can start from my open source project temporal.io :). It is used by thousands of companies for mission-critical applications and can scale to almost any rate given enough DB capacity.

SalesForce Notifications - Reliable Integration

I need to develop a system that is listening to the changes happened with SalesForce objects and transfers them to my end.
Initially I considered SalesForce Streaming API that allows exactly that - create a push topic that subscribes to objects notifications and later have a set of clients that are reading them using long polling.
However such approach doesn't guarantee durability and reliable delivery of notifications - which I am in need.
What will be the architecture allowing to implement the same functionality in reliable way?
One approach I have in mind is create a Force.com applications that uses SalesForce triggers to subscribe to notifications and later just sends them using HTTPS to the cloud or my Data Server. Will this be a valid option - or are there any better ones?
I two very good questions on salesforce.stackexchange.com covering this very topic in details:
https://salesforce.stackexchange.com/questions/16587/integrating-a-real-time-notification-application-with-salesforce
https://salesforce.stackexchange.com/questions/20600/best-approach-for-a-package-to-respond-to-dml-events-dynamically-without-object

Is RabbitMQ, ZeroMQ, Service Broker or something similar an appropriate solution for creating a high availability database webservice?

I have a CRUD webservice, and have been tasked with trying to figure out a way to ensure that we don't lose data when the database goes down. Everyone is aware that if the database goes down we won't be able to get "reads" but for a specific subset of the operations we want to make sure that we don't lose data.
I've been given the impression that this is something that is covered by services like 0MQ, RabbitMQ, or one of the Microsoft MQ services. Although after a few days of reading and research, I'm not even certain that the messages we're talking about in MQ services include database operations. I am however 100% certain that I can queue up as many hello worlds as I could ever hope for.
If I can use a message queue for adding a layer of protection to the database, I'd lean towards Rabbit (because it appears to persist through crashes) but since the target is a Microsoft SQL server databse, perhaps one of their solutions (such as SQL Service Broker, or MSMQ) is more appropriate.
The real fundamental question that I'm not yet sure of though is whether I'm even playing with the right deck of cards (so to speak).
With the desire for a high-availablity webservice, that continues to function if the database goes down, does it make sense to put a Rabbit MQ instance "between" the webservice and the database? Maybe the right link in the chain is to have RabbitMQ send messages to the webserver?
Or is there some other solution for achieving this? There are a number of lose ideas at the moment around finding a way to roll up weblogs in the event of database outage or something... but we're still in early enough stages that (at least I) have no idea what I'm going to do.
Is message queue the right solution?
Introducing message queuing in between a service and it's database operations is certainly one way of improving service availability. Writing to a local temporary queue in a store-and-forward scenario will always be more available than writing to a remote database server, simply by being a local operation.
Additionally by using queuing you gain greater control over the volume and nature of database traffic your database has to handle at peak. Database writes can be queued, routed, and even committed in a different order.
However, in order to do this you need to be aware that when a database write is performed it is processed off-line. Even under conditions where this happens almost instantaneously, you are losing a benefit that the synchronous nature of your current service gives you, which is that your service consumers can always know if the database write operation is successful or not.
I have written about this subject before here. The user posting the question had similar concerns to you. Whether you do this or not is a decision you have to make based on whether this is something your consumers care about or not.
As for the technology stacks you are thinking of this off-line model is implementable with any of them pretty much, with the possible exception of Service broker, which doesn't integrate well with code (see my answer here: https://stackoverflow.com/a/45690344/569662).
If you're using Windows and unlikely to need to migrate, I would go for MSMQ (which supports durable messaging via transactional queues) as it's lightweight and part of Windows.

Real-time synchronization of database data across all the clients

What's the best strategy to keep all the clients of a database server synchronized?
The scenario involves a database server and a dynamic number of clients that connect to it, viewing and modifying the data.
I need real-time synchronization of the data across all the clients - if data is added, deleted, or updated, I want all the clients to see the changes in real-time without putting too much strain on the database engine by continuous polling for changes in tables with a couple of million rows.
Now I am using a Firebird database server, but I'm willing to adopt the best technology for the job, so I want to know if there is any kind of already existing framework for this kind of scenario, what database engine does it use and what does it involve?
Firebird has a feature called EVENT that you may be able to use to notify clients of changes to the database. The idea is that when data in a table is changed, a trigger posts an event. Firebird takes care of notifying all clients who have registered an interest in the event by name. Once notified, each client is responsible for refreshing its own data by querying the database.
The client can't get info from the event about the new or old values. This is by design, because there's no way to resolve this with transaction isolation. Nor can your client register for events using wildcards. So you have to design your server-to-client notification pretty broadly, and let the client update to see what exactly changed.
See http://www.firebirdsql.org/doc/whitepapers/events_paper.pdf
You don't mention what client platform or language you're using, so I can't advise on the specific API you would use. I suggest you google for instance "firebird event java" or "firebird event php" or similar, based on the language you're using.
Since you say in a comment that you're using WPF, here's a link to a code sample of some .NET application code registering for notification of an event:
http://www.firebirdsql.org/index.php?op=devel&sub=netprovider&id=examples#3
Re your comment: Yes, the Firebird event mechanism is limited in its ability to carry information. This is necessary because any information it might carry could be canceled or rolled back. For instance if a trigger posts an event but then the operation that spawned the trigger violates a constraint, canceling the operation but not the event. So events can only be a kind of "hint" that something of interest may have happened. The other clients need to refresh their data at that time, but they aren't told what to look for. This is at least better than polling.
So you're basically describing a publish/subscribe mechanism -- a message queue. I'm not sure I'd use an RDBMS to implement a message queue. It can be done, but you're basically reinventing the wheel.
Here are a few message queue products that are well-regarded:
Microsoft MSMQ (seems to be part of Windows Professional and Server editions)
RabbitMQ (free open-source)
Apache ActiveMQ (free open-source)
IBM WebSphere MQ (probably overkill in your case)
This means that when one client modifies data in a way that others may need to know about, that client also has to post a message to the message queue. When consumer clients see the message they're interested in, they know to refresh their copy of some data.
SQL Server 2005 and higher support notification based data source caching expiry.

Good Strategy for Message Queuing?

I'm currently designing an application which I will ultimately want to move to Windows Azure. In the short term, however, it will be running on a server which I will host myself.
The application involves a number of separate web applications - some of these are essentially WCF services which receive data, and some are sites for users to manage data. In addition, there will need to be a worker service running in the background which will process data in various ways.
I'm very keen to use a decoupled architecture for this. Ideally I'm wanting the components (i.e. web apps and worker service) to know as little as possible about each other. It seems like using a message queue will be the best solution here - the web apps can enqueue messages with work units into the queue and the worker service can pick them out and process them as needed.
However, I want to work out a good set of technologies for doing this, bearing in mind that I'll ultimately be moving to Azure and want to minimise the amount of re-work I'll need to do when I migrate to the cloud. Azure has a Queue component built in which looks ideal for my needs. What I'd like to do is create something myself which will mimic this as closely as possible.
It looks like there are several options (I'm using .NET on Windows, with a SQL Server 2005 back end) - the ones I've found so far are:
MSMQ
SQL Server service broker
Rolling my own using a database table and some stored procs
I was wondering if anyone has any suggestions for this - or if anyone has done anything similar and has advice on things to do/to avoid. I realise that every situation is different, but in this case I think my queuing requirements are pretty generic so I'd love to hear anyone else's thoughts about the best way to do this.
Thanks in advance,
John
If you have Azure in mind, perhaps you should start straight on Azure as the APIs and semnatics are significantly different between Azure queues and any of MSMQ or SSB.
A quick 3048 meters comparison of MSMQ vs. SSB (I'll leave a custom table-as-queue out of comparison as it really depends how you implement it...)
Deployment: MSMQ is a Windows component, SSB is a SQL compoenent. SSB requires a SQL instance to store any message, so disconencted clients need access to an instance (can be Express). MSMQ requires deployment of MSMQ on the client (part of OS, but optional install).
Programmability: MSMQ offers a fully fledged, supported, WCF channel. SSB offers only an experimental WCF channel at http://ssbwcf.codeplex.com
Performance: SSB will be significantly faster than MSMQ in transacted mode. MSMQ will be faster if let operate in untransacted mode (best effort, unordered, delivery)
Queriability: SSB queues can be SELECTE-ed uppon (view any message, full SQL JOIN/WHERE/ORDER/GROUP power), MSMQ queues can be peeked (only next message)
Recoverability: SSB queues are integrated in the database so they are backed up and restored with the database, keeping a consitent state with the application state. MSMQ queues are backed up in the NT file backup subsytem, so to keep the backup in sync (coherent) the queue and database have to be suspended.
Transactions (since every enqueue/dequeue is always accompanied by a database update): SSB is fully integrated in SQL so dequeueing and enqueueing are local transaction operations. MSMQ is a separate TM (Transaction Manager) so queue/dequeue has to be a Distributed Transaction operation to enroll both SQL and MSMQ in the transaction.
Management and Monitoring: both equaly bad. No tools whatsoever.
Correlated Messages processing: SSB can block processing of correlated message by concurent threads via built-in Conversation Group Locking.
Event Driven: SSB has Activation to launch stored procedures, MSMQ uses Windows Activation service. Similar. SSB though has self load balancing capalities due to the way WAITFOR(RECEIVE) and MAX_QUEUE_READERS interact.
Availability: SSB piggybacks on the SQL Server High Availability story, it can work either in a clustered or in database miroring environment. MSMQ rides the Windows clustering story only. Database Mirroring is much cheaper than clustering as a HA solution.
In addition I'd add that SSB and MSMQ differ significantly at the level ofthe primitive they offer: SSB primitive is a conversation, while MSMQ primitive is a message. Think TCP vs. UDP semantics.
Pick a queue back end that works for you, or that is better suited to your environment. #Remus has given a great comparison between MSMQ and SSB. MSMQ is going to be the easier one to implement, but has some notable limitations, while SSB is going to feel very heavy as its at the other end of the spectrum.
Have It Your Way
To minimize the rework from you applications, abstract the queues access behind an interface, and then provide an implementation for the queue transport you ultimately decide to go with. When its time to move to Azure, or another queue transport, you just provide a new implementation of your interface.
You get to control the semantics of how you want to interact with the queue to give a consistent usable API from your applications.
A rough idea might be:
interface IQueuedTransport
{
void SendMessage(XmlDocument);
XmlDocument ReceiveMessage();
}
public class MSMQTransport : IQueuedTransport {}
public class AzureQueueTransport : IQueuedTransport {}
You may not be building the be-all queuing transport, just what meets your needs. If you work with Xml, pass xml. If you work in byte arrays, pass byte arrays. :)
Good luck!
Z
Use Win32 Mailslots. They will be reliable on a single server, are easy to implement, and do not require any extra software.

Resources