Messaging Middleware Vs RPC and Distributed Databases - database

I would like to know your opinions on advantages and disadvantages of using
Messaging Middleware vs. RPC and Distributed Databases in a distributed application?

These three are completely different things:
Message Oriented Middleware (MOM): A subsystem providing (arbitrary) message delivery services between interested systems. Usually providing the ability to change messages' content, route them, log them, guarantee the delivery, etc.
Remote Procedure Call (RPC): A rather generic term denoting a method of invoking a procedure / method / service residing in a remote process.
Distributed database: seems quite self-explanatory to me, refer to wikipedia.
Hence it's hard to tell specific (dis)advantages not knowing the actual distributed application better. You could be comparing RPC and MOM. In that case MOM usually is a complete message delivery solution, while RPC is just a technical mean of inter-process communication.

Related

What APIs to use for program working across computers

I want to try programming something which can do things across multiple end points, so that when things occur on one computer events can occur on others. Obviously the problem here is in sending commands to the other end points.
I'm just not sure of what program I would use to do this with, I'm guessing it would have to use an API which uses some kind of client server model. I expect there are things that people use to do this with but I don't know what they are called.
How would I go about doing this? Are there common APIs which allow people to do this?
There are (at least) two types to distinguish between: RPC APIs and Message Queues (MQ)
An RPC-style API can be imaged like an remotely callable interface, it typically gives you one response per request. Apache Thrift1) is one of the frameworks designed for this purpose: An easy to use cross-platform, cross-language RPC framework. (And oh yes, it also supports Erlang, just in case ...). There are a few others around, like Googles protocol buffers, Apache Avro, and a few more.
Message Queuing systems are more suitable in cases where a more loosely coupling is desired or acceptable. In contrast to an RPC-style framework and API, a messaging queue decouples request and response a bit more. For example, a MQ system is more suitable for distributing work to multiple handlers, or distributing one event to multiple recipients via producer/consumer or publish/subscribe patterns. A typical candidate could be MSMQ, ApacheMQ or RabbitMQ
Although with RPC this can be achieved as well, it is much more complicated and involves more work as you are operating on a somewhat lower abstraction level. RPCs shine when you need more the request/response style and value performance higher than the comfort of an MQ.
On top of MQ systems there are more sophisticated Service Bus systems, like for example NServiceBus. A service bus operates on an even higher level of abstraction. They also have their pro's and con's, but can be helpful too. At the end, it depends on your use case.
1) Disclaimer: I am actively involved in that project.
Without more information, I can just suggest you to look at Erlang. It is probably the easiest language to learn distributed systems, since sending messages is built into the language and it is irrelevant for the language and command itself, if the message is sent within the same PC, LAN or through the internet to a different machine.

Observer pattern in Oracle

Can I set hook on changing or adding some rows in table and get notified someway when such event araised? I discover web and only stuck with pipes. But there is no way to get pipe message immediately when it was send. Only periodical tries to receive.
Implementing an Observer pattern from a database should generally be avoided.
Why? It relies on vendor proprietary (non-standard) technology, promotes database vendor lock-in and support risk, and causes a bit of bloat. From an enterprise perspective, if not done in a controlled way, it can look like "skunkworks" - implementing in an unusual way behaviour commonly covered by application and integration patterns and tools. If implemented at a fine-grained level, it can result in tight-coupling to tiny data changes with a huge amount of unpredicted communication and processing, affecting performance. An extra cog in the machine can be an extra breaking point - it might be sensitive to O/S, network, and security configuration or there may be security vulnerabilities in vendor technology.
If you're observing transactional data managed by your app:
implement the Observer pattern in your app. E.g. In Java, CDI and javabeans specs support this directly, and OO custom design as per Gang Of Four book is a perfect solution.
optionally send messages to other apps. Filters/interceptors, MDB messages, CDI events and web services are also useful for notification.
If users are directly modifying master data within the database, then either:
provide a singular admin page within your app to control master data refresh OR
provide a separate master data management app and send messages to dependent apps OR
(best approach) manage master data edits in terms of quality (reviews, testing, etc) and timing (treat same as code change), promote through environments, deploy and refresh data / restart app to a managed shedule
If you're observing transactional data managed by another app (shared database integration) OR you use data-level integration such as ETL to provide your application with data:
try to have data entities written by just one app (read-only by others)
poll staging/ETL control table to understand what/when changes occured OR
use JDBC/ODBC-level proprietary extension for notification or polling, as well mentioned in Alex Poole's answer OR
refactor overlapping data operations from 2 apps into a shared SOA service can either avoid the observation requirement or lift it from a data operation to a higher level SOA/app message
use an ESB or a database adapter to invoke your application for notification or a WS endpoint for bulk data transfer (e.g. Apache Camel, Apache ServiceMix, Mule ESB, Openadaptor)
avoid use of database extension infrastructure such as pipes or advanced queuing
If you use messaging (send or recieve), do so from your application(s). Messaging from the DB is a bit of an antipattern. As a last resort, it is possible to use triggers which invoke web services (http://www.oracle.com/technetwork/developer-tools/jdev/dbcalloutws-howto-084195.html), but great care is required to do this in a very coarse fashion, invoking a business (sub)-process when a set of data changes, rather than crunching fine-grained CRUD type operations. Best to trigger a job and have the job call the web service outside the transaction.
In addition to the other answers, you can look at database change notification. If your application is Java-based there is specific documentation covering JDBC, and similar for .NET here and here; and there's another article here.
You can also look at continuous query notification, which can be used from OCI.
I know link-only answers aren't good but I don't have the experience to write anything up (I have to confess I haven't used either, but I've been meaning to look into DCN for a while now...) and this is too long for a comment *8-)
Within the database itself triggers are what you need. You can run arbitrary PL/SQL when data is inserted, deleted, updated, or any combination thereof.
If you need to have the event propagate outside the database you would need a way to call out to your external application from your PL/SQL trigger. Some possible options are:
DBMS_PIPES - Pipes in Oracle are similar to Unix pipes. One session can write and a separate session can read to transfer information. Also, they are not transactional so you get the message immediately. One drawback is that the API is poll based so I suggest option #2.
Java - PL/SQL can invoke arbitrary Java (assuming you load your class into your database). This opens the door to do just about any type of messaging you'd like including using JMS to push messages to a message queue. Depending on how you implement this you can even have it being transactionally tied the INSERT/UPDATE/DELETE statement itself. The listening application would then just listen to the JMS queue and it wouldn't be tied to the DB publishing the event at all.
Depending on your requirements use triggers or auditing
Look at DBMS_ALERT, DBMS_PIPE or (preferably) AQ (Advanced queuing) it's Oracle's internal messaging system. Oracle's AQ has its own API, but also can treated like Java JMS provider.
There are also techniques like Stream or (XStream) but those are quite complex.

What are the pros and cons of using database for IPC to share data instead of message passing?

To be more specific for my application: the shared data are mostly persistent data such as monitoring status, configurations -- not more than few hundreds of items, and are updated and read frequently but no more than 1 or 2Hz. The processes are local to each other on the same machine.
EDIT 1: more info - Processes are expected to poll on a set of data they are interested in (ie. monitoring) Most of the data are persistent during the lifetime of the program but some (eg. configs) are required to be restored after software restart. Data are updated by the owner only (assume one owner for each data) Number of processes are small too (not more than 10)
Although using a database is notably more reliable and scalable, it always seems to me it is kind of an overkill or too heavy to use when all I do with it is to share data within an application. Whereas message passing with eg. JMS also has the middleware part, but it is more lightweight and has a more natural or flexible communication API. Implementing event notification and command pattern is also easier with messaging I think.
It will be of great help if anyone can give me an example of what circumstance would one be more preferable to the other.
For example I know we can more readily share persistent data between processes using database, although it is also doable with messaging by distributing across processes and/or storing with some XML files.
And according to here, http://en.wikipedia.org/wiki/Database-as-IPC and here, http://tripatlas.com/Database_as_an_IPC. It says it'd be an anti pattern when used in place of message passing, but it does not elaborate on eg. how bad can the performance hit be using database compared to message passing.
I have gone thru several previous posts that asked a similar question but I am hoping to find an answer that'd focus on design justification. But from those questions that I have read so far I can see a lot of people did use database for IPC (or implemented message queue with database)
Thanks!
I once wrote a data acquisition system that ran on about 20 lab computers. All the communication between the machines took place through small tables on a MySQL database. This scheme worked great, and as the years went by and the system expanded everything scaled well and remained very responsive. It was easy to add features and fix bugs. You could debug it easily because all the message passing was easy to tap into.
What the database does for you is that it provides a fast, well debugged way of maintaining concurrency while the network traffic gets very busy. Someone at MySQL spent a lot of time making the network stuff work well under high load, and if you write your own system using tcp/ip sockets you're going to be re-inventing those wheels at great expense of time and effort.
Another advantage is that if you're using the database anyway for actual data storage then you don't have to add anything else to your system. You keep the possible points of failure to a minimum.
So all these people who say IPC through databases is bad aren't always right.
Taking into account that DBMS is a way to store information, and messages are the way to transport information, your decision should be based on the answer to the question: "do I need persistence of data in time or the data is consumed by recipient".

At what level should I implement communication between nodes on a distributed system?

I'm building a web application that from day one will be on the limits of what a single server can handle. So I'm considering to adopt a distributed architecture with several identical nodes. The goal is to provide scalability (add servers to accommodate more users) and fault tolerance. The nodes need to share some state between them, therefore some communication between them is required. I believe I have the following alternatives to implement this communication in Java:
Implement it using sockets and a custom protocol.
Use RMI
Use web services (each node can send and receive/parse HTTP request).
Use JMS
Use another high-level framework like Terracotta or hazelcast
I would like to know how this technologies compare to each other:
When the number of nodes increases
When the amount of communication between the nodes increases (1000s of messages per second and/or messages up to 100KB etc)
On a practical level (eg ease of implementation, available documentation, license issues etc)
I'm also interested to know what technologies are people using in real production projects (as opposed to experimental or academic ones).
Don't forget Jini.
It gives you automatic service discovery, service leasing, and downloadable proxies so that the actual client/server communication protocol is up to you and not enforced by the framework (e.g. you can choose HTTP/RMI/whatever).
The framework is built around acknowledgement of the 8 Fallacies of Distributed Computing and recovery-oriented computing. i.e. you will have network problems, and the architecture is built to help you recover and maintain a service.
If you also use Javaspaces it's trivial to implement workflows and consumer-producer architectures. Producers will write into the Javaspaces, and one or more consumers will take that work from the space (under a transaction) and work with it. So you scale it simply by providing more consumers.

Exchange Data between two apps across PC on LAN

I have a need of implementing two apps that will exchange data with each other. Both apps will be running on separate PCs which are part of a LAN.
How we can do this in Delphi?
Is there any free component which will make it easy to exchange data between apps across PCs?
If I'm writing it myself, I (almost) always use sockets to exchange data between apps.
It's light weight, it works well on the same machine, across the local network or the Internet with no changes and it lets you communicate between apps with different permissions, like services (Windows messages cause problems here).
It might not be a requirements for you, but I'm also a fan of platform independent transports, like TCP/IP.
There are lots of free choices for Delphi. Here are a few that I know of. If you like blocking libraries, look at Indy or Synapse. If you prefer non-blocking, check out ICS.
Before you choose a technique, you should characterize the communication according to its throughput, granularity, latency, and criticality.
Throughput -- how much data per unit time will you need to move? The range of possible values is so wide that the lowest-rate and highest-rate applications have almost nothing in common.
Granularity -- how big are the messages? How much data does the receiving application need before it can use the message?
Latency -- when one aplication sends a message, how soon must the other application see it? How quickly do you want the receiving application to react to the sending application?
Criticality -- how long can a received message be left unattended before it is overrun by a later message? (This is usually not important unless the throughput is high and the message storage is limited.)
Once you have these questions answered, you can begin to ask about the best technology for your particular situation.
-Al.
I used to use Mailslots if I needed to communicate with more than one PC at a time ("broadcast") over a network, although there is the caveat that mailslots are not guaranteed.
For 1-to-1, Named Pipes are a Windows way of doing this sort thing, you basically open a communication channel between 2 PCs and then write messages into the pipe. Not straight forward to start with but very reliable and the recommended way for things like Windows Services.
MS offer Named Pipes as an alternative way of communicating with an SQL Server (other than TCP/IP).
But as Bruce said, TCP/IP is standard and platform independent, and very reliable.
DCOM used to be a good method of interprocess communication. This was also one of Delphis strong points. Today I would strongly advice against using it.
Depending on the nature of your project I'd choose either
using a SQL server
socket communication
Look at solutions that use "Remote Procedure Call" type interfaces. I use RemObjects SDK for this sort of thing, but there are open source versions of RealThinClient which would do just as well.
Both of these allow you to create a connection that for most of your code is "transparent", and you just call an interface which sends the data over the wire and gets results back. You can then program how you usually do, and forget the details of sockets etc.
This is one of those cases where there really isn't a "best" answer as just about any of the technologies already discussed can be used to accurately communicate between two applications. The choice of which method to use will really come down to the critical nature of your communication, as well as how much data must be transfered from one workstation to another.
If your communication is not time sensitive or critical, then a simple poll of a database or file at regular intervals might be sufficient. If your communication is critical and time sensitive then placing a TCPIP server in each client might be worth pursuing. If just time sensitive then mailslots makes a good choice, if critical but not time sensitive then named pipes.
I've used the Indy library's Multicast components (IdIPMCastClient/Server) for this type of thing many times. The apps just send XML to each other. Quick and easy with minimal connection requirements.
Probably the easiest way is to read and write a file (or possibly one file per direction). It also has the advantage that it is easy to simulate and trace. It's not the fastest option, though (and it definitely sounds lame ;-) ).
A possibility could be to "share" objects across the network.
It is possible with a Client-Server ORM like our little mORMot.
This Open Source libraires are working from Delphi 6 up to XE2, and use JSON for transmission. There is some security features included (involving a RESTful authentication mechanism), and can use any database - or no database at all.
See in particular the first four samples provided, and the associated documentation.
For Delphi application integration, a message oriented middleware might be an option. Message brokers offer guaranteed delivery, load balancing, different communication models and they work cross-platform and cross-language. Open source message message brokers include:
Apache ActiveMQ and ActiveMQ Apollo
Open Message Queue (OpenMQ)
HornetQ
RabbitMQ
(Disclaimer - I am the author of Delphi / Free Pascal client libraries for these servers)

Resources