I have a simple server written in C. It's main purpose is to communicate with some business partners over a proprietary protocol. For that reason and a few others, it must be written in C. I have a number of other processes, however, written in other languages (e.g. Python) that must communicate with the server (locally, on the same Linux server).
What are the best options for cross-language IPC in this scenario? Specifically, I think I have a handle on transport technologies: Unix domain sockets, named pipes, shared memory, ZeroMQ (Crossroads). I'm more interested in the best way to implement the protocol, in order to keep the C code small and maintainable, while still allowing communication from other languages.
Edit: there seems to be some confusion. I'm not interested in discussion of pros/cons of domain sockets, shared memory et. al. I am interested in msgpack (thanks unwind), and other technologies/approaches for implementing the wire protocol.
It's hard to optimize (=select the "best") when the requirements are unknown. You do state that your goal is to keep the C code "small and maintainable", which seems to imply that you should look for a library. Perhaps msgpack over a local socket?
Also, your basic premise that the server must be written in C because you have a proprietary protocol seems ... weird, at least.
Edit: What you need is a "serialization framework", i.e. something can turn a memory structure into a byte stream. The best candidates are:
Protocol Buffers
MessagePack
JSON
Pros/cons:
Protocol Buffers
+ Fast
+ Easy to version (which you'll start to love very much when you need to make a change to your message format for the first time and which you will curse to hell before that)
- Solves many problems which you don't know about, yet. That makes the API a bit "strange". I assure you, there are very good reasons for what and how they do it but you will feel confused sometimes.
I don't know much about MessagePack.
Lastly:
JSON
+ Any language out there can read and write JSON data.
+ Human readable without tools
- somewhat slow
- the format is very flexible but if you need to make big changes, you need to find a strategy to figure out what format (= which fields) a message has when you read it.
As for the transport layer:
Pros/cons:
Shared memory
+ Fastest option
- You need a second channel (like a semaphore) to tell the other process that the data is now ready
- gets really ugly when you try to connect more then two processes
- OS specific
Named pipes
+ Very easy to set up
+ Fairly fast
- Only allows two processes to talk ... or rather one process to talk to another in a single direction. If you need bi-directional communication, you need several pipes
Sockets
+ Pretty easy to set up
+ Available for all and any languages
+ Allows remote access (not all processes need to be on the same machine)
+ Two-way communication with one server and several processes
- Slower than shmem and pipes
ZeroMQ
+ Like sockets but better
+ Modern API (not that old IPC/socket junk)
+ Support for many languages...
- ...but not all
If you can I'd suggest to try ZeroMQ because it's a modern framework that solves many of the problems that you'll encounter with the older technologies.
If that fails, I'd try sockets next. They are easy, well supported and docile.
Related
Since i read a lot text and code about socket programming i decided to go like that:
TCP Server:
Socket multiplexing
asynchronious I/O
I want to be able to handle 800-1200 client connections at the same time. How do i handle client buffers? Every single example i read worked with just one solely buffer. Why dont people use something like:
typedef struct my_socket_tag {
socket sock;
char* buffer;
} client_data;
Now I am able to give the buffer away from a receiver-thread to a dispatch-request-thread and the receiving could go on on another socket while the first client specific buffer is processed.
Is that common practice? Am I missing the point?
Please give some hints, how to improve my question next time, thank you!
The examples are usually oversimplified. Scalability is a serious issue, and I suggest it would be better to begin with simpler applications; handling a thousand of client connections is possible but in most applications it will require quite a careful development. Socket programming may get tricky.
There are different kinds of server applications; there is no single approach that would fit all tasks perfectly. There are lots of details to consider (is it a stream or datagram oriented service? are the connections, if any, persistent? does it involve lots of small data transfers, or few huge transfers, or lots of huge transfers? Et cetera, et cetera). This is why you are not likely to see any common examples in books.
If you choose threading approach, be careful not to create too many threads; one thread per client is usually (but not always) a bad choice. In some cases, you can even handle everything in a single thread (using async IO) without sacrificing any performance.
Having said that, I would recommend learning C++ and boost asio (or a similar framework). It takes care of many scalability-related problems, so there's no point in reinventing the wheel.
You may study the Architecture of Open Source Applications book (freely available). There are quite a few relevant examples that you may find useful.
I have been asked this question in some recent interviews,Whats the advantages and disadvantages of using Socket in IPC when there are other ways to perform IPC.Have not found exact answer .
Any help would be much appreciated.
Compared to pipes, IPC sockets differ by being bidirectional, that is, reads and writes can be done on the same descriptor. Pipes, unlike sockets, are unidirectional. You have to keep a pair of descriptors if you want to do both reads and writes.
Pipes, on the other hand, guarantee atomicity when reading or writing under a certain amount of bytes. Writing something less than PIPE_BUF bytes at once is guaranteed to be delivered in one chunk and never observed partial. Sockets do require more care from the programmer in that respect.
Shared memory, when used for IPC, requires explicit synchronisation from the programmer. It may be the most efficient and most flexible mechanism, but that comes at an increased complexity cost.
Another point in favour of sockets: an app using sockets can be easily distributed - ie. it can be run on one host or spread across several hosts with little effort. This depends of course on the nature of the app.
Perhaps this is too simplified an answer, yet it is an important detail. Sockets are not supported on all OS's. Recently, I have been aware of a project that used sockets for IPC all over the place only to find that they were forced to change from Linux to a proprietary OS which was POSIX, but did not support sockets the same way as Linux.
Sockets allow you a few benefits...
You can connect a simple client to them for testing (manually enter data, see the response).
This is very useful for debugging, simulating and blackbox testing.
You can run the processes on different machines. This can be useful for scalability and is very helpful in debugging / testing if you work in embedded software.
It becomes very easy to expose your process as a service
But there are drawbacks as well
Overhead is greater than IPC optimized for a single machine. Shared memory in particular is better if you need the performance, and you know your processes are all on the same machine.
Security - if your client apps can connect so can anyone else, if you're not careful about authentication. Data can also be sniffed if you're not encrypting, and modified if you're not at least signing data sent over the wire.
Using a true message queue tends to leave you with fixed sized messages. If you have a large number of messages of wildly varying sizes this can become a performance problem. Using a socket can be a way around this, though you're then left trying to wrap this functionality to become identical to a queue, which is tricky to get the detail right on, particularly aspects like blocking/non-blocking and atomicity.
Shared memory is quick but requires management (you end up writing a version of malloc to manage the SHM) plus you have to synchronise and lock it in some way. Though you can use libraries to help with this the availability depends on your environment and language.
Queues are easy but have the downsides listed as pros to my socket discussion.
Pipes have been covered by Blagovests answer to this question.
As is ever the case with this kind of stuff I would suggest reading the W. Richard Stevens books on IPC and sockets. There is no better explanation than his! :-)
I am writing my first single-threaded, single-process server in C using kqueue() / epoll() to handle asynchronous event dispatch. As one would expect, it is quite a lot harder to follow the flow of control than in a blocking server.
Is there a common pattern (maybe even with a name) that people use to avoid the callback-driven protocol implementation becoming a gigantic tangled hairball?
Alternately, are there any nonblocking servers written in C for which the source code is a pleasure to read?
Any input would be much appreciated!
More thoughts:
A lot of the cruft seems to come from the need to deal with the buffering of IO. There is no necessary correspondence between a buffer filling/draining and a single state transition. A buffer fill/drain might correspond to [0, N] state transitions.)
I've looked at libev (docs here) and it looks like a great tool, and libevent, which looks less exciting but still useful, but neither of them really answers the question: how do I manage the flow of control in a way that isn't horrendously opaque.
You can try using something like State Threads or GNU Portable Threads, which allows you to write as though you were using a single thread per connection (The implementation uses Fibers).
Alternately, you can build your protocol implementation using a state machine generator (such as Ragel).
I'm currently writing an HTTP server in C so that I'll learn about C, network programming and HTTP. I've implemented most of the simple stuff, but I'm only handling one connection at a time. Currently, I'm thinking about how to efficiently add multitasking to my project. Here are some of the options I thought about:
Use one thread per connection. Simple but can't handle many connections.
Use non-blocking API calls only and handle everything in one thread. Sounds interesting but using select()s and such excessively is said to be quite slow.
Some other multithreading model, e.g. something complex like lighttpd uses. (Probably) the best solution, but (probably) too difficult to implement.
Any thoughts on this?
There is no single best model for writing multi-tasked network servers. Different platforms have different solutions for high performance (I/O completion ports, epoll, kqueues). Be careful about going for maximum portability: some features are mimicked on other platforms (i.e. select() is available on Windows) and yield very poor performance because they are simply mapped onto some other native model.
Also, there are other models not covered in your list. In particular, the classic UNIX "pre-fork" model.
In all cases, use any form of asynchronous I/O when available. If it isn't, look into non-blocking synchronous I/O. Design your HTTP library around asynchronous streaming of data, but keep the I/O bit out of it. This is much harder than it sounds. It usually implies writing state machines for your protocol interpreter.
That last bit is most important because it will allow you to experiment with different representations. It might even allow you to write a compact core for each platform local, high-performance tools and swap this core from one platform to the other.
Yea, do the one that's interesting to you. When you're done with it, if you're not utterly sick of the project, benchmark it, profile it, and try one of the other techniques. Or, even more interesting, abandon the work, take the learnings, and move on to something completely different.
You could use an event loop as in node.js:
Source code of node (c, c++, javascript)
https://github.com/joyent/node
Ryan Dahl (the creator of node) outlines the reasoning behind the design of node.js, non-blocking io and the event loop as an alternative to multithreading in a webserver.
http://www.yuiblog.com/blog/2010/05/20/video-dahl/
Douglas Crockford discusses the event loop in Scene 6: Loopage (Friday, August 27, 2010)
http://www.yuiblog.com/blog/2010/08/30/yui-theater-douglas-crockford-crockford-on-javascript-scene-6-loopage-52-min/
An index of Douglas Crockford's above talk (if further background information is needed). Doesn't really apply to your question though.
http://yuiblog.com/crockford/
Look at your platforms most efficient socket polling model - epoll (linux), kqueue (freebsd), WSAEventSelect (Windows). Perhaps combine with a thread pool, handle N connections per thread. You could always start with select then replace with a more efficient model once it works.
A simple solution might be having multiple processes: have one process accept connections, and as soon as the connection is established fork and handle the connection in that child process.
An interesting variant of this technique is used by SER/OpenSER/Kamailio SIP proxy: there's one main process that accepts the connections and multiple child worker processes, connected via pipes. The parent sends the new filedescriptor through the socket. See this book excerpt at 17.4.2. Passing File Descriptors over UNIX Domain Sockets. The OpenSER/Kamailio SIP proxies are used for heavy-duty SIP processing where performance is a huge issue and they do very well with this technique (plus shared memory for information sharing). Multi-threading is probably easier to implement, though.
I am developing some experimental setup in C.
I am exploring a scenario as follows and I need help to understand it.
I have a system A which has a lot of Applications using cryptographic algorithms.
But these crypto calls(openssl calls) should be sent to another System B which takes care of cryptography.
Therefore, I have to send any calls to cryptographic (openssl) engines via socket to a remote system(B) which has openssl support.
My plan is to have a small socket prog on System A which forwards these calls to system B.
What I'm still unclear at this moment is how I handle the received commands at System B.
Do I actually get these commands and translate them into corresponding calls to openssl locally in my system? This means I have to program whatever is done on System A right?
Or is there a way to tunnel/send these raw lines of code to the openssl libs directly and just received the result and then resend to System A
How do you think I should go about the problem?
PS: Oh by the way, the calls to cryptography(like EngineUpdate, VerifyFinal etc or Digest on System A can be either on Java or C.. I already wrote a Java/C program to send these commands to System B via sockets...
The problem is only on System B and how I have to handle..
You could use sockets on B, but that means you need to define a protocol for that. Or you use RPC (remote procedure calls).
Examples for socket programming can be found here.
RPC is explained here.
The easiest (not to say "the easy", but still) way I can imagine would be to:
Write wrapper (proxy) versions of the libraries you want to make remote.
Write a server program that listens to calls, performs them using the real local libraries, and sends the result back.
Preload the proxy library before running any application where you want to do this.
Of course, there are many many problems with this approach:
It's not exactly trivial to define a serializing protocol for generic C function calls.
It's not exactly trivial to write the server, either.
Applications will slow a lot, since the proxy call needs to be synchronous.
What about security of the data on the network?
UPDATE:
As requested in a comment, I'll try to expand a bit. By "wrapper" I mean a new library, that has the same API as another one, but does not in fact contain the same code. Instead, the wrapper library will contain code to serialize the arguments, call the server, wait for a response, de-serialize the result(s), and present them to the calling program as if nothing happened.
Since this involves a lot of tedious, repetitive and error-prone code, it's probably best to abstract it by making it code-driven. The best would be to use the original library's header file to define the serialization needed, but that (of course) requires quite heavy C parsing. Failing that, you might start bottom-up and make a custom language to describe the calls, and then use that to generate the serialization, de-serialization, and proxy code.
On Linux systems, you can control the dynamic linker so that it loads your proxy library instead of the "real" library. You could of course also replace (on disk) the real library with the proxy, but that will break all applications that use it if the server is not working, which seems very risky.
So you basically have two choices, each outlined by unwind and ammoQ respectively:
(1) Write a server and do the socket/protocol work etc., yourself. You can minimize some of the pain by using solutions like Google's protocol buffers.
(2) use an existing middleware solution like (a) message queues or (b) an RPC mechanism like CORBA and its many alternatives
Either is probably more work than you anticipated. So really you have to answer this yourself. How serious is your project? How varied is your hardware? How likely is the hardware and software configuration to change in the future?
If this is more than a learning or pet project you are going to be bored with in a month or two then an existing middleware solution is probably the way to go. The downside is there is a somewhat intimidating learning curve.
You can go the RPC route with CORBA, ICE, or whatever the Java solutions are these days (RMI? EJB?), and a bunch of others. This is an elegant solution since your calls to the remote encryption machine appear to your SystemA as simple function calls and the middleware handles the data issues and sockets. But you aren't going to learn them in a weekend.
Personally I would look to see if a message queue solution like AMQP would work for you first. There is less of a learning curve than RPC.