Epoll vs Libevent for Bittorrent like application - c

I am implementing bit torrent for P2p file sharing. Let's say, Maximum of among 100 peers sharing simultaneously. TCP Connections are setup between each peer to every other peer. Initially, One peer has whole file and it starts sharing pieces and subsequently, all peers share their pieces.
Typically, piece size is 50kB - 1MB. I am wondering, What is the best approach to write such application in C. Using threads with epoll or libevent??
Can anybody please give positives/negatives of different possible approaches??

If we're only talking about 100 peer connections at any given moment, the traditional approach of using select or poll on a group of TCP sockets will work out just fine.
EPoll helps solve the problem of when you need to scale to thousands of long running connections. Read the doc on the C10K problem for more details.
I've heard good things about libevent. I believe it's an abstraction on top of epoll and other socket functions that provides a few nice things. If it makes your programming easier, then by all means use it. But you probably don't need it for performance.

Libevent is essentially a wrapper around epoll, mostly considered for writing good portable code. Since its a wrapper, drawbacks of epoll will be retained and does not add much from performance perspective. If portability is not concern, epoll should just work fine. Even better, if the volume is considerably less than one still use poll.

Related

handle client buffer in tcp server

Since i read a lot text and code about socket programming i decided to go like that:
TCP Server:
Socket multiplexing
asynchronious I/O
I want to be able to handle 800-1200 client connections at the same time. How do i handle client buffers? Every single example i read worked with just one solely buffer. Why dont people use something like:
typedef struct my_socket_tag {
socket sock;
char* buffer;
} client_data;
Now I am able to give the buffer away from a receiver-thread to a dispatch-request-thread and the receiving could go on on another socket while the first client specific buffer is processed.
Is that common practice? Am I missing the point?
Please give some hints, how to improve my question next time, thank you!
The examples are usually oversimplified. Scalability is a serious issue, and I suggest it would be better to begin with simpler applications; handling a thousand of client connections is possible but in most applications it will require quite a careful development. Socket programming may get tricky.
There are different kinds of server applications; there is no single approach that would fit all tasks perfectly. There are lots of details to consider (is it a stream or datagram oriented service? are the connections, if any, persistent? does it involve lots of small data transfers, or few huge transfers, or lots of huge transfers? Et cetera, et cetera). This is why you are not likely to see any common examples in books.
If you choose threading approach, be careful not to create too many threads; one thread per client is usually (but not always) a bad choice. In some cases, you can even handle everything in a single thread (using async IO) without sacrificing any performance.
Having said that, I would recommend learning C++ and boost asio (or a similar framework). It takes care of many scalability-related problems, so there's no point in reinventing the wheel.
You may study the Architecture of Open Source Applications book (freely available). There are quite a few relevant examples that you may find useful.

Whats the advantages and disadvantages of using Socket in IPC

I have been asked this question in some recent interviews,Whats the advantages and disadvantages of using Socket in IPC when there are other ways to perform IPC.Have not found exact answer .
Any help would be much appreciated.
Compared to pipes, IPC sockets differ by being bidirectional, that is, reads and writes can be done on the same descriptor. Pipes, unlike sockets, are unidirectional. You have to keep a pair of descriptors if you want to do both reads and writes.
Pipes, on the other hand, guarantee atomicity when reading or writing under a certain amount of bytes. Writing something less than PIPE_BUF bytes at once is guaranteed to be delivered in one chunk and never observed partial. Sockets do require more care from the programmer in that respect.
Shared memory, when used for IPC, requires explicit synchronisation from the programmer. It may be the most efficient and most flexible mechanism, but that comes at an increased complexity cost.
Another point in favour of sockets: an app using sockets can be easily distributed - ie. it can be run on one host or spread across several hosts with little effort. This depends of course on the nature of the app.
Perhaps this is too simplified an answer, yet it is an important detail. Sockets are not supported on all OS's. Recently, I have been aware of a project that used sockets for IPC all over the place only to find that they were forced to change from Linux to a proprietary OS which was POSIX, but did not support sockets the same way as Linux.
Sockets allow you a few benefits...
You can connect a simple client to them for testing (manually enter data, see the response).
This is very useful for debugging, simulating and blackbox testing.
You can run the processes on different machines. This can be useful for scalability and is very helpful in debugging / testing if you work in embedded software.
It becomes very easy to expose your process as a service
But there are drawbacks as well
Overhead is greater than IPC optimized for a single machine. Shared memory in particular is better if you need the performance, and you know your processes are all on the same machine.
Security - if your client apps can connect so can anyone else, if you're not careful about authentication. Data can also be sniffed if you're not encrypting, and modified if you're not at least signing data sent over the wire.
Using a true message queue tends to leave you with fixed sized messages. If you have a large number of messages of wildly varying sizes this can become a performance problem. Using a socket can be a way around this, though you're then left trying to wrap this functionality to become identical to a queue, which is tricky to get the detail right on, particularly aspects like blocking/non-blocking and atomicity.
Shared memory is quick but requires management (you end up writing a version of malloc to manage the SHM) plus you have to synchronise and lock it in some way. Though you can use libraries to help with this the availability depends on your environment and language.
Queues are easy but have the downsides listed as pros to my socket discussion.
Pipes have been covered by Blagovests answer to this question.
As is ever the case with this kind of stuff I would suggest reading the W. Richard Stevens books on IPC and sockets. There is no better explanation than his! :-)

raw socket bypassing tcp/ip headers

I have a 2 programs that are communicating via sockets on the same computer.
Currently 1.6 million bytes is taking about 7 seconds to transfer using TCP/IP.
I need to make it fast.
If I use a raw socket instead, and ignore the TCP/IP headers, then this should increase the speed? Is there anything else I can do to increase speed? Is the SOCKET_RAW option a straight copy or does it do anything else?
1.6MB shouldn't take 7 seconds using "normal" TCP/IP - certainly not on the same machine! That suggests you've got inefficient code somewhere. I'd address that before trying to do anything "special" in terms of the networking.
EDIT: I've just written a short C# program on a netbook, and that transfers 2MB (generating random data as it goes) in 279ms. That's with no optimization. Unless you're running on a machine from the 1980s, you should definitely be getting better performance than that...
Try using Unix Domain Sockets instead.
To get that poor of performance, you are doing something very inefficient. Perhaps the i/o operations are single byte?
Changing to raw sockets is a bad idea. To get reliable communication, you'd then have to add some sort of data checking, sequencing, etc., etc.: everything that TCP does for reliability.
If the purpose is to transfer data from one process to another on the same machine, use shared memory and a mutex to synchronize access. Of course this is not a good solution if the programs will eventually have to run on separate machines.
No, using raw IP sockets is definitely not a good idea. Using a unix-domain socket might be marginally more efficient, but I doubt it's going to solve your problem. You clearly have another problem. Perhaps it is your application-level protocol which is inefficient?

Best approach to non blocking server/listening socket in a multi-thread application on Windows?

I'm writing a TCP server/client application on Windows, to become familiar with the Winsock API. I come from an UNIX background and would like to know which of these could be the best approach to implement the application:
First the specification
Must scale well on multiprocessor and single-processor systems.
No hardset limit of connections.
Application can both listen for connections, acting as server, and act as client.
Multi threaded.
First approach:
Non-blocking select-like socket for listening, in the 'server' thread.
for each client connecting we spawn a separate thread.
Second approach:
Blocking socket for listening, in the 'server' thread.
for each client connecting we spawn a separate thread.
Third approach:
Non-blocking select-like socket for listening, in the 'server' thread.
No separate thread for each incoming connection, the protocol would need state information kept across sessions I suppose.
I wonder what is the most efficient and scalable approach, and especially if it can work with a UDP socket too.
Note: I'm writing the application in plain and old C. No .NET nor C++ involved, C++ exceptions disabled too.
As Gary says, I/O Completion Ports are the most efficient way to manage multiple network connections in a non-blocking/async manner on Windows platforms.
With IOCP you get notified when your networking operations complete and you can process these completions with a small number of threads. You get to decide how many threads you allocate to process the completions and the kernel decides when to use the threads that you're providing. It uses them in a LIFO order, to reduce context switching, so that if you are only using the minimal number of threads required at any point and you're reusing the same threads rather than cycling through all of the threads that you have available for use.
The asynchronous nature of IOCP programming can be a little confusing to start with, but once you get the hang of it it's fairly straight forward.
I have some free IOCP server code which demonstrates the basics and provides some example servers that are pretty easy to build on. You can find the code here: http://www.serverframework.com/products---the-free-framework.html. That page also links to some articles that I wrote to explain the code.
Relating this to the detail of your question. You should be looking at a variation on your third approach. Use AcceptEx() to accept new connections, this can be used in an asynchronous manner and so you don't need a separate thread for connection acceptance and can use the threads that are also processing your overlapped/async read and write operations.
I've written an asynchronous client which does not use blocking sockets, so if you're interested in that approach, then take a look at my client: http://codesprout.blogspot.com/2011/04/asynchronous-http-client.html
It's an HTTP client, but I've shown very little HTTP protocol processing in there, it's all just .NET sockets. The server would work in a similar way: you can take advantage of the *Async methods such as AsseptAsync.
Under Windows, the best performances are achieved by using I/O completion calls.
This is because the lists and queuing mechanism is done in the kernel, far from the heavy user-mode overhead (which drags your code down if you dare to do the hard work yourself).
Unfortunately, Windows I/O completion calls need to allocate many threads to scale and this is quickly killing the performances (as compared to Linux epoll which can scale independently of the number of worker threads you decide to involve in the task).
Recently, I discovered http://gwan.com/ a Web server which came from Windows and was then ported under Linux. And their authors describe the problem in details on their forum.

C HTTP server - multithreading model?

I'm currently writing an HTTP server in C so that I'll learn about C, network programming and HTTP. I've implemented most of the simple stuff, but I'm only handling one connection at a time. Currently, I'm thinking about how to efficiently add multitasking to my project. Here are some of the options I thought about:
Use one thread per connection. Simple but can't handle many connections.
Use non-blocking API calls only and handle everything in one thread. Sounds interesting but using select()s and such excessively is said to be quite slow.
Some other multithreading model, e.g. something complex like lighttpd uses. (Probably) the best solution, but (probably) too difficult to implement.
Any thoughts on this?
There is no single best model for writing multi-tasked network servers. Different platforms have different solutions for high performance (I/O completion ports, epoll, kqueues). Be careful about going for maximum portability: some features are mimicked on other platforms (i.e. select() is available on Windows) and yield very poor performance because they are simply mapped onto some other native model.
Also, there are other models not covered in your list. In particular, the classic UNIX "pre-fork" model.
In all cases, use any form of asynchronous I/O when available. If it isn't, look into non-blocking synchronous I/O. Design your HTTP library around asynchronous streaming of data, but keep the I/O bit out of it. This is much harder than it sounds. It usually implies writing state machines for your protocol interpreter.
That last bit is most important because it will allow you to experiment with different representations. It might even allow you to write a compact core for each platform local, high-performance tools and swap this core from one platform to the other.
Yea, do the one that's interesting to you. When you're done with it, if you're not utterly sick of the project, benchmark it, profile it, and try one of the other techniques. Or, even more interesting, abandon the work, take the learnings, and move on to something completely different.
You could use an event loop as in node.js:
Source code of node (c, c++, javascript)
https://github.com/joyent/node
Ryan Dahl (the creator of node) outlines the reasoning behind the design of node.js, non-blocking io and the event loop as an alternative to multithreading in a webserver.
http://www.yuiblog.com/blog/2010/05/20/video-dahl/
Douglas Crockford discusses the event loop in Scene 6: Loopage (Friday, August 27, 2010)
http://www.yuiblog.com/blog/2010/08/30/yui-theater-douglas-crockford-crockford-on-javascript-scene-6-loopage-52-min/
An index of Douglas Crockford's above talk (if further background information is needed). Doesn't really apply to your question though.
http://yuiblog.com/crockford/
Look at your platforms most efficient socket polling model - epoll (linux), kqueue (freebsd), WSAEventSelect (Windows). Perhaps combine with a thread pool, handle N connections per thread. You could always start with select then replace with a more efficient model once it works.
A simple solution might be having multiple processes: have one process accept connections, and as soon as the connection is established fork and handle the connection in that child process.
An interesting variant of this technique is used by SER/OpenSER/Kamailio SIP proxy: there's one main process that accepts the connections and multiple child worker processes, connected via pipes. The parent sends the new filedescriptor through the socket. See this book excerpt at 17.4.2. Passing File Descriptors over UNIX Domain Sockets. The OpenSER/Kamailio SIP proxies are used for heavy-duty SIP processing where performance is a huge issue and they do very well with this technique (plus shared memory for information sharing). Multi-threading is probably easier to implement, though.

Resources