I've gotten ideas for multiple projects recently that all involve reading IP addresses from a file. Since they are all supposed to be able to handle a large amount of hosts, I've attempted to implement multi-threading or creating a pool of sockets and select()-ing from them in order to achieve some form of concurrency for better performance. On multiple occasions, reading from the file seems to be the bottleneck in enhancing performance. The way I understand it, reading from a file with fgets or similar is a synchronous, blocking operation. So even if I successfully implemented a client that connects to multiple hosts asynchronously, the operation would still be synchronous because I can only read one address at a time from a file.
/* partially pseudo code */
/* getaddrinfo() stuff here */
while(fgets(ip, sizeof(ip), file) {
FD_ZERO(&readfds);
/* create n sockets here in a for loop */
for (i = 0; i < socket_num; i++) {
if (newfd > fd[i]) newfd = fd[i];
FD_SET(fd[i], &readfds);
}
/* here's where I think I should connect n sockets to n addresses from file
* but I'm only getting one IP at a time from file, so I'm not sure how to connect to
* n addresses at once with fgets
*/
for (j = 0; j < socket_num; j++) {
if ((connect(socket, ai->ai_addr, ai->ai_addrlen)) == -1)
// error
else {
freeaddrinfo(ai);
FD_SET(socket, &master);
fdmax = socket;
if (select(socket+1, &master, NULL, NULL, &tv) == -1);
// error
if ((recvd = read(socket, banner, RECVD)) <= 0)
// error
if (FD_ISSET(socket, &master))
// print success
}
/* clear sets and close sockets and stuff */
}
I've pointed out my issues with comments, but just to clarify: I'm not sure how to perform asynchronous I/O operations on multiple target servers read from a file, since reading entries from file seems to be strictly synchronous. I've run into similar isssues with multithreading, with a marginally better degree of success.
void *function_passed_to_pthread_create(void *opts)
{
while(fgets(ip_addr, sizeof(ip_addr), opts->file) {
/* speak to ip_addr and get response */
}
}
main()
{
/* necessary stuff */
for (i = 0; i < thread_num; i++) {
pthread_create(&tasks, NULL, above_function, opts)
}
for (j = 0; j < thread_num; j++)
/* join threads */
return 0;
}
This seems to work, but since multiple threads are all processing the same file the results aren't always accurate. I imagine it's because multiple threads may process the same address from file at the same time.
I've considered loading all the entries from a file into an array/into memory, but if the file was particularly large I imagine that could cause memory issues. On top of that, I'm not sure it that even makes sense to do anyway.
As a final note; if the file I'm reading from happens to be a particularly large file with a huge amount of IPs then I do not believe either solution scales well. Anything is possible with C though, so I imagine there is some way to achieve what I'm hoping to.
To sum this post up; I'd like to find a way to improve a client-side applications performance using asynchronous I/O or multi-threading when reading entries from a file.
Several people have hinted at a good solution to this in their comments, but it's probably worth spelling it out in more detail. The full solution has quite a lot of details and is pretty complicated code, so I'm going to use pseudocode to explain what I'd recommend.
What you have here is really a variation on a classic producer/consumer problem: You have a single thing producing data, and many things trying to consume that data. In your case, it must be a "single thing" producing that data, because the lengths of each line of the source file are unknown: You can't just jump forward 'n' bytes and somehow be at the next IP. There can only be one actor at a time moving the read pointer toward the next unknown position of the \n, so you by definition have a single producer.
There are three general ways to attack this:
Solution A involves having each thread pulling a little more out of a shared file buffer, and kicking off an asynchronous (nonblocking) read every time the last read completes. There are a whole host of headaches getting this solution right, as it's very sensitive to timing differences between the filesystem and the work being performed: If the file reads are slow, the workers will all stall waiting for the file. If the workers are slow, the file reader will either stall or fill up memory waiting for them to consume the data. This solution is likely the absolute fastest, but it's also incredibly difficult synchronization code to get right with about a zillion caveats. Unless you're an expert in threading (or extremely clever abuse of epoll_wait()), you probably don't want to go this route.
Solution B has a "master" thread, responsible for reading the file, and populating some kind of thread-safe queue with the data it reads, with one IP address (one string) per queue entry. Each of the worker threads just consumes queue entries as fast as it can, querying the remote server and then requesting another queue entry. This requires a little care to get right, but is generally a lot safer than Solution A, especially if you use somebody else's queue implementation.
Solution C is pretty hacktastic, but you shouldn't dismiss it out-of-hand, depending on what you're doing. This solution just involves using something like the Un*x sed command (see Get a range of lines from a file given the start and end line numbers) to slice your source file into a bunch of "chunky" source files in advance — say, twenty of them. Then you just run twenty copies of a really simple single-thread program in parallel using &, each on a different "slice" of file. Mushed together with a little shell script to automate it, this can be a "good enough" solution for a lot of needs.
Let's take a closer look at Solution B — a master thread with a thread-safe queue. I'm going to cheat and assume you can construct a working queue implementation (if not, there are StackOverflow articles on implementing a thread-safe queue using pthreads: pthread synchronized blocking queue).
In pseudocode, this solution is then something like this:
main()
{
/* Create a queue. */
queue = create_queue();
/* Kick off the master thread to read the file, and give it the queue. */
master_thread = pthread_create(master, queue);
/* Kick off a bunch of workers with access to the queue. */
for (i = 0; i < 20; i++) {
worker_thread[i] = pthread_create(worker, queue);
}
/* Wait for everybody to finish. */
pthread_join(master_thread);
for (i = 0; i < 20; i++) {
pthread_join(worker_thread[i]);
}
}
void master(queue q)
{
FILE *fp = fopen("ips.txt", "r");
char buffer[BIGGER_THAN_ANY_IP];
/* Inhale the file as fast as we can, and push each line we
read onto the queue. */
while (fgets(fp, buffer) != NULL) {
char *next_ip = strdup(buffer);
enqueue(q, next_ip);
}
/* Add some final messages in the queue to let the workers
know that we're out of data. There are *much* better ways
of notifying them that we're "done", but in this case,
pushing a bunch of NULLs equal to the number of threads is
simple and probably good enough. */
for (i = 0; i < 20; i++) {
enqueue(q, NULL);
}
}
void worker(queue q)
{
char *ip;
/* Inhale messages off the queue as fast as we can until
we get a "NULL", which means that it's time to stop.
The call to dequeue() *must* block if there's nothing
in the queue; the call should only return NULL if the
queue actually had NULL pushed into it. */
while ((ip = dequeue(q)) != NULL) {
/* Insert code to actually do the work here. */
connect_and_send_and_receive_to(ip);
}
}
There are plenty of caveats and details in a real implementation (like: how do we implement the queue, ring buffers or a linked list? what if the text isn't all IPs? what if the char buffer isn't big enough? how many threads is enough? how do we deal with file or network errors? will malloc performance become a bottleneck? what if the queue gets too big? can we do better to overlap the network I/O?).
But, caveats and details aside, the pseudocode I presented above is a good enough starting point that you likely can expand it into a working solution.
read IP's from a file, have worker threads, keep giving IP's to worker threads. let all socket communication happen in worker threads. Also if the IPv4 addresses are stored hex format instead of ascii, probably can read multiples of them in a single shot and it would be faster.
If you just want to read asynchronously you can use getch() from ncurses with delay set to 0. It is part of posix so you don't need any additional dependencies. Also you have unlocked_stdio.
On the other hand, I have to wonder why is fgets() a bottleneck. As long as you have data in file it should not block. And even if data is huge (like 1MB or 100k ip addresses) reading it into list at startup should take less than 1 second.
And why are you openining sockets_num connections to every ip in the list? You are having sockets_num multiplied by number of ip addresses at the same time. Since every socket is file on linux you will hit system issues when you try to open more than several thousand files (see ulimit -Sn). Can you confirm that issue is not in connect() in that case?
Related
If I am using select() to monitor three file descriptor sets:
if (select(fdmax+1, &read_fds, &write_fds, &except_fds, NULL) == -1) {
perror("select()");
exit(1);
} else {
...
}
Can a particular file descriptor be ready for reading AND writing AND exception handling simultaneously?
Beej's popular networking page shows a select() example in which he tests the members of the read fd_set using a for loop. Since the loop increments by one each iteration, it will necessarily test some integers that don't happen to be existing file descriptors:
for(i = 0; i <= fdmax; i++) {
if (FD_ISSET(i, &read_fds)) { // we got one!!
{
...
}
}
I believe he's doing this for the sake of keeping the example code simple. Might/should one only test existing file descriptors?
Expanding a little bit with examples and #user207421 comment:
1 Can a particular file descriptor be ready for reading AND writing AND exception handling simultaneously?
Good example will be a socket, which will (almost) always be ready for writing, and will be ready for reading when data is available. It is not common to have exceptions - they are used for exceptional situations. For example, availability of out-of-band message on TCP connections, but most applications do not use those features.
Note that 'normal' errors will be indicated in readfds (for example, socket shutdown).
See also: *nix select and exceptfds/errorfds semantics,
Beej's popular networking page shows a select() example in which he tests the members of the read fd_set using a for loop. Since the loop increments by one each iteration, it will necessarily test some integers that don't happen to be existing file descriptors:
I believe that in this case, it is done simplify the code examples, and is a reasonable implementation for most light weight implementations. It works well if the number of non-listen connections is very small.
Worth mentioning that the 'fd_set' is implemented on Linux with a set of bits, but on Windows (winsock) as an array of fd values. A full scan on all FDs will be O(n) on Linux, and O(n*n) on Windows. This can make a big performance hit on large N for Windows app.
In large scale applications, where a server will listen to hundreds (or more) open connections, each require different actions, potentially with multiple states, the common practice will be to have the list of of active connections, and use a callback to invoke the function. This is usually implemented with an 'eventloop'. Examples include X11, rpc servers, etc.
See Also: https://en.wikipedia.org/wiki/Event_loop
Your question: why you would use select() when you only have one socket.
when select is used and you do not want it to block other processing.
Then make use of the timeout parameter.
That way, even with only one file descriptor open, The program will not block forever due to that one file descriptor not receiving any data as it would block if using read() or similar function.
I.E. this is a very good method when, for instance, listening to a serial port, which only has data when some external event occurs.
So, I have a MMO game server listening for connections (up to 1100, hardcoded limit because of game design) and needing to log everything without blocking my main thread (which handles all the virtual-world logic), including but not limited to: what NPCs give to players, when a monster/player dies, every single chat message, when players level up, when players drop an item to the ground, etc.
I get a thread from a pool for each connection, and each thread can log to a file at any time (it may actually never happen, but the server must be prepared enough to handle it appropriately).
I thought about an independent logging thread constantly checking an order queue, until it is not empty.
while Game is Running
Check the queue, if it's empty, sleep.
File IO. Open plain-text file, log and gracefully close it.
Repeat.
Is a linked list enough for organizing the queue and just protect it with a mutex like this?
LogThreadOrder Queue[1024];
LogThreadOrder *FirstThreadOrder;
int NumberOfOrders;
struct LogThreadOrder {
char FileName[4096];
char LogMessage[4096];
LogThreadOrder* NextOrder;
};
void Log(const char* FileName, const char* LogMessage) {
LogMutex.lock();
ProtocolThreadOrder* Order = Queue[NumberOfOrders++];
Order->NextOrder = FirstThreadOrder;
FirstThreadOrder = Order;
LogMutex.release();
}
int LogThreadLoop() {
// Loop through Queue until its empty.
}
So, my question is:
Is this a smart design?
Is it thread-safe?
Is there any better way of accomplishing what I'm trying to do?
Would it be better/more-efficient to have multiple logger threads and multiple queues?
Your pseudocode could be made to work, but it's difficult in C and mutex synchronization is inefficient. In my experience it's always better to use existing, well-established, asynchronous inter-process communication mechanisms between threads like named pipes. Even then there are many pitfalls and when dependencies aren't a problem using something like ZeroMQ might be the best idea. Specifically something like this. I don't think having multiple threads is a good idea, as file you want to write to will be a contention point.
I'm a linux programmer and recently involved in porting an epoll based client with two file descriptor written in c to windows.
As you know,in linux using epoll or select (I know windows supports select, but its not efficient at all) you can block on file descriptors until a file descriptor
is ready and you can know when it is ready to write and when read.
I have taken a look at windows IOCP and it sounds ok for overlapped io in microsoft world.
But in all samples it is used for a multi client server that each client's socket is independent from other sockets.
using completions ports, it can be done creating a completionKey structure for each client and put a variable in struct and make it read when invoking
WSArecv and wirt when WSAsend and the other variable indicating socket value and retrieving them from GetQueuedCompletionStatus to know what to do, if write is done for socket, do read, and vise versa.
But in my case, the file descriptors (fd) are really overlapped. reading from one fd, makes read and write to other fd and that makes it hard to know what
operation realy happend for each fd in GetQueuedCompletionStatus result because there is one completionKey associated for each fd. to be clear consider this please:
There is two handles called fd1 and fd2 and completionKey1 is holding handle and status for f1 and completionKey2 for fd2 and completionKey variable is for
retrieving completion from GetQueuedCompletionStatus.
GetQueuedCompletionStatus(port_handle, &completionKey.bufflen, (PULONG_PTR)&completionKey,(LPOVERLAPPED *)&ovl,INFINITE);
switch (completionKey.status)
{
case READ:
if(completionKey->handle == fd1)
{
fd1_read_is_done(completionKey.buffer,completionKey.bufflen);
completionKey->status = WRITE;
do_fd1_write(completionKey);
completionKey2->status = WRITE;
completionKey2->buffer = "somedata";
do_fd2_write(completionKey2);
}
else if(completionKey->handle == fd2)
{
fd2_read_is_done(completionKey.buffer,completionKey.bufflen);
completionKey->status = WRITE;
do_fd2_write(completionKey);
completionKey1->status = WRITE;
completionKey1->buffer = "somedata";
do_fd1_write(completionKey1);
}
break;
case WRITE_EVENT:
if(completionKey->handle == fd1)
{
fd1_write_is_done(completionKey.bufflen);
completionKey->status = READ;
do_fd1_read(completionKey);
completionKey2->status = READ;
do_fd2_read(completionKey2);
}
else if(completionKey->handle == fd2)
{
fd2_write_is_done(completionKey.bufflen);
completionKey->status = READ;
do_fd2_read(completionKey);
completionKey1->status = READ;
do_fd1_read(completionKey1);
}
break;
}
in the above code, it comes a situation that some of altering completionKeys will override the pending reads or writes and the resulted
completionKey->status would be wrong (it will report read instead of write for instance) and worst is the buffer will override. if I use locking for
completionKeys, it will lead to dead lock situations.
After looking to WSAsend or WSArecv, noticed there is a overlap parameter can be set for every send or receive.
but it leads to two major problems. according to WSAOVERLAPPED structure:
typedef struct _WSAOVERLAPPED {
ULONG_PTR Internal;
ULONG_PTR InternalHigh;
union {
struct {
DWORD Offset;
DWORD OffsetHigh;
};
PVOID Pointer;
};
HANDLE hEvent;
} WSAOVERLAPPED, *LPWSAOVERLAPPED;
First, there is no place for putting status and appropriate buffer in it and most of them are reserved.
Second if could make a work for first problem, I need to check if there is no available overlapped left and all of them are used in pending operations,
allocate a new one for every read and write and because of the client is going to be so busy, it might happens a lot and besides, managing those overlapped pools is a headache.
so am I missing something or microsoft has screwed this?
And because of I don't need multithreading, is there another way to solve my problem?
thanks in advance
Edit
As I guessed, the first problem that I mentioned in using overlapped struct has answer and I need just create another struct with all buffers and status and etc and put OVERLAPPED as first filed.
now you solve me the others ;)
You're really asking two different questions here. I can't answer the first one, as I've never used IO completion ports, but from everything I've read they're best avoided by everyone but experts. (I will point out an obvious solution to the problem I think you're describing: rather than actually writing the data to the other socket while another write is still pending, put the data in a queue and write it later. You still have to deal with two simultaneous operations on a given socket - one read and one write - but that shouldn't be a problem.)
However, it's easy to use OVERLAPPED (or WSAOVERLAPPED) structures to track the status of overlapped requests. All you do is embed the OVERLAPPED structure as the first element in a larger structure:
typedef struct _MyOverlapped
{
WSAOVERLAPPED overlapped;
... your data goes here ...
} MyOverlapped, lpMyOverlapped;
then cast the LPWSAOVERLAPPED sent to the completion routine to lpMyOverlapped to access your context data.
Alternatively, if you are using a completion routine, the hEvent member of WSAOVERLAPPED is guaranteed to be unused, so you can set this to a pointer to a structure of your choice.
I don't see why you think that managing the pool of overlapped structures is going to be a problem. There's exactly one overlapped structure per active buffer, so every time you allocate a buffer, allocate a corresponding overlapped structure.
I'm trying to write a server able to handle multiple (more than a thousand) client connections concurrently in C language. Every connection is meant to accomplish three things:
Send data to the server
The server processes the data
The server returns data to the client
I am using non-blocking sockets and epoll() for handling all the connections, but my problem is right in the moment after the server receives the data from one client and has to call a function which spends several seconds in processing the data before it returns the result that has to be sent back to the client before closing the connection.
My question is, what paradigm can I use in order to be able to keep handling more connections while the data of one client "is cooking"?
I've been researching a bit about the possibilities of doing it by creating a thread or a process every time I need to call the computing function, but I'm not sure if this would be possible given the number of possible concurrent connections, that's why I came here expecting that someone more experienced that me in the matter could shed some light on my ignorance.
Code snippet:
while (1)
{
ssize_t count;
char buf[512];
count = read (events[i].data.fd, buf, sizeof buf); // read the data
if (count == -1)
{
/* If errno == EAGAIN, that means we have read all
data. So go back to the main loop. */
if (errno != EAGAIN)
{
perror ("read");
done = 1;
}
/* Here is where I should call the processing function before
exiting the loop and closing the actual connection */
answer = proc_function(buf);
count = write (events[i].data.fd, answer, sizeof answer); // send the answer to the client
break;
}
...
Thanks in advance.
It seems sensible to multi-thread or multi-process to some degree to accomplish this. The degree to which you multi-thread or multi-process is the question.
1) You could dump the polling system entirely and use a thread/process per connection. That thread can then stall as long as it wants working on the processing for that connection. You'd then have to decide on creating/killing a thread/process each time (probably easiest) or having a pool of threads/processes (probably fastest).
2) You could have a thread/process for the networky bits and hand off the processing to one other thread. This is less parallel, but it does mean you can at least keep handling network connections whilst you're chopping through the list of work. This gives you control of what processing is being handled at least. It would be easy to prioritise incoming connections this way, whereas option 1 might not.
3) (sort of possible 1 & 2) You could use asynchronous I/O to multiplex your connections. You still to handle the processing in the same way as 1 & 2 above.
You also have the question of threads vs processes. Threads are probably quicker to get going but it's more difficult to ensure data integrity. Processes are going to be more resilient but require more interfacing between them.
You also have to decide on a way to pass data between the threads/processes. This is less of an issue for option 1 as you only have to pass off the connection to the thread. Option 2 may (depending on what your data is) be more of a problem. You could use a message queue for passing the messages about but if you have a lot of data to send shared memory is more appropriate. Shared memory is a pain to engineer for processes but easy with threads (as all threads share the same memory space).
There are performance issues as you get to this scale too. It's worth investigating performance characteristics for these things. The differences to how calls like select and poll scale is significant when you're dealing with a lot of connections.
Without knowledge of what data is being sent and received it's hard to give solid recommendations.
Incidentally, this isn't a new problem. Dan Kegel had a good article about it a few years back. It's now out-of-date, but the overview is still good. You should research the current state of the art for the concepts he discusses though.
First of all, I've never worked with C before (mostly Java which is the reason you'll find me write some naive C code). I am writing a simple command interpreter in C. I have something like this:
//Initialization code
if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) {
perror("Select dead");
exit(EXIT_FAILURE);
}
....
....
//Loop through connections to see who has the data ready
//If the data is ready
if ((nbytes = recv(i, buf, sizeof(buf), 0)) > 0) {
//Do something with the message in the buffer
}
Now if I'm looking at something like a long paragraph of commands, it is obvious that a 256 byte buffer will not be able to get the entire command. For the time being, I'm using a 2056 byte buffer to get the entire command. But if I want to use the 256 byte buffer, how would I go about doing this? Do I keep track of which client gave me what data and append it to some buffer? I mean, use something like two dimensional arrays and such?
Yes, the usual approach is to have a buffer of "data I've received but not processed" for each client, large enough to hold the biggest protocol message.
You read into that buffer (always keeping track of how much data is currently in the buffer), and after each read, check to see if you have a complete message (or message(s), since you might get two at once!). If you do, you process the message, remove it from the buffer and shift any remaining data up to the start of the buffer.
Something roughly along the lines of:
for (i = 0; i < nclients; i++)
{
if (!FD_ISSET(client[i].fd, &read_fds))
continue;
nbytes = recv(client[i].fd, client[i].buf + client[i].bytes, sizeof(client[i].buf) - client[i].bytes, 0);
if (nbytes > 0)
{
client[i].bytes += nbytes;
while (check_for_message(client[i]))
{
size_t message_len;
message_len = process_message(client[i]);
client[i].bytes -= message_len;
memmove(client[i].buf, client[i].buf + message_len, client[i].bytes);
}
}
else
/* Handle client close or error */
}
By the way, you should check for errno == EINTR if select() returns -1, and just loop around again - that's not a fatal error.
I would keep a structure around for each client. Each structure contains a pointer to a buffer where the command is read in. Maybe you free the buffers when they're not used, or maybe you keep them around. The structure could also contain the client's fd in it as well. Then you just need one array (or list) of clients which you loop over.
The other reason you'd want to do this, besides the fact that 256 bytes might not be enough, is that recv doesn't always fill the buffer. Some of the data might still in transit over the network.
If you keep around buffers for each client, however, you can run into the "slowloris" attack, where a single client keeps sending little bits of data and takes up all your memory.
It can be a serious pain when you get tons of data like that over a network. There is a constant trade between allocating a huge array or multiple reads with data moves. You should consider getting a ready made linked list of buffers, then traverse the linked list as you read the buffers in each node of the linked list. That way it scales gracefully and you can quickly delete what you've processed. I think that's the best approach and it's also how boost asio implements buffered reads.
If you're dealing with multiple clients a common approach to to fork/exec for each connection. Your server would listen for incoming connections, and when one is made it would fork and and exec a child version of itself that would then handle the "command interpreter" portion of the problem.
This way you're letting the OS manage the client processes--that is, you don't have to have a data structure in your program to manage them. You will still need to clean up child processes in your server as they terminate.
As for managing the buffer...How much data do you expect before you post a response? You may need to be prepared to dynamically adjust the size of your buffer.