I started writing a c web server a while ago (windows 8), but I tried using only threads by myself, without using the select() option.
This is my main loop, and I'm opening each new thread like this:
uintptr_t new_thread;
while (client_sock = accept(server->sock, (struct sockaddr *)&client_info, &size))
{
if (client_sock <= 0) quit();
printf("\n[***] : Got a connection from localhost on port %d\n",ntohs(client_info.sin_port));
code = init_connection(client_sock);
if (code)
{
new_thread = _beginthread(handle_connection, 0, ID++, client_sock);
if (new_thread == -1)
{
fprintf(stderr, "Could not create thread for sending data: %d\n", GetLastError());
closesocket(client_sock);
quit();
}
}
else
{
debug("Failed to init connection");
closesocket(client_sock);
debug("Connection to client ended");
}
}
First of all, I would love to here if I can make this code better.
Testing this program by trying to enter the localhost from chrome, I see that no more data is sent (after recieving one http request).
My question is what would be the best way for the program to act then: close the thread and when another request will be made it will open a new one? if so, how do I close that thread? if not, when should I close that thread?
Normally, when implementing a server that forks separate processes, I would make the child process stay alive to serve predefined amount of requests (e.g. 100) and then kill itself. This is to reduce overhead created by forking and on the other hand recover from possible memory leaks or other problems in the process. Threads are lighter than processes, so it may make sense to close them faster.
I think you should compare the benefits and drawbacks. Measure the overhead of thread creation and closing compared to keeping them alive. In any case you must make sure that there is limit on the number threads you have alive at one time.
About the windows specifics on creating ans closing the thread you could go and add e.g. this response.
Related
I'm trying to understand the different practices when it comes to socket programming and handling multiple connections.
In particular when a server needs to serve multiple clients.
I have looked at some code examples; where some use fd_set and others use a fork() system call.
Roughly:
FD_SET
//Variables
fd_set fds, readfds;
//bind(...)
//listen(...)
FD_ZERO(&fds);
FD_SET(request_socket, &fds);
while(1) {
readfds = fds;
if (select (FD_SETSIZE, &readfds, NULL, NULL, NULL) < 0)
//Something went wrong
//Service all sockets with input pending
for(i = 0; i < FD_SETSIZE; i++) {
if (FD_ISSET (i, &readfds)) {
if (i == request_socket) {
/* Connection request on original socket. */
int new;
size = sizeof (clientname);
new = accept (request_socket, (struct sockaddr *) &clientname, &size);
if (new < 0)
//Error
fprintf (stderr, "Server: connect from host %s, port %hd.\n", inet_ntoa (clientname.sin_addr), ntohs (clientname.sin_port));
FD_SET (new, &fds);
}
else {
/* Data arriving on an already-connected socket. */
if (read_from_client (i) < 0) { //handles queries
close (i);
FD_CLR (i, &fds);
}
}//end else
fork()
//bind()
//listen()
while(1) {
//Connection establishment
new_socket = accept(request_socket, (struct sockaddr *) &clientaddr, &client_addr_length);
if(new_socket < 0) {
error("Error on accepting");
}
if((pid = fork()) < 0) {
error("Error on fork");
}
if((pid = fork()) == 0) {
close(request_socket);
read_from_client(new_socket);
close(new_socket);
exit(0);
}
else {
close(new_socket);
}
}
My question is then: what is the difference between the two practices (fd_set and fork)? Is one more suitable than the other?
You would choose between one of the two approaches, select() or fork() based on the nature of the IO operations you have to do once you receive a connection from a client.
Many IO system calls are blocking. While a thread is blocked on IO performed for one client (e.g. connecting to a database or server, reading a file on disk, reading from the network, etc.), it cannot serve the other clients' requests. If you create a new process with fork(), then each process can block independently without impeding progress on the other connections. Although it may seem advantageous to start a process for each client, it has drawbacks: multiple processes are harder to coordinate, and consume more resources. There is no right or wrong approach, it is all about trade-offs.
You may read about "events vs threads" to understand the various tradeoffs to consider: See: Event Loop vs Multithread blocking IO
The select() system call approach (which you've called the FD_SET approach), would generally classify as a polling approach. Using this, a process can wait on multiple file descriptor events at once, sleep there, and be woken up when activity arises on at least one of the file descriptors specified in the FD_SET. You may read the man page on select for details (man 2 select). This will allow the server process to read from the multiple clients bit by bit (but still one at a time), as soon as new data arrives on any socket of interest.
Trying to call read() on a socket that has no data available would block -- select just makes sure you only do it on those that have data available. It is generally called in a loop so that the process comes back for the next piece of work. Writing the program in that style often forces one to handle requests iteratively, and carefully, because you want to avoid blocking in your single process.
fork() (man 2 fork) creates a child process. Child processes are created with a copy of the file descriptors open in the parent, which explains all the fd-closing business when the system call returns. Once you have a child process to take care of the client's socket, then you can write straightforward linear code with blocking calls without affecting the other connections (because those would be handled in parallel by other child processes of the server).
The main difference between the two practices is the number of processes used to handle multiple connections. With select, single process (in fact a single thread) can handle concurrent connections from multiple clients. When we use the fork based approach, a new process is created for every new connections. So if there are N concurrent client connections, there will be N processes to handle those connections.
When we use select, we don't need to worry about shared memory or synchronization as everything is happening within the same thread of execution.
On the other hand, when we use select, we need to be more careful while coding as the same thread of execution is going to handle multiple clients. In fork based approach, the child process has to handle only a single client so it tends to be bit easier to implement.
When we use fork based approach, we end up using more system resource as a result of creating more processes.
The choice of approach depends on the application - expected number of connections, the nature of connections (persistent or short duration), whether there is a need to share data among connection handlers etc.
I am writing a two daemon application - a client and a server. It is a very basic version of distributed shell. Clients are connecting to the server, and server issues a command, that is propagated to every client.
I dont know how to create the socket logic on server-side - I do some testing and for now I am accepting connections in an loop and for every incoming connection I fork a child to process the connection
while (1) {
clisockfd = accept(sockfd, (struct sockaddr *) &cliaddr, &clilen);
if (clisockfd < 0) {
log_err("err on opening client socket");
exit(EXIT_FAILURE);
}
/* create a new child to process the connection */
if((pid = fork()) < 0) {
log_err("err on forking, something is really broken!");
exit(EXIT_FAILURE);
}
if(!pid) {
/* here we are in forked process so we dont need the sockfd */
close(sockfd);
/* function that handles connection */
handle_connection(clisockfd);
exit(EXIT_FAILURE);
} else {
close(clisockfd);
}
}
However what I have now have some disadvantages - I can accept a connection, do something with it, and return to main process (forked process have to return, and then execution in main process is resumed). I would like to keep every socketfd somewhere(a list?) and be able to choose one of those (or all of them) and send to this socketfd a command that I want to issue on my client/s. I assume that I cant do it in traditional accept->fork->return to main process manner.
So it probably should looks like:
client connects -> server set up a new socketfd and saves it somewhere -> drops to shell where I can choose one of socket and send it a command -> somewhere in the whole process it also should wait for next incoming client connections - but where?
If someone could give me an idea what mechanisms should I use to create the logic that I need? Maybe it would be better to issue connection from server to client, not from client to server.
Regards,
Krzysztof
I assume that I cant do it in traditional accept->fork->return to main process manner.
You could but it will be hard to design/maintain.
The best solution is to use select() (POSIX), epoll() (Linux), kqueue() (BSD) or I/O Completion Ports (Windows) depending on your platform.
There is a good examples/explanations about select() in Beej's network programming guide.
I am implementing a server that serves multiple clients and I need some server's variable to be in shared memory so a client actually sees what another client has eventually edited.
I tried looking around but I didn't understand if there is any way to achieve this by using fork() or I totally need to change the way I handle clients. In particular I don't know if I should implement piped processes or threads. Also, what is the simpler way?
This is my code after declaring int var in the main:
while(1) {
printf("Waiting connection...\n");
if ((connfd = accept(listenfd, (struct sockaddr *) NULL, NULL)) < 0) {
perror("Accept Error");
exit(1);
}
if ((pid = fork()) == 0) {
close (listenfd);
printf("Variable: %d\n", var); // var = 0
var = 1;
printf("Variable: %d\n", var); // var = 1
exit(0);
}
close(connfd);
}
When I connect with another client I see again var = 0 because the child generates a copy of the parent process.
Tried using static or declaring global variables outside of the main() but as I understood it has no effect.
fork does not duplicate variables but the entire address space (by definition of fork) of the invoking process.
You might want to use some shared memory, but then you should care about synchronization. Read shm_overview(7) and sem_overview(7) (you could share some memory using mmap(2) but you need to synchronize anyway).
Maybe you don't need to fork, but you just want to have several threads sharing the same address space. Read pthreads(7) and a good pthread tutorial. Then you'll also should care about synchronization, probably using mutexes.
You could also (and instead) use some other form of IPC. Read Advanced Linux Programming, consider using pipe(7)-s and have some event loop around a multiplexing syscall like poll(2)
In a server/client scenario, you might have some request and protocol to query (from inside clients) some state variable (inside the server).
PS. The main issue is not sharing data, it is synchronization and deadlock avoidance.
I was working on a simple client server program, with the intention of creating a chat program. I am new to socket programming in C. I have learnt that, to serve multiple clients, the server needs to fork a new process each time a client connects. Each time a client requests connection, the accept() function returns a descriptor id and after the fork() the parent closes the id.
Instead I didn't close the ID, such that each new client will get a newid, when accept() is invoked.
nsockfd = accept(lsockfd, (struct sockaddr *) &cli_addr, &cli_len);
now this is stored in 2 variables:
if (client1 < 0)
{ client1 = nsockfd;
printf("if loop %d\n",nsockfd);
}
else
{ client2 = nsockfd;
printf("else loop %d\n",nsockfd);
}
Now rest of code is
snprintf(buf, sizeof(buf), "Hi client1 Nice to meet you.",inet_ntoa(cli_addr.sin_addr));
ret = send(client1, buf, strlen(buf), 0);
if (ret == -1) {
perror("Error sending message");
exit(1);
}
printf("SRV - %s\n", buf);
strcpy(buf,"");
snprintf(buf, sizeof(buf), "Hi client2 Nice to meet you.",inet_ntoa(cli_addr.sin_addr));
if(client2 > 0)
{ret = send(client2, buf, strlen(buf), 0);
if (ret == -1) {
perror("Error sending message");
exit(1);
}
printf("SRV - %s\n", buf);
strcpy(buf,"");
Here the code is working as intended; each client prints only one of the statements.
If this is a flawless method, why is it taught that fork() should be used for serving each client.
I am working on a localhost? Is this the reason that this code is working for me?
It isn't a concurrent server if you don't either fork() or process the connection in a (new?) thread. That's the definition of a concurrent server.
If I'm reading your code correctly, what you've got is a simple sequential server. It can only process one connection at a time. That's fine if the computation required for each response is minimal, as in your example. It's not so good if the computation involves a lot of effort — access disk or database, for example.
Note that a sequential server design is completely legitimate. So too is a concurrent server design. They should be applied to different workloads. Generally, though, a concurrent server will handle large traffic volumes better than a sequential server. Imagine if Google used sequential servers for responding to search requests!
Another design uses a thread pool or process pool with one thread or process farming out the work to other threads or processes. These are trickier to write so they work well.
As I am currently doing this project in only C, I've up untill this point only used my webserver as a single threaded application. However, I dont want that anymore! So I have the following code that handles my Work.
void BeginListen()
{
CreateSocket();
BindSocket();
ListenOnSocket();
while ( 1 )
{
ProcessConnections();
}
}
Now I've added fork(); before the start of ProcessConnection(); which helpes me allowing multiple connections! However, when I add code for daemoning the application found in this answer. I've encounted a little problem, using fork() will create a copy of my whole running app, which is the purpose of fork(). So, I'd like to solve this problem.
My ProcessConnection() looks like this
void ProcessConnections()
{
fork();
addr_size = sizeof(connector);
connecting_socket = accept(current_socket, (struct sockaddr *)&connector, &addr_size);
if ( connecting_socket < 0 )
{
perror("Accepting sockets");
exit(-1);
}
HandleCurrentConnection(connecting_socket);
DisposeCurrentConnection();
}
How would I do to simply just add a couple of lines above or after connecting=socket = accept... in order to make it accept more than one connection at the time? Can i use fork(); but when it comes down to DisposeCurrentConnection(); I want to kill that process and just have the parent-thread running.
I'm not a 100% sure what it is that you're trying to do, buy off the top of my head, I'd prefer to do the fork after the accept, and simply exit() when you're done. Keep in mind though, that you need to react to the SIGCHLD signal when the child process exits, otherwise you'll have a ton of zombie-processes hanging around, waiting to deliver their exit-status to the parent process. C-pseudo-code:
for (;;) {
connecting_socket = accept(server_socket);
if (connecting_socket < 0)
{
if (errno == EINTR)
continue;
else
{
// handle error
break;
}
}
if (! (child_pid = fork ()))
{
// child process, do work with connecting socket
exit (0);
}
else if (child_pid > 0)
{
// parent process, keep track of child_pid if necessary.
}
else
{
// fork failed, unable to service request, send 503 or equivalent.
}
}
The child_pid is needed to (as already mentioned) to kill the child-process, but also if you wish to use waitpid to collect the exit status.
Concerning the zombie-processes, if you're not interested in what happened to the process, you could install a signal hander for SIGCHLD and just loop on waitpid with -1 until it there are no more child-processes, like this
while (-1 != waitpid (-1, NULL, WNOHANG))
/* no loop body */ ;
The waitpid function will return the pid of the child that exited, so if you wish you can correlate this to some other information about the connection (if you did keep track of the pid). Keep in mind that accept will probably exit with errno set to EINTR, without a valid connection if a SIGCHLD is caught, so remember to check for this on accepts return.
EDIT:
Don't forget to check for error conditions, i.e. fork returns -1.
Talking about fork() and threads on unix is not strictly correct. Fork creates a whole new process, which has no shared address space with the parent.
I think you are trying to achieve a process-per-request model, much like a traditional unix web server such as NCSA httpd or Apache 1.x, or possibly build a multi-threaded server with shared global memory:
Process-per-request servers:
When you call fork(), the system creates a clone of the parent process, including file descriptiors. This means that you can accept the socket request and then fork. The child process has the socket request, which it can reply to and then terminate.
This is relatively efficient on unix, as the memory of the process is not physically copied - the pages are shared between the process. The system uses a mechanism called copy-on-write to make copies on a page-by-page basis when the child process writes to memory. Thus, the overhead of a process-per-request server on unix is not that great, and many systems use this architecture.
Better to use select() function which enables u to listen and connect from different
requests in one program.... It avoids blocking but forking creates a new address space
for the copy of the program which leads to memory inefficiency....
select(Max_descr, read_set, write_set, exception_set, time_out);
i.e u can
fd_set* time_out;
fd_set* read_set;
listen(1);
listen(2);
while(1)
{
if(select(20, read_set, NULL,NULL, timeout) >0)
{
accept(1);
accept(2); .....
pthread_create(func);....
}
else
}
Check the return value of fork(). If it is zero, you are the child process, and you can exit() after doing your work. If it is a positive number then it's the process ID of the newly created process. This can let you kill() the child processes if they are hanging around too long for some reason.
As per my comment, this server is not really multi-threaded, it is multi-process.
If you want a simple way to make it accept multiple connections (and you don't care too much about performance) then you can make it work with inetd. This leaves the work of spawning the processes and being a daemon to inetd, and you just need to write a program that handles and processes a single connection. edit: or if this is a programming exercise for you, you could grab the source of inetd and see how it does it
You can also do what you want to do without either threads or new processes, using select.
Here's an article that explains how to use select (pretty low overhead compared to fork or threads - here's an example of a lightweight web server written this way)
Also if you're not wedded to doing this in C, and C++ is OK, you might consider porting your code to use ACE. That is also a good place to look for design patterns of how to do this as I believe it supports pretty much any connection handling model and is very portable.