How to avoid busy looping in specific case - c

I have a problem which I don't really know how to solve. I have a program that multiplexes multiple connections. Those connections are receiving streaming data all at the same time. I had to configure non blocking sockets as the streams have different bitrates. Now what I actually did is keep those sockets in an array looping through them and detecting with a select if there is data to read and proceding to the next element in the array if not.
It works very well except for teh fact that teh CPU is always at 100%. Actually if at some point there is nothing to read from any socket it will still loop. I don't really know how it would be possible to block the loop whenever no data is available on any socket and just keep going when there is data. I think this may be the solution but I don't really see how I could do this. The program has to be very responsive though as it is a UDP stream recorder and if it blocks for too long, this will produce lags in the file.
I thank you a lot.
PS.: Just for info I am still learning so please don't blame me even if the solution may be obvious.
EDIT:
here's some pseudo code:
When a recording request comes in, I create a new connection and connect to the stream address. If it succeeds, I build my fdset using following function:
build_fdset()
{
int ii;
/* */
FD_ZERO(&fdset);
/* */
for (ii = 0; ii < max; ii++)
{
if (astRecorder[ii].bUsed != FALSE && astRecorder[ii].socket != INVALID_SOCKET)
{
FD_SET(astRecorder[ii].socket,&fdset);
/* */
if (astRecorder[ii].socket > maxSocket)
maxSocket = astRecorder[ii].socket;
}
}
}
Then the loop handling the connections:
main_loop()
{
struct timeval timeout;
/* */
timeout.tv_sec = 1;
timeout.tv_usec = 0;
/* */
for (;;)
{
memcpy(&fdset_cpy,&fdset,sizeof(fdset));
int ret = select((maxSocket + 1) , &fdset_cpy, NULL, NULL, &timeout);
if (iSelectRet <= 0)
continue;
else
{
int ii;
for(ii = 0; ii < max; ii++)
{
if ((recorders[ii].bUsed) && (FD_ISSET(recorders[ii].socket, &fdset_cpy)))
{
/* receive from socket */
/* handle received data */
}
}
}
}
}
PROBLEM: When I set timeout to timeout.tv_sec = 1 timeout.tv_usec = 0 everything works fine BUT i get 100% CPU usage! When I give NULL as timeout, the program blocks on the select() although there is data on the sockets.
SOLUTION:
Well I finally found the error! In the above code I set the timeout values only once before the main loop. Well the problem with that is that as for fdset, the timeout structure is modified by the select() function. So after the first correct timed out select, the timeout structure gets modified by the select() function and is set to 0. This results in 0 timeout, thus the problem that the next time the loop gets to the select function, the timeout given to the select is 0!!!
Thanks a lot still to those who tried to help! I apreciate it =)

The timeout of the select call can be NULL which means to wait forever.

You could also use sleep after you check all your streams to give up CPU at the end of your loop. This way you don't depend on a single stream to have incoming data sometime in the near future, at the risk of not servicing your other streams.

Related

Is there a size limit of write() for a socket fd?

I am writing a little web server which involves epoll and multithread. For small and short http/1.1 requests and responses, it works as expected. But when working with large size file downloads, it is always interrupted by the timer I devised. I expire the timers with a fixed timeout value, but I also have a if statement to check if the response was sent successfully.
static void
_expire_timers(list_t *timers, long timeout)
{
httpconn_t *conn;
int sockfd;
node_t *timer;
long cur_time;
long stamp;
timer = list_first(timers);
if (timer) {
cur_time = mstime();
do {
stamp = list_node_stamp(timer);
conn = (httpconn_t *)list_node_data(timer);
if ((cur_time - stamp >= timeout) && httpconn_close(conn)) {
sockfd = httpconn_sockfd(conn);
DEBSI("[CONN] socket closed, server disconnected", sockfd);
close(sockfd);
list_del(timers, stamp);
}
timer = list_next(timers);
} while (timer);
}
}
I realized that in a non-blocking environment, the write() function might be interrupted during the request-response communication. I wonder how long write() can hold or how much data write() can send, so I can tweek the timout setting in my code.
This is the code which involves write(),
void
http_rep_get(int clifd, void *cache, char *path, void *req)
{
httpmsg_t *rep;
int len_msg;
char *bytes;
rep = _get_rep_msg((list_t *)cache, path, req);
bytes = msg_create_rep(rep, &len_msg);
/* send msg */
DEBSI("[REP] Sending reply msg...", clifd);
write(clifd, bytes, len_msg);
/* send body */
DEBSI("[REP] Sending body...", clifd);
write(clifd, msg_body_start(rep), msg_body_len(rep));
free(bytes);
msg_destroy(rep, 0);
}
And the following is the epoll loop I use to process the incoming requests,
do {
nevents = epoll_wait(epfd, events, MAXEVENTS, HTTP_KEEPALIVE_TIME);
if (nevents == -1) perror("epoll_wait()");
/* expire the timers */
_expire_timers(timers, HTTP_KEEPALIVE_TIME);
/* loop through events */
for (i = 0; i < nevents; i++) {
conn = (httpconn_t *)events[i].data.ptr;
sockfd = httpconn_sockfd(conn);
/* error case */
if ((events[i].events & EPOLLERR) || (events[i].events & EPOLLHUP) ||
(!(events[i].events & EPOLLIN))) {
perror("EPOLL ERR|HUP");
list_update(timers, conn, mstime());
break;
}
else if (sockfd == srvfd) {
_receive_conn(srvfd, epfd, cache, timers);
}
else {
/* client socket; read client data and process it */
thpool_add_task(taskpool, httpconn_task, conn);
}
}
} while (svc_running);
The http_rep_get() is executed by the threadpool handler httpconn_task(), HTTP_KEEPALIVE_TIME is the fixed timeout. The handler httpconn_task() will add a timer to the timers once a request arrives. Since the write() is executed in http_rep_get(), I think it might be interrupted by the timers. I guess I can change the way to write to the clients, but I need to make sure how much the write() can do.
If you are interested, you may browser my project to help me with this.
https://github.com/grassroot72/Maestro
Cheers,
Edward
Is there a size limit of write() for a socket fd?
It depends on what you mean by a limit.
As the comments explain, a write call may write fewer bytes than you ask it to. Furthermore, this is expected behavior if you perform a large write to a socket. However, there is no reliable way to determine (or predict) how many bytes will be written before you call write.
The correct way to deal with this is to check how many bytes were actually written each time, and use a loop for ensure that all bytes are written (or until you get a failure).

Is it OK to loop over recv / read to read all data from socket

I'm building a multi-client<->server messaging application over TCP.
I created a non blocking server using epoll to multiplex linux file descriptors.
When a fd receives data, I read() /or/ recv() into buf.
I know that I need to either specify a data length* at the start of the transmission, or use a delimiter** at the end of the transmission to segregate the messages.
*using a data length:
char *buffer_ptr = buffer;
do {
switch (recvd_bytes = recv(new_socket, buffer_ptr, rem_bytes, 0)) {
case -1: return SOCKET_ERR;
case 0: return CLOSE_SOCKET;
default: break;
}
buffer_ptr += recvd_bytes;
rem_bytes -= recvd_bytes;
} while (rem_bytes != 0);
**using a delimiter:
void get_all_buf(int sock, std::string & inStr)
{
int n = 1, total = 0, found = 0;
char c;
char temp[1024*1024];
// Keep reading up to a '\n'
while (!found) {
n = recv(sock, &temp[total], sizeof(temp) - total - 1, 0);
if (n == -1) {
/* Error, check 'errno' for more details */
break;
}
total += n;
temp[total] = '\0';
found = (strchr(temp, '\n') != 0);
}
inStr = temp;
}
My question: Is it OK to loop over recv() until one of those conditions is met? What if a client sends a bogus message length or no delimiter or there is packet loss? Wont I be stuck looping recv() in my program forever?
Is it OK to loop over recv() until one of those conditions is met?
Probably not, at least not for production-quality code. As you suggested, the problem with looping until you get the full message is that it leaves your thread at the mercy of the client -- if a client decides to only send part of the message and then wait for a long time (or even forever) without sending the last part, then your thread will be blocked (or looping) indefinitely and unable to serve any other purpose -- usually not what you want.
What if a client sends a bogus message length
Then you're in trouble (although if you've chosen a maximum-message-size you can detect obviously bogus message-lengths that are larger than that size, and defend yourself by e.g. forcibly closing the connection)
or there is packet loss?
If there is a reasonably small amount of packet loss, the TCP layer will automatically retransmit the data, so your program won't notice the difference (other than the message officially "arriving" a bit later than it otherwise would have). If there is really bad packet loss (e.g. someone pulled the Ethernet cable out of the wall for 5 minutes), then the rest of the message might be delayed for several minutes or more (until connectivity recovers, or the TCP layer gives up and closes the TCP connection), trapping your thread in the loop.
So what is the industrial-grade, evil-client-and-awful-network-proof solution to this dilemma, so that your server can remain responsive to other clients even when a particular client is not behaving itself?
The answer is this: don't depend on receiving the entire message all at once. Instead, you need to set up a simple state-machine for each client, such that you can recv() as many (or as few) bytes from that client's TCP socket as it cares to send to you at any particular time, and save those bytes to a local (per-client) buffer that is associated with that client, and then go back to your normal event loop even though you haven't received the entire message yet. Keep careful track of how many valid received-bytes-of-data you currently have on-hand from each client, and after each recv() call has returned, check to see if the associated per-client incoming-data-buffer contains an entire message yet, or not -- if it does, parse the message, act on it, then remove it from the buffer. Lather, rinse, and repeat.

SSL_read blocks indefinitely

I am trying to read data off an Openssl linked socket using SSL_read. I perform Openssl operations in client mode that sends command and receives data from a real-world server. I used two threads where one thread handles all Openssl operations like connect, write and close. I perform the SSL_read in a separate thread. I am able to read data properly when I issue SSL_read once.
But I ran into problems when I tried to perform multiple connect, write, close sequences. Ideally I should terminate the thread performing the SSL_read in response to close. This is because for the next connect we would get a new ssl pointer and so we do not want to perform read on old ssl pointer. But problem is when I do SSL_read, I am stuck until there is data available in SSL buffer. It gets blocked on the SSL pointer, even when I have closed the SSL connection in the other thread.
while(1) {
memset(sbuf, 0, sizeof(uint8_t) * TLS_READ_RCVBUF_MAX_LEN);
read_data_len = SSL_read(con, sbuf, TLS_READ_RCVBUF_MAX_LEN);
switch (SSL_get_error(con, read)) {
case SSL_ERROR_NONE:
.
.
.
}
I tried all possible solutions to the problem but non works. Mostly I tried indication for letting me know there might be data in SSL buffer, but none of it returns proper indication.
I tried:
- Doing SSL_pending first to know if there is data in SSL buffer. But this always returns zero
- Doing select on the Openssl socket to see if it returns value bigger than zero. But it always returns zero.
- Making the socket as non-blocking and trying the select, but it doesnt seem to work. I am not sure if I got the code properly.
An example of where I used select for blocking socket is as follows. But select always returns zero.
while(1) {
// The use of Select here is to timeout
// while waiting for data to read on SSL.
// The timeout is set to 1 second
i = select(width, &readfds, NULL,
NULL, &tv);
if (i < 0) {
// Select Error. Take appropriate action for this error
}
// Check if there is data to be read
if (i > 0) {
if (FD_ISSET(SSL_get_fd(con), &readfds)) {
// TODO: We have data in the SSL buffer. But are we
// sure that the data is from read buffer? If not,
// SSL_read can be stuck indefinitely.
// Maybe we can do SSL_read(con, sbuf, 0) followed
// by SSL_pending to find out?
memset(sbuf, 0, sizeof(uint8_t) * TLS_READ_RCVBUF_MAX_LEN);
read_data_len = SSL_read(con, sbuf, TLS_READ_RCVBUF_MAX_LEN);
error = SSL_get_error(con, read_data_len);
switch (error) {
.
.
}
So as you can see I have tried number of ways to get the thread performing SSL_read to terminate in response to close, but I didnt get it to work as I expected. Did anybody get to make SSL_read work properly? Is non-blocking socket only solution to my problem? For blocking socket how do you solve the problem of quitting from SSL_read if you never get a response for command? Can you give an example of working solution for non blocking socket with read?
I can point you to a working example of non-blocking client socket with SSL ... https://github.com/darrenjs/openssl_examples
It uses non-blocking sockets with standard linux IO (based on poll event loop). Raw data is read from the socket and then fed into SSL memory BIO's, which then perform the decryption.
The approach I used was single threaded. A single thread performs the connect, write, and read. This means there cannot be any problems associated with one thread closing a socket, while another thread is trying to use that socket. Also, as noted by the SSL FAQ, "an SSL connection cannot be used concurrently by multiple threads" (https://www.openssl.org/docs/faq.html#PROG1), so single threaded approach avoids problems with concurrent SSL write & read.
The challenge with single threaded approach is that you then need to create some kind of synchronized queue & signalling mechanism for submitting and holding data pending for outbound (eg, the commands that you want to send from client to server), and get the socket event loop to detect when there is data pending for write and pull it from the queue etc. For that I would would look at standard std::list, std::mutex etc, and either pipe2 or eventfd for signalling the event loop.
OpenSSL calls recv() which in turn obeys the SOCKET's timeout, which by default is infinite. You can change the timeout thusly:
void socket_timeout_receive_set(SOCKET handle, dword milliseconds)
{
if(handle==SOCKET_HANDLE_NULL)
return;
struct timeval tv = { long(milliseconds / 1000), (milliseconds % 1000) * 1000 };
setsockopt(handle, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv, sizeof(tv));
}
Unfortunately, ssl_error_get() returns SSL_ERROR_SYSCALL which it returns in other situations too, so it's not easy to determine that it timed out. But this function will help you determine if the connection is lost:
bool socket_dropped(SOCKET handle)
{
// Special thanks: "Detecting and terminating aborted TCP/IP connections" by Vinayak Gadkari
if(handle==SOCKET_HANDLE_NULL)
return true;
// create a socket set containing just this socket
fd_set socket_set;
FD_ZERO(&socket_set);
FD_SET(handle, &socket_set);
// if the connection is unreadable, it is not dropped (strange but true)
static struct timeval timeout = { 0, 0 };
int count = select(0, &socket_set, NULL, NULL, &timeout);
if(count <= 0) {
// problem: count==0 on a connection that was cut off ungracefully, presumably by a busy router
// for connections that are open for a long time but may not talk much, call keepalive_set()
return false;
}
if(!FD_ISSET(handle, &socket_set)) // creates a dependency on __WSAFDIsSet()
return false;
// peek at the next character
// recv() returns 0 if the connection was dropped
char dummy;
count = recv(handle, &dummy, 1, MSG_PEEK);
if(count > 0)
return false;
if(count==0)
return true;
return sec==WSAECONNRESET || sec==WSAECONNABORTED || sec==WSAENETRESET || sec==WSAEINVAL;
}

Using select and recv to obtain a file from a web server through a socket

I'm having trouble receiving "large" files from a web server using C sockets; namely when these files (or so I suspect) are larger than the size of the buffer I'm using to receive them. If I attempt to ask (through a GET request) for a simple index.html that's not bigger than a few bytes, I get it fine, but anything else fails. I'm assuming that my lack of knowledge on select() or recv() is what's failing me. See here:
fd_set read_fd_set;
FD_ZERO(&read_fd_set);
FD_SET((unsigned int)socketId, &read_fd_set);
/* Initialize the timeout data structure. */
struct timeval timeout;
timeout.tv_sec = 2;
timeout.tv_usec = 0;
// Receives reply from the server
int headerReceived = 0;
do {
select(socketId+1, &read_fd_set, NULL, NULL, &timeout);
if (!(FD_ISSET(socketId, &read_fd_set))) {
break;
}
byteSize = recv(socketId, buffer, sizeof buffer, 0);
if (byteSize == 0 || (byteSize < BUFFER_SIZE && headerReceived)) {
break;
}
headerReceived = 1;
} while(1);
That's right, after sending the GET request to the web server, which I'm pretty sure the server is getting just fine, and GET requests from any other client (like any web browser) are working as intended.
Thanks in advance, any help is greatly appreciated.
if (byteSize == 0 || (byteSize < BUFFER_SIZE && headerReceived))
{
break;
}
headerReceived is set to true after the first read. It is entirely possible and likely subsequent recv()s will be less than BUFFER_SIZE. You are out of the read loop at that point. Recv() is going to return whatever number of bytes there are to read, not necessarily how many you request.
Also either stick with BUFFER_SIZE or sizeof(buffer). Mixing and matching is just asking for a bug somewhere down the road.
One thing that I spot is that you don't reinitialize the selection during the loop. This is probably why you get small files successfully; they are received in one go and the loop doesn't have to be iterated.
I suggest you put the:
FD_ZERO(&read_fd_set);
FD_SET((unsigned int)socketId, &read_fd_set);
timeout.tv_sec = 2;
timeout.tv_usec = 0;
inside the loop (before you invoke select), and it might just work.
You did not say what O/S you are using, but according to the POSIX spec:
Upon successful completion, the select() function may modify the
object pointed to by the timeout argument.
(And I believe Linux, for example, does precisely this.)
So it is very possible that later invocations of your loop have the timeout set to zero, which will cause select to return immediately with no descriptors ready.
I would suggest re-initializing the timeout structure immediately before calling select every time through the loop.

Network receipt timer to ms resolution

My scenario, I'm collecting network packets and if packets match a network filter I want to record the time difference between consecutive packets, this last part is the part that doesn't work. My problem is that I cant get accurate sub-second measurements no matter what C timer function I use. I've tried: gettimeofday(), clock_gettime(), and clock().
I'm looking for assistance to figure out why my timing code isn't working properly.
I'm running on a cygwin environment.
Compile Options: gcc -Wall capture.c -o capture -lwpcap -lrt
Code snippet :
/*globals*/
int first_time = 0;
struct timespec start, end;
double sec_diff = 0;
main() {
pcap_t *adhandle;
const struct pcap_pkthdr header;
const u_char *packet;
int sockfd = socket(PF_INET, SOCK_STREAM, 0);
.... (previous I create socket/connect - works fine)
save_attr = tty_set_raw();
while (1) {
packet = pcap_next(adhandle, &header); // Receive a packet? Process it
if (packet != NULL) {
got_packet(&header, packet, adhandle);
}
if (linux_kbhit()) { // User types message to channel
kb_char = linux_getch(); // Get user-supplied character
if (kb_char == 0x03) // Stop loop (exit channel) if user hits Ctrl+C
break;
}
}
tty_restore(save_attr);
close(sockfd);
pcap_close(adhandle);
printf("\nCapture complete.\n");
}
In got_packet:
got_packet(const struct pcap_pkthdr *header, const u_char *packet, pcap_t * p){ ... {
....do some packet filtering to only handle my packets, set match = 1
if (match == 1) {
if (first_time == 0) {
clock_gettime( CLOCK_MONOTONIC, &start );
first_time++;
}
else {
clock_gettime( CLOCK_MONOTONIC, &end );
sec_diff = (end.tv_sec - start.tv_sec) + ((end.tv_nsec - start.tv_nsec)/1000000000.0); // Packet difference in seconds
printf("sec_diff: %ld,\tstart_nsec: %ld,\tend_nsec: %ld\n", (end.tv_sec - start.tv_sec), start.tv_nsec, end.tv_nsec);
printf("sec_diffcalc: %ld,\tstart_sec: %ld,\tend_sec: %ld\n", sec_diff, start.tv_sec, end.tv_sec);
start = end; // Set the current to the start for next match
}
}
}
I record all packets with Wireshark to compare, so I expect the difference in my timer to be the same as Wireshark's, however that is never the case. My output for tv_sec will be correct, however tv_nsec is not even close. Say there is a 0.5 second difference in wireshark, my timer will say there is a 1.999989728 second difference.
Basically, you will want to use a timer with a higher resolution
Also, I did not check in libpcap, but I am pretty sure that libpcap can give you the time at which each packet was received. In which case, it will be closest that you can get to what Wireshark displays.
I don't think that it is the clocks that are your problem, but the way that you are waiting on new data. You should use a polling function to see when you have new data from either the socket or from the keyboard. This will allow your program to sleep when there is no new data for it to process. This is likely to make the operating system be nicer to your program when it does have data to process and schedule it quicker. This also allows you to quit the program without having to wait for the next packet to come in. Alternately you could attempt to run your program at really high or real time priority.
You should consider getting the current time at the first instance after you get a packet if the filtering can take very long. You may also want to consider multiple threads for this program if you are trying to capture data on a fast and busy network. Especially if you have more than one processor, but since you are doing some pritnfs which may block. I noticed you had a function to set a tty to raw mode, which I assume is the standard output tty. If you are actually using a serial terminal that could slow things down a lot, but standard out to a xterm can also be slow. You may want to consider setting stdout to fully buffered rather than line buffered. This should speed up the output. (man setvbuf)

Resources