Is there a size limit of write() for a socket fd? - c

I am writing a little web server which involves epoll and multithread. For small and short http/1.1 requests and responses, it works as expected. But when working with large size file downloads, it is always interrupted by the timer I devised. I expire the timers with a fixed timeout value, but I also have a if statement to check if the response was sent successfully.
static void
_expire_timers(list_t *timers, long timeout)
{
httpconn_t *conn;
int sockfd;
node_t *timer;
long cur_time;
long stamp;
timer = list_first(timers);
if (timer) {
cur_time = mstime();
do {
stamp = list_node_stamp(timer);
conn = (httpconn_t *)list_node_data(timer);
if ((cur_time - stamp >= timeout) && httpconn_close(conn)) {
sockfd = httpconn_sockfd(conn);
DEBSI("[CONN] socket closed, server disconnected", sockfd);
close(sockfd);
list_del(timers, stamp);
}
timer = list_next(timers);
} while (timer);
}
}
I realized that in a non-blocking environment, the write() function might be interrupted during the request-response communication. I wonder how long write() can hold or how much data write() can send, so I can tweek the timout setting in my code.
This is the code which involves write(),
void
http_rep_get(int clifd, void *cache, char *path, void *req)
{
httpmsg_t *rep;
int len_msg;
char *bytes;
rep = _get_rep_msg((list_t *)cache, path, req);
bytes = msg_create_rep(rep, &len_msg);
/* send msg */
DEBSI("[REP] Sending reply msg...", clifd);
write(clifd, bytes, len_msg);
/* send body */
DEBSI("[REP] Sending body...", clifd);
write(clifd, msg_body_start(rep), msg_body_len(rep));
free(bytes);
msg_destroy(rep, 0);
}
And the following is the epoll loop I use to process the incoming requests,
do {
nevents = epoll_wait(epfd, events, MAXEVENTS, HTTP_KEEPALIVE_TIME);
if (nevents == -1) perror("epoll_wait()");
/* expire the timers */
_expire_timers(timers, HTTP_KEEPALIVE_TIME);
/* loop through events */
for (i = 0; i < nevents; i++) {
conn = (httpconn_t *)events[i].data.ptr;
sockfd = httpconn_sockfd(conn);
/* error case */
if ((events[i].events & EPOLLERR) || (events[i].events & EPOLLHUP) ||
(!(events[i].events & EPOLLIN))) {
perror("EPOLL ERR|HUP");
list_update(timers, conn, mstime());
break;
}
else if (sockfd == srvfd) {
_receive_conn(srvfd, epfd, cache, timers);
}
else {
/* client socket; read client data and process it */
thpool_add_task(taskpool, httpconn_task, conn);
}
}
} while (svc_running);
The http_rep_get() is executed by the threadpool handler httpconn_task(), HTTP_KEEPALIVE_TIME is the fixed timeout. The handler httpconn_task() will add a timer to the timers once a request arrives. Since the write() is executed in http_rep_get(), I think it might be interrupted by the timers. I guess I can change the way to write to the clients, but I need to make sure how much the write() can do.
If you are interested, you may browser my project to help me with this.
https://github.com/grassroot72/Maestro
Cheers,
Edward

Is there a size limit of write() for a socket fd?
It depends on what you mean by a limit.
As the comments explain, a write call may write fewer bytes than you ask it to. Furthermore, this is expected behavior if you perform a large write to a socket. However, there is no reliable way to determine (or predict) how many bytes will be written before you call write.
The correct way to deal with this is to check how many bytes were actually written each time, and use a loop for ensure that all bytes are written (or until you get a failure).

Related

Read chardevice with libevent

I wrote a chardevice that passes some messages received from the network to an user space application. The user space application has to both read the chardevice and send/receive messages via TCP sockets to other user-space applications. Both read and receiving should be blocking.
Since Libevent is able to handle multiple events at the same time, I thought registering an event for the file created by the chardevice and an event for a socket would just work, but I was wrong.
But a chardevice creates a "character special file", and libevent seems to not be able to block. If I implement a blocking mechanism inside the chardevice, i.e. mutex or semaphore, then the socket event blocks too, and the application cannot receive messages.
The user space application has to accept outside connections at any time.
Do you know how to make it work? Maybe also using another library, I just want a blocking behaviour for both socket and file reader.
Thank you in advance.
Update: Thanks to #Ahmed Masud for the help. This is what I've done
Kernel module chardevice:
Implement a poll function that waits until new data is available
struct file_operations fops = {
...
.read = kdev_read,
.poll = kdev_poll,
};
I have a global variable to handle if the user space has to stop, and a wait queue:
static working = 1;
static wait_queue_head_t access_wait;
This is the read function, I return -1 if there is an error in copy_to_user, > 0 if everything went well, and 0 if the module has to stop. used_buff is atomic since it handles the size of a buffer shared read by user application and written by kernel module.
ssize_t
kdev_read(struct file* filep, char* buffer, size_t len, loff_t* offset)
{
int error_count;
if (signal_pending(current) || !working) { // user called sigint
return 0;
}
atomic_dec(&used_buf);
size_t llen = sizeof(struct user_msg) + msg_buf[first_buf]->size;
error_count = copy_to_user(buffer, (char*)msg_buf[first_buf], llen);
if (error_count != 0) {
atomic_inc(&used_buf);
paxerr("send fewer characters to the user");
return error_count;
} else
first_buf = (first_buf + 1) % BUFFER_SIZE;
return llen;
}
When there is data to read, I simply increment used_buf and call wake_up_interruptible(&access_wait).
This is the poll function, I just wait until the used_buff is > 0
unsigned int
kdev_poll(struct file* file, poll_table* wait)
{
poll_wait(file, &access_wait, wait);
if (atomic_read(&used_buf) > 0)
return POLLIN | POLLRDNORM;
return 0;
}
Now, the problem here is that if I unload the module while the user space application is waiting, the latter will go into a blocked state and it won't be possible to stop it. That's why I wake up the application when the module is unloaded
void
kdevchar_exit(void)
{
working = 0;
atomic_inc(&used_buf); // increase buffer size to application is unlocked
wake_up_interruptible(&access_wait); // wake up application, but this time read will return 0 since working = 0;
... // unregister everything
}
User space application
Libevent by default uses polling, so simply create an event_base and a reader event.
base = event_base_new();
filep = open(fname, O_RDWR | O_NONBLOCK, 0);
evread = event_new(base, filep, EV_READ | EV_PERSIST,
on_read_file, base);
where on_read_file simply reads the file, no poll call is made (libevent handles that):
static void
on_read_file(evutil_socket_t fd, short event, void* arg)
{
struct event_base* base = arg;
int len = read(...);
if (len < 0)
return;
if (len == 0) {
printf("Stopped by kernel module\n");
event_base_loopbreak(base);
return;
}
... // handle message
}

Why does TCP socket slow down if done in multiple system calls?

Why is the following code slow? And by slow I mean 100x-1000x slow. It just repeatedly performs read/write directly on a TCP socket. The curious part is that it remains slow only if I use two function calls for both read AND write as shown below. If I change either the server or the client code to use a single function call (as in the comments), it becomes super fast.
Code snippet:
int main(...) {
int sock = ...; // open TCP socket
int i;
char buf[100000];
for(i=0;i<2000;++i)
{ if(amServer)
{ write(sock,buf,10);
// read(sock,buf,20);
read(sock,buf,10);
read(sock,buf,10);
}else
{ read(sock,buf,10);
// write(sock,buf,20);
write(sock,buf,10);
write(sock,buf,10);
}
}
close(sock);
}
We stumbled on this in a larger program, that was actually using stdio buffering. It mysteriously became sluggish the moment payload size exceeded the buffer size by a small margin. Then I did some digging around with strace, and finally boiled the problem down to this. I can solve this by fooling around with buffering strategy, but I'd really like to know what on earth is going on here. On my machine, it goes from 0.030 s to over a minute on my machine (tested both locally and over remote machines) when I change the two read calls to a single call.
These tests were done on various Linux distros, and various kernel versions. Same result.
Fully runnable code with networking boilerplate:
#include <netdb.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <netinet/ip.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
static int getsockaddr(const char* name,const char* port, struct sockaddr* res)
{
struct addrinfo* list;
if(getaddrinfo(name,port,NULL,&list) < 0) return -1;
for(;list!=NULL && list->ai_family!=AF_INET;list=list->ai_next);
if(!list) return -1;
memcpy(res,list->ai_addr,list->ai_addrlen);
freeaddrinfo(list);
return 0;
}
// used as sock=tcpConnect(...); ...; close(sock);
static int tcpConnect(struct sockaddr_in* sa)
{
int outsock;
if((outsock=socket(AF_INET,SOCK_STREAM,0))<0) return -1;
if(connect(outsock,(struct sockaddr*)sa,sizeof(*sa))<0) return -1;
return outsock;
}
int tcpConnectTo(const char* server, const char* port)
{
struct sockaddr_in sa;
if(getsockaddr(server,port,(struct sockaddr*)&sa)<0) return -1;
int sock=tcpConnect(&sa); if(sock<0) return -1;
return sock;
}
int tcpListenAny(const char* portn)
{
in_port_t port;
int outsock;
if(sscanf(portn,"%hu",&port)<1) return -1;
if((outsock=socket(AF_INET,SOCK_STREAM,0))<0) return -1;
int reuse = 1;
if(setsockopt(outsock,SOL_SOCKET,SO_REUSEADDR,
(const char*)&reuse,sizeof(reuse))<0) return fprintf(stderr,"setsockopt() failed\n"),-1;
struct sockaddr_in sa = { .sin_family=AF_INET, .sin_port=htons(port)
, .sin_addr={INADDR_ANY} };
if(bind(outsock,(struct sockaddr*)&sa,sizeof(sa))<0) return fprintf(stderr,"Bind failed\n"),-1;
if(listen(outsock,SOMAXCONN)<0) return fprintf(stderr,"Listen failed\n"),-1;
return outsock;
}
int tcpAccept(const char* port)
{
int listenSock, sock;
listenSock = tcpListenAny(port);
if((sock=accept(listenSock,0,0))<0) return fprintf(stderr,"Accept failed\n"),-1;
close(listenSock);
return sock;
}
void writeLoop(int fd,const char* buf,size_t n)
{
// Don't even bother incrementing buffer pointer
while(n) n-=write(fd,buf,n);
}
void readLoop(int fd,char* buf,size_t n)
{
while(n) n-=read(fd,buf,n);
}
int main(int argc,char* argv[])
{
if(argc<3)
{ fprintf(stderr,"Usage: round {server_addr|--} port\n");
return -1;
}
bool amServer = (strcmp("--",argv[1])==0);
int sock;
if(amServer) sock=tcpAccept(argv[2]);
else sock=tcpConnectTo(argv[1],argv[2]);
if(sock<0) { fprintf(stderr,"Connection failed\n"); return -1; }
int i;
char buf[100000] = { 0 };
for(i=0;i<4000;++i)
{
if(amServer)
{ writeLoop(sock,buf,10);
readLoop(sock,buf,20);
//readLoop(sock,buf,10);
//readLoop(sock,buf,10);
}else
{ readLoop(sock,buf,10);
writeLoop(sock,buf,20);
//writeLoop(sock,buf,10);
//writeLoop(sock,buf,10);
}
}
close(sock);
return 0;
}
EDIT: This version is slightly different from the other snippet in that it reads/writes in a loop. So in this version, two separate writes automatically causes two separate read() calls, even if readLoop is called only once. But otherwise the problem still remains.
Interesting. You are being a victim of the Nagle's algorithm together with TCP delayed acknowledgements.
The Nagle's algorithm is a mechanism used in TCP to defer transmission of small segments until enough data has been accumulated that makes it worth building and sending a segment over the network. From the wikipedia article:
Nagle's algorithm works by combining a number of small outgoing
messages, and sending them all at once. Specifically, as long as there
is a sent packet for which the sender has received no acknowledgment,
the sender should keep buffering its output until it has a full
packet's worth of output, so that output can be sent all at once.
However, TCP typically employs something known as TCP delayed acknowledgements, which is a technique that consists of accumulating together a batch of ACK replies (because TCP uses cumulative ACKS), to reduce network traffic.
That wikipedia article further mentions this:
With both algorithms enabled, applications that do two successive
writes to a TCP connection, followed by a read that will not be
fulfilled until after the data from the second write has reached the
destination, experience a constant delay of up to 500 milliseconds,
the "ACK delay".
(Emphasis mine)
In your specific case, since the server doesn't send more data before reading the reply, the client is causing the delay: if the client writes twice, the second write will be delayed.
If Nagle's algorithm is being used by the sending party, data will be
queued by the sender until an ACK is received. If the sender does not
send enough data to fill the maximum segment size (for example, if it
performs two small writes followed by a blocking read) then the
transfer will pause up to the ACK delay timeout.
So, when the client makes 2 write calls, this is what happens:
Client issues the first write.
The server receives some data. It doesn't acknowledge it in the hope that more data will arrive (so it can batch up a bunch of ACKs in one single ACK).
Client issues the second write. The previous write has not been acknowledged, so Nagle's algorithm defers transmission until more data arrives (until enough data has been collected to make a segment) or the previous write is ACKed.
Server is tired of waiting and after 500 ms acknowledges the segment.
Client finally completes the 2nd write.
With 1 write, this is what happens:
Client issues the first write.
The server receives some data. It doesn't acknowledge it in the hope that more data will arrive (so it can batch up a bunch of ACKs in one single ACK).
The server writes to the socket. An ACK is part of the TCP header, so if you're writing, you might as well acknowledge the previous segment at no extra cost. Do it.
Meanwhile, the client wrote once, so it was already waiting on the next read - there was no 2nd write waiting for the server's ACK.
If you want to keep writing twice on the client side, you need to disable the Nagle's algorithm. This is the solution proposed by the algorithm author himself:
The user-level solution is to avoid write-write-read sequences on
sockets. write-read-write-read is fine. write-write-write is fine. But
write-write-read is a killer. So, if you can, buffer up your little
writes to TCP and send them all at once. Using the standard UNIX I/O
package and flushing write before each read usually works.
(See the citation on Wikipedia)
As mentioned by David Schwartz in the comments, this may not be the greatest idea for various reasons, but it illustrates the point and shows that this is indeed causing the delay.
To disable it, you need to set the TCP_NODELAY option on the sockets with setsockopt(2).
This can be done in tcpConnectTo() for the client:
int tcpConnectTo(const char* server, const char* port)
{
struct sockaddr_in sa;
if(getsockaddr(server,port,(struct sockaddr*)&sa)<0) return -1;
int sock=tcpConnect(&sa); if(sock<0) return -1;
int val = 1;
if (setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &val, sizeof(val)) < 0)
perror("setsockopt(2) error");
return sock;
}
And in tcpAccept() for the server:
int tcpAccept(const char* port)
{
int listenSock, sock;
listenSock = tcpListenAny(port);
if((sock=accept(listenSock,0,0))<0) return fprintf(stderr,"Accept failed\n"),-1;
close(listenSock);
int val = 1;
if (setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &val, sizeof(val)) < 0)
perror("setsockopt(2) error");
return sock;
}
It's interesting to see the huge difference this makes.
If you'd rather not mess with the socket options, it's enough to ensure that the client writes once - and only once - before the next read. You can still have the server read twice:
for(i=0;i<4000;++i)
{
if(amServer)
{ writeLoop(sock,buf,10);
//readLoop(sock,buf,20);
readLoop(sock,buf,10);
readLoop(sock,buf,10);
}else
{ readLoop(sock,buf,10);
writeLoop(sock,buf,20);
//writeLoop(sock,buf,10);
//writeLoop(sock,buf,10);
}
}

close() is not closing socket properly

I have a multi-threaded server (thread pool) that is handling a large number of requests (up to 500/sec for one node), using 20 threads. There's a listener thread that accepts incoming connections and queues them for the handler threads to process. Once the response is ready, the threads then write out to the client and close the socket. All seemed to be fine until recently, a test client program started hanging randomly after reading the response. After a lot of digging, it seems that the close() from the server is not actually disconnecting the socket. I've added some debugging prints to the code with the file descriptor number and I get this type of output.
Processing request for 21
Writing to 21
Closing 21
The return value of close() is 0, or there would be another debug statement printed. After this output with a client that hangs, lsof is showing an established connection.
SERVER 8160 root 21u IPv4 32754237 TCP localhost:9980->localhost:47530 (ESTABLISHED)
CLIENT 17747 root 12u IPv4 32754228 TCP localhost:47530->localhost:9980 (ESTABLISHED)
It's as if the server never sends the shutdown sequence to the client, and this state hangs until the client is killed, leaving the server side in a close wait state
SERVER 8160 root 21u IPv4 32754237 TCP localhost:9980->localhost:47530 (CLOSE_WAIT)
Also if the client has a timeout specified, it will timeout instead of hanging. I can also manually run
call close(21)
in the server from gdb, and the client will then disconnect. This happens maybe once in 50,000 requests, but might not happen for extended periods.
Linux version: 2.6.21.7-2.fc8xen
Centos version: 5.4 (Final)
socket actions are as follows
SERVER:
int client_socket;
struct sockaddr_in client_addr;
socklen_t client_len = sizeof(client_addr);
while(true) {
client_socket = accept(incoming_socket, (struct sockaddr *)&client_addr, &client_len);
if (client_socket == -1)
continue;
/* insert into queue here for threads to process */
}
Then the thread picks up the socket and builds the response.
/* get client_socket from queue */
/* processing request here */
/* now set to blocking for write; was previously set to non-blocking for reading */
int flags = fcntl(client_socket, F_GETFL);
if (flags < 0)
abort();
if (fcntl(client_socket, F_SETFL, flags|O_NONBLOCK) < 0)
abort();
server_write(client_socket, response_buf, response_length);
server_close(client_socket);
server_write and server_close.
void server_write( int fd, char const *buf, ssize_t len ) {
printf("Writing to %d\n", fd);
while(len > 0) {
ssize_t n = write(fd, buf, len);
if(n <= 0)
return;// I don't really care what error happened, we'll just drop the connection
len -= n;
buf += n;
}
}
void server_close( int fd ) {
for(uint32_t i=0; i<10; i++) {
int n = close(fd);
if(!n) {//closed successfully
return;
}
usleep(100);
}
printf("Close failed for %d\n", fd);
}
CLIENT:
Client side is using libcurl v 7.27.0
CURL *curl = curl_easy_init();
CURLcode res;
curl_easy_setopt( curl, CURLOPT_URL, url);
curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, write_callback );
curl_easy_setopt( curl, CURLOPT_WRITEDATA, write_tag );
res = curl_easy_perform(curl);
Nothing fancy, just a basic curl connection. Client hangs in tranfer.c (in libcurl) because the socket is not perceived as being closed. It's waiting for more data from the server.
Things I've tried so far:
Shutdown before close
shutdown(fd, SHUT_WR);
char buf[64];
while(read(fd, buf, 64) > 0);
/* then close */
Setting SO_LINGER to close forcibly in 1 second
struct linger l;
l.l_onoff = 1;
l.l_linger = 1;
if (setsockopt(client_socket, SOL_SOCKET, SO_LINGER, &l, sizeof(l)) == -1)
abort();
These have made no difference. Any ideas would be greatly appreciated.
EDIT -- This ended up being a thread-safety issue inside a queue library causing the socket to be handled inappropriately by multiple threads.
Here is some code I've used on many Unix-like systems (e.g SunOS 4, SGI IRIX, HPUX 10.20, CentOS 5, Cygwin) to close a socket:
int getSO_ERROR(int fd) {
int err = 1;
socklen_t len = sizeof err;
if (-1 == getsockopt(fd, SOL_SOCKET, SO_ERROR, (char *)&err, &len))
FatalError("getSO_ERROR");
if (err)
errno = err; // set errno to the socket SO_ERROR
return err;
}
void closeSocket(int fd) { // *not* the Windows closesocket()
if (fd >= 0) {
getSO_ERROR(fd); // first clear any errors, which can cause close to fail
if (shutdown(fd, SHUT_RDWR) < 0) // secondly, terminate the 'reliable' delivery
if (errno != ENOTCONN && errno != EINVAL) // SGI causes EINVAL
Perror("shutdown");
if (close(fd) < 0) // finally call close()
Perror("close");
}
}
But the above does not guarantee that any buffered writes are sent.
Graceful close: It took me about 10 years to figure out how to close a socket. But for another 10 years I just lazily called usleep(20000) for a slight delay to 'ensure' that the write buffer was flushed before the close. This obviously is not very clever, because:
The delay was too long most of the time.
The delay was too short some of the time--maybe!
A signal such SIGCHLD could occur to end usleep() (but I usually called usleep() twice to handle this case--a hack).
There was no indication whether this works. But this is perhaps not important if a) hard resets are perfectly ok, and/or b) you have control over both sides of the link.
But doing a proper flush is surprisingly hard. Using SO_LINGER is apparently not the way to go; see for example:
http://msdn.microsoft.com/en-us/library/ms740481%28v=vs.85%29.aspx
https://www.google.ca/#q=the-ultimate-so_linger-page
And SIOCOUTQ appears to be Linux-specific.
Note shutdown(fd, SHUT_WR) doesn't stop writing, contrary to its name, and maybe contrary to man 2 shutdown.
This code flushSocketBeforeClose() waits until a read of zero bytes, or until the timer expires. The function haveInput() is a simple wrapper for select(2), and is set to block for up to 1/100th of a second.
bool haveInput(int fd, double timeout) {
int status;
fd_set fds;
struct timeval tv;
FD_ZERO(&fds);
FD_SET(fd, &fds);
tv.tv_sec = (long)timeout; // cast needed for C++
tv.tv_usec = (long)((timeout - tv.tv_sec) * 1000000); // 'suseconds_t'
while (1) {
if (!(status = select(fd + 1, &fds, 0, 0, &tv)))
return FALSE;
else if (status > 0 && FD_ISSET(fd, &fds))
return TRUE;
else if (status > 0)
FatalError("I am confused");
else if (errno != EINTR)
FatalError("select"); // tbd EBADF: man page "an error has occurred"
}
}
bool flushSocketBeforeClose(int fd, double timeout) {
const double start = getWallTimeEpoch();
char discard[99];
ASSERT(SHUT_WR == 1);
if (shutdown(fd, 1) != -1)
while (getWallTimeEpoch() < start + timeout)
while (haveInput(fd, 0.01)) // can block for 0.01 secs
if (!read(fd, discard, sizeof discard))
return TRUE; // success!
return FALSE;
}
Example of use:
if (!flushSocketBeforeClose(fd, 2.0)) // can block for 2s
printf("Warning: Cannot gracefully close socket\n");
closeSocket(fd);
In the above, my getWallTimeEpoch() is similar to time(), and Perror() is a wrapper for perror().
Edit: Some comments:
My first admission is a bit embarrassing. The OP and Nemo challenged the need to clear the internal so_error before close, but I cannot now find any reference for this. The system in question was HPUX 10.20. After a failed connect(), just calling close() did not release the file descriptor, because the system wished to deliver an outstanding error to me. But I, like most people, never bothered to check the return value of close. So I eventually ran out of file descriptors (ulimit -n), which finally got my attention.
(very minor point) One commentator objected to the hard-coded numerical arguments to shutdown(), rather than e.g. SHUT_WR for 1. The simplest answer is that Windows uses different #defines/enums e.g. SD_SEND. And many other writers (e.g. Beej) use constants, as do many legacy systems.
Also, I always, always, set FD_CLOEXEC on all my sockets, since in my applications I never want them passed to a child and, more importantly, I don't want a hung child to impact me.
Sample code to set CLOEXEC:
static void setFD_CLOEXEC(int fd) {
int status = fcntl(fd, F_GETFD, 0);
if (status >= 0)
status = fcntl(fd, F_SETFD, status | FD_CLOEXEC);
if (status < 0)
Perror("Error getting/setting socket FD_CLOEXEC flags");
}
Great answer from Joseph Quinsey. I have comments on the haveInput function. Wondering how likely it is that select returns an fd you did not include in your set. This would be a major OS bug IMHO. That's the kind of thing I would check if I wrote unit tests for the select function, not in an ordinary app.
if (!(status = select(fd + 1, &fds, 0, 0, &tv)))
return FALSE;
else if (status > 0 && FD_ISSET(fd, &fds))
return TRUE;
else if (status > 0)
FatalError("I am confused"); // <--- fd unknown to function
My other comment pertains to the handling of EINTR. In theory, you could get stuck in an infinite loop if select kept returning EINTR, as this error lets the loop start over. Given the very short timeout (0.01), it appears highly unlikely to happen. However, I think the appropriate way of dealing with this would be to return errors to the caller (flushSocketBeforeClose). The caller can keep calling haveInput has long as its timeout hasn't expired, and declare failure for other errors.
ADDITION #1
flushSocketBeforeClose will not exit quickly in case of read returning an error. It will keep looping until the timeout expires. You can't rely on the select inside haveInput to anticipate all errors. read has errors of its own (ex: EIO).
while (haveInput(fd, 0.01))
if (!read(fd, discard, sizeof discard)) <-- -1 does not end loop
return TRUE;
This sounds to me like a bug in your Linux distribution.
The GNU C library documentation says:
When you have finished using a socket, you can simply close its file
descriptor with close
Nothing about clearing any error flags or waiting for the data to be flushed or any such thing.
Your code is fine; your O/S has a bug.
include:
#include <unistd.h>
this should help solve the close(); problem

how to read and write data on serial port using threads

I am creating a serial port application in which i am creating two threads one is WRITER THREAD which will write data to serial port and a READER THREAD which will read data from serial port.I know how to open, configure,read and write data on serial port but how to do it using threads.
I am using LINUX(ubuntu) and trying to open ttyS0 port programming in C.
The way I have done this in the past is to set up the port for asynchronous I/O using a VMIN of 0 and a VTIME of, say, 5 deciseconds. The purpose of this was to allow the thread to notice when it was time for the application to shut down, as it could try to read, time out, check for a quit flag, and then try to read some more.
Here is an example read function:
size_t myread(char *buf, size_t len) {
size_t total = 0;
while (len > 0) {
ssize_t bytes = read(fd, buf, len);
if (bytes == -1) {
if (errno != EAGAIN && errno != EINTR) {
// A real error, not something that trying again will fix
if (total > 0) {
return total;
}
else {
return -1;
}
}
}
else if (bytes == 0) {
// EOF
return total;
}
else {
total += bytes;
buf += bytes;
len -= bytes;
}
}
return total;
}
The write function would look as you would expect.
In your setup function, make sure to set:
struct termios tios;
...
tios.c_cflag &= ~ICANON;
tios.c_cc[VMIN] = 0;
tios.c_cc[VTIME] = 5; // You may want to tweak this; 5 = 1/2 second, 10 = 1 second, ...
...
Using of a serial port from 2 threads is simple, if only one thread reads and other thread only writes.
You should use one file descriptor for the serial port.
Open and initialize it in one thread by using normal open, tcsetattr, etc functions.
Then deliver the file descriptor to the other thread(s).
Now the reader thread can use read() function, and the writer can use write() function without any extra synchronization. You can also use select() in both threads.
Closing of the file descriptor needs attention, you should do it in one thread for avoiding problems.

Flush kernel's TCP buffer for `MSG_MORE`-flagged packets

send()'s man page reveals the MSG_MORE flag which is asserted to act like TCP_CORK. I have a wrapper function around send():
int SocketConnection_Write(SocketConnection *this, void *buf, int len) {
errno = 0;
int sent = send(this->fd, buf, len, MSG_NOSIGNAL);
if (errno == EPIPE || errno == ENOTCONN) {
throw(exc, &SocketConnection_NotConnectedException);
} else if (errno == ECONNRESET) {
throw(exc, &SocketConnection_ConnectionResetException);
} else if (sent != len) {
throw(exc, &SocketConnection_LengthMismatchException);
}
return sent;
}
Assuming I want to use the kernel buffer, I could go with TCP_CORK, enable whenever it is necessary and then disable it to flush the buffer. But on the other hand, thereby the need for an additional system call arises. Thus, the usage of MSG_MORE seems more appropriate to me. I'd simply change the above send() line to:
int sent = send(this->fd, buf, len, MSG_NOSIGNAL | MSG_MORE);
According to lwm.net, packets will be flushed automatically if they are large enough:
If an application sets that option on
a socket, the kernel will not send out
short packets. Instead, it will wait
until enough data has shown up to fill
a maximum-size packet, then send it.
When TCP_CORK is turned off, any
remaining data will go out on the
wire.
But this section only refers to TCP_CORK. Now, what is the proper way to flush MSG_MORE packets?
I can only think of two possibilities:
Call send() with an empty buffer and without MSG_MORE being set
Re-apply the TCP_CORK option as described on this page
Unfortunately the whole topic is very poorly documented and I couldn't find much on the Internet.
I am also wondering how to check that everything works as expected? Obviously running the server through strace is not an option. So the simplest way would be to use netcat and then look at its strace output? Or will the kernel handle traffic transmitted over a loopback interface differently?
I have taken a look at the kernel source and both assumptions seem to be true. The following code are extracts from net/ipv4/tcp.c (2.6.33.1).
static inline void tcp_push(struct sock *sk, int flags, int mss_now,
int nonagle)
{
struct tcp_sock *tp = tcp_sk(sk);
if (tcp_send_head(sk)) {
struct sk_buff *skb = tcp_write_queue_tail(sk);
if (!(flags & MSG_MORE) || forced_push(tp))
tcp_mark_push(tp, skb);
tcp_mark_urg(tp, flags, skb);
__tcp_push_pending_frames(sk, mss_now,
(flags & MSG_MORE) ? TCP_NAGLE_CORK : nonagle);
}
}
Hence, if the flag is not set, the pending frames will definitely be flushed. But this is be only the case when the buffer is not empty:
static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffset,
size_t psize, int flags)
{
(...)
ssize_t copied;
(...)
copied = 0;
while (psize > 0) {
(...)
if (forced_push(tp)) {
tcp_mark_push(tp, skb);
__tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_PUSH);
} else if (skb == tcp_send_head(sk))
tcp_push_one(sk, mss_now);
continue;
wait_for_sndbuf:
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
wait_for_memory:
if (copied)
tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
goto do_error;
mss_now = tcp_send_mss(sk, &size_goal, flags);
}
out:
if (copied)
tcp_push(sk, flags, mss_now, tp->nonagle);
return copied;
do_error:
if (copied)
goto out;
out_err:
return sk_stream_error(sk, flags, err);
}
The while loop's body will never be executed because psize is not greater 0. Then, in the out section, there is another chance, tcp_push() gets called but because copied still has its default value, it will fail as well.
So sending a packet with the length 0 will never result in a flush.
The next theory was to re-apply TCP_CORK. Let's take a look at the code first:
static int do_tcp_setsockopt(struct sock *sk, int level,
int optname, char __user *optval, unsigned int optlen)
{
(...)
switch (optname) {
(...)
case TCP_NODELAY:
if (val) {
/* TCP_NODELAY is weaker than TCP_CORK, so that
* this option on corked socket is remembered, but
* it is not activated until cork is cleared.
*
* However, when TCP_NODELAY is set we make
* an explicit push, which overrides even TCP_CORK
* for currently queued segments.
*/
tp->nonagle |= TCP_NAGLE_OFF|TCP_NAGLE_PUSH;
tcp_push_pending_frames(sk);
} else {
tp->nonagle &= ~TCP_NAGLE_OFF;
}
break;
case TCP_CORK:
/* When set indicates to always queue non-full frames.
* Later the user clears this option and we transmit
* any pending partial frames in the queue. This is
* meant to be used alongside sendfile() to get properly
* filled frames when the user (for example) must write
* out headers with a write() call first and then use
* sendfile to send out the data parts.
*
* TCP_CORK can be set together with TCP_NODELAY and it is
* stronger than TCP_NODELAY.
*/
if (val) {
tp->nonagle |= TCP_NAGLE_CORK;
} else {
tp->nonagle &= ~TCP_NAGLE_CORK;
if (tp->nonagle&TCP_NAGLE_OFF)
tp->nonagle |= TCP_NAGLE_PUSH;
tcp_push_pending_frames(sk);
}
break;
(...)
As you can see, there are two ways to flush. You can either set TCP_NODELAY to 1 or TCP_CORK to 0. Luckily, both won't check whether the flag is already set. Thus, my initial plan to re-apply the TCP_CORK flag can be optimized to just disable it, even if it's currently not set.
I hope this helps someone with similar issues.
That's a lot of research... all I can offer is this empirical post note:
Sending a bunch of packet with MSG_MORE set, followed by a packet without MSG_MORE, the whole lot goes out. It works a treat for something like this:
for (i=0; i<mg_live.length; i++) {
// [...]
if ((n = pth_send(sock, query, len, MSG_MORE | MSG_NOSIGNAL)) < len) {
printf("error writing to socket (sent %i bytes of %i)\n", n, len);
exit(1);
}
}
}
pth_send(sock, "END\n", 4, MSG_NOSIGNAL);
That is, when you're sending out all the packets at once, and have a clearly defined end... AND you are only using one socket.
If you tried writing to another socket in the middle of the above loop, you may find that Linux releases the previously held packets. At least that appears to be the trouble I'm having right now. But it might be an easy solution for you.

Resources