linux c socket error: Input/output error - c

I've got a "Input/output error" error when I try to send data to a tcp server. What does this mean in terms of sockets? Its basically the same code I was used always worked fine. I was hoping someone could tell me what are the reasons of inpput/output error when I tried to send over a socket and how I could check/fix them. Any help is appreciated.
struct SOCKETSTUS {
int sendSockFd;
int recvSockFd;
short status;
long heartBeatSendTime;
long heartBeatRecTime;
long loginPackSendTime;
};
struct SOCKETSTUS sockArr[128];
if (tv.tv_sec - sockArr[i].heartBeatSendTime >= beatTim)
{
if (send(sockArr[i].sendSockFd, szBuffer, packetSize, 0) != packetSize)
{
fprintf(stderr, "Heartbeat package send failed:[%d][%s]\n", errno, strerror(errno));
if (errno == EBADF || errno == ECONNRESET || errno == ENOTCONN || errno == EPIPE)
{
Debug("link lose connection\n"); Reconn(i); continue;
}
}
else
{
sockArr[i].heartBeatSendTime = tv.tv_sec;
if (sockArr[i].status == SOCK_IN_FLY)
sockArr[i].heartBeatRecTime = tv.tv_sec;
}
}
The error occured in send() calls.

Your error check is incorrect. send() returns the number of bytes sent or -1 on error. You check only that the return value equals packetSize, not that the return value indicates error. Sometimes send() on a stream socket will return fewer bytes than requested.
So, some previous syscall (perhaps a harmlessly failed tty manipulation? a dodgy signal handler?) set errno to EIO.
Change your code to treat -1 different from a "short" send.

Related

How to know which case happens when nonblocking recv returns 0?

I have simple TCP server that runs with non-blocking sockets.
Quote from manpage of recv;
When a stream socket peer has performed an orderly shutdown, the return value will be 0 (the traditional "end-of-file" return).
The value 0 may also be returned if the requested number of bytes to receive from a stream socket was 0.
When socket is readable I read it with this code:
uint8_t buf[2048];
ssize_t rlen;
while(1){
rlen = recv(fd, buf, sizeof(buf), 0);
if(rlen < 0){
/* some error came, let's close socket... */
}
else if(rlen == 0){
/* is there no bytes to read? do we need break; in here? */
/* is socket closed by peer? do we need to close socket? */
}
/* some code that process buf and rlen... */
}
How we can know which case happens when recv returns 0?
When recv returns 0 it means that the socket has been gracefully closed by the other peer and can be closed from your side as well. When no data is present in socket, -1 is returned and errno is set to EAGAIN / ETIMEDOUT and the socket hasn't to be closed.
Finally, when -1 is returned and errno is set to a value different from EWOULDBLOCKor EAGAIN the socket has to be closed, because some unrecoverable error occurred.For non-blocking sockets it means that no data is immediately available when recv is called. For blocking sockets) it means tgat no data is available even after the timeout (SO_RCVTIMEO) previously set with setsockopt() expired.
As you correctly quoted in your last edit, 0 can be returned from recv also if the requested size is 0.
How we can know which case happens when recv returns 0?
Just test the provided recv size (in this case it is the size of a constant array, so it doesn't make much sense; but in case it is a variable coming from elsewhere...):
bufSize = sizeof(buf);
/* further code that, who knows, might affect bufSize */
rlen = read(fd, buf, bufSize);
if(rlen < 0){
if (errno != ETIMEDOUT && errno != EWOLUDBLOCK)
{
/* some error came, let's close socket... */
}
}
else if(rlen == 0){
if (bufSize != 0)
{
/* Close socket */
}
}

EAGAIN on a blocking read system call on regular file

So, this is a weird case that I am seeing sometimes and not able to figure out a reason.
We have a C program that reads from a regular file. And there are other processes which write into the same file. The application is based on the fact that the writes are atomic in Linux for write size up to 4096 bytes.
The file is NOT opened with non blocking flag, so my assumption is that reads would be blocking.
But sometimes during the startup, we see "Resource temporarily unavailable" error set in errno. And the size returned by read != -1 but some partially read size.
An error message would look something like:
2018-08-07T06:40:52.991141Z, Invalid message size, log_s.bin, fd 670, Resource temporarily unavailable, read size 285, expected size 525
My questions are:
Why are we getting EAGAIN on blocking file read?
Why is the return value not -1?
This happens only during the initial time when it is started. It works fine thereafter. What are some edge cases that can get us in such situation?
Why are we getting EAGAIN on blocking file read ?
You aren't (see below).
Why is the return value not -1 ?
Because the operation did not fail.
The value of errno only carries a sane value if the call to read() failed. A call to read() failed if and only if -1 is returned.
From the Linux man-page for read():
RETURN VALUE
On success, the number of bytes read is returned (zero indicates end
of file), and the file position is advanced by this number. It is
not an error if this number is smaller than the number of bytes
requested;
[...]
On error, -1 is returned, and errno is set appropriately.
A common pattern to read() would be
char buffer[BUFFER_MAX];
char * p = buffer;
size_t to_read = ... /* not larger then BUFFER_MAX! */
while (to_read > 0)
{
ssize_t result = read(..., p, to_read);
if (-1 == result)
{
if (EAGAIN == errno || EWOULDBLOCK == errno)
{
continue;
}
if (EINTR == errno)
{
continue; /* or break depending on application design. */
}
perror("read() failed");
exit(EXIT_FAILURE);
}
else if (0 < result)
{
to_read -= (size_t) result;
p += (size_t) result;
}
else if (0 == result) /* end of file / connection shut down for reading */
{
break;
}
else
{
fprintf(stderr, "read() returned the unexpected value of %zd. You probably hit a (kernel) bug ... :-/\n", result);
exit(EXIT_FAILURE);
}
}
If (0 < to_read)
{
fprintf(stderr, "Encountered early end of stream. %zu bytes not read.\n", to_read);
}

system call return values and errno

I am using following system calls in my program:
recvfrom
sendto
sendmsg
And from each system call mentioned above I check if it completes with out any interruption and in case if it is interrupted, I retry.
Ex:
recvagain:
len = recvfrom(fd, response, MSGSIZE, MSG_WAITALL, (struct sockaddr *)&from, &fromlen);
if (errno == EINTR) {
syslog(LOG_NOTICE, "recvfrom interrupted: %s", strerror(errno));
goto recvagain;
}
Problem here is that do I need to reset errno value to 0 each and every time it fails. Or if recvfrom() is successful, does it reset errno to 0?
recvfrom() man page says:
Upon successful completion, recvfrom() returns the length of the message in bytes. If no messages are available to be received and the
peer has performed an orderly shutdown, recvfrom() returns 0.
Otherwise the function returns -1 and sets errno to indicate the
error.
same case with sendto and sendmsg.
I can n't really check this now as I don't have access to server-client setup.
Any idea?
Thanks
recvfrom returns -1 if it is interrupted (and sets errno to EINTR). Therefore, you should just check len:
if(len == -1) {
if(errno == EINTR) {
syslog(LOG_NOTICE, "recvfrom interrupted");
goto recvagain;
} else {
/* some other error occurred... */
}
}
The errno pseudo "variable" may not change on successful syscalls. So you could clear it either before your recvfrom, or when len<0 and having tested its value.
See errno(3) man page for more.
Actually, as Robert Xiao (nneonneo) commented, you should not write errno and just test it when the syscall has failed (in that case, the C function -e.g. recvfrom etc...- wrapping that syscall would have written errno before returning -1).

socket select fails with operation in progress - Non blocking mode

Our application uses a non-blocking socket usage with connect and select operations (c code).
The pusedo code is as below
unsigned int ConnectToServer(struct sockaddr_in *pSelfAddr,struct sockaddr_in *pDestAddr)
{
int sktConnect = -1;
sktConnect = socket(AF_INET,SOCK_STREAM,0);
if(sktConnect == INVALID_SOCKET)
return -1;
fcntl(sktConnect,F_SETFL,fcntl(sktConnect,F_GETFL) | O_NONBLOCK);
if(pSelfAddr != 0)
{
if(bind(sktConnect,(const struct sockaddr*)(void *)pSelfAddr,sizeof(*pSelfAddr)) != 0)
{
closesocket(sktConnect);
return -1;
}
}
errno = 0;
int nRc = connect(sktConnect,(const struct sockaddr*)(void *)pDestAddr, sizeof(*pDestAddr));
if(nrC != -1)
{
return sktConnect;
}
if(errno != EINPROGRESS)
{
int savedError = errno;
closesocket(sktConnect);
return -1;
}
fd_set scanSet;
FD_ZERO(&scanSet);
FD_SET(sktConnect,&scanSet);
struct timeval waitTime;
waitTime.tv_sec = 2;
waitTime.tv_usec = 0;
int tmp;
tmp = select(sktConnect +1, (fd_set*)0, &scanSet, (fd_set*)0,&waitTime);
if(tmp == -1 || !FD_ISSET(sktConnect,&scanSet))
{
int savedErrorNo = errno;
writeLog("Connect %s failed after select, cause %d, error %s",inet_ntoa(pDestAddr->sin_addr),savedErrorNo,strerror(savedErrorNo));
closesocket(sktConnect);
return -1;
}
.
.
.
.
.}
Problem statement
In the above code, the select fails with error code 115 which is "Operation in progress". I do not see any documentation on select failing with errno 115.
a. When does the select fails with error code 115 in non-blocking socket? Under what scenario?
b. Do we see any system logs which hints at this problem. Only concern for us me - I could not find any documented feature which describes such problem.
PS : We are using SUSE Linux 11 Enterprise Edition.
The errno EINPROGRESS isn't from select(), it is left over from the prior connect() operation. You enter the block that reports it if either select() returned -1 or the FD isn't set. All this means is that the connection is still in progress. errno is never cleared, only set.
Some thoughts on your code:
I think your condition below the select can be modified to check only to see, if select has returned a value greater than 0 and if that is the case, you can check output of getsockopt for the socket (for SOL_SOCKET and SO_ERROR) options (getsockopt(...,SOL_SOCKET, SO_ERROR,...,...)) to see if connect has not failed.
I am not very sure if the select will always return the socket as writable in case of a connection success. So, in your case, it may (only may) be the case that, the tmp variable is not -1 and the errno it is showing is the errno of the previous connect call.
Additional Reasons:
Another good reason is that, the destination address to which you are connecting is either not reachable, or doesn't have a server waiting at the specified address + port combination. In which case, you can try once with a blocking socket to see if that connects.
As far as I understand, you are trying to make a connection with timeout.
If so, there is a error in your code. After connect() call but before select() you should remove O_NONBLOCK option using fcntl(). Otherwise the select() will always return at once because the operations with your socket (which has O_NONBLOCK) would not block.
The EINPROGRESS which you read is probably generated not by select() but by previous connect() call.
You also should not use bind() call here because connect() implicitly binds your address to socket.

Connection refused after some time on threaded process in tcp socket requests (c/linux)

I'm trying to make process that takes number of requests each second, on each request new thread is created. Each thread then opens socket connection to address (http port) sends HEAD requests, gets response and closes socket.
Problem I'm having comes when i put more then 3 requests per second, after some time i get error in send() part of function, i keep getting Connection Refused. If I input more requests per second i get errors earlier. If i put only 2 requests per second i don't get errors at all. I suspect that I'm running out of some resource but i can't find which.
Here is basic structure of code
//declarations
socketfd = socket(servinfo->ai_family,servinfo->ai_socktype,servinfo->ai_protocol);
arg = fcntl(socketfd, F_GETFL, NULL)) < 0);
arg |= O_NONBLOCK;
fcntl(socketfd, F_SETFL, arg)
if((conn = connect(socketfd, servinfo->ai_addr, servinfo->ai_addrlen)) < 0)
{
if(errno == EINPROGRESS)
{
do
{
tv.tv_sec = CONNECT_TIMEOUT;
tv.tv_usec = 0;
FD_ZERO(&myset);
FD_SET(socketfd, &myset);
if((res = select(socketfd+1, NULL, &myset, NULL, &tv) > 0))
{
if( (arg = fcntl(socketfd, F_GETFL, NULL)) < 0) {
perror("fcntl get 2");
}
arg &= (~O_NONBLOCK);
if( fcntl(socketfd, F_SETFL, arg) < 0) {
perror("fcntl set 2");
}
char szBuf[4096];
std::string htmlreq = "HEAD / HTTP/1.1\r\nHost:";
htmlreq += info->hostName;
htmlreq += "\r\n\r\n";
if((conn = send(socketfd,htmlreq.c_str(),htmlreq.size(),0)) == -1 && errno != EINTR)
{
perror("send");
close(socketfd);
return;
}
if((conn = recv(socketfd,szBuf,sizeof(szBuf)+1,0)) < 0 && errno != EINTR)
{
perror("recv");
close(socketfd);
return ;
}
close(socketfd);
// do stuff with data
break;
}
else
{
//timeout
break;
}
}while(1);
}
else
{
perror("connect");
close(socketfd);
return;
}
}
I removed some error checking from start, what i get as output is "Send: Connection Refused" after some time. I'd appreciate some pointers to what part could be causing problems, platform is ubuntu linux. I'd also be glad to post other parts of code if needed. Tnx in advance.
The resource you're probably running out of is on the server you're connecting to. The connection is being refused by the computer you're connecting to because it's either:
Configure to limit the number of connections per second ( based on some criteria )
Or the server you're connecting to is under too much load for some reason and can't take any more connections.
Since you always get the error on the third connection it could be that the server you're connecting to limits the number of connections on a per IP basis.
Edit1
You're trying to do a non-blocking connect? Now that I look at it closer it sounds like your problem is with the select, as in select is returning that the socket is readable before it's actually connected and then you're calling send. One of the things to watch out for on non-blocking connects is that the socket becomes both readable and writeable on error. Which means you need to check for both after select returns otherwise you may be missing whatever the actual error is and seeing the send error instead.
This is from Stevens UNP:
FD_ZERO(&rset);
FD_SET(sockfd, &rset);
wset = rset;
tval.tv_sec = nsec;
tval.tv_usec = 0;
if ( (n = Select(sockfd+1, &rset, &wset, NULL,
nsec ? &tval : NULL)) == 0) {
close(sockfd); /* timeout */
errno = ETIMEDOUT;
return(-1);
}
if (FD_ISSET(sockfd, &rset) || FD_ISSET(sockfd, &wset)) {
len = sizeof(error);
if (getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &error, &len) < 0)
return(-1); /* Solaris pending error */
} else
err_quit("select error: sockfd not set");
done:
Fcntl(sockfd, F_SETFL, flags); /* restore file status flags */
if (error) {
close(sockfd); /* just in case */
errno = error;
return(-1);
}
return(0);
There are quite a few problems in your code.
First you set the socket to non blocking. I don't understand why you do this. The connect function has an internal timeout and so won't block.
Another problem of your code is that the first if statement will skip the instruction block if the connection immediately succeeds ! Which may happen.
You apparently want to first send the HEAD message. There is no real need to make this one non blocking unless you expect the remote server or the network to be very slow and want a time out on it. In this case the select with non blocking socket would make sens.
Once you send the HEAD message, you expect some data in response that you collect with the recv function. Be aware that this function call may return before the whole data sent is received. You need an independent way to determine that all the data sent has been received. Would the server close the connection ? This would detected by the recv function returning 0.
So the recv should be wrapped into a loop where you append to received data to some buffer or a file and quit when recv returns 0. Use a non blocking socket if you want to add a timeout on this recv operation which may indeed block.
But first try without timeouts to be sure it works at full speed without blocking as your current version.
I suspect the initial connect is slow because of name and IP adresse resolution, and gets faster in subsequent calls because data is cached.

Resources