I have a problem with a server socket under Linux. For some reason unknown to me the server socket vanishes and I get a Bad file descriptor error in the select call that waits for an incomming connection. This problem always occurs when I close an unrelated socket connection in a different thread. This happens on an embedded Linux with 2.6.36 Kernel.
Does anyone know why this would happen? Is it normal that a server socket can simply vanish resulting in Bad file descriptor?
edit:
The other socket code implements a VNC Server and runs in a completely different thread. The only thing special in that other code is the use of setjmp/longjmp but that should not be a problem.
The code that create the server socket is the following:
int server_socket = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
struct sockaddr_in saddr;
memset(&saddr, 0, sizeof(saddr));
saddr.sin_family = AF_INET;
saddr.sin_addr.s_addr = htonl(INADDR_ANY);
saddr.sin_port = htons(1234);
const int optionval = 1;
setsockopt(server_socket, SOL_SOCKET, SO_REUSEADDR, &optionval, sizeof(optionval));
if (bind(server_socket, (struct sockaddr *) &saddr, sizeof(saddr)) < 0) {
perror("bind");
return 0;
}
if (listen(server_socket, 1) < 0) {
perror("listen");
return 0;
}
I wait for an incomming connection using the code below:
static int WaitForConnection(int server_socket, struct timeval *timeout)
{
fd_set read_fds;
FD_ZERO(&read_fds);
int max_sd = server_socket;
FD_SET(server_socket, &read_fds);
// This select will result in 'EBADFD' in the error case.
// Even though the server socket was not closed with 'close'.
int res = select(max_sd + 1, &read_fds, NULL, NULL, timeout);
if (res > 0) {
struct sockaddr_in caddr;
socklen_t clen = sizeof(caddr);
return accept(server_socket, (struct sockaddr *) &caddr, &clen);
}
return -1;
}
edit:
When the problem case happens i currently simply restart the server but I don't understand why the server socket id should suddenly become an invalid file descriptor:
int error = 0;
socklen_t len = sizeof (error);
int retval = getsockopt (server_socket, SOL_SOCKET, SO_ERROR, &error, &len );
if (retval < 0) {
close(server_socket);
goto server_start;
}
Sockets (file descriptors) usually suffer from the same management issues as raw pointers in C. Whenever you close a socket, do not forget to assign -1 to the variable that keeps the descriptor value:
close(socket);
socket = -1;
As you would do to C pointer
free(buffer);
buffer = NULL;
If you forget to do this yo can later close socket twice, as you would free() memory twice if it was a pointer.
The other issue might be related to the fact that people usually forget: file descriptors in UNIX environment start from 0. If somewhere in the code you have
struct FooData {
int foo;
int socket;
...
}
// Either
FooData my_data_1 = {0};
// Or
FooData my_data_2;
memset(&my_data_2, 0, sizeof(my_data_2));
In both cases my_data_1 and my_data_2 have a valid descriptor (socket) value. And later, some piece of code, responsible for freeing FooData structure may blindly close() this descriptor, that happens to be you server's listening socket (0).
1- close your socket:
close(sockfd);
2- clear your socket file descriptor from select set:
FD_CLR(sockfd,&master); //opposite of FD_SET
You don't distinguish the two error cases in your code, both can fail select or accept. My guess is that you just have a time out and that select returns 0.
print retval and errno in an else branch
investigate the return value of accept seperately
ensure that errno is reset to 0 before each of the system calls
In Linux once you create a connection and it get closed then you have to wait for some time before making new connection.
As in Linux, socket doesn't release the port no. as soon as you close the socket.
OR
You reuse the socket, then bad file descriptor want come.
Related
I have a while(1) loop that uses recvfrom to get data that has been sent to a domain socket from another process (P2).
The while loop needs to do 2 things, firstly listen for incoming data from P2, and secondly run another function checkVoltage().
So it runs a little something like this:
while(true)
{
listenOnSocket() /*listens for 100 u seconds*/
checkVoltage();
}
My issue is this: the listenOnSocket() function uses the recvfrom function to check for an input from another process. It spends 100usecs listening, then times out and proceeds to run the checkVoltage() function. So it spends like 99% of the time in the listenOnSocket() function. My issue is that if P2 sends information to the socket during the checkVoltage() function, then it will result in an error, stating: sending datagram message: No such file or directory.
Is there a way to have this loop check for any data that has been sent to the socket previously? That way if P2 sends data during the checkVoltage() function, it will not result in an error.
Thanks.
EDIT:
So the listenOnSocket() function creates a socket with the name FireControl when I run P1 (the program that receives data from P2) the FireControl file vanishes for a split second then reappears. If P2 sends data to P1 during this short period, it results in the error mentioned up top.
So I guess this means I should separate the creation of the socket from the recvfrom function, because the short period where the new socket is created it does not exist - if that makes sense.
I'm a dope, I should've separated them in the first place!
EDIT2: Here is listenOnSocket():
command listenOnSocket(int timeout, float utimeout) /*Returns null payload when no input is detected*/
{
command payload;
int sock;
socklen_t* length;
struct sockaddr_un name;
char buf[1024];
struct timeval tv;
tv.tv_sec = timeout;
tv.tv_usec = utimeout;
/* Create socket from which to read. */
sock = socket(AF_UNIX, SOCK_DGRAM, 0);
if (sock < 0)
{
perror("opening datagram socket");
payload = nullPayload;
}
/* Create name. */
name.sun_family = AF_UNIX;
strcpy(name.sun_path, NAME);
unlink(name.sun_path);
/* Bind the UNIX domain address to the created socket */
if (bind(sock, (struct sockaddr *) &name, sizeof(struct sockaddr_un)))
{
perror("binding name to datagram socket\n");
payload = nullPayload;
}
/*Socket has been created at NAME*/
if (timeout != 0 || utimeout != 0)
{
setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv, sizeof(struct timeval));
}
else
{
tv.tv_sec = 0;
tv.tv_usec = 0;
setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv, sizeof(struct timeval));
}
/* Read from the socket */
if (recvfrom(sock, &payload, sizeof(command), 0, (struct sockaddr *)&name, &length) < 0) /*Less than zero results from a timeout*/
{
payload = nullPayload;
}
unlink(NAME);
return payload;
}
and here is the loop that calls it:
while (1)
{
buffer = getADCValue();
checkVoltage();
temp = listenOnSocket(0, 100); /*Look for a new command*/
doStuffWithTempIfItHasChanged();
}
}
I guess this means I should separate the creation of the socket from the recvfrom function, because the short period where the new socket is created it does not exist
That is correct. If you open and close the socket every time in your listenOnSocket() socket, (a) you will lose any datagrams that got queued that you didn't read, and (b) sends while the socket is closed will fail ... of course. Nothing for them to send to.
Once you've bound the socket, the datagrams will accumulate in a buffer and can be read later using recvfrom. That said, if the buffer overflows, messages may be discarded.
I wrote a server/client program. And use select check socket. But when client close socket(tcp status in server will get in close_wait), select always return 1 and errno is 0.
Why select return 1? Tcp socket have nothing to read now!
server:
int sock = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
addr.sin_port = htons(6999);
socklen_t socklen = sizeof(struct sockaddr_in);
bind(sock, (struct sockaddr *)&addr, socklen);
listen(sock, 0);
int clisock;
clisock = accept(sock, NULL, NULL);
fd_set backset, rcvset;
struct timeval timeout;
timeout.tv_sec = 3;
int maxfd = clisock+1;
FD_SET(clisock, &rcvset);
backset = rcvset;
int ret;
while(1) {
rcvset = backset;
timeout.tv_sec = 3;
ret = select(maxfd, &rcvset, NULL, NULL, &timeout);
if(ret <= 0)
continue;
sleep(1);
printf("ret:%d, %s\n",
ret, strerror(errno));
}
client:
int sock = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = inet_addr("127.0.0.1");
addr.sin_port = htons(6999);
socklen_t socklen = sizeof(struct sockaddr_in);
connect(sock, (struct sockaddr *)&addr, socklen);
sleep(3);
close(sock);
sleep(100);
output:
./server
ret:1, Success
ret:1, Success
ret:1, Success
The socket is readable because the peer has closed it and when you read from it you will get an end of stream. Rather than nothing.
CLOSE_WAIT means that TCP is waiting for you to close the socket. So close it.
Select returns because there is an event on one of the sockets it monitors. The documentation uses the term "readable". In this cas it is somewhat misleading since the socket on the other end is closed from and there are no bytes to read from it. The reason why the documentation is phrased like that is because select works on any kind of file descriptor. The "file" could be a socket, a pipe or a normal file. They didn't want to get entangled by the specifics of the different types of file descriptor.
That the socket on the other end is closed is normal, so select should not return an error in this case. When you try to actually read from your socket you will get an error once you have read all available data, if the connection has been closed on the other end.
Since select can monitor several file descriptors at once, and uses a single bit for each file descriptor it would be impossible to differentiate between "data has arrived" and "the socket on the other end has been closed". Both events will flag the socket as "readable".
The same goes when monitoring for writing. If the other side closes its endpoint the socket will be flagged as "writable" as far as select is concerned. You won't get the error until you actually try writing to the socket.
I have developed a tcp server in my one embedded device using lwip+freeRTOS.
Now this is the flow how I communicate with other device ( Other device has Linux system and let's name it LinuxDevice) using my device ( let's name it lwipDevice)
Send UDP packet from lwipDevice to LinuxDevice to initiate that I am ready to share my information.
Now LinuxDevice recognises this message successfully and sends one TCP packet (which contain command) to lwipDevice for telling to send its information.
But at lwipDevice side this message is not received. So it will not send any response to LinuxDevice. And steps 1-3 repeat again and again.
Now this is code of lwipDevice for TCP server:
long server_sock=-1;
#define FAIL 1
#define PASS 0
int CreateTcpServerSocket(long *pSock, int port)
{
struct sockaddr_in sin;
int addrlen = sizeof(sin);
int e;
struct linger linger;
linger.l_linger=0;
int i = 1;
*pSock = socket(AF_INET, SOCK_STREAM, 0);
if (*pSock == -1)
{
printf("*** createTcpSercerSocket:open sock error,port %d\n",port);
return FAIL;
}
memset((char *)&sin, 0, sizeof(sin));
sin.sin_family = AF_INET;
sin.sin_len = sizeof(sin);
sin.sin_addr.s_addr = htonl(INADDR_ANY); /* wildcard IP address */
sin.sin_port = htons(port);
e = bind(*pSock, (struct sockaddr*)&sin, addrlen);
if (e != 0)
{
printf("error %d binding tcp listen on port\n");
closesocket(*pSock);
*pSock = -1;
return FAIL;
}
lwip_ioctl(*pSock, FIONBIO, &i); //Set Non blocking mode
e = listen(*pSock, 2);
if (e != 0)
{
pprintf("error :listen on TCP server\n");
closesocket(*pSock);
*pSock = -1;
return FAIL;
}
return PASS;
}
void vTCPTask(void *parm)
{
struct sockaddr client; /* for BSDish accept() call */
int clientsize;
long sock;
if(CreateTcpServerSocket(&server_sock, 8000) == FAIL) //Here server created successfully
{
printf("Fail to create server!!!!!\n");
server_sock=-1;
}
while(1)
{
// some code for other stuff
sock= accept(server_sock, &client, &clientsize); //This line always fails and reurn -1
if(sock != -1)
{
printf("accepted socket:\n\n");
//...now receive data from client....
// send some data to client
}
// some code for other stuff
//sleep for 15 seconds
}
}
int main()
{
//...initilization of lwip stack
//....some more code
//...................
xTaskCreate(vTCPTask, (signed char const *) "tcptask",
356, NULL, 3, (xTaskHandle *)&xNotifierServiceTaskHandle);
/* Start the scheduler */
vTaskStartScheduler();
return 1
}
I have checked lwip_accept function and it will return from this condition:
if (netconn_is_nonblocking(sock->conn) && (sock->rcvevent <= 0))
{
LWIP_DEBUGF(SOCKETS_DEBUG, ("lwip_accept(%d): returning EWOULDBLOCK\n", s));
sock_set_errno(sock, EWOULDBLOCK);
return -1;
}
EDIT:
I know that netconn_is_nonblocking(sock->conn) condition will always true because have set socket in non blocking mode. But why sock->rcvevent always zero even LinuxDevice already send packet to it?
EDIT:
For testing purpose have commented all other stuff code in task ( see //some code for other stuff ) then socket is successfully accepted and i try to receive the packet but now problem is it's now stuck in lwip_recvfrom function (Note: LinuxDevice continue send packets). So have further debug more and found that it stuck in function sys_arch_mbox_fetch ( function call flow:: lwip_recvfrom\netconn_recv\netconn_recv_data\sys_arch_mbox_fetch).
Does anyone have an idea what is wrong with it?
You have configured the socket as non-blocking, so the accept() call will never block. If there is no incoming connection pending it will return the EWOULDBLOCK error code, which is what you see.
Finally I figured out what is cause of issue.
In lwipopt.h file there is macro like
/* Non-static memory, used with DMA pool */
#ifdef __CODE_RED
#define MEM_SIZE (6 * 1024)
#else
#define MEM_SIZE (24 * 1024)
#endif
I have defined _CODE_RED. So MEM_SIZE will (6 * 1024). Now when i change that memory size to (16 * 1024) then everything working fine.
Now all the time connection accepted and after that i am able to send/recv tcp packets successfully.
Where do you set rcvevent? Your code doesn't reveal it. I suppose it's the result of recv (or read). Reading from a non-blocking that has no available data (haven't yet received data) returns EAGAIN, which evaluates true in your rcvevent <= 0 condition. You have to manually check these specific error codes.
But why sock->rcvevent always zero even LinuxDevice already send packet to it?
Have you tried sending data with telnet or netcat to be sure the error is in your server and not in your client? Maybe your client is not sending to the correct destination, or something else.
I have a problem with a program. The program should be triggered by UDP messages, that's why I open a nonblocking UDP socket, because I want to use it later again. After that the program should open a file, copy out a certain amount of bytes and send it to a browser.
The problem occurs when I want to open a file, then I get a "resource temporarily not availabe" fault.
Here is a simple program, which creates the same fault.
main part:
udp_openPort(9999);
for(;;){
if(udp_receiveData(temp, 32) > 0){
printf("Received Message: %d\n",atoi(temp));
break;
}
}
filefd = open("test.txt",O_RDONLY);
printf("File Open: %s\n",strerror(errno));
read(filefd,buff,sizeof(buff));
printf("Daten: %s",buff);
close(filefd);
udp_closePort();
udp_receiveData():
int udp_receiveData(void* data, int size){
socklen_t dummy = sizeof(NetworkAddr);
NetworkAddr sender;
return recvfrom(sockfd, data, size, MSG_DONTWAIT, (struct sockaddr*) (&sender), &dummy);
}
When I open the socket as a blocking socket, there is no problem with opening the file, but I need I nonblocking socket for my purpose.
Did I make a mistake in coding the program or did I made a mistake, when I planned the program?
Best regards,
PG
EDIT: Here is the udp_openPort() function:
int udp_openPort(int portNr){
//Create handle to socket
sockfd = socket(AF_INET, SOCK_DGRAM, 0);
if (sockfd == -1){
return 0;
}
// Make sure that we don't receive our own packets.
char loop = 0;
if (setsockopt(sockfd, IPPROTO_IP, IP_MULTICAST_LOOP, &loop, sizeof(loop)) == -1){
}
// Bind to the port where we receive UDP messages.
NetworkAddr addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(portNr);
addr.sin_addr.s_addr = htonl(INADDR_ANY);
if (bind(sockfd, (struct sockaddr*) &addr, sizeof(addr)) == -1){
return 0;
}
return 1;
}
Maybe someone sees a problem in here.
Looking at the manpage of open, the errno is set if and only if the error occurred, i.e. when returned descriptor is -1. Hence, your error handling is incorrect. You should have checked the value of filefd in order to determine if the file has been opened or not.
Because the file in fact was opened correctly, errno hasn't been modified and your error message was set by udp_receiveData; in this case, as your socket is non-blocking, there was no data on UDP socket (resource temporarily unavailable). You don't experience this with a blocking socket, as your program is then sleeping waiting for a message to arrive.
Are you sure that you error after open is not interfere with udp_openPort? Probably udp_openPort does something wrong and set the errno variable which you lately misinterpret as file open error.
it seems that when i use send() function (in a TCP file transfer program) like this
while((count = recv(socketConnection, buff, 100000, 0))>0)
myfile.write(buff,count);
the function recv() just waits untill the whole data comes and exits the loop when it is no more receiving any data but in a similar program for a UDP program
while((n = recvfrom(sockfd,mesg,1024,0,(struct sockaddr *)&cliaddr,&len))>0)
myfile.write(mesg,n);
the recvfrom() function just blocks and does not exit the loop for some reason, as far as i know both recv() and recvfrom() are blocking right?? Then why the difference. Does it have something to do with the functions or just the nature of TCP,UDP(which i guess is not a reason)??
P.S. Please help me understand this guys, I'm a newbie to socket programming and networking.
EDIT: full server program for both TCP and UDP
UDP server (with recvfrom() )
int i=0;
int sockfd,n;
struct sockaddr_in servaddr,cliaddr;
socklen_t len;
char mesg[1024];
sockfd=socket(AF_INET,SOCK_DGRAM,0);
bzero(&servaddr,sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr=htonl(INADDR_ANY);
servaddr.sin_port=htons(32000);
bind(sockfd,(struct sockaddr *)&servaddr,sizeof(servaddr));
ofstream myfile;
// fcntl(sockfd,F_SETFL,O_NONBLOCK);
myfile.open("2gb",ios::out);
while((n = recvfrom(sockfd,mesg,1024,0,(struct sockaddr *)&cliaddr,&len))>0)
myfile.write(mesg,n);
TCP (recv() ) server program
struct sockaddr_in socketInfo;
char sysHost[MAXHOSTNAME+1]; // Hostname of this computer we are running on
struct hostent *hPtr;
int socketHandle;
int portNumber = 8070;
//queue<char*> my_queue;
bzero(&socketInfo, sizeof(sockaddr_in)); // Clear structure memory
gethostname(sysHost, MAXHOSTNAME); // Get the name of this computer we are running on
if((hPtr = gethostbyname(sysHost)) == NULL)
{
cerr << "System hostname misconfigured." << endl;
exit(EXIT_FAILURE);
}
if((socketHandle = socket(AF_INET, SOCK_STREAM, 0)) < 0)
{
close(socketHandle);
exit(EXIT_FAILURE);
}
// std::cout<<"hi starting server";
socklen_t optlen;
int rcvbuff=262144;
optlen = sizeof(rcvbuff);
socketInfo.sin_family = AF_INET;
socketInfo.sin_addr.s_addr = htonl(INADDR_ANY);
socketInfo.sin_port = htons(portNumber); // Set port number
if( bind(socketHandle, (struct sockaddr *) &socketInfo, sizeof(socketInfo)) < 0)
{
close(socketHandle);
perror("bind");
exit(EXIT_FAILURE);
}
listen(socketHandle, 1);
int socketConnection;
if( (socketConnection = accept(socketHandle, NULL, NULL)) < 0)
{
exit(EXIT_FAILURE);
}
close(socketHandle);
time_start(boost::posix_time::microsec_clock::local_time());
int rc = 0; // Actual number of bytes read
int count=0;
char *buff;
int a=100000;
buff=new char[a];
ofstream myfile;
myfile.open("345kb.doc",ios::out|ios::app);
if(myfile.is_open())
{
long i=0;
while((count = recv(socketConnection, buff, 100000, 0))>0)
{
myfile.write(buff,count);
}}
the function recv() just waits untill the whole data comes and exits the loop when it is no more receiving any data
recv() on a TCP connection returns 0 when the sending side has closed the connection and this is the condition for your loop to terminate.
for a UDP program the recvfrom() function just blocks and does not exit the loop for some reason,
Because UDP is a connection-less protocol hence there is no special return code from recv() for a closed UDP connection. Unless someone sends you a 0-length datagram.
recv() will end the loop because at the other side the socket is closed, so recv() will return 0 (socket gracefully closed) whereas, recvfrom that does not have that signal, it does not know about closing, because it's an unconnected socket. It's stay there until it receives a packet or timeout, with UDP you need a way to tell that the communication is over (finish).