Our software had problems connecting to a SIEMENS PLC. We created a socket and continually did connect() on it, always receiveing WSAETIMEDOUT. Telnetting to the PLC on the exact same IP and port worked. Pseudocode below:
// Does not work
SOCKET reconnect(char* ip) {
SOCKET sock = socket(PF_INET,SOCK_STREAM,0);
struct sockaddr_in addr = make_addr();
int err;
while(1) {
err = connect(sock,(struct sockaddr FAR*) &addr,sizeof(addr));
if( err==SOCKET_ERROR ) {
log() // WSAETIMEDOUT logged here
continue;
}
return sock;
}
}
After changing the code to create a new socket for each connect() call, it seems to work...
// Works
SOCKET reconnect(char* ip) {
struct sockaddr_in addr = make_addr();
int err;
while(1) {
SOCKET sock = socket(PF_INET,SOCK_STREAM,0);
err = connect(sock,(struct sockaddr FAR*) &addr,sizeof(addr));
if( err==SOCKET_ERROR ) {
log() // WSAETIMEDOUT logged here
closesocket(sock);
continue;
}
return sock;
}
}
The first snippet has been running successfully in production for ~20 years, across multiple versions of Windows. I suspect it doesn't follow the specs though... Has there been any changes/updates to Windows Server 2012 R2 (which is what the customer is running) that changes this behavior?
EDIT
According to the docs,
If the error code returned indicates the connection attempt failed
(that is, WSAECONNREFUSED, WSAENETUNREACH, WSAETIMEDOUT) the
application can call connect again for the same socket.
... which makes this even more puzzling.
Your code was always wrong. A failed connect() always hoses the socket. You were lucky it worked so long.
Related
I expect the following code* to fail since the server address hasn't been set with a valid value (verified in debugger - the whole struct is indeed initialized to 0, making address family AF_UNSPEC).
* incomplete illustrative snippet
static struct sockaddr_in g_server_addr;
int main(void)
{
int hdl;
hdl = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (-1 == hdl)
{
printf("Client Socket creation failed.");
return -1;
}
if(-1 == connect(hdl, (struct sockaddr *) &g_server_addr, sizeof(g_server_addr)))
{
printf("Connect() on socket failed.");
return -1;
}
return 0;
}
I need connect() to fail when called with incorrect input.
(This code is being tested on an Ubuntu machine.)
From the connect manpage:
Connectionless sockets may dissolve the association by connecting to an address with the sa_family member of sockaddr set to AF_UNSPEC (supported on Linux since kernel 2.2).
The manpage is a bit outdated, it will work on any socket, which can be disconnected at all, as for example TCP sockets.
In practice, there is no error when trying to dissolve the association on a stream socket, which is not yet connected. This is, why you don't get an error.
If you need to get an error, initialize the address family with an invalid family:
static struct sockaddr_in g_server_addr = { -1 };
This will yield the error -1 EAFNOSUPPORT (Address family not supported by protocol)
See also net/ipv4/af_inet.c of a recent linux kernel:
int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr,
int addr_len, int flags) {
...
if (uaddr->sa_family == AF_UNSPEC) {
err = sk->sk_prot->disconnect(sk, flags);
sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED;
goto out;
}
...
out:
return err;
I am trying to read from a UDP port, from a local (loopback) application, using IOCP. IOCP works fine for TCP/IP, but I am unable to open the socket properly for UDP.
This is what I am doing:
// same as for tcp/ip
struct sockaddr_in remoteAddr = { 0 };
remoteAddr.sin_addr.s_addr = LOOPBACK_ADDRESS;
remoteAddr.sin_family = AF_INET;
remoteAddr.sin_port = htons(portNumber);
// using SOCK_DGRAM here
SOCKET sock = INVALID_SOCKET;
sock = WSASocketW(AF_INET, SOCK_DGRAM, IPPROTO_IP,
NULL, 0, WSA_FLAG_OVERLAPPED);
if( sock == INVALID_SOCKET ) {
LOG("WSASocketW failed: %d", WSAGetLastError());
return;
}
nRet = WSAConnect(*sock, (const struct sockaddr*)&remoteAddr, sizeof(remoteAddr),
NULL, NULL, NULL, NULL);
if (nRet == SOCKET_ERROR)
{
LOG("WSAConnect failed: %d", WSAGetLastError());
return;
}
nRet = WSARecv(sock, &wsabuf, 1, NULL, &flags, &overlapped, NULL);
if (nRet == SOCKET_ERROR && (ERROR_IO_PENDING != WSAGetLastError()))
{
LOG("WSARecv failed: %d", WSAGetLastError());
return;
}
// no errors of any kind
LOG("seems good so far");
Everything passes without errors, but GetQueuedCompletionStatus inside the worker loop thread never returns. If I do the same thing to connect to a TCP socket (just replace SOCK_DGRAM with SOCK_STREAM basically), I get data inside the loop.
Am I doing something obviously wrong?
(Btw) I know I could use WSARecvFrom, but I would like to reuse as much code as possible from the TCP socket. I.e. hopefully, set everything up and then post WSARecv calls inside the worker thread regardless of the type of the socket (WSARecv is supposed to work with UDP properly, AFAIK).
Managed to get it to work, thanks to the comment by #WouterHuysentruit.
Basically, if I want to receive UDP packets using WSARecv, I need to bind. If I want to send UDP packets using WSASend, I need to connect. So the following works:
if (port_type == incoming_packets)
{
// bind to port
ret = bind(*sock, (const struct sockaddr*)&remoteAddr, sizeof(remoteAddr));
...
WSARecv(...);
}
else
{
// this can send to a loopback udp port which is bound in
// a different application
ret = WSAConnect(*sock, (const struct sockaddr*)&remoteAddr, sizeof(remoteAddr), ...);
...
WSASend(...);
}
As others have pointed out, WSAReceiveFrom/WSASendTo are usually a better choice for UDP, but in this case I can support multiple port types using IOCP transparently.
I am facing one of the strangest programming problems in my life.
I've built a few servers in the past and the clients would connect normally, without any problems.
Now I'm creating one which is basically a web server. However, I'm facing a VERY strange situation (at least to me).
Suppose that you connect to localhost:8080 and that accept() accepts your connection and then the code will process your request in a separate thread (the idea is to have multiple forks and threads across each child - that's implemented on another file temporarily but I'm facing this issue on that setup as well so...better make it simple first). So your request gets processed but then after being processed and the socket being closed AND you see the output on your browser, accept() accepts a connection again - but no one connects of course because only one connection was created.
errno = 0 (Success) after recv (that's where the program blows up)
recv returns 0 though - so no bytes read (of course, because the connection was not supposed to exist)
int main(int argc, char * argv[]){
int sock;
int fd_list[2];
int fork_id;
/* Socket */
sock=create_socket(PORT);
int i, active_n=0;
pthread_t tvec;
char address[BUFFSIZE];
thread_buffer t_buffer;
int msgsock;
conf = read_config("./www.config");
if(conf == NULL)
{
conf = (config*)malloc(sizeof(config));
if(conf == NULL)
{
perror("\nError allocating configuration:");
exit(-1);
}
// Set defaults
sprintf(conf->httpdocs, DOCUMENT_ROOT);
sprintf(conf->cgibin, CGI_ROOT);
}
while(cicle) {
printf("\tWaiting for connections\n");
// Waits for a client
msgsock = wait_connection(sock, address);
printf("\nSocket: %d\n", msgsock);
t_buffer.msg = &address;
t_buffer.sock = msgsock;
t_buffer.conf = conf;
/* Send socket to thread */
if (pthread_create(&tvec, NULL, thread_func, (void*)&t_buffer) != 0)
{
perror("Error creating thread: ");
exit(-1);
}
}
free(conf);
return 0;
}
Here are two important functions used:
int create_socket(int port) {
struct sockaddr_in server, remote;
char buffer[BUFF];
int sock;
sock = socket(AF_INET, SOCK_STREAM, 0);
if (sock < 0) {
perror("opening stream socket");
exit(1);
}
server.sin_family = AF_INET;
server.sin_port = htons(port);
server.sin_addr.s_addr = htonl(INADDR_ANY);
if (bind(sock, (struct sockaddr *) &server, sizeof(struct sockaddr_in))) {
perror("binding stream socket");
exit(1);
}
gethostname(buffer, BUFF);
printf("\n\tServidor a espera de ligações.\n");
printf("\tUse o endereço %s:%d\n\n", buffer,port);
if (listen(sock, MAXPENDING) < 0) {
perror("Impossível criar o socket. O servidor vai sair.\n");
exit(1);
}
return(sock);
}
int wait_connection(int serversock, char *remote_address){
int clientlen;
int clientsock;
struct sockaddr_in echoclient;
clientlen = sizeof(echoclient);
/* Wait for client connection */
if ((clientsock = accept(serversock, (struct sockaddr *) &echoclient, &clientlen)) < 0)
{
perror("Impossivel estabelecer ligacao ao cliente. O servidor vai sair.\n");
exit(-1);
}
printf("\n11111111111111Received request - %d\n", clientsock);
sprintf(remote_address, "%s", inet_ntoa(echoclient.sin_addr));
return clientsock;
}
So basically you'd see:
11111111111111Received request - D
D is different both times so the fd is different definitely.
Twice! One after the other has been processed and then it blows up after recv in the thread function. Some times it takes a bit for the second to be processed and show but it does after a few seconds. Now, this doesn't always happen. Some times it does, some times it doesn't.
It's so weird...
I've rolled out the possibility of being an addon causing it to reconnect or something because Apache's ab tool causes the same issue after a few requests.
I'd like to note that even if I Don't run a thread for the client and simply close the socket, it happens as well! I've considered the possibility of the headers not being fully read and therefore the browsers sends another request. But the browser receives the data back properly otherwise it wouldn't show the result fine and if it shows the result fine, the connection must have been closed well - otherwise a connection reset should appear.
Any tips? I appreciate your help.
EDIT:
If I take out the start thread part of the code, sometimes the connection is accepted 4, 5, 6 times...
EDIT 2: Note that I know that the program blows up after recv failing, I exit on purpose.
This is certainly a bug waiting to happen:
pthread_create(&tvec, NULL, thread_func, (void*)&t_buffer
You're passing t_buffer, a local variable, to your thread. The next time you accept a client, which can happen
before another client finished, you'll pass the same variable to that thread too, leading to a lot of very indeterministic behavior.(e.g. 2 threads reading from the same connection, double close() on a descriptor and other oddities. )
Instead of passing the same local variable to every thread, dynamically allocate a new t_buffer for each new client.
Suppose ... after being processed and the socket being closed AND you see the output on your browser, accept() accepts a connection again - but no one connects of course because only one connection was created.
So if no-one connects, there is nothing to accept(), so this never happens.
So whatever you're seeing, that isn't it.
This question is similar to https://stackoverflow.com/questions/11650328/using-reliable-multicast-pragmatic-general-multicast-not-returning-from-accept, but my code is slightly different from its, so it may result in a different answer.
I am attempting to get a reliable-multicast server/client proof of concept setup.
The solution itself is a server/client connection. The client connects to the server via TCP/IP. The server then opens up a reliable multicast socket for the client to listen on. The client sends messages via TCP, and the server echoes it back via IPPROTO_RM. The end goal is to have many clients connected to the server, all receiving every echoed message.
The example code is based off of this page.
I have set up my RM sockets similarly (see listings below). The TCP sockets are working fine. The problem is in the RM sockets. The server opens up the multicast socket, then binds and connects to the multicast address properly. The client, however, listens correctly, but the call to accept blocks forever.
Both client and server processes are running on the same host.
I have checked, and Multicasting support is installed on the host (Server 2008).
Update: I've noticed that sometimes the accept will return if I send some data down the socket from the sender's side first. This is not ideal, nor is it reliable.
Update: The signs are pointing to the switch. Seems like a little hub won't cut it. We had an hilarious incident which resulted in lost comms for the whole building.
Listings
Server creates and connects Multicast sender
short
Port = 0;
const char
*Address = "234.5.6.7";
SOCKET
RMSocket;
SOCKADDR_IN
LocalAddr,
SessionAddr;
RMSocket = socket(AF_INET, SOCK_RDM, IPPROTO_RM);
if (RMSocket == INVALID_SOCKET)
{
return Failed;
}
LocalAddr.sin_family = AF_INET;
LocalAddr.sin_port = htons(0);
LocalAddr.sin_addr.s_addr = htonl(INADDR_ANY);
if ( bind( RMSocket, (SOCKADDR*)&LocalAddr, sizeof(LocalAddr)) == SOCKET_ERROR )
{
return Failed;
}
SessionAddr.sin_family = AF_INET;
SessionAddr.sin_port = htons( Port );
SessionAddr.sin_addr.s_addr = inet_addr( Address );
if ( connect( RMSocket, (SOCKADDR*)&SessionAddr, sizeof(SessionAddr)) == SOCKET_ERROR )
{
return Failed;
}
return Success;
Client creates and accepts Multicast reader
short
Port = 0;
const char
*Address = "234.5.6.7";
SOCKADDR_IN
LocalAddr;
SOCKET
RMListener,
RMSocket;
RMListener = socket( AF_INET, SOCK_RDM, IPPROTO_RM );
if ( RMListener == INVALID_SOCKET )
{
return Failed;
}
LocalAddr.sin_family = AF_INET;
LocalAddr.sin_port = htons( Port );
LocalAddr.sin_addr.s_addr = inet_addr( Address );
if ( bind( RMListener, (SOCKADDR*)&LocalAddr, sizeof(LocalAddr) ) )
{
return Failed;
}
if ( listen( RMListener, SOMAXCONN ) )
{
return Failed;
}
// BLOCKS HERE
RMSocket = accept( RMListener, NULL, NULL);
if ( RMSocket == INVALID_SOCKET )
{
return Failed;
}
return Success;
Do you have MSMQ (microsoft message queuing) installed ? it is required for IPPROTO_RM to work on Ms based computers. Plus it will only work for Windows version >= Xp||2003
Edit:I saw that you already checked it.
when I do 100 non-block socket connection in 1 thread,it is very slow(the number of connection increased one by one),but if I do a blocking socket connection in 100 parallel threads(one connect per thread), it is very fast(get done immediately )
sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (fcntl(sock, F_SETFL,O_NONBLOCK)!=0)
{
perror("fcntl nonblock");
return -1;
}
if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR,&reuseAddr, sizeof(reuseAddr))!=0)
{
perror("reuse addr");
return -1;
}
sAddr.sin_addr.s_addr = inet_addr(SRV_ADDR);
sAddr.sin_port = htons(1972);
if ((n=connect(sock, (const struct sockaddr *) &sAddr, sizeof(sAddr))) < 0)
{
if (errno !=EINPROGRESS) {
perror("client connect error");
return -1;
}
}
else if (n>=0)
{
printf("#%d connected\n",sock);
}
return sock;
Awesome question :-). Here's why I think this is happening. The standard says this:
If the connection cannot be established immediately and O_NONBLOCK is
set for the file descriptor for the socket, connect() shall fail and
set errno to [EINPROGRESS]
The question of course is what "immediately" means. I believe that "immediately" is actually some small time that allows the SYN, SYN-ACK, ACK to happen. If it didn't wait at all, it would have 0-chance of actually succeeding.
So basically:
The client sends a SYN
Waits (blocks) for a small period of time ("immediately") for a SYN-ACK.
Completes the connection
In doing so it returns successfully instead of EADDRINUSE.
Now, when using threads, each thread does this so nobody waits. They all just connect(2) and context switching allows everybody to do it almost simultaneously.