Why does GDB break when writing to the network? - c

I get this error every time my program reaches a write() function. The program will continue again, but will stop on the next write() call. When I run this program outside of gdb, it runs properly.
Program received signal SIGPIPE, Broken pipe.
0x00007ffff794b340 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
I've been told that this happens when the socket is closed from the remote end, but how would that be happening.
Note: The server and client are both running on the same machine, and the server was prebuilt for me, so I don't have access to it's code.

SIGPIPE is generated when the other side closed the connection. And there are good reasons for its existence.
By default gdb catches SIGPIPE.
If you aren't interested, and chances are you don't, simply disable it:
handle SIGPIPE nostop noprint pass
I've been told that this happens when the socket is closed from the remote end, but how would that be happening.
You mean why? Since you don't have the source we can only guess.
Perhaps it already sent all the data it wanted and closed the connection, because there's no point keeping it open... Remember, connections can be half-closed (that is, from one side). The server doesn't want to read any further, and just waits you to read the data and close your side. Probably nothing went wrong - but you have to decide that yourself, as only you know what the application protocol is.

Related

client-server application using fifos

I'm trying to write a client-server application in C using 2 fifos (client_to_server and server_to_client).
A version of the app where the client writes a command to the server who reads it works well, but when I add in client the lines in order to read the answer from the server it doesn't work anymore: the server gets blocked in reading the command from the client (as if there is nothing in client_to_server fifo, although the client written in it). What could be the problem in this case?
You are using fputs to send data to the server. That means that the data could stay in a local buffer until the buffer is full or you explicitely flush it. When you do not wait for the answer but exit from the client, the fifo is implicitely flushed and closed, causing the server to recieve something. But if you start waiting in the client without a prior flush, you end with a deadlock.
But remember: pipes were invented for single way communications. If you want 2 way communications with acknowlegements and/or synchronizations, you should considere using sockets.

Any cases Where close() is preferred to shutdown()?

I am a developer on an open source project and I have been having some problems with the server thinking it has answered a socket completely (meaning it has either sent a reply or closed it's end in response to a failure) and the client being stuck in poll(). After some research, I found that close() doesn't always generate a POLLHUP event, but shutdown(sock, 2) does.
In light of that, I'm considering adding a shutdown(sock,2) in the event of error handling (in addition to the close() call). Does anyone know of some reasons that this would cause problems? Am I barking up the wrong tree? I'm thinking that if the server believes that the socket is closed, the client should definitely not attempt anything else with that socket, and I can't think of a reason not to add this, but I haven't been working with tcp connections for that long and would love some advice.
You need to figure out why closeing the socket isn't causing it to shutdown. The most likely reason is that there is another descriptor that accesses the same endpoint. Only closeing the last endpoint causes an implicit shutdown.
Do you ever dup the file descriptor? Do you make sure it is closed in all child processes? If the socket was in a parent process before it forked this process, did the parent close their copy?
POLLHUP is not the right way to test for a closed connection. You should be testing for the file descriptor becoming readable and subsequently returning a zero-length read. This is the definition of end-of-file.

Going from listen and fork to xinetd

I have a piece of C network software that currently works in listen and fork mode. It's listening on some server socket and accepts incoming connection. Then it calls the core server function providing the new accepted socket.
Now I'm trying to make that software also work behind xinetd (depending on some runtime parameter). I tried to directly call the core server function providing file descriptor 0 instead of an accepted socket, but this method is just not working. The program immediately stops with a SIG_PIPE.
Is there any obvious reason for such behavior ? My core function performs some low level socket calls and signal handling. Is that supposed to work behind xinetd ?
Not absolutely certain but not everything you can do on a socket handle also works with ordinary file handles. For a start, you can't write to stdin. Also some system calls probably need a socket e.g. recv().
Edit
Another possibility: does your server process close stdin as part of its start up?

How to find where a process is stuck using DDD

I have a TCP Svr process written in C and running on CentOS 5.5. It acts as a TCP Server for external clients and also does some IPC communication with other processes in the system using Unix Domain Sockets it has establised. It's not a multi threaded process. It does one task at a time. There's an epoll_wait() I use to listen for requests on either the TCP socket or any of the IPC sockets it has established with internal processes. When the epoll_wait() function breaks,I process the request for whoever it is and then go back into epoll_wait()
I have a TCP Client that connects to this Process from outside (not IPC). It connects sucessfully, sends a request msg, gets a response back. I've put this in an infinite loop
just to test out its robustness etc.
After a while, the TCP Server stops responding to requests coming from TCP Client. The TCP client connects successfully, sends a request message, but it doesnt get any response msg back from the TCP server.
So I reckon the TCP server is stuck somewhere else, trying to do something and has not returned to the epoll_wait() to process
other requests coming in. I've tried to figure it out using logs, but thats not helping me understand where exactly the process is stuck.
So I wanted to use any debugger that can give me some information (function name would be great), as to what the process is doing. Putting breakpoints, is overwhelming cause the TCP Server process has tons of files and functions....
I'm trying to use DDD on CentOS 5.5, to figureout whats going on. I attach to the process successfully. Then I click on "Step" or "Stepi" or "Next" button....
but nothing happens....
btw when I use Eclipse for debugging, and attach to this process (or any process), I always get "__kernel_vsyscall()"....Does this mean, the program breaks by default at
whatever its doing? If thats the case, how do I come out of the __kernel_vsyscall() call, to continue within my program? If I press f8, it comes out, but then I dont know where it was, since I loose the stack trace....Like I said earlier. Since I cant figure where it was, I dont know where to put breakpoint....
In summary, I want to figureout where my process is stuck or what its doing and try to debug from that point on....
How do I go about this?
Thanks
Amit
1) Attaching to a C process can often cause problems in itself, is there any way for you to start the process in the debugger?
2) Using the step functions of DDD need to be done after you've set a breakpoint and the program is stopped on a command. From reading your question, I'm not sure you've done that. You may not want to set many breakpoints, but is setting one or two in critical sections of code possible?
In summary, What I wanted to accomplish was to be able to find where my program is stuck, when it hangs. I figured it out - It was so simple. Create a configuration in Eclipse ...."Debug Configurations->C/C++ attach to application"...
Let the process run normally from shell (preferably with a terminal attached). When it hangs, open eclipse, click on the debug icon and run the configured process. It'll ask you to attach to a process. Look for your process name and attach to it.
Now, just look at the entire stack trace....you'll see some of your own function calls mixed with kernel function calls. That tells you where the program is stuck.

Why does my Perl TCP server script hang with many TCP connections?

I've got a strange issue with a server accepting TCP connections. Even though there are normally some processes waiting, at some volume of connections it hangs.
Long version:
The server is written in Perl and binds a $srv socket with the reuse flag and listen == 5. Afterwards, it forks into 10 processes with a loop of $clt=$srv->accept(); do_processing($clt); $clt->shutdown(2);
The client written in C is also very simple - it sends some lines, then receives all lines available and does a shutdown(sockfd, 2); There's nothing async going on and at the end both send and receive queues are empty (as reported by netstat).
Connections last only ~20ms. All clients behave the same way, are the same implementation, etc. Now let's say I'm accepting X connections from client 1 and another X from client 2. Processes still report that they're idle all the time. If I add another X connections from client 3, suddenly the server processes start hanging just after accepting. The first blocking thing they do after accept(); is while (<$clt>) ... - but they don't get any data (on the first try already). Suddenly all 10 processes are in this state and do not stop waiting. On strace, the server processes seem to hang on read(), which makes sense.
There are loads of connections in TIME_WAIT state belonging to that server (~100 when the problem starts to manifest), but this might be a red herring.
What could be happening here?
After some more analysis: It turned out that the client was at fault, not closing previous connections properly before trying the next one. The servers at the beginning of the load-balancing list were left stale connections.
This probably isn't the solution to your problem, but it might solve a problem you'll experience in the future: don't forget to close() the sockets when you're done! shutdown() will disconnect the stream, but it'll still eat a file descriptor.
Since you said strace is showing processes stuck in read(), then your problem seems to be that the client isn't sending the data you expect it to be sending. You should either fix your client, or add an alarm() to your server processes so that they can survive dead clients.
Does it surge and then pause a long time (circa two minutes or so) and then surge again? If so you may not have your system max open files limit set high enough.

Resources