Drop an open TCP connection without sending RST - c

Looking into nginx: ignore some requests without proper Host header got me thinking that it's not actually possible to close(2) a TCP connection without the OS properly terminating the underlying TCP connection by sending an RST (and/or FIN) to the other end.
One workaround would be to use something like tcpdrop(8), however, as can be seen from usr.sbin/tcpdrop/tcpdrop.c on OpenBSD and FreeBSD, it's implemented through a sysctl-based interface, and may have portability issues outside of BSDs. (In fact, it looks like even the sysctl-based implementation may be different enough between OpenBSD and FreeBSD to require a porting layer -- OpenBSD uses the tcp_ident_mapping structure (which, subsequently, contains two sockaddr_storage elements, plus some other info), whereas FreeBSD, DragonFly and NetBSD use an array of two sockaddr_storage elements directly.) It turns out, that OpenBSD's tcpdrop does appear to send the R packet as per tcpdump(8), and can be confirmed by looking at /sys/netinet/tcp_subr.c :: tcp_drop(), which calls tcp_close() in the end (and tcp_close() is confirmed to send RST elsewhere on SO), so, it appears that it wouldn't even work, either.
If I'm establishing the connection myself through C, is there a way to subsequently drop it without an acknowledgement to the other side, e.g., without initiating RST?

If I'm establishing the connection myself through C, is there a way to subsequently drop it without an acknowledgement to the other side, e.g., without initiating RST?
No. Even if there was, if the peer subseqently sent anything it would be answered by an RST.
NB Normal TCP termination uses a FIN, not an RST.

Cheating an attacker in this way could be a good idea. Of course, in this case you are already reserving server resources for the established connection. In the most basic mode you can use netfilter to drop any TCP outgoing segment with RST or FIN flags set. This rule iptables rule could be an example:
sudo iptables -A OUTPUT -p tcp --tcp-flags FIN,RST SYN -j DROP
Of course, this rule will affect all your TCP connections. I wrote it just to provide a lead of how you can do it. Go to https://www.netfilter.org/ for getting more ideas working on your solution. Basically you should be able to do the same only for the selected connections.
Because of how TCP works, if you're able to implement it, the client (or attacker) will keep the connection open for a long time. To understand the effect it would have in a client read here: TCP, recv function hanging despite KEEPALIVE where I provide results of a test in which the other side doesn't return any TCP segment (not even an ACK). In my configuration, it takes 13 minutes for the socket to enter an error state (that depends on Linux parameters like tcp_retries1 and tcp_retries2).
Just consider a DoS attack will usually imply connections from thousands of different devices and not necessarily many connections from the same device. This is very easy to detect and block in the firewall. So, it's very improbable that you are going to generate resources exhaustion in the client. Also this solution will not work for cases of half-open connection attack.

Related

detecting connection state in epoll linux

There are many threads regarding how to detect if a socket is connected or not using various methods like getpeername / getsockopt w/ SO_ERROR. https://man7.org/linux/man-pages/man2/getpeername.2.html would be a good way for me to detect if a socket is connected or not. The problem is, it does not say anything about if the connection is in progress... So if i call connect, it is in progress, then i call getpeername, will it say it is an error (-1) even though the connection is still in progress?
If it does, I can implement a counter-like system that will eventually kill the socket if it is still in progress after x seconds.
Short Answer
I think that, if getpeername() returns ENOTCONN, that simply means that the tcp connection request has not yet succeeded. For it to not return ENOTCONN, I think the client end needs to have received the syn+ack from the server and sent its own ack, and the server end needs to have received the client's ack.
Thereafter all bets are off. The connection might subsequently be interrupted, but getpeername() has no way of knowing this has happened.
Long Answer
A lot of it depends on how fussy and short-term one wants to be about knowing if the connection is up.
Strictly Speaking...
Strictly speaking with maximum fussiness, one cannot know. In a packet switched network there is nothing in the network that knows (at any single point in time) for sure that there is a possible connection between peers. It's a "try it and see" thing.
This contrasts to a circuit switched network (e.g. a plain old telephone call), where there is a live circuit for exclusive use between peers (telephones); provided current is flowing, you know the circuit is complete even if the person at the other end of the phone call is silent.
Note that if the two computers were connected by a single Ethernet cable (no router, no switches, just a cable between NICs), that is effectively a fixed circuit (not even a circuit-switched network).
Relaxing a Little...
Focusing on what one can know about a connection in a packet switched network. As others have already said, the answer is that, really, one has to send and receive packets constantly to know if the network can still connect the two peers.
Such an exchange of packets occurs with a tcp socket connect() - the connecting peer sends a special packet to say "please can I connect to you", and the serving peer replies "yes", the client then says "thank you!" (syn->, <-syn+ack, ack->). But thereafter the packets flow between peers only if the applications send and receive data, or elects to close the connection (fin).
Calling something like getpeername() I think is somewhat misleading, depending on your requirements. It's fine, if you trust the network infrastructure and remote computer and its application to not break, and not crash.
It's possible for the connect() to succeed, then something breaks somewhere in the network (e.g. the peer's network connection is unplugged, or the peer crashes), and there is no knowledge at your end of the network that that has happened.
The first thing you can know about it is if you send some traffic and fail to get a response. The response is, initially, the tcp acks (which allows your network stack to clear out some of its buffers), and then possibly an actual message back from the peer application. If you keep sending data out into the void, the network will quite happily route packets as far as it can, but your tcp stack's buffers will fill up due to the lack of acks coming back from the peer. Eventually, your network socket blocks on a call to write(), because the local buffers are full.
Various Options...
If you're writing both applications (server and client), you can write the application to "ping pong" the connection periodically; just send a message that means nothing other than "tell me you heard this". Successful ping-ponging means that, at least within the last few seconds, the connection was OK.
Use a library like ZeroMQ. This library solves many issues with using network connections, and also includes (in modern version) socket heartbeats (i.e. a ping pong). It's neat, because ZeroMQ looks after the messy business of making, restoring and monitoring connections with a heartbeat, and can notify the application whenever the connection state changes. Again, you need to be writing both client and server applications, because ZeroMQ has it's own protocol on top of tcp that is not compatible with just a plain old socket. If you're interested in this approach, the words to look for in the API documentation is socket monitor and ZMQ_HEARTBEAT_IVL;
If, really, only one end needs to know the connection is still available, that can be accomplished by having the other end just sending out "pings". That might fit a situation where you're not writing the software at both ends. For example, a server application might be configured (rather than re-written) to stream out data regardless of whether the client wants it or not, and the client ignores most of it. However, the client knows that if it is receiving data it then also knows there is a connection. The server does not know (it's just blindly sending out data, up until its writes() eventually block), but may not need to know.
Ping ponging is also good in that it gives some indication of the performance of the network. If one end is expecting a pong within 5 seconds of sending a ping but doesn't get it, that indicates that all is not as expected (even if packets are eventually turning up).
This allows discrimination between networks that are usefully working, and networks that are delivering packets but too slowly to be useful. The latter is still technically "connected" and is probably represented as connected by other tests (e.g. calling getpeername()), but it may as well not be.
Limited Local Knowledge...
There is limited things one can do locally to a peer. A peer can know whether its connection to the network exists (e.g. the NIC reports a live connection), but that's about it.
My Opinion
Personally speaking, I default to ZeroMQ these days if at all possible. Even if it means a software re-write, that's not so bad as it seems. This is because one is generally replacing code such as connect() with zmq_connect(), and recv() with zmq_revc(), etc. There's often a lot of code removal too. ZeroMQ is message orientated, a tcp socket is stream orientated. Quite a lot of applications have to adapt tcp into a message orientation, and ZeroMQ replaces all the code that does that.
ZeroMQ is also well supported across numerous languages, either in bindings and / or re-implementations.
man connect
If the initiating socket is connection-mode, .... If the connection cannot be established immediately and O_NONBLOCK is not set for the file descriptor for the socket, connect() shall block for up to an unspecified timeout interval until the connection is established. If the timeout interval expires before the connection is established, connect() shall fail and the connection attempt shall be aborted.
If connect() is interrupted by a signal that is caught while blocked waiting to establish a connection, connect() shall fail and set errno to [EINTR], but the connection request shall not be aborted, and the connection shall be established asynchronously.
If the connection cannot be established immediately and O_NONBLOCK is set for the file descriptor for the socket, connect() shall fail and set errno to [EINPROGRESS], but the connection request shall not be aborted, and the connection shall be established asynchronously.
When the connection has been established asynchronously, select() and poll() shall indicate that the file descriptor for the socket is ready for writing.
If the socket is in blocking mode, connect will block while the connection is in progress. After connect returns, you'll know if a connection has been established (or not).
A signal could interrupt the (blocking/waiting) process, the connection routine will then switch to asynchronous mode.
If the socket is in non blocking mode (O_NONBLOCK) and the connection cannot be established immediately, connect will fail with the error EINPROGRESS and like above switching to asynchronous mode, that means, you'll have to use select or poll to figure out if the socket is ready for writing (indicates established connection).

Deny a client's TCP connect request before accept()

I'm trying code TCP server in C language. I just noticed accept() function returns when connection is already established.
Some clients are flooding with random data some clients are just sending random data for one time, after that I want to close their's current connection and future connections for few minutes (or more, depends about how much load program have).
I can save bad client IP addresses in a array, can save timings too but I cant find any function for abort current connection or deny future connections from bad clients.
I found a function for windows OS called WSAAccept that allows you deny connections by user choice, but I don't use windows OS.
I tried code raw TCP server which allows you access TCP packet from begin including all TCP header and it doesn't accept connections automatically. I tried handle connections by program side including SYN ACK and other TCP signals. It worked but then I noticed raw TCP server receiving all packets in my network interface, when other programs using high traffic it makes my program laggy too.
I tried use libnetfilter which allows you filter whole traffic in your network interface. It works too but like raw TCP server it also receiving whole network interface's packets which is making it slow when there is lot of traffic. Also I tried compare libnetfilter with iptables. libnetfilter is slower than iptables.
So in summary how I can abort client's current and future connection without hurt other client connections?
I have linux with debian 10.
Once you do blacklisting on packet level you could get very fast vulnerable to very trivial attacks based on IP spoofing. For a very basic implementation an attacker could use your packet level blacklisting to blacklist anyone he wants by just sending you many packets with a fake source IP address. Usually you don't want to touch these filtering (except you really know what you are doing) and you just trust your firewall etc. .
So I recommend really just to close the file descriptor immediately after getting it from accept.

Assign a new socket to client after receving request from 8080 in server code

C Language TCP server/client.. I want to assign a new socket for a particular client which requested my server from 8080 lets say the new socket is 8081 to get further request, and want to free the previous socket(8080) so that the other clients will request my server from 8080. is there any way of doing it in C language. (OS Ubuntu) Thanks
Your problem statement is incorrect. You can't do this even if you wanted to. The way that TCP sockets work is that accept() gives you a new socket for the incoming client connection, on the same port you are listening to. That's all you need and it's all you can get. You can't 'allocate a new socket' to the client on a new port without engaging in another TCP handshake with him, which would be nothing but a complete waste of time when you already have a connection to him. This does not preclude another connection bring accepted while this one is open. You need to read a TCP Sockets networking tutorial.
Mat and EJP have said the pertinent things above, but I thought it might help others to describe the situation more verbosely.
A TCP/IP connection is identified by a four-tuple: target IP address, target TCP port number, source IP address, and source TCP port number. The kernel will keep track of established connections based on these four things. A single server port (and IP address) can be connected to thousands of clients at the same time, limited in practice only by the resources available.
When you have a listening TCP socket, it is bound to some IP address (or wildcard address) and TCP port. Such a socket does not receive data, only new connections. When accept() is called, the server notes the new four-tuple of the connection, and hands off the file descriptor that represents that connection (as the accept() return value). The original socket is free to accept new connections. Heck, you can even have more than one thread accepting new connections if you want to, although establishing new connections in Linux is so fast you shouldn't bother; it's just too insignificant to worry about.
If establishing the connection at application level is resource-intensive -- this is true for for example encrypted connections, where agreeing to an encryption scheme and preparing the data structures needed takes typically several orders of magnitude more CPU resources than a simple TCP connection --, then it is natural to wish to avoid that overhead. Let's assume this is the point in OP's question: to avoid unnecessary application-level connection establishment when a recent client needs another connection.
The preferred solution is connection multiplexing. Simply put, the application-level protocol is designed to allow multiple data streams via a single TCP connection.
The OP noted that it would be necessary/preferable to keep the existing application protocol intact, i.e. that the optimization should be completely on the server side, transparent to the clients.
This turns the recommended solution to a completely new direction. We should not talk about application protocols, but how to efficiently implement the existing one.
Before we get to that, let's take a small detour.
Technically, it is possible to use the kernel packet filtering facilities to modify incoming packets to use a different port based on the source IP address, redirecting requests from specific IP addresses to separate ports, and making those separate ports otherwise inaccessible. Technically possible, but quite complex to implement, and with very questionable benefits.
So, let's ignore the direction OP assumed would bring the desired benefits, and look at the alternatives. Or, actually, the common approach used.
Structurally, your application has
- A piece of code accepting new connections
- A piece of code establishing the application-level resources needed for that connection
- A piece of code doing the communication with the client (serving the response to the client, per the client's request)
There is no reason for these three pieces to be consecutive, or even part of the same code flow. Use data structures to your advantage.
Instead of treating new incoming connections (accept()ed) as equal, they can be simply thrown into separate pools based on their source IP addresses. (Or, if you are up to it, have a data structure which clusters source IP addresses together, but otherwise keeps them in the order they were received.)
Whenever a worker completes a request by a client, it checks if that same client has new incoming connections. If yes, it can avoid most if not all of the application-level connection establishment by checking that the new connection matches the application-level parameters of the old one. (You see, it is possible that even if the source IP address is the same, it could be a completely different client, for example if the clients are under the same VPN or NATted subnet.)
There are quite a few warts to take care of, for example how to keep the priorities, and avoid starving new IP addresses if known clients try to hog the service.
For protocols like HTTP, where the client sends the request information as soon as the server accepts the connection, there is an even better pattern to apply: instead of connection pools, have request pools. A single thread or a thread pool can receive the requests (they may span multiple packets in most protocols), without acting on them; only detecting when the request itself is complete. (A careful server will limit the number of pending requests, and the number of incomplete request, to avoid vulnerability to DOS.)
When the requests are complete, they are grouped, so that the same "worker" who serves one request, can serve another similar request with minimal overhead. Again, some careful thought is needed to avoid the situation where a prolific client hogs the server resources by sending a lot of requests, but it's nothing some careful thought and testing won't resolve.
One question remains:
Do you need to do this?
I'd wager you do not. Apache, which is one of the best HTTP servers, does not do any of the above. The performance benefits are not considered worth the extra code complexity. Could you write a new HTTP server (or a server for whatever protocol you're working with), and use a scheme similar to above, to make sure you can use your hardware as efficiently as possible? Sure. You don't even need to be a wizard, just do some research and careful planning, and avoid getting caught in minute details, keeping the big picture in mind at all times.
I firmly believe that code maintainability and security is more important than efficiency, especially when writing an initial implementation. The information gained from the first implementation has thus far always changed how I perceive the actual "problem"; similar to opening new eyes. It has always been worth it to create a robust, easy to develop and maintain, but not necessarily terribly efficient implementation, for the first generation. If there is someone willing to support the development of the next generation, you not only have the first generation implementation to compare (and verify and debug) against, but also all the practical knowledge gained.
That is also the reason old hands warn so often against premature optimization. In short, you end up optimizing resource waste and personal pain, not the implementation you're developing.
If I may, I'd recommend the OP back up a few steps, and actually describe what they intend to implement, what the observed problem with the implementation is, and suggestions on how to fix and avoid the problem. The current question is like asking how to better freeze a banana, as it keeps shattering when you hammer nails with it.

How to use SO_KEEPALIVE option properly to detect that the client at the other end is down?

I was trying to learn the usage of option SO_KEEPALIVE in socket programming in C language under Linux environment.
I created a server socket and used my browser to connect to it. It was successful and I was able to read the GET request, but I got stuck on the usage of SO_KEEPALIVE.
I checked this link keepalive_description#tldg.org but I could not find any example which shows how to use it.
As soon as I detect the client's request on accept() function I set the SO_KEEPALIVE option value 1 on the client socket. Now I don't know, how to check if the client is down, how to change the time interval between the probes sent etc.
I mean, how will I get the signal that the client is down? (Without reading or writing at the client - I thought I will get some signal when probes are not replied back from client), how should I program it after setting the option SO_KEEPALIVE on).
Also if suppose the probes are sent every 3 secs and the client goes down in between I will not get to know that client is down and I may get SIGPIPE.
Anyways importantly I wanna know how to use SO_KEEPALIVE in the code.
To modify the number of probes or the probe intervals, you write values to the /proc filesystem like
echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes
Note that these values are global for all keepalive enabled sockets on the system, You can also override these settings on a per socket basis when you set the setsockopt, see section 4.2 of the document you linked.
You can't "check" the status of the socket from userspace with keepalive. Instead, the kernel is simply more aggressive about forcing the remote end to acknowledge packets, and determining if the socket has gone bad. When you attempt to write to the socket, you will get a SIGPIPE if keepalive has determined remote end is down.
You'll get the same result if you enable SO_KEEPALIVE, as if you don't enable SO_KEEPALIVE - typically you'll find the socket ready and get an error when you read from it.
You can set the keepalive timeout on a per-socket basis under Linux (this may be a Linux-specific feature). I'd recommend this rather than changing the system-wide setting. See the man page for tcp for more info.
Finally, if your client is a web browser, it's quite likely that it will close the socket fairly quickly anyway - most of them will only hold keepalive (HTTP 1.1) connections open for a relatively short time (30s, 1 min etc). Of course if the client machine has disappeared or network down (which is what SO_KEEPALIVE is really useful for detecting), then it won't be able to actively close the socket.
As already discussed, SO_KEEPALIVE makes the kernel more aggressive about continually verifying the connection even when you're not doing anything, but does not change or enhance the way the information is delivered to you. You'll find out when you try to actually do something (for example "write"), and you'll find out right away since the kernel is now just reporting the status of a previously set flag, rather than having to wait a few seconds (or much longer in some cases) for network activity to fail. The exact same code logic you had for handling the "other side went away unexpectedly" condition will still be used; what changes is the timing (not the method).
Virtually every "practical" sockets program in some way provides non-blocking access to the sockets during the data phase (maybe with select()/poll(), or maybe with fcntl()/O_NONBLOCK/EINPROGRESS&EWOULDBLOCK, or if your kernel supports it maybe with MSG_DONTWAIT). Assuming this is already done for other reasons, it's trivial (sometimes requiring no code at all) to in addition find out right away about a connection dropping. But if the data phase does not already somehow provide non-blocking access to the sockets, you won't find out about the connection dropping until the next time you try to do something.
(A TCP socket connection without some sort of non-blocking behaviour during the data phase is notoriously fragile, as if the wrong packet encounters a network problem it's very easy for the program to then "hang" indefinitely, and there's not a whole lot you can do about it.)
Short answer, add
int flags =1;
if (setsockopt(sfd, SOL_SOCKET, SO_KEEPALIVE, (void *)&flags, sizeof(flags))) { perror("ERROR: setsocketopt(), SO_KEEPALIVE"); exit(0); };
on the server side, and read() will be unblocked when the client is down.
A full explanation can be found here.

sockets - discover firewalled ports

I was reading the nmap source code because I'd like to find out how does it discover that certain ports are filtered or firewalled. I have some experience with sockets in c and i've built simple port scanners, that's easy - if the connection succeeds, the port is open, otherwise it's closed (because of the RST returned). But in case with the firewalled ports, they don't return RST packet back, and my port scanner just "waits" forever.
If someone's got experience with this topic, please point me to the parts of the nmap code where the actual scanning and port-state determination occurs, or at least tell me if there are any other codes available which deal with this problem.
Use asynchronous socket API calls (i.e. don't wait for the connection to be established, and instead try the next port/address in parallel) and define a reasonable timeout (e.g. if the connection isn't established after a minute you can consider it filtered).

Resources