I have used upto 1000 sockets in epoll. Is it possible to use million sockets in single epoll ? Is it efficient ?.
500,000 TCP connections from a single server is the gold standard these days. The record is over a million. It does require kernel tuning. See, for example, Linux Kernel Tuning for C500k. (https://news.ycombinator.com/item?id=1740823)
Unlike with select(), there is no intrinsic limit on the number of sockets managed by epoll(). So long as you don't hit any extrinsic limits on the number of sockets in general, like the maximum number of file descriptors in the system, or kernel memory, you can use as many sockets as you want with epoll().
Related
If I want to sniffing packet in linux without set any filters, I saw 2 options.
Use libpcap
Use raw socket myself like https://www.binarytides.com/packet-sniffer-code-in-c-using-linux-sockets-bsd-part-2/
Why libpcap is better than use raw sockets myself?
Three reasons:
1) It's way easier to correctly set up.
2) It's portable, even to Windows, which uses a quite similar yet different API for sockets.
3) It's MUCH faster.
1 and 2, IMO, don't need much explanation. I'll dive into 3.
To understand why libpcap is (generally) faster, we need to understand the bottlenecks in the socket API.
The two biggest bottlenecks that libpcap tends to avoid are syscalls and copies.
How it does so is platform-specific.
I'll tell the story for Linux.
Linux, since 2.0 IIRC, implements what it calls AF_PACKET socket family, and later with it PACKET_MMAP. I don't exactly recall the benefits of the former, but the latter is critical to avoid both copying from kernel to userspace (there are still a few copies kernel-side) and syscalls.
In PACKET_MMAP you allocate a big ring buffer in userspace, and then associate it to an AF_PACKET socket. This ring buffer will contain a bit of metadata (most importantly, a marker that says if a region is ready for user processing) and the packet contents.
When a packet arrives to a relevant interface (generally one you bind your socket to), the kernel makes a copy in the ring buffer and marks the location as ready for userspace*.
If the application was waiting on the socket, it gets notified*.
So, why is this better than raw sockets? Because you can do with few or none syscalls after setting up the socket, depending on whether you want to busy-poll the buffer itself or wait with poll until a few packets are ready, and because you don't need the copy from the socket's internal RX buffer to your user buffers, since it's shared with you.
libpcap does all of that for you. And does it on Mac, *BSD, and pretty much any platform that provides you with faster capture methods.
*It's a bit more complex on version 3, where the granularity is in "blocks" instead of packets.
I am building an UDP port scanner in C.
This is a scheme of the code
Create Socket
Structure raw UDP packet with port i
Send packet and wait n miliseconds for reply
I need to perform those tasks X times, depending on the number of ports to be scanned. It may be up to 65535 times.
My goal is to optimize resources, considering an i386 machine running under a 3.5.0-17-generic Linux kernel.
How many threads should be created?
How many packets should be sent inside a single thread?
Thanks for your attention.
One thread, using select, epoll or similar.
All of them. Remember to rate limit since that doesn't happen automatically with UDP.
even though a lot was said on the topic, I am still stumped.
I experiment with a monster linux server capable of handling proper load ramps, presumably many thousand connections a second. Now, if i check default listen() queue:
#cat /proc/sys/net/core/somaxconn
128
which couldn't be actual queue size at all. I suspect it might be a legacy, and actual size is given by this:
#cat /proc/sys/net/ipv4/tcp_max_syn_backlog
2048
However, man tcp says the latter is connections awaiting ACK from clients, which is different from total number of connections having not yet been accepted, which is what listen() backlog is.
So my question is how can I increase listen() backlog, and how to get/set upper limit of it (right before kernel recompilation)?
somaxconn is the number of complete connections waiting.
tcp_max_syn_backlog is the number of incomplete connections waiting.
They aren't the same thing. It's all described in the man page.
You increase it by following these instructions: https://serverfault.com/questions/271380/how-can-i-increase-the-value-of-somaxconn - basically by using sysctl.
And yes, somaxconn is the cap on listen backlog.
I'm using Linux 64 bit Linux scv 3.2.0-39-generic #62-Ubuntu SMP Thu Feb 28 00:28:53 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux and have two processes using sockets which run on the same physical host.
One process (A) sends on a TCP/IP socket (would be a local socket given the host is the same) the following pieces of data:
276 bytes
16 bytes
This is done in 0.000023 seconds form process A. The data is being sent calling 2 times the send socket API.
Another process (B), receives the data via epoll using epoll_wait(efd, events, 10, 5). Data is received as follows (time is taken with clock_gettime(CLOCK_REALTIME, &cur_ts);, what matters is relative difference):
Read data from socket buffer at 8051.177743 (276)
Call to epoll 8051.177763 again
Read data from socket buffer 8051.216250 (16)
Making the receiving process lag of 0.038507 seconds. Basically if the sending process A takes less than a ms, on the receiving side epoll to receive the data adds an additional lag of approximately 0.038 s.
Is this expected? What am I doing wrong?
Or how can I improve the situation?
Thanks
Is this expected? ...
Yes. I would expect that. Here's why:
What am I doing wrong? ...
epoll was designed to be used in situations where large numbers of file descriptors need to be watched. That's what it's suitable for, and it seems to me that the situation you're using it for isn't that situation.
... how can I improve the situation?
If you want to improve the performance, use the right tool for the job. Don't use epoll for a single socket. Just use plain-old vanilla recv. If you're handling two or three sockets, consider using poll or select. If you're venturing into hundreds, then you might want to consider using epoll or kqueue.
I need to perform data filtering based on the source unicast IPv4 address of datagrams arriving to a Linux UDP socket.
Of course, it is always possible to manually perform the filtering based on the information provided by recvfrom, but I am wondering if there could be another more intelligent/efficient approach (if possible, not using libpcap).
Any ideas?
If it's a single source you need to allow, then use just connect(2) and kernel will do filtering for you. As a bonus, connected UDP sockets are more efficient. This, of cource, does not work for more then one source.
As already stated, NetFilter (the Linux firewall) can help you here.
You could also use the UDP options of xinetd and tcpd to perform filtering.
What proportion of datagrams are you expecting to discard? If it is very high, then you may want to review your application design (for example, to make the senders not send so many datagrams which are to be discarded). If it is not very high, then you don't really care about how much effort you spend discarding them.
Suppose discarding a packet takes the same amount of (runtime) effort as processing it normally; if you discard 1% of packets, you will only be spending 1% of time discarding. However, realistically, discarding is likely to be much easier than processing messages.