Thousands of IP Addresses/Interfaces vs. slow program performance - c

I have a CentOS 5.9 machine set up with 5000+ IP addresses (secondary) for eth2.
My program only uses 2 for 2 UDP sockets (1 RX, 1 TX).
When I run the application, the CPU usage is almost 100% all the time.
When I drop down the number of the IP addresses (10), everything go to the normal - hardly 1% CPU usage.
Program is basically a client - server application. It uses non blocking r/w and epoll_wait()
for event waiting.
Can someone please explain to me why so high CPU usage for binary that only use small portion
of configured addresses.

I don't think the question posted talks about number of sockets but rather number of addresses on the interface. Although it seems a little strange as to why your program goes too high in CPU with this number, but in general number of addresses will affect the performance of the IP stack to deal with incoming packets and outgoing packets. Like when you call a send, and your socket is not bound, kernel needs to determine an IP address to put in the packet based on the destination address, and if that takes time it will show up in your process context.
But these still does not explain much, I guess putting a gprof will be a good idea.

Handling thousands of sockets takes specialized software. Most network programmers naively use "select" and expect that to scale up to thousands of sockets well... which it definitely does not. A more event-driven model scales much better ... the event being a new socket or data on the socket, etc.
For Linux and Windows I use Libevent. It's a socket wrapper and not very hard to use and it scales nicely to ten-of-thousands of sockets.
http://libevent.org/
Look at the website here and you can see the logarithmic graph that shows tens of thousands of sockets performing as though they were 100. Of course, if the sockets are super busy, then you are right back to low-performance, but most sockets in the world are mostly quiet and this is where libevent shines. There are other libraries as well like ZeroMq (C# mono), libev, Boost.ASIO.
http://zeromq.org/
http://libev.schmorp.de/bench.html
http://www.boost.org/doc/libs/1_36_0/doc/html/boost_asio.html
Here is my working, super-simple sample. You'll need to add threading protections but with less than an hour's work, you could easily support a few thousand simultaneous connections.
http://pastebin.com/g02S2RTi

Related

Upload in a restricted country is too slow

I'm a C programmer and may write programs in Linux area to connect machines over internet. After I examined that speedtest.net can't upload or has a very poor upload speed, I decided to examine a simple TCP socket connection and see whether it's really slow, and found that yes, it's really slow. I've rented a VPS outside of my country. I don't know what's happening by the government in infrastructure and how packets are routed and how they're restricted. Examining what I saw in speedtest.net, again in a simple socket connection proves me that I can't have a chance. When the traffic in shaped so, there's no way. It proves that it's not a restriction on HTTPS or any application layer protocol, when just a simple TCP socket connection also can't succeed to gain reasonable speed. The speed is below 10 kilobytes per second! Damn!
In contrast, after I got disappointed, I examined some barrier breakers like CyberGhost extension for Chrome. I wondered when I saw that it may overcome the barrier by increasing the upload speed to about 200 kilobytes per second! How?! They can't use any method closer to hardware than sockets.
Now I come here to consult with you and see what ideas you may have about it, so that I may write a program or change the written program based on it.
Thank you

zero copy udp socket using sendfile instead of sendto

I'm working with udp sockets in real time environment. I am currently using standard socket functions sendto() which takes relatively a lot of time. I read that it is possible to use zero copy, that, if I understand well, avoids having extra time added by copying files from user/kernel environment. However, I see that sendfile() allows only to copy from one file descriptor to another. I can't see how I can use that to send UDP packets, which in my case is a buffer. So my question is:
is it even possible to use sendfile() to send UDP packets ?
If so,what is the correct way of doing this ?
Edit
I am working on a real time platform where I have several operations plus the sending over the socket. All of these should not take more than 1ms. I tried on three machines, the first has 4 cores at 3.4GHz, the other 8 cores at 2.3GHz and the last one 4 cores at 1.4GHz. On the first one it takes less than 1µs to send a 720 bytes packet. While on the two others it is between 6 and 9µs. I'm using a linux low latency kernel, and deactivated all CPU power management features, so all the CPUs are at max frequency.
I noticed that if the time taken by sendto() is larger than 6µs, the platform simply does not work. One other precision, I have several threads running in parallel. So maybe it is just the CPU processing other threads while the sendto() has not completed yet. I'm wondering it this is possible, to stop the sendto() while in process to do someting else.
This is why I was trying to find other solutions to do optimization somewhere else, and I thought that using sendfile() would avoid additional times to be saved.
I am not sure if sendfile works with UDP sockets, however, memfd_create creates a file descriptor from memory and theoretically could allow bypassing the copying from the user space to the kernel.
Still though, when sending the kernel has to copy the data into the kernel socket buffer first because it needs to prepend user data with UDP, IP and Ethernet headers, which cannot be done in-place. This copying cannot be avoided even when using sendfile.
To do real zero-copy networking you may like to have a look at PF_RING ZC (Zero Copy) drivers:
On-Demand Kernel Bypass with PF_RING Aware Drivers
PF_RING™ ZC comes with a new generation of PF_RING™ aware drivers that can be used both in kernel or bypass mode. Once installed, the drivers operate as standard Linux drivers where you can do normal networking (e.g. ping or SSH). When used from PF_RING™ they are quicker than vanilla drivers, as they interact directly with it. If you open a device using a PF_RING-aware driver in zero copy (e.g. pfcount -i zc:eth1) the device becomes unavailable to standard networking as it is accessed in zero-copy through kernel bypass, as happened with the predecessor DNA. Once the application accessing the device is closed, standard networking activities can take place again.

Sending UDP and TCP packets on the same network line - how to prevent UDP drops? [duplicate]

Consider the prototypical multiplayer game server.
Clients connecting to the server are allowed to download maps and scripts. It is straightforward to create a TCP connection to accomplish this.
However, the server must continue to be responsive to the rest of the clients via UDP. If TCP download connections are allowed to saturate available bandwidth, UDP traffic will suffer severely from packet loss.
What might be the best way to deal with this issue? It definitely seems like a good idea to "throttle" the TCP upload connection somehow by keeping track of time, and send() on a regular time interval. This way, if UDP packet loss starts to occur more frequently the TCP connections may be throttled further. Will the OS tend to still bunch the data together rather than sending it off in a steady stream? How often would I want to be calling send()? I imagine doing it too often would cause the data to be buffered together first rendering the method ineffective, and doing it too infrequently would provide insufficient (and inefficient use of) bandwidth. Similar considerations exist with regard to how much data to send each time.
It sounds a lot like you're solving a problem the wrong way:
If you're worried about losing UDP packets, you should consider not using UDP.
If you're worried about sharing bandwidth between two functions, you should consider having separate pipes (bandwidth) for them.
Traffic shaping (which is what this sounds like) is typically addressed in the OS. You should look in that direction before making strange changes to your application.
If you haven't already gotten the application working and experienced this problem, you are probably prematurely optimizing.
To avoid saturating the bandwidth, you need to apply some sort of rate limiting. TCP actually already does this, but it might not be effective in some cases. For example, it has no idea weather you consider the TCP or UDP traffic to be the more important.
To implement any form of rate limiting involving UDP, you will first need to calculate UDP loss rate. UDP packets will need to have sequence numbers, and then the client has to count how many unique packets it actually got, and send this information back to the server. This gives you the packet loss rate. The server should monitor this, and if packet loss jumps after a file transfer is started, start lowering the transfer rate until the packet loss becomes acceptable. (You will probably need to do this for UDP anyway, since UDP has no congestion control.)
Note that while I mention "server" above, it could really be done either direction, or both. Depending on who needs to send what. Imagine a game with player created maps that transfer these maps with peer-to-peer connections.
While lowering the transfer rate can be as simple as calling your send function less frequently, attempting to control TCP this way will no doubt conflict with the existing rate control TCP has. As suggested in another answer, you might consider looking into more comprehensive ways to control TCP.
In this particular case, I doubt it would be an issue, unless you really need to send lots of UDP information while the clients are transferring files.
I wold expect most games to just show a loading screen or a lobby while this is happening. Neither should require much UDP traffic unless your game has it's own VOIP.
Here is an excellent article series that explains some of the possible uses of both TCP and UDP, specifically in the context of network games. TCP vs. UDP
In a later article from the series, he even explains a way to make UDP 'almost' as reliable as TCP (with code examples).
And as always... and measure your results. You have no way of knowing if your code is making the connections faster or slower unless you measure.
"# If you're worried about losing UDP packets, you should consider not using UDP."
Right on. UDP means no guarentee of packet delivery, especially over the internet. Check the TCP speed which is quite acceptable in modern day internet connections for most users playing games.

UDP vs TCP in cluster communication

I'm working on a software that will be running on HP blades/Linux, each blade will have multiple programs and I'm considering UDP for the IPC communication. The size of messages between the blades/programs won't be bigger than a 400Bytes.
I used to use TCP before and I'm not experienced in using UDP so the question here is this, is using UDP for cluster communication wise based on your experience?
It depends on your requirements for reliability. As you know UDP provides no delivery guarantees or even ordering guarantees (packets may arrive out-of-order). If your application is tolerant of this, or if you can make it tolerant with relatively simple code, UDP is definitely a better choice - it is lower latency, lower overhead, and programmatically simpler to deal with.
If reliability is an absolute requirement, then unless you're really hard core and trying to squeeze every last ounce of performance out of your cluster, just use TCP. Otherwise you'll simply find yourself trying to reinvent the mechanisms TCP uses to guarantee reliability, and you probably won't do as good a job as TCP does (it's had decades of tweaking and tuning).
Also note that on small LANs, despite the lack of any guarantee, UDP is quite reliable, but even in a perfect setup you still have to expect the occasional dropped packet. The more complex your cluster's network gets and the higher the bandwidth utilization of the system, the less reliable it will be.

Maximizing performance on udp

im working on a project with two clients ,one for sending, and the other one for receiving udp datagrams, between 2 machines wired directly to each other.
each datagram is 1024byte in size, and it is sent using winsock(blocking).
they are both running on a very fast machines(separate). with 16gb ram and 8 cpu's, with raid 0 drives.
im looking for tips to maximize my throughput , tips should be at winsock level, but if u have some other tips, it would be great also.
currently im getting 250-400mbit transfer speed. im looking for more.
thanks.
Since I don't know what else besides sending and receiving that your applications do it's difficult to know what else might be limiting it, but here's a few things to try. I'm assuming that you're using IPv4, and I'm not a Windows programmer.
Maximize the packet size that you are sending when you are using a reliable connection. For 100 mbs Ethernet the maximum packet is 1518, Ethernet uses 18 of that, IPv4 uses 20-64 (usually 20, thought), and UDP uses 8 bytes. That means that typically you should be able to send 1472 bytes of UDP payload per packet.
If you are using gigabit Ethernet equiptment that supports it your packet size increases to 9000 bytes (jumbo frames), so sending something closer to that size should speed things up.
If you are sending any acknowledgments from your listener to your sender then try to make sure that they are sent rarely and can acknowledge more than just one packet at a time. Try to keep the listener from having to say much, and try to keep the sender from having to wait on the listener for permission to keep sending.
On the computer that the sender application lives on consider setting up a static ARP entry for the computer that the receiver lives on. Without this every few seconds there may be a pause while a new ARP request is made to make sure that the ARP cache is up to date. Some ARP implementations may do this request well before the ARP entry expires, which would decrease the impact, but some do not.
Turn off as many users of the network as possible. If you are using an Ethernet switch then you should concentrate on the things that will introduce traffic to/from the computers/network devices on which your applications are running reside/use (this includes broadcast messages, like many ARP requests). If it's a hub then you may want to quiet down the entire network. Windows tends to send out a constant stream of junk to networks which in many cases isn't useful.
There may be limits set on how much of the network bandwidth that one application or user can have. Or there may be limits on how much network bandwidth the OS will let it self use. These can probably be changed in the registry if they exist.
It is not uncommon for network interface chips to not actually support the maximum bandwidth of the network all the time. There are chips which may miss packets because they are busy handling a previous packet as well as some which just can't send packets as close together as Ethernet specifications would allow. Additionally the rest of the system might not be able to keep up even if it is.
Some things to look at:
Connected UDP sockets (some info) shortcut several operations in the kernel, so are faster (see Stevens UnP book for details).
Socket send and receive buffers - play with SO_SNDBUF and SO_RCVBUF socket options to balance out spikes and packet drop
See if you can bump up link MTU and use jumbo frames.
use 1Gbps network and upgrade your network hardware...
Test the packet limit of your hardware with an already proven piece of code such as iperf:
http://www.noc.ucf.edu/Tools/Iperf/
I'm linking a Windows build, it might be a good idea to boot off a Linux LiveCD and try a Linux build for comparison of IP stacks.
More likely your NIC isn't performing well, try an Intel Gigabit Server Adapter:
http://www.intel.com/network/connectivity/products/server_adapters.htm
For TCP connections it has been shown that using multiple parallel connections will better utilize the data connection. I'm not sure if that applies to UDP, but it might help with some of the latency issues of packet processing.
So you might want to try multiple threads of blocking calls.
As well as Nikolai's suggestion of send and recv buffers, if you can, switch to overlapped I/O and have many recvs pending, this also helps to minimise the number of datagrams that are dropped by the stack due to lack of buffer space.
If you're looking for reliable data transfer, consider UDT.

Resources