force UDP broadcast via the network (disable loopback) - c

I want to send a UDP broadcast datagram to multiple devices on the network, including the sender device itself. The goal is to have all devices receive the data at the EXACT same time (well, +/- 5ms is OK).
The problem is that the network interface on the sending device is looping the data back, so it is received immediately (in contrast to the other devices where network latency comes into play - quite a bit for Wifi for instance)
Any idea how I can disable my network interface to loop the data back directly?
Another idea I had: Is it possible to create a virtual network interface to send the broadcast packet and listen on another interface which only receives it via the network?
I am trying to do that in C on a Linux machine. Any help would be greatly appreciated!

UDP are sent as IP-payload. The routing of IP packets is a domain of the IP stack. It decides how a packet is transferred to the destination. When you IP stack detects that the destination is the local host it will enqueue the packet in the receive queue and the packet will be available immediatly. If your adapters' send queues are filled that you will have a delay. So you can't make a synchronization with this concept.
If you need a hard synchronization you should utilize NTP or SNTP tro synchronize the clocks and define a comment start time for your desired common operation.
Edit:
The (S)NTP protocol is designed to synchronize at millisecond Level. You will get a precision that you can't achieve with any Transmission of UDP packets due to the reason I described above.

Related

lwIP echo server error sending data

I am working with the ethernet communication under echo server lwIP. I would like to capture samples from DMA to the HOST by ethernet. The system captures samples via UART.
I am not able to make lwIP to send more than 2 packages higher than 1500 bytes without waiting for ACK. My application sends packet continuously to the client. Client receives the packet without any delay but it sends the ACK after 200ms (see attached wireshark capture image). LWIP get stuck always waiting for ACK packet before it sends the next packet. My lwIP could only send no more than 2 TCP segment and then wait for ACK. The network delay will cause performance to get down.
Is there any configuration which makes the LWIP to send packet without waiting for the ACK packet? Do you have any suggestion?
If you don't want to wait how about using UDP instead of TCP? TCP is a stream protocol and is going to ensure that everything arrives and is in-order (so long as there aren't errors). echo usually makes me think of a situation where you don't care about ordering, only whether a particular packet makes it or not and how long it took.

Injecting an incoming packet to a network interface

I want to be able to simulate an incoming packet on a certain physical network interface.
Specifically, given an array of bytes and an interface name, I want to be able to make that interface think a packet containing those bytes arrived from another interface (most likely on another machine).
I've implemented the code that prepares the packet, but I'm unsure what the next step is.
I should point out that I actually need to feed the interface with my bytes, and not use a workaround that might produce a similar results in other machines (I've seen answers to other questions mentioning the loopback interface and external tools). This code is supposed to simulate traffic on a machine that's expecting to receive traffic from certain sources via specific interfaces. Anything else will be ignored by the machine.
I'm going to stick my neck out and say this is not possible without kernel modifications, and possibly driver modifications. Note that:
There are plenty of ways of generating egress packets through a particular interface, including libpcap. But you want to generate ingress packets.
There are plenty of ways of generating ingress packets that are not through a physical interface - this is what tap/tun devices are for.
If you modify the kernel to allow direct injection of packets into a device's receive queue, that may have unexpected effects, and is still not going to be an accurate simulation of the packets arriving in hardware (e.g. they will not be constrained to the same MTU etc). Perhaps you can build an iptables extension that fools the kernel into thinking the packet came from a different interface; I'm not sure that will do what you need though.
If all you need is simulation (and you are happy with a complete simulation), build a tap/tun driver, and rename the tap interface to eth0 or similar.
Depending on which network layer you're trying to simulate, there may be a work-around.
I have had success getting ip packets into the ingress queue with an ethernet 'hairpin'. That is, by setting the source and destination MAC address to the local interface, sending the packet results in it first appearing as an egress packet, then being 'hairpinned' and also appearing as an ingress packet.
This at least works under linux using pcapplusplus (libpcap under the hood), with my wireless interface. Your millage may vary.
This will obviously only suit your needs if you're OK with modifying the ethernet header, ie only simulating a higher layer.
Here is a snippet of c++ where I spoof a rst tcp packet for a local socket:
//always use the actual device source MAC, even if we're spoofing the remote rst
// this produces a 'hairpin' from the egress to the ingress on the interface so the tcp stack actually processes the packet
// required because the tcp stack doesn't process egress packets (at least on a linux wireless interface)
pcpp::EthLayer eth(localMAC,localMAC);
pcpp::IPv4Layer ip(remoteIP, localIP);
pcpp::TcpLayer tcp(remotePort, localPort);
pcpp::Packet pac(60);
ip.getIPv4Header()->timeToLive = 255;
tcp.getTcpHeader()->rstFlag = 1;
tcp.getTcpHeader()->ackFlag = 1;
tcp.getTcpHeader()->ackNumber = pcpp::hostToNet32(src.Ack);
tcp.getTcpHeader()->sequenceNumber = pcpp::hostToNet32(src.Seq);
pac.addLayer(&eth);
pac.addLayer(&ip);
pac.addLayer(&tcp);
pac.computeCalculateFields();
dev->sendPacket(&pac);
EDIT: the same code works on windows on an ethernet interface. It doesn't seem to do the same 'hairpin' judging from wireshark, but the tcp stack does process the packets.
Another solution is to create a new dummy network device driver, which will have the same functionality as the loopback interface (i.e. it will be dummy). After that you can wrap up a creation of simple tcp packet and specify in the source and destination addresses the addresses of the two network devices.
It sounds a little hard but it's worth trying - you'll learn a lot for the networking and tcp/ip stack in linux.

How to implement an ethernet modem

Okay, what I want to do, as a training exercise, is to implement something like this
client --ethernet--> Modem1 --GPIO--> Modem2 --ethernet--> My Home Router
Where the client connects to Modem1 using an ethernet cable.
Modem1 is a Raberry PI, converting the signal and relaying it via the GPIO
Modem2 is a Raberry PI, receives the data from the GPIO, and send it via the ethernet cable to my home router
I want to implement the Modems, but have little idea where to start.
I have read up a little on ethernet programming, but still can't find answers to the "simple stuff" like.
How do I implement Modem1 so that when its connected to the client, the client discovers it as an internet connection.
On the Modem2 end, how do I make "My Home Router" send packets meant for the "client" to Modem2, so that Modem2 may forward them.
and possibly things I haven't though of....
So, how, concretely, can I implement this? preferably in c.
I'd venture to say you might be able to write some sort of custom GPIO intermediate layer.
Read Ethernet->Encapsulate->Write GPIO->|->Read GPIO->Decapsulate->Write Ethernet
(and vice versa)
The problem then becomes: How can both modems act as "Ethernet proxies"?
Modem1 acts as a proxy for the router. Modem2 acts as a proxy for the client. If your Raspberry Pi can spoof MAC addresses, you might be able to fool Ethernet peers into communicating with your modems' Ethernet port. The reason why you need to spoof MAC addresses is that in TCP/IP networking, there is the ARP table, which maps remote IP addresses to the MAC address that can route IP packets to/from them. This is what allows your client to communicate to your router over TCP/IP.
Another potential pitfall is where your modem communication introduces delays that interfere with the Ethernet layer's handling of the protocol. For example, the Ethernet protocol may have real-time constraints that could be shattered if you introduce delays...
But let's assume anything is possible in a perfect world...
You'll need to write code for reading/writing Ethernet messages (I've seen open source code for reading/writing Ethernet packets over raw sockets in Linux)
You'll need to write a custom driver for your GPIO comms.
This means implementing a carefully thought-out protocol to manage pins state, start-of-message, end-of-message, data-payload, checksum, whatever...
Finally, you'll need to write a top-level communications layer that implements:
Ethernet-to-GPIO process:
a) read from Ethernet port, encapsulates Ethernet packet into a custom message (or message fragments)
b) communicate this custom message, using your custom GPIO protocol driver, to the external GPIO peer
GPIO-to-Ethernet process:
a) Read from GPIO, using your custom driver code
b) Decapsulate Ethernet packet
c) Write Ethernet packet to Ethernet port.
these two processes run forever...
Again, all hinges on whether or not your modems can insert themselves in an peer-to-peer connection without disturbing the natural flow of the Ethernet protocol...
As for the 'C' part...
If you use open source libraries (or code snippets) for reading/writing raw Ethernet via raw sockets, that is most likely written in C.
Your GPIO code will read write from the GPIO pins in one of two ways: from a memory mapped H/W address, or using ioport calls on that H/W address.
Receive raw Ethernet frames in Linux
Send a raw Ethernet frame in Linux
Good luck

In linux, why do I lose UDP packets if I call send() as fast as possible?

The implicit question is: If Linux blocks the send() call when the socket's send buffer is full, why should there be any lost packets?
More details:
I wrote a little utility in C to send UDP packets as fast as possible to a unicast address and port. I send a UDP payload of 1450 bytes each time, and the first bytes are a counter which increments by 1 for every packet. I run it on a Fedora 20 inside VirtualBox on a desktop PC with a 1Gb nic (=quite slow).
Then I wrote a little utility to read UDP packets from a given port which checks the packet's counter against its own counter and prints a message if they are different (i.e. 1 or more packets have been lost). I run it on a Fedora 20 bi-xeon server with a 1Gb ethernet nic (=super fast). It does show many lost packets.
Both machines are on a local network. I don't know exactly the number of hops between them, but I don't think there are more than 2 routers between them.
Things I tried:
Add a delay after each send(). If I set a delay of 1ms, then no packets are lost any more. A delay of 100us will start losing packets.
Increase the receiving socket buffer size to 4MiB using setsockopt(). That does not make any difference...
Please enlighten me!
For UDP the SO_SNDBUF socket option only limits the size of the datagram you can send. There is no explicit throttling send socket buffer as with TCP. There is, of course, in-kernel queuing of frames to the network card.
In other words, send(2) might drop your datagram without returning an error (check out description of ENOBUFS at the bottom of the manual page).
Then the packet might be dropped pretty much anywhere on the path:
sending network card does not have free hardware resources to service the request, frame is discarded,
intermediate routing device has no available buffer space or implements some congestion avoidance algorithm, drops the packet,
receiving network card cannot accept ethernet frames at given rate, some frames are just ignored.
reader application does not have enough socket receive buffer space to accommodate traffic spikes, kernel drops datagrams.
From what you said though, it sounds very probable that the VM is not able to send the packets at a high rate. Sniff the wire with tcpdump(1) or wireshark(1) as close to the source as possible, and check your sequence numbers - that would tell you if it's the sender that is to blame.
Even if send() blocks when the send buffer is full (provided that you didn't set SOCK_NONBLOCK on the socket to put it in non-blocking mode) the receiver must still be fast enough to handle all incoming packets. If the receiver or any intermediate system is slower than the sender, packets will get lost when using UDP. Note that slower does not only apply to the speed of the network interface but to the whole network stack plus the userspace application.
In your case it is quite possible that the receiver is receiving all packets but can't handle them fast enough in userpace. You can check that by recording and analyzing your traffic via tcpdump or wireshark.
If you don't want to loose packets then switch to TCP.
Any of the two routers you mentionned might drop packets if there is an overload,
and the receiving PC might drop or miss packets as well under certain circumstances such as overload.
As one of the above posters said, UDP is a simple datagram protocol that does not guarantee delivery. Either because of local machine, equipments on the network,etc. That is the reason why many current developers will recommend, if you want reliability, to switch to TCP. However, if you really want to stick to the UDP protocol and there are many valid reasons to do that, you will need to find a library that will help you guarantee the delivery. Look for SS7 projects especially in telephony APIs where UDP is used to transmit voice,data and signalling information. For your sole purpose app may i suggest the enet UDP library.http://enet.bespin.org/

Maximizing performance on udp

im working on a project with two clients ,one for sending, and the other one for receiving udp datagrams, between 2 machines wired directly to each other.
each datagram is 1024byte in size, and it is sent using winsock(blocking).
they are both running on a very fast machines(separate). with 16gb ram and 8 cpu's, with raid 0 drives.
im looking for tips to maximize my throughput , tips should be at winsock level, but if u have some other tips, it would be great also.
currently im getting 250-400mbit transfer speed. im looking for more.
thanks.
Since I don't know what else besides sending and receiving that your applications do it's difficult to know what else might be limiting it, but here's a few things to try. I'm assuming that you're using IPv4, and I'm not a Windows programmer.
Maximize the packet size that you are sending when you are using a reliable connection. For 100 mbs Ethernet the maximum packet is 1518, Ethernet uses 18 of that, IPv4 uses 20-64 (usually 20, thought), and UDP uses 8 bytes. That means that typically you should be able to send 1472 bytes of UDP payload per packet.
If you are using gigabit Ethernet equiptment that supports it your packet size increases to 9000 bytes (jumbo frames), so sending something closer to that size should speed things up.
If you are sending any acknowledgments from your listener to your sender then try to make sure that they are sent rarely and can acknowledge more than just one packet at a time. Try to keep the listener from having to say much, and try to keep the sender from having to wait on the listener for permission to keep sending.
On the computer that the sender application lives on consider setting up a static ARP entry for the computer that the receiver lives on. Without this every few seconds there may be a pause while a new ARP request is made to make sure that the ARP cache is up to date. Some ARP implementations may do this request well before the ARP entry expires, which would decrease the impact, but some do not.
Turn off as many users of the network as possible. If you are using an Ethernet switch then you should concentrate on the things that will introduce traffic to/from the computers/network devices on which your applications are running reside/use (this includes broadcast messages, like many ARP requests). If it's a hub then you may want to quiet down the entire network. Windows tends to send out a constant stream of junk to networks which in many cases isn't useful.
There may be limits set on how much of the network bandwidth that one application or user can have. Or there may be limits on how much network bandwidth the OS will let it self use. These can probably be changed in the registry if they exist.
It is not uncommon for network interface chips to not actually support the maximum bandwidth of the network all the time. There are chips which may miss packets because they are busy handling a previous packet as well as some which just can't send packets as close together as Ethernet specifications would allow. Additionally the rest of the system might not be able to keep up even if it is.
Some things to look at:
Connected UDP sockets (some info) shortcut several operations in the kernel, so are faster (see Stevens UnP book for details).
Socket send and receive buffers - play with SO_SNDBUF and SO_RCVBUF socket options to balance out spikes and packet drop
See if you can bump up link MTU and use jumbo frames.
use 1Gbps network and upgrade your network hardware...
Test the packet limit of your hardware with an already proven piece of code such as iperf:
http://www.noc.ucf.edu/Tools/Iperf/
I'm linking a Windows build, it might be a good idea to boot off a Linux LiveCD and try a Linux build for comparison of IP stacks.
More likely your NIC isn't performing well, try an Intel Gigabit Server Adapter:
http://www.intel.com/network/connectivity/products/server_adapters.htm
For TCP connections it has been shown that using multiple parallel connections will better utilize the data connection. I'm not sure if that applies to UDP, but it might help with some of the latency issues of packet processing.
So you might want to try multiple threads of blocking calls.
As well as Nikolai's suggestion of send and recv buffers, if you can, switch to overlapped I/O and have many recvs pending, this also helps to minimise the number of datagrams that are dropped by the stack due to lack of buffer space.
If you're looking for reliable data transfer, consider UDT.

Resources