Broadcasting UDP packets using multiple NICs

Broadcasting UDP packets using multiple NICs - c

I'm building an embedded system for a camera controller in Linux (not real-time). I'm having a problem getting the networking to do what I want it to do. The system has 3 NICs, 1 100base-T and 2 gigabit ports. I hook the slower one up to the camera (that's all it supports) and the faster ones are point-to-point connections to other machines. What I am attempting to do is get an image from the camera, do a little processing, then broadcast it using UDP to each of the other NICs.
Here is my network configuration:
eth0: addr: 192.168.1.200 Bcast 192.168.1.255 Mask: 255.255.255.0 (this is the 100base-t)
eth1: addr: 192.168.2.100 Bcast 192.168.2.255 Mask: 255.255.255.0
eth2: addr: 192.168.3.100 Bcast 192.168.3.255 Mask: 255.255.255.0
The image is coming in off eth0 in a proprietary protocol, so it's a raw socket. I can broadcast it to eth1 or eth2 just fine. But when I try to broadcast it to both, one after the other, I get lots of network hiccups and errors on eth0.
I initialize the UDP sockets like this:
sock2=socket(AF_INET,SOCK_DGRAM,IPPROTO_UDP); // Or sock3
sa.sin_family=AF_INET;
sa.sin_port=htons(8000);
inet_aton("192.168.2.255",&sa.sin_addr); // Or 192.168.3.255
setsockopt(sock2, SOL_SOCKET, SO_BROADCAST, &broadcast, sizeof(broadcast));
bind(sock2,(sockaddr*)&sa,sizeof(sa));
sendto(sock2,&data,sizeof(data),0,(sockaddr*)&sa,sizeof(sa)); // sizeof(data)<1100 bytes
I do this for each socket separately, and call sendto separately. When I do one or the other, it's fine. When I try to send on both, eth0 starts getting bad packets.
Any ideas on why this is happening? Is it a configuration error, is there a better way to do this?
EDIT:
Thanks for all the help, I've been trying some things and looking into this more. The issue does not appear to be broadcasting, strictly speaking. I replaced the broadcast code with a unicast command and it has the same behavior. I think I understand the behavior better, but not how to fix it.
Here is what is happening. On eth0 I am supposed to get an image every 50ms. When I send out an image on eth1 (or 2) it takes about 1.5ms to send the image. When I try to send on both eth1 and eth2 at the same time it takes about 45ms, occasionally jumping to 90ms. When this goes beyond the 50ms window, eth0's buffer starts to build. I lose packets when the buffer gets full, of course.
So my revised question. Why would it go from 1.5ms to 45ms just by going from one ethernet port to two?
Here is my initialization code:
sock[i]=socket(AF_INET,SOCK_DGRAM,IPPROTO_UDP);
sa[i].sin_family=AF_INET;
sa[i].sin_port=htons(8000);
inet_aton(ip,&sa[i].sin_addr);
//If Broadcasting
char buffer[]="eth1" // or eth2
setsockopt(sock[i],SOL_SOCKET,SO_BINDTODEVICE,buffer,5);
int b=1;
setsockopt(sock[i],SOL_SOCKET,SO_BROADCAST,&b,sizeof(b));
Here is my sending code:
for(i=0;i<65;i++) {
sendto(sock[0],&data[i],sizeof(data),0,sa[0],sizeof(sa[0]));
sendto(sock[1],&data[i],sizeof(data),0,sa[1],sizeof(sa[1]));
}
It's pretty basic.
Any ideas? Thanks for all your great help!
Paul

Maybe your UDP stack runs out of memory?
(1) Check /proc/sys/net/ipv4/udp_mem (see man 7 udp for details). Make sure that the first number is at least 8x times the image size. This sets the memory for all UDP sockets in the system.
(2) Make sure you per-socket buffer for sending socket is big enough. Use setsockopt(sock2, SOL_SOCKET, SO_SNDBUF, image_size*2) to set send buffer on both sockets. You might need to increase maximumu allowed value in /proc/sys/net/core/wmem_max. See man 7 socket for details.
(3) You might as well increase RX buffer for receiving socket. Write a big number to .../rmem_max, then use SO_RCVBUF to increase the receiving buffer size.

A workaround until this issue is actually solved may be to createa bridge for eth1+eth2 and send the packet to that bridge.
Thus it's only mapped to kernel-memory once and not twice per image.

It's been a long time, but I found the answer to my question, so I thought I would put it here in case anyone else ever finds it.
The two Gigabit Ethernet ports were actually on a PCI bridge off the PCI-express bus. The PCI-express bus was internal to the motherboard, but it was a PCI bus going to the cards. The bridge and the bus did not have enough bandwidth to actually send out the images that fast. With only one NIC enabled the data was sent to the buffer and it looked very quick to me, but it took much longer to actually get through the bus, out the card, and on to the wire. The second NIC was slower because the buffer was full. Although changing the buffer size masked the problem, it did not actually send the data out any faster and I was still getting dropped packets on the third NIC.
In the end, the 100Base-T card was actually built onto the motherboard, therefore had a faster bus to it, resulting in overall faster bandwidth than the gigabit ports.. By switching the camera to a gigabit line and one of the gigabit lines to the 100Base-T line I was able to meet the requirements.
Strange.

Related

Can a linux socket return data less than the underlying packet? [duplicate]

When will a TCP packet be fragmented at the application layer? When a TCP packet is sent from an application, will the recipient at the application layer ever receive the packet in two or more packets? If so, what conditions cause the packet to be divided. It seems like a packet won't be fragmented until it reaches the Ethernet (at the network layer) limit of 1500 bytes. But, that fragmentation will be transparent to the recipient at the application layer since the network layer will reassemble the fragments before sending the packet up to the next layer, right?

It will be split when it hits a network device with a lower MTU than the packet's size. Most ethernet devices are 1500, but it can often be smaller, like 1492 if that ethernet is going over PPPoE (DSL) because of the extra routing information, even lower if a second layer is added like Windows Internet Connection Sharing. And dialup is normally 576!
In general though you should remember that TCP is not a packet protocol. It uses packets at the lowest level to transmit over IP, but as far as the interface for any TCP stack is concerned, it is a stream protocol and has no requirement to provide you with a 1:1 relationship to the physical packets sent or received (for example most stacks will hold messages until a certain period of time has expired, or there are enough messages to maximize the size of the IP packet for the given MTU)
As an example if you sent two "packets" (call your send function twice), the receiving program might only receive 1 "packet" (the receiving TCP stack might combine them together). If you are implimenting a message type protocol over TCP, you should include a header at the beginning of each message (or some other header/footer mechansim) so that the receiving side can split the TCP stream back into individual messages, either when a message is received in two parts, or when several messages are received as a chunk.

Fragmentation should be transparent to a TCP application. Keep in mind that TCP is a stream protocol: you get a stream of data, not packets! If you are building your application based on the idea of complete data packets then you will have problems unless you add an abstraction layer to assemble whole packets from the stream and then pass the packets up to the application.

The question makes an assumption that is not true -- TCP does not deliver packets to its endpoints, rather, it sends a stream of bytes (octets). If an application writes two strings into TCP, it may be delivered as one string on the other end; likewise, one string may be delivered as two (or more) strings on the other end.
RFC 793, Section 1.5:
"The TCP is able to transfer a
continuous stream of octets in each
direction between its users by
packaging some number of octets into
segments for transmission through the
internet system."
The key words being continuous stream of octets (bytes).
RFC 793, Section 2.8:
"There is no necessary relationship
between push functions and segment
boundaries. The data in any particular
segment may be the result of a single
SEND call, in whole or part, or of
multiple SEND calls."
The entirety of section 2.8 is relevant.

At the application layer there are any number of reasons why the whole 1500 bytes may not show up one read. Various factors in the internal operating system and TCP stack may cause the application to get some bytes in one read call, and some in the next. Yes, the TCP stack has to re-assemble the packet before sending it up, but that doesn't mean your app is going to get it all in one shot (it is LIKELY will get it in one read, but it's not GUARANTEED to get it in one read).
TCP tries to guarantee in-order delivery of bytes, with error checking, automatic re-sends, etc happening behind your back. Think of it as a pipe at the app layer and don't get too bogged down in how the stack actually sends it over the network.

This page is a good source of information about some of the issues that others have brought up, namely the need for data encapsulation on an application protocol by application protocol basis Not quite authoritative in the sense you describe but it has examples and is sourced to some pretty big names in network programming.

If a packet exceeds the maximum MTU of a network device it will be broken up into multiple packets. (Note most equipment is set to 1500 bytes, but this is not a necessity.)
The reconstruction of the packet should be entirely transparent to the applications.

Different network segments can have different MTU values. In that case fragmentation can occur. For more information see TCP Maximum segment size
This (de)fragmentation happens in the TCP layer. In the application layer there are no more packets. TCP presents a contiguous data stream to the application.

A the "application layer" a TCP packet (well, segment really; TCP at its own layer doesn't know from packets) is never fragmented, since it doesn't exist. The application layer is where you see the data as a stream of bytes, delivered reliably and in order.
If you're thinking about it otherwise, you're probably approaching something in the wrong way. However, this is not to say that there might not be a layer above this, say, a sequence of messages delivered over this reliable, in-order bytestream.

Correct - the most informative way to see this is using Wireshark, an invaluable tool. Take the time to figure it out - has saved me several times, and gives a good reality check

If a 3000 byte packet enters an Ethernet network with a default MTU size of 1500 (for ethernet), it will be fragmented into two packets of each 1500 bytes in length. That is the only time I can think of.
Wireshark is your best bet for checking this. I have been using it for a while and am totally impressed

Additional header to IPv4 packet can be segmented with GSO?

I'm getting trouble with packet segmentation. I've already read from many sources about GSO, which is a generalized way for segmenting a packet with size greater than the Ethernet MTU (1500 B). However, I have not found an answer for doubts that I have in mind.
If we add a new set of bytes (ex. a new header by the name 'NH') between L2 and L3 layer, the kernel must be able to pass through NH and adjust sk_buff pointer to the beginning of the L3 to offload the packet according to the 'policy' of the L3 protocol type (ex. IPv4 fragmentation). My thoughts were to modify skb_network_protocol() function. This function, if I'm not wrong, enables skb_mac_gso_segment() to properly call GSO function for different types of L3 protocol. However, I'm not being able to segment my packets properly.
I have a kernel module that forwards packets through the network (OVS, Open vSwitch). On the tests which I've been running (h1 --ping-- h2), the host generates large ICMP packets and then sends packets which are less or equal than MTU size. Those packets are received by the first switch which attaches the new header NH, so if a packet had 1500B, it becomes 1500B + NH length. Here is the problem, the switch has already received a fragmented packet from the host, and the switch adds more bytes in the packet (kind of VLAN does).
Therefore, at first, I tried to ping large packets, but it didn't work. In OVS, before calling dev_queue_xmit(), a packet can be segmented by calling skb_gso_segment(). However, the packet needs to go through a condition checked by netif_needs_gso(). But I'm not sure if I have to use skb_gso_segment() to properly segment the packet.
I also noticed that, for the needs_gso_segment() function be true, skb_shinfo(skb)->gso_size have to be true. However, gso_size has always zero value for all the received packets. So, I made a test by attributing a random value to gso_size (ex. 1448B). Now, on my tests, I was able to ping from h1 to h2, but the first 2 packets were lost. On another test, TCP had a extremely poor performance. And since then, I've been getting a kernel warning: "[ 5212.694418] [c1642e50] ? skb_warn_bad_offload+0xd0/0xd8
"
For small packets (< MTU) I got no trouble and ping works fine. TCP works fine, but for small window size.
Someone has any idea for what's happening? Should I always use GSO when I get large packets? Is it possible to fragment a fragmented IPv4 packets?
As the new header lies between L2 and L3, I guess the enlargement of a IPv4 packet due to the additional header, is similar to what happens with VLAN. How VLAN can handle the segmentation problem?
Thanks in advance,

Packets incapsulation for own simple VPN

I want to do my own very simple implementation of VPN in C on Linux. For that purpose I'm going to capture IP packets, modify them and send forward. The modification consists of encryption, authentication and other stuff like in IPSec. My question is should I process somehow the size of packets or this will be handled automatically? I know it's maximum size is 65535 - 20 (for header) but accoring to MTU it is lesser. I think its because encrypted payload "incapsulated into UDP" for NAT-T is much bigger then just "normal payload" of the IP packet.

Well, I found that there actually 2 ways to handle that problem:
1) We can send big packets by settings DF flag to tell we want fragment out packets. But in this case packet can be lost, because not all the devices/etc support packet fragmentation
2) We can automatically calculate our maximum MTU between hosts, split them and send. On another side we put all this packets together and restore them. This can be done by implementing our own "system" for this purpose.
More about IP packets fragmentation and reassembly you can read here

Path of the UDP packet from kernel to user-space in Linux

I'm maintaining some network driver and I've got some problems with lost of data. The effect is that when I send for example ICMP or UDP ping using ping or nping some of the udp/icmp packets are lost.
I'm sure that on ping/nping side of the transmission the ping reply is received by my driver and the kernel (tcpdump shows incoming udp or icmp packets as a reply).
But application ping/nping shows sometimes that for example 80% packets are lost. I suspect that those packets are lost somewhere between kernel and user space.
I know that for UDP there is procedure udp_rcv() for maintenance of UDP packets, but I don't know which procedure is next in the path of delivering of the packet to user space application.
Linux kernel is in version 3.3.8.
My question is - how to trace the whole path of transition of the packet from my driver to user space socket buffer?

udp_rcv() is a callback that is passed to struct net_protocol as a .handler.
You may either look at usage of this handler field in the structure, or you can also see if some error occurs. There is a callback err_handler. Maybe packet loss happens here and error handler will be called.
P. S. Remember that UDP does not guarantee 100% transmit success, and one lost packet out of 100 might be expected behavior. (:

C: Detecting how much data was written to a tap

I am working on a program where I'm reading from a Tap. The only issue is, I have no clue how to detect the end of one transmission to the tap and the start of another.
Does reading from the tap act the same way as a SOCK_STREAM ?

Tun/tap tries to look like a regular ethernet controller, but the tap device itself is accessed just like any other file descriptor.
Since it pretends to be an ethernet controller, you have to know in advance how big the ethernet frame itself was that was transmitted - this comes either from the software bridge that the tap device was attached to or the "length" field in the raw ethernet frame.
This, of course can only be the maximum of the MTU size of the tap device, which typically defaults to 1500 bytes.
So, before you do a read() on the file descriptor for the tap device, you've gotta figure out how big the ethernet frame actually is.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight