Where does the Source MAC comes from before Encapsulating Frames in Layer2?

Where does the Source MAC comes from before Encapsulating Frames in Layer2? - arp

packet arrives from application->...Network->(IP encapsulation added here as per IP configuration)->goes down to Data Link Layer Here the Framing is Done and SourceMac and Dest Mac is added for LAN Switching. Does every time the SourceMac is extracted from the HostNIC and encapsulated into the packet before sending out the interface? or there is some configuration file it reads from?
I am assuming /etc/network/interfaces file is empty and not having any hw-addr address to change the MAC[ifconfig eth0 hw ether (Macwe want tochange AA:BB:CC....) Command]. where does it gets its own MAC?
does it do a Lookup everytime say 'ifconfig eth0 |grep HWaddr' and fetches own MAC or similar via system call? coz that will add a huge overhead querying the NIC chipset everytime. Or does it maintain a file reads from it and simply encapsulates the packet coming from the upper layer and sends out from the wire?

None of the above. The MAC adds its own address to Ethernet frames on the way out; the software doesn't have to add it.
On occasion, though, it is useful for the driver to know the physical address of the chip it's driving; this doesn't require querying the NIC every time or "maintaining a file"; 6 bytes of RAM in the driver's data structure does the job just fine. This is where the value displayed by ifconfig comes from.

Related

Duplicate Ethernet frame on virtual VLAN interface via RAW socket

I am working with raw Ethernet frames. I have a Ethernet interface eth0 and a virtual VLAN interface eth0.100 on my Linux machine. My RAW socket is bound to the virtual interface eth0.100. The problem is that when a VLAN tagged (VLAN ID=100) frame is transmitted to this interface externally, my application gets two copies of the same Ethernet frame. From application I cannot see the difference between these frames, the content of the payload is exactly the same. My interface is NOT operating in promiscuous mode.
I used tcpdump to capture the frames and below is the result
eth0: This gets one frame which is VLAN tagged - 100.
eth0.100: This gets one frame which is NOT VLAN tagged.
If I bind to the eth0, I still get two copies of the frame. But if I delete eth0.100 and bind to eth0, I just get one copy. Is my application getting two copies of the frame, one from eth0 and one from eth0.100 even though I am bound ONLY to eth0.100 ?
I tried to use BPF, but I am not sure what filter to use on eth0.100.

I tried to use BPF, when I generated a filtering rule using tcpdump I could see that the byte code generated was the same for both eth0 and eth0.100 interfaces. So I dropped the idea. Below solution worked for me.
I deleted the virtual interface eth0.100 and bound my raw socket to the base interface eth0. When sending a frame I manually VLAN tagged the frame. I don't have do anything special on reception as the base interface gets all the frames. Now I don't see the duplicate frames I was seeing earlier.
This still does not explain why I saw duplicate frames. Will investigate further and post when I get an answer.

Additional header to IPv4 packet can be segmented with GSO?

I'm getting trouble with packet segmentation. I've already read from many sources about GSO, which is a generalized way for segmenting a packet with size greater than the Ethernet MTU (1500 B). However, I have not found an answer for doubts that I have in mind.
If we add a new set of bytes (ex. a new header by the name 'NH') between L2 and L3 layer, the kernel must be able to pass through NH and adjust sk_buff pointer to the beginning of the L3 to offload the packet according to the 'policy' of the L3 protocol type (ex. IPv4 fragmentation). My thoughts were to modify skb_network_protocol() function. This function, if I'm not wrong, enables skb_mac_gso_segment() to properly call GSO function for different types of L3 protocol. However, I'm not being able to segment my packets properly.
I have a kernel module that forwards packets through the network (OVS, Open vSwitch). On the tests which I've been running (h1 --ping-- h2), the host generates large ICMP packets and then sends packets which are less or equal than MTU size. Those packets are received by the first switch which attaches the new header NH, so if a packet had 1500B, it becomes 1500B + NH length. Here is the problem, the switch has already received a fragmented packet from the host, and the switch adds more bytes in the packet (kind of VLAN does).
Therefore, at first, I tried to ping large packets, but it didn't work. In OVS, before calling dev_queue_xmit(), a packet can be segmented by calling skb_gso_segment(). However, the packet needs to go through a condition checked by netif_needs_gso(). But I'm not sure if I have to use skb_gso_segment() to properly segment the packet.
I also noticed that, for the needs_gso_segment() function be true, skb_shinfo(skb)->gso_size have to be true. However, gso_size has always zero value for all the received packets. So, I made a test by attributing a random value to gso_size (ex. 1448B). Now, on my tests, I was able to ping from h1 to h2, but the first 2 packets were lost. On another test, TCP had a extremely poor performance. And since then, I've been getting a kernel warning: "[ 5212.694418] [c1642e50] ? skb_warn_bad_offload+0xd0/0xd8
"
For small packets (< MTU) I got no trouble and ping works fine. TCP works fine, but for small window size.
Someone has any idea for what's happening? Should I always use GSO when I get large packets? Is it possible to fragment a fragmented IPv4 packets?
As the new header lies between L2 and L3, I guess the enlargement of a IPv4 packet due to the additional header, is similar to what happens with VLAN. How VLAN can handle the segmentation problem?
Thanks in advance,

Raw socket: filter only packets coming from a certain IP address? (Linux C)

I have a C Linux program that uses raw sockets to read incoming tcp/udp packets.
I would like to apply a filter so that only packets sent from a certain IP address reach my socket.
Is it possibile or am I supposed to receive necessarily every packets and then check source address field? I'm a bit concerned about cpu usage in the latter case.
Thank you.

It also depends on the processor that is being used by you. Some processor have features such as built-in hardware to filter packets based on rules involving input port, source ip address and type of protocol(TCP / UDP / etc..). This in-turn can reduce the load as hardware filtering has better performance than software filtering.

C: Detecting how much data was written to a tap

I am working on a program where I'm reading from a Tap. The only issue is, I have no clue how to detect the end of one transmission to the tap and the start of another.
Does reading from the tap act the same way as a SOCK_STREAM ?

Tun/tap tries to look like a regular ethernet controller, but the tap device itself is accessed just like any other file descriptor.
Since it pretends to be an ethernet controller, you have to know in advance how big the ethernet frame itself was that was transmitted - this comes either from the software bridge that the tap device was attached to or the "length" field in the raw ethernet frame.
This, of course can only be the maximum of the MTU size of the tap device, which typically defaults to 1500 bytes.
So, before you do a read() on the file descriptor for the tap device, you've gotta figure out how big the ethernet frame actually is.

Broadcasting UDP packets using multiple NICs

I'm building an embedded system for a camera controller in Linux (not real-time). I'm having a problem getting the networking to do what I want it to do. The system has 3 NICs, 1 100base-T and 2 gigabit ports. I hook the slower one up to the camera (that's all it supports) and the faster ones are point-to-point connections to other machines. What I am attempting to do is get an image from the camera, do a little processing, then broadcast it using UDP to each of the other NICs.
Here is my network configuration:
eth0: addr: 192.168.1.200 Bcast 192.168.1.255 Mask: 255.255.255.0 (this is the 100base-t)
eth1: addr: 192.168.2.100 Bcast 192.168.2.255 Mask: 255.255.255.0
eth2: addr: 192.168.3.100 Bcast 192.168.3.255 Mask: 255.255.255.0
The image is coming in off eth0 in a proprietary protocol, so it's a raw socket. I can broadcast it to eth1 or eth2 just fine. But when I try to broadcast it to both, one after the other, I get lots of network hiccups and errors on eth0.
I initialize the UDP sockets like this:
sock2=socket(AF_INET,SOCK_DGRAM,IPPROTO_UDP); // Or sock3
sa.sin_family=AF_INET;
sa.sin_port=htons(8000);
inet_aton("192.168.2.255",&sa.sin_addr); // Or 192.168.3.255
setsockopt(sock2, SOL_SOCKET, SO_BROADCAST, &broadcast, sizeof(broadcast));
bind(sock2,(sockaddr*)&sa,sizeof(sa));
sendto(sock2,&data,sizeof(data),0,(sockaddr*)&sa,sizeof(sa)); // sizeof(data)<1100 bytes
I do this for each socket separately, and call sendto separately. When I do one or the other, it's fine. When I try to send on both, eth0 starts getting bad packets.
Any ideas on why this is happening? Is it a configuration error, is there a better way to do this?
EDIT:
Thanks for all the help, I've been trying some things and looking into this more. The issue does not appear to be broadcasting, strictly speaking. I replaced the broadcast code with a unicast command and it has the same behavior. I think I understand the behavior better, but not how to fix it.
Here is what is happening. On eth0 I am supposed to get an image every 50ms. When I send out an image on eth1 (or 2) it takes about 1.5ms to send the image. When I try to send on both eth1 and eth2 at the same time it takes about 45ms, occasionally jumping to 90ms. When this goes beyond the 50ms window, eth0's buffer starts to build. I lose packets when the buffer gets full, of course.
So my revised question. Why would it go from 1.5ms to 45ms just by going from one ethernet port to two?
Here is my initialization code:
sock[i]=socket(AF_INET,SOCK_DGRAM,IPPROTO_UDP);
sa[i].sin_family=AF_INET;
sa[i].sin_port=htons(8000);
inet_aton(ip,&sa[i].sin_addr);
//If Broadcasting
char buffer[]="eth1" // or eth2
setsockopt(sock[i],SOL_SOCKET,SO_BINDTODEVICE,buffer,5);
int b=1;
setsockopt(sock[i],SOL_SOCKET,SO_BROADCAST,&b,sizeof(b));
Here is my sending code:
for(i=0;i<65;i++) {
sendto(sock[0],&data[i],sizeof(data),0,sa[0],sizeof(sa[0]));
sendto(sock[1],&data[i],sizeof(data),0,sa[1],sizeof(sa[1]));
}
It's pretty basic.
Any ideas? Thanks for all your great help!
Paul

Maybe your UDP stack runs out of memory?
(1) Check /proc/sys/net/ipv4/udp_mem (see man 7 udp for details). Make sure that the first number is at least 8x times the image size. This sets the memory for all UDP sockets in the system.
(2) Make sure you per-socket buffer for sending socket is big enough. Use setsockopt(sock2, SOL_SOCKET, SO_SNDBUF, image_size*2) to set send buffer on both sockets. You might need to increase maximumu allowed value in /proc/sys/net/core/wmem_max. See man 7 socket for details.
(3) You might as well increase RX buffer for receiving socket. Write a big number to .../rmem_max, then use SO_RCVBUF to increase the receiving buffer size.

A workaround until this issue is actually solved may be to createa bridge for eth1+eth2 and send the packet to that bridge.
Thus it's only mapped to kernel-memory once and not twice per image.

It's been a long time, but I found the answer to my question, so I thought I would put it here in case anyone else ever finds it.
The two Gigabit Ethernet ports were actually on a PCI bridge off the PCI-express bus. The PCI-express bus was internal to the motherboard, but it was a PCI bus going to the cards. The bridge and the bus did not have enough bandwidth to actually send out the images that fast. With only one NIC enabled the data was sent to the buffer and it looked very quick to me, but it took much longer to actually get through the bus, out the card, and on to the wire. The second NIC was slower because the buffer was full. Although changing the buffer size masked the problem, it did not actually send the data out any faster and I was still getting dropped packets on the third NIC.
In the end, the 100Base-T card was actually built onto the motherboard, therefore had a faster bus to it, resulting in overall faster bandwidth than the gigabit ports.. By switching the camera to a gigabit line and one of the gigabit lines to the 100Base-T line I was able to meet the requirements.
Strange.