How does the AF_PACKET socket work in Linux? - c

I am trying to write a C sniffer for Linux, and understand the actions happening in the kernel while sniffing.
I am having troubles finding an answer for the following question:
If I initialize my socket in the following way:
sock_raw = socket(AF_PACKET , SOCK_RAW , htons(ETH_P_ALL));
What happens in the kernel? How am I seeing all the incoming and outgoing packets, but not "hijacking" them? Because what I have understood do far is that when the kernel receives a packet, it sends it to the relevant protocol handler function. Therefore I can't understand - does the kernel clone the packet and sends it in addition to the socket I opened?

What happens in the kernel?
The kernel simply duplicates the packets as soon as it receives them from the physical layer (for incoming packets) or just before sending them out to the physical layer (for outgoing packets). One copy of each packet is sent to your socket (if you use ETH_PH_ALL then you are listening on all interfaces, but you could also bind(2) to a particular one). After a copy is sent to your socket, the other copy then continues being processed like it normally would (e.g. identifying and decoding the protocol, checking firewall rules, etc).
How am I seeing all the incoming and outgoing packets, but not "hijacking" them?
In order for hijacking to happen, you would need to write data to the socket injecting new packets (accurately crafted depending on the protocol you want to hijack). If you only read incoming packets, you are merely sniffing, without hijacking anything.
does the kernel clone the packet and sends it in addition to the socket I opened?
Yes, that's basically what happens. This image could help you visualize it.
man 7 packet also describes this:
Packet sockets are used to receive or send raw packets at the device driver (OSI Layer 2) level. They allow the user to implement protocol modules in user space on top of the physical layer.
The socket_type is either SOCK_RAW for raw packets including the link-level header or SOCK_DGRAM for cooked packets with the link-level header removed. The link-level header information is available in a common format in a sockaddr_ll structure. protocol is the IEEE 802.3 protocol number in network byte order. See the <linux/if_ether.h> include file for a list of allowed protocols. When protocol is set to htons(ETH_P_ALL), then all protocols are received. All incoming packets of that protocol type will be passed to the packet socket before they are passed to the protocols implemented in the kernel.

Related

Working of Raw Sockets in the Linux kernel

I'm working on integrating the traffic control layer of the linux kernel to a custom user-level network stack. I'm using raw sockets to do the same. My question is if we use raw sockets with AF_PACKET, RAW_SOCK, and IPPROTO_RAW, will the dev_queue_xmit (the function which is the starting point of the Queueing layer as far as I've read) be called? Or does the sockets interface directly call the network card driver?
SOCK_RAW indicates that the userspace program should receive the L2 (link-layer) header in the message.
IPPROTO_RAW applies the same for the L3 (IP) header.
A userspace program sets SOCK_RAW, IPPROTO_RAW to manually parse or/and compose protocol headers of a packet. It guarantees that the kernel doesn't modify the corresponding layer header on the way to/from the userspace. The raw socket doesn't change the way the packet gets received or transmitted - those are queued as usual. From the network driver perspective, it doesn't matter who set the headers - the userspace (raw sockets) or the kernel (e.g., SOCK_DGRAM).
Keep in mind that getting raw packets requires CAP_NET_RAW capability - usually, the program needs to run with superuser privileges.

Read UDP/TCP payload with divert socket

I need to intercept/redirect TCP and UDP packets that have its payloads matching some regex patterns and also get original destination address and port.
I can't just redirect TCP and UDP packets to my application by the use of DSTNAT on firewall (and splice() if don't match patterns) because that wouldn't allow me to get their original destination addresses and ports from before they are changed/translated.
So I read about divert sockets and they look promising. I'm in doubt however, as I couldn't find anywhere how much an application can read of a packet received on a divert socket. Is it possible to read entire packet (including TCP and UDP payload) or just its headers? Is entire packet sent to divert socket or just first received fragment (maybe limited by MTU/MRU or how much send() could push on single call on other end...)?
If it matters, the firewall I'm going to use for diverting packets is ipfw.

C RAW socket communication with custom ETH type

So I have two userspace applications (lets say app A and B) running on linux 2.6 kernel.
app A sends raw packet with a custom ethernet type (ETH_FOO) using the socket below
socket(PF_PACKET, SOCK_RAW, htons(ETH_FOO));
if app B opens a raw socket with ETH_P_ALL and listens to all interfaces without binding, it can successfully receive pkts sent by A with type ETH_FOO.
But if B opens the socket with type ETH_FOO, no packet is observed. I just want to capture ETH_FOO pkts. What may be the problem?
This is my first question here. Pardon my mistakes if there is any. Also I can not copy the entire code since it's not mine and somewhat propriatery.
When you use ETH_P_ALL, you are listening all packets, both ingoing and outgoing.
If you are in the same machine, using the same network interface, when you send a packet, there is no ingoing packet. Using ETH_P_ALL will get you the outgoing packet.
When you specify other value than ETH_P_ALL, only incoming packets are listened to. And you get nothing if using the Ethernet interface in the same machine.
You have two options here:
use different machines
in the same machine, use the loopback adapter (which creates an ingoing packet for every outgoing packet). The loopback adapter is listed together with the Ethernet adapter when you type ip a.
It took me some time to learn this, and I did it here, where you can learn a bit more.

Debugging multicast reception

I'd like to debug multicast reception by the Linux kernel, because I'm not receiving any packets. Let me be more specific: I'm building a flexible userland transport mode network daemon. One of the options of running it, is using UDP sockets. One of the use cases, is to transport UDP packets that go to multicast addresses.
So I end up with UDP packets to a multicast destination, transported by a UDP packet to the same multicast destination. That's asking for trouble, I know, but I get away with it: using SO_BINDTODEVICE, I can pretty much cheat my way through the routing table and packets are sent out as I intended.
On the receiving side, I'm not so lucky. Linux does not give my receiving socket the multicast packets. It just won't see them, although tcpdump proves that they arrive at the interface. Note that unicast - using the very same sockets - is not a problem at all. I can send and receive them to my heart's content. Not so with multicast.
So I'd like to know what the Linux kernel 'thinks' in that bit between receiving the packet (which it obviously does), and giving it to my process' UDP server socket (which it doesn't do). Any thoughts?

raw socket listener

This is a quick question for linux c programming raw sockets. If I wanted to just listen to any interface with a raw socket, must I actually bind to an ip address or interface to listen to traffic? From what I understand, I feel like I should be able to just call sock(); and then start recvfrom() traffic. Maybe I'm wrong, but I've seen some programs that don't use it.
You are right, the only thing you will need to do is call socket() and then recvfrom(). Nevertheless be aware of the fact that there are some limitations with listening using SOCK_RAW.
If you're not using raw sockets on a "send-and-forget" basis, you will
be interested in reading the reply packet(s) for your raw packet(s).
The decision logic for whether a packet will be delivered to a raw
socket can be enumarated as such:
TCP and UDP packets are never delivered to raw sockets, they are always handled by the kernel protocol stack.
Copies of ICMP packets are delivered to a matching raw socket. For some of the ICMP types (ICMP echo request, ICMP timestamp request,
mask request) the kernel, at the same time, may wish to do some
processing and generate replies.
All IGMP packets are delivered to raw sockets: e.g. OSPF packets.
All other packets destined for protocols that are not processed by a kernel subsystem are delivered to raw sockets.
The fact that you're dealing with a protocol for which reply packets
are delivered to your raw socket does not necessarily mean that you'll
get the reply packet. For this you may also need to consider:
setting the protocol accordingly while creating your socket via socket(2)system call. For instance, if you're sending an ICMP
echo-request packet, and want to receive ICMP echo-reply, you can set
the protocol argument (3rd argument) to IPPROTO_ICMP).
setting the protocol argument in socket(2) to 0, so any protocol number in the received packet header will match.
defining a local address for your socket (via e.g. bind(2)), so if the destination address matches the socket's local address, it'll be
delivered to your application also.
For more details you can read e.g. this.
If you meant to capture the traffic on a interface, you can use libpcap.

Resources