What is the internal mechanics of socket() function?

What is the internal mechanics of socket() function? - c

I am trying to use the BlueZ HCI function:
int hci_open_dev(int dev_id) {...}
which internally tries to create a socket like this:
socket(AF_BLUETOOTH, SOCK_RAW | SOCK_CLOEXEC, BTPROTO_HCI);
I tried to understand the linux kernel code for socket() but feel lost.
Id like to know what exactly does it mean to create a socket for the given domain (AF_BLUETOOTH), data transmission type (SOCK_RAW) and protocol (BTPROTO_HCI).
The man page just states that it takes these params, creates a socket and returns a device descriptor.
But id like to understand what exactly happens and the exact kernel steps involved in creating a socket.

Here is a very broad description (hope that helps understanding the main scheme).
Kernel developers will probably be horrified...
A socket is common abstract interface for many different communication means.
It provides many generic operations, such as closing, sending/receiving data, setting/retrieving options, which can be used on almost any kind of socket.
Creating a socket implies specifying the exact properties of this communication means.
It's a bit like the instantiation of a concrete type implementing an interface.
These properties are first organised by protocol families; this is the first argument to the socket() call.
For example:
PF_INET is used for communications relying on IPv4,
PF_INET6 is used for communications relying on IPv6,
PF_LOCAL is used for inter-process communication inside the system (kind of pipe),
PF_NETLINK is used for communication with the OS kernel,
PF_PACKET is used for direct communication with network interfaces,
... (there exist many of them)
Once a protocol family is chosen, you have to specify, which protocol you want to use amongst those which are provided by this family; this is the second argument to the socket() call.
For example:
SOCK_DGRAM is used for UDP over IPv4 or IPv6, or distinct messages in PF_LOCAL,
SOCK_STREAM is used for TCP over IPv4 or IPv6, or a continuous byte stream in PF_LOCAL,
SOCK_RAW, accesses directly is the raw underlying protocol in the family if any (IPv4, or IPv6 for example),
... (each family can provide many on them)
Some protocols can accept some variants or some restrictions; this is the third argument to the socket() call.
Often 0 is sufficient, but for example we can find:
PF_PACKET, SOCK_RAW, htons(ETH_P_ALL) to capture any kind of network packet received on a network interface,
PF_PACKET, SOCK_RAW, htons(ETH_P_ARP) to capture only ARP frames,
When we ask for the creation of a socket with these three arguments, the operating system creates an internal resource associated with the socket handle which will be obtained.
Of course, the exact structure of this resource depends on the chosen family/protocol/variant, and it is associated to kernel callbacks which are specific to it.
Each time an operation in invoked on this socket (through a system call), the specific callback will be called.

Please look here: it's a good high-level description of the BlueZ Linux implemention of the Bluetooth stack:
Linux Without Wires The Basics of Bluetooth. Specifically, it gives you a good overview of these BlueZ kernel drivers:
bluetooth.ko, which contains core infrastructure of BlueZ. It exports sockets of the Bluetooth family AF_BLUETOOTH. All BlueZ
modules utilise its services.
Bluetooth HCI packets are transported over UART or USB. The corresponding BlueZ HCI implementation is hci_uart.ko and hci_usb.ko.
The L2CAP layer of Bluetooth, which is responsible for segmentation, reassembly and protocol multiplexing, is implemented by l2cap.ko.
With the help of bnep.ko, TCP/IP applications can run over Bluetooth. This emulates an Ethernet port over the L2CAP layer. The
kernel thread named kbnepd is responsible for BNEP connections.
rfcomm.ko is responsible for running serial port applications like the terminal. This emulates serial ports over the L2CAP layer. The
kernel thread named krfcommd is responsible for RFCOMM connections.
hidp.ko implements the HID (human interface device) layer. The user mode daemon hidd allows BlueZ to handle input devices like Bluetooth
mice.
sco.ko implements the synchronous connection oriented (SCO) layer to handle audio. SCO connections do not specify a channel to connect to a
remote host; only the host address is specified.
Another excellent resource is the BlueZ project page:
http://www.bluez.org/

Related

How does the AF_PACKET socket work in Linux?

I am trying to write a C sniffer for Linux, and understand the actions happening in the kernel while sniffing.
I am having troubles finding an answer for the following question:
If I initialize my socket in the following way:
sock_raw = socket(AF_PACKET , SOCK_RAW , htons(ETH_P_ALL));
What happens in the kernel? How am I seeing all the incoming and outgoing packets, but not "hijacking" them? Because what I have understood do far is that when the kernel receives a packet, it sends it to the relevant protocol handler function. Therefore I can't understand - does the kernel clone the packet and sends it in addition to the socket I opened?

What happens in the kernel?
The kernel simply duplicates the packets as soon as it receives them from the physical layer (for incoming packets) or just before sending them out to the physical layer (for outgoing packets). One copy of each packet is sent to your socket (if you use ETH_PH_ALL then you are listening on all interfaces, but you could also bind(2) to a particular one). After a copy is sent to your socket, the other copy then continues being processed like it normally would (e.g. identifying and decoding the protocol, checking firewall rules, etc).
How am I seeing all the incoming and outgoing packets, but not "hijacking" them?
In order for hijacking to happen, you would need to write data to the socket injecting new packets (accurately crafted depending on the protocol you want to hijack). If you only read incoming packets, you are merely sniffing, without hijacking anything.
does the kernel clone the packet and sends it in addition to the socket I opened?
Yes, that's basically what happens. This image could help you visualize it.
man 7 packet also describes this:
Packet sockets are used to receive or send raw packets at the device driver (OSI Layer 2) level. They allow the user to implement protocol modules in user space on top of the physical layer.
The socket_type is either SOCK_RAW for raw packets including the link-level header or SOCK_DGRAM for cooked packets with the link-level header removed. The link-level header information is available in a common format in a sockaddr_ll structure. protocol is the IEEE 802.3 protocol number in network byte order. See the <linux/if_ether.h> include file for a list of allowed protocols. When protocol is set to htons(ETH_P_ALL), then all protocols are received. All incoming packets of that protocol type will be passed to the packet socket before they are passed to the protocols implemented in the kernel.

Working of Raw Sockets in the Linux kernel

I'm working on integrating the traffic control layer of the linux kernel to a custom user-level network stack. I'm using raw sockets to do the same. My question is if we use raw sockets with AF_PACKET, RAW_SOCK, and IPPROTO_RAW, will the dev_queue_xmit (the function which is the starting point of the Queueing layer as far as I've read) be called? Or does the sockets interface directly call the network card driver?

SOCK_RAW indicates that the userspace program should receive the L2 (link-layer) header in the message.
IPPROTO_RAW applies the same for the L3 (IP) header.
A userspace program sets SOCK_RAW, IPPROTO_RAW to manually parse or/and compose protocol headers of a packet. It guarantees that the kernel doesn't modify the corresponding layer header on the way to/from the userspace. The raw socket doesn't change the way the packet gets received or transmitted - those are queued as usual. From the network driver perspective, it doesn't matter who set the headers - the userspace (raw sockets) or the kernel (e.g., SOCK_DGRAM).
Keep in mind that getting raw packets requires CAP_NET_RAW capability - usually, the program needs to run with superuser privileges.

How bind works internally in kernel space?

Can anyone help me in tracing bind() system call in socket programming. I would like to know what happens when bind() is called, in kernel space. Like which are the structures it updates and what functions are invoked in lower level

The bind(2) system call just configures the local side's address parameters that a socket will use once you have connected (or sendto(2)). If you don't use it, the kernel selects defaults for it, depending on the underlying protocol.
The exact procedure bind(2) follows depends on the protocol family you are working on, as bind will behave differently depending if you are using PF_UNIX, PF_INET, PF_PACKET, PF_XNS, etc.
For example, in Unix sockets, you'll get your socket associated to an inode in the filesystem (an inode that supports unix sockets, of course), so clients have a path to connect to (in Unix sockets, addresses are paths in the filesystem). In TCP/IP sockets, you can fix the local IP address or the local IP port your socket can listen on (to accept connections) or you can force a IP address and/or port to connect from, to a server.
For a deeper understanding of networking sockets internals, I recommend you reading the excellent book from W.R. Stevens "TCP/IP Illustrated Vol 2. The implementation," describing the implementation of BSD sockets in NET2. It's old, but still the best explanation ever made. For a good introduction of the BSD socket system calls use, there's also an excellent book (for a long time it was indeed also the best system call reference for BSD unix system calls) by W.R.Stevens: "UNIX network programming, Vol 1 (2ND Ed): The sockets API." Both are two jewels everyone should have available at work.

When using PF_PACKET type of socket, what does PACKET_ADD_MEMBERSHIP?

When using a PF_PACKET type of socket with protocol type ETH_P_IP, the man packet documentation talks about a socket option for multicast. The socket option is PACKET_ADD_MEMBERSHIP.
Assuming you use PACKET_ADD_MEMBERSHIP socket option on a PF_PACKET socket correctly, what features and benefits and use cases is this socket option for?
Right now I receive all incoming IP packets so I look at each packet to see if it has the correct IP dst-address and UDP dst-port and I skip over all the other packets. Would using PACKET_ADD_MEMBERSHIP socket option mean I don't need to do my own filter because the kernel or driver would filter for me?
I dug into the linux-kernel source and traced down the code a little bit. I found that the ethernet-mac-address you pass in via setsockopt() is added to a list of ethernet-mac-addresses. And then the list is sent to the network-device hardware to do something... but I can't find any authoritative documentation telling me what happens next.
My educated guess is that the ethernet-mac-address list is used by the hardware to filter at the layer-2 ethernet protocol (i.e. the hardware only accepts packets that have a destination ethernet-address that matches one on the list). If there is some good documentation I would welcome that.
(I'm more familiar with TCP/UDP sockets and so this looks very similar to AF_INET type of socket's IP_ADD_MEMBERSHIP socket option... so I was expecting IGMP reports to be generated which would start multicast traffic from the router... but I found out experimentally that no IGMP reports are generated when you use this socket option.)

Your guess is correct. PACKET_ADD_MEMBERSHIP should add addresses to the NIC's hardware filter. As you've surmised, it's intended to allow you to receive multicasts for a number of different addresses without incurring the load(*) of full promiscuous mode.
(* With modern full duplex ethernet, there's generally not a lot of traffic coming to the NIC that it wouldn't want to receive anyway, unless it's in a virtualized environment.)
Note that there is also a separate PACKET_MR_UNICAST which does not appear in the packet(7) man page but works analogously. I would use the appropriate one (unicast vs multicast) for the type of address you're filtering on, as it's conceivable (though unlikely) that a driver would refuse to put a unicast address into the multicast filtering table.
All that being said, you'll still need to keep your software filtering as backup. There are some older drivers that don't implement MAC filtering at all (particularly for multiple unicast addresses). The core kernel or the driver handles this by simply turning on promiscuous mode if the feature isn't available.
As for the relationship with IP_ADD_MEMBERSHIP, the IP_ADD_MEMBERSHIP code will automatically construct the appropriate multicast MAC address and add it to the interface. See ip_mc_filter_add.

WINAPI: CreateFile to Network Adapter to Read Raw Bytes

Is it possible to read a Network Adapter similar to a Serial Port? I know that Serial Ports can be read with CreateFile WINAPI Function. Is there a similar way to read raw bytes from a Network Adapter?
I am aware of the WiFi/Network Functions but the WiFi Examples are fairly sparse.

You can pass the SOCK_RAW flag when you create the socket using WSASocket() (or socket(), as your tastes run). This is described in further detail under TCP/IP Raw Sockets on MSDN.
From that page --
Once an application creates a socket
of type SOCK_RAW, this socket may be
used to send and receive data. All
packets sent or received on a socket
of type SOCK_RAW are treated as
datagrams on an unconnected socket.
Of note, Microsoft crippled their raw sockets implementation after Windows XP SP2; the details are described on the MSDN page in the section Limitations on Raw Sockets:
TCP data cannot be sent over raw sockets.
UDP datagrams with an invalid source address cannot be sent over raw
sockets.
A call to the bind function with a raw socket is not allowed.
If these limitations are too restrictive, you can fall back to the previously recommended winpcap library.

If you want to capture raw packets you need a support driver like WinPCAP to do that.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight