How bind works internally in kernel space? - c

Can anyone help me in tracing bind() system call in socket programming. I would like to know what happens when bind() is called, in kernel space. Like which are the structures it updates and what functions are invoked in lower level

The bind(2) system call just configures the local side's address parameters that a socket will use once you have connected (or sendto(2)). If you don't use it, the kernel selects defaults for it, depending on the underlying protocol.
The exact procedure bind(2) follows depends on the protocol family you are working on, as bind will behave differently depending if you are using PF_UNIX, PF_INET, PF_PACKET, PF_XNS, etc.
For example, in Unix sockets, you'll get your socket associated to an inode in the filesystem (an inode that supports unix sockets, of course), so clients have a path to connect to (in Unix sockets, addresses are paths in the filesystem). In TCP/IP sockets, you can fix the local IP address or the local IP port your socket can listen on (to accept connections) or you can force a IP address and/or port to connect from, to a server.
For a deeper understanding of networking sockets internals, I recommend you reading the excellent book from W.R. Stevens "TCP/IP Illustrated Vol 2. The implementation," describing the implementation of BSD sockets in NET2. It's old, but still the best explanation ever made. For a good introduction of the BSD socket system calls use, there's also an excellent book (for a long time it was indeed also the best system call reference for BSD unix system calls) by W.R.Stevens: "UNIX network programming, Vol 1 (2ND Ed): The sockets API." Both are two jewels everyone should have available at work.

Related

What is the internal mechanics of socket() function?

I am trying to use the BlueZ HCI function:
int hci_open_dev(int dev_id) {...}
which internally tries to create a socket like this:
socket(AF_BLUETOOTH, SOCK_RAW | SOCK_CLOEXEC, BTPROTO_HCI);
I tried to understand the linux kernel code for socket() but feel lost.
Id like to know what exactly does it mean to create a socket for the given domain (AF_BLUETOOTH), data transmission type (SOCK_RAW) and protocol (BTPROTO_HCI).
The man page just states that it takes these params, creates a socket and returns a device descriptor.
But id like to understand what exactly happens and the exact kernel steps involved in creating a socket.
Here is a very broad description (hope that helps understanding the main scheme).
Kernel developers will probably be horrified...
A socket is common abstract interface for many different communication means.
It provides many generic operations, such as closing, sending/receiving data, setting/retrieving options, which can be used on almost any kind of socket.
Creating a socket implies specifying the exact properties of this communication means.
It's a bit like the instantiation of a concrete type implementing an interface.
These properties are first organised by protocol families; this is the first argument to the socket() call.
For example:
PF_INET is used for communications relying on IPv4,
PF_INET6 is used for communications relying on IPv6,
PF_LOCAL is used for inter-process communication inside the system (kind of pipe),
PF_NETLINK is used for communication with the OS kernel,
PF_PACKET is used for direct communication with network interfaces,
... (there exist many of them)
Once a protocol family is chosen, you have to specify, which protocol you want to use amongst those which are provided by this family; this is the second argument to the socket() call.
For example:
SOCK_DGRAM is used for UDP over IPv4 or IPv6, or distinct messages in PF_LOCAL,
SOCK_STREAM is used for TCP over IPv4 or IPv6, or a continuous byte stream in PF_LOCAL,
SOCK_RAW, accesses directly is the raw underlying protocol in the family if any (IPv4, or IPv6 for example),
... (each family can provide many on them)
Some protocols can accept some variants or some restrictions; this is the third argument to the socket() call.
Often 0 is sufficient, but for example we can find:
PF_PACKET, SOCK_RAW, htons(ETH_P_ALL) to capture any kind of network packet received on a network interface,
PF_PACKET, SOCK_RAW, htons(ETH_P_ARP) to capture only ARP frames,
When we ask for the creation of a socket with these three arguments, the operating system creates an internal resource associated with the socket handle which will be obtained.
Of course, the exact structure of this resource depends on the chosen family/protocol/variant, and it is associated to kernel callbacks which are specific to it.
Each time an operation in invoked on this socket (through a system call), the specific callback will be called.
Please look here: it's a good high-level description of the BlueZ Linux implemention of the Bluetooth stack:
Linux Without Wires The Basics of Bluetooth. Specifically, it gives you a good overview of these BlueZ kernel drivers:
bluetooth.ko, which contains core infrastructure of BlueZ. It exports sockets of the Bluetooth family AF_BLUETOOTH. All BlueZ
modules utilise its services.
Bluetooth HCI packets are transported over UART or USB. The corresponding BlueZ HCI implementation is hci_uart.ko and hci_usb.ko.
The L2CAP layer of Bluetooth, which is responsible for segmentation, reassembly and protocol multiplexing, is implemented by l2cap.ko.
With the help of bnep.ko, TCP/IP applications can run over Bluetooth. This emulates an Ethernet port over the L2CAP layer. The
kernel thread named kbnepd is responsible for BNEP connections.
rfcomm.ko is responsible for running serial port applications like the terminal. This emulates serial ports over the L2CAP layer. The
kernel thread named krfcommd is responsible for RFCOMM connections.
hidp.ko implements the HID (human interface device) layer. The user mode daemon hidd allows BlueZ to handle input devices like Bluetooth
mice.
sco.ko implements the synchronous connection oriented (SCO) layer to handle audio. SCO connections do not specify a channel to connect to a
remote host; only the host address is specified.
Another excellent resource is the BlueZ project page:
http://www.bluez.org/

DNS security and port randomization

I've been reading lately about DNS cache-poisoning attacks. Essentially they are possible simply because an attacker can guess the DNS message transaction ID, since it is only a 16-bit integer. Even if the integer is random, it's still possible for a flurry of DNS packets to coincidentally match 1 of 2^16 packets in a short time window.
So a second security measure is port randomization. If the UDP source port is random, an attacker would have to guess both the source port and the transaction ID in a short time window, which is usually not feasible. But I read that older versions of DNS software such as BIND versions before 9 did NOT perform port randomization, and are therefore vulnerable.
This brings me to the question: don't most UNIX OS's like Linux and BSD automatically assign random ports when a SOCK_DGRAM is used without a prior call to bind? I thought that was the whole idea with ephemeral ports. Why does an application (like BIND) have to go out of it's way to perform port randomization?
My understanding is that, essentially, an OS like Linux will have a RANGE of ephemeral ports available for use with each process. A process can call bind() to bind a UDP socket to a specific port. But if a UDP socket is used (i.e. send is called) without first calling bind, the OS will lazily assign a random ephemeral port to the socket. So, why were older versions of BIND not performing port randomization automatically?
This brings me to the question: don't most UNIX OS's like Linux and BSD automatically assign random ports when a SOCK_DGRAM is used without a prior call to bind? I thought that was the whole idea with ephemeral ports.
The main idea of ephemeral ports is not to be random in a secure way, but just to pick some unused port fast. Different OS use different strategies, some do it a bit random, some use a stronger random generator and some assign the ports even in a sequential way.
This means not on all OS ephemeral ports are unpredictable enough for use with DNS.
For more details I would recommend to study RFC 6506 "Port Randomization Recommendations" and the overview about port selection strategies at https://www.cymru.com/jtk/misc/ephemeralports.html.

Unix Sockets : AF_LOCAL vs AF_INET

I'm just starting with socket programming in UNIX and I was reading the man pages for the socket system call. I'm a bit confused about the AF_LOCAL argument and when it is used. The manual just says local communication. Wouldn't an AF_INET format also work for local communication?
AF_LOCAL uses UNIX domain sockets which are local to the filesystem and can be used for internal communications. AF_INET is an IP socket. AF_LOCAL will not incur some performance penalties related to sending data over IP. See this old but very nice discussion of the topic.

User-mode TCP stack for retransmits over lossy serial link

I believe that my question is:
Is there a simple user-mode TCP stack on PC operating systems that could be used to exchange data over a lossy serial link with a Linux-based device?
Here is more context:
I have a Linux-based device connected via a serial link to a PC. The serial link is lossy so data being sent between the two devices sometimes needs to be retransmitted. Currently the system uses a custom protocol that includes framing, addressing (for routing to different processes within the Linux device), and a not-so-robust retransmission algorithm.
On the Linux device side, it would be convenient to replace the custom protocol, implement SLIP over the serial link and use TCP for all communications. The problem is that on the PC-side, we're not sure how to use the host's TCP stack without pulling in general IP routing that we don't need. If there were a user-mode TCP stack available, it seems like I could integrate that in the PC app. The only TCP stacks that I've found so far are for microcontrollers. They could be ported, but it would be nice if there were something more ready-to-go. Or is there some special way to use the OS's built in TCP stack without needing administrative privileges or risking IP address conflicts with the real Ethernet interfaces.
Lastly, just to keep the solution focused on TCP, yes, there are other solutions to this problem such as using HDLC or just fixing our custom protocol. However, we wanted to explore the TCP route further in case it was an option.
It appears that the comments have already answered your question, but perhaps to clarify; No you can not use TCP without using IP. TCP is built on top of IP, and it isn't going to work any other way.
PPP is a good way of establishing an IP connection over a serial link, but if you do not have administrative access on both sides of the computer this could be difficult. 172.16.x, 10.x, and 192.168.x are defined as being open for local networks, so you should be able to find a set of IP addresses that does not interfere with the network operation of the local computer.
From the point of view of no configuration, no dependencies, comping up with your own framing / re-transmit protocol should not be too hard, and is probably your best choice if you don't need inter-operability. That being said kermit, {z,y,z}modem would provide both better performance and a standard to code against.
Lastly, you may be able to use something like socat to do protocol translation. I.e. connect a serial stream to a TCP port. That wouldn't address data reliability / re-transmission, but it may be the interface you are looking to program against.

Socket programming at low level

I am unable to understand or grasp rather; what it means to program at a lower layer in socket programming. I am used to working with tcp/udp/file system sockets. These are all wrapped around their own protocol specifications ... which as i understand would make it working at the application layer in the stack.
In the project i am on , i have seen some files which are "named" LinkLayer, TransportLayer... but i don't see any more calls other than standard socket calls....send /recv/ seletct...
Does the fact you are setting a socket options mean you are programming at a lower level ? Is it just restricted to it? Or are there other API's which grant you access at the representation in kernel ?
Typically this refers to using SOCK_RAW sockets, which requires you to assemble your own packet headers, calculate checksums, etc. You still use send/recv/etc. but now you are responsible for making sure every bit is in the right place.
You can use SOCK_RAW sockets to implement protocols other than TCP or UDP, or to do things with the Internet protocols that higher-level interfaces don't accommodate (like tweaking the TTL of your packets to implement something like traceroute).
This usually means working on a lower OSI-Layer, for example, not directly sending TCP-streams or UDP-packets, but crafting own IP or even Ethernet packets or other low-layer protocols which would - in normal case - be handled by the operating system.
This can be done done via specific socket options which enable you to receive or send data on any layer, even layer 2 (Data Link).

Resources