What does dev_net_set do in Linux? - c

I am writing a simple net device driver based on the loopback driver and want to register my net_device structure. This and that page on writing a net device say to just call register_netdev. But they're writing fancy drivers with PCI express and other complicated things.
So, if I just want something like the loopback driver, I should presumably base my code on loopback.c. My question is, what does the first line of this code in loopback_net_init do:
dev_net_set(dev, net);
err = register_netdev(dev);
Apparently net is determined by this code in net_namespace.c:
register_pernet_device(ops) ...
__register_pernet_operations(list, ops)
for_each_net(net) ...
What is this looping for? What might go wrong if I skip the dev_net_set call? Why are others not using it?

AFAIK, net is a structure that will allow the kernel to interact with the device. You need it to register the device and remove it in the module cleanup function. Please review the code under linux/net/8021q/ for examples.

AFAIK, looping happens at the level of sockets (layer 5-7), whereas net_dev is used as the kernel component that immediately interacts with the driver, when you actually want to use a say, ethernet card, or SLIP,PLIP for transmitting frames (layer 2-0). Loopback happens at the level of the network subsystem of the kernel, and lies way above the drivers which interact with the hardware. So I don't see why you would need a driver to use the loopback feature. However, there is also a provision for registering a dummy device with net_dev, though I don't know if that is what you are looking for.
That said, if your intention is to simply use some driver that simulates an actual physical device without one and say, reflects the packets that it recieves, that is possible too. Basically till the net_dev layer, the kernel does all the protocol stuff (TCP/IP), and finally passes off the packet to some handle that the device driver registers with the net_dev or something similar. Similarly on receiving stuff, the device triggers an interrupt, the driver does a DMA operation, and the kernel takes over from there. Hence instead of the code for doing the DMA operation, you can make a module that simply pass over a static packet, that is compatible with ethernet/TCP/IP . In a vast majority of cases, all these aspects (the network and other subsystems) are agnostic to the underlying bus details, i.e. it shouldn't matter whether the ethernet card is connected to PCI or ISA but there can be exceptions. Thus, IMHO, you are trying to do something that should only be attempted after having a thorough understanding of the network subsystem, and a good enough understanding of the kernel as a whole. Till then you will be shooting in the dark. Sometimes you may hit, but often-times you will miss.

http://man7.org/linux/man-pages/man8/ip-netns.8.html
A network namespace is logically another copy of the network stack,
with its own routes, firewall rules, and network devices.
So for_each_net is looping over these namespaces and creating a copy of all "per net" network devices in each one.
Use ip netns list to determine whether you are using network namespaces. Often they are not used, so drivers do not necessarily need to use dev_net_set.

Related

Is accessing a device driver in Linux only possible using its device file?

Let's say I have a webcam, and I installed the device driver for this webcam in my Linux OS, now a device file will be created for the device driver (for example: /dev/video0).
Now say I want to create a program in C that wants to access this webcam. How can my program access the device driver for the webcam, should my program use the device file (/dev/video0) to access the device driver, or is there another way?
You asked a general question, and then gave a specific example. I'll try to address both.
When you load a driver, the way to communicate with it from user space is by whatever means this driver defined. Typically, this is through a /dev device created for the driver. If that's the case, yes, that's the only way to communicate with it.
This is not universally true. Many drivers also have entries under the /sys sysfs pseudo file system, and some aspects can be modified through there. In fact, there are whole classes of drivers that are only accessible through the /sys fs. Prominent examples are GPIO and Led devices, that can be turned on and off via access to /sys/class/gpio and similar paths.
Another option, considered deprecated but still sometimes used, is to use the /proc pseudo file system. Again, this is up to the driver to define its communication method. As the user, you will have to follow whatever protocol the driver defined.
Also, some drivers don't have any file system presence at all. The most obvious standard example are network interfaces. The only way to communicate with them is via the networking system calls.
In the particular example you provided, you talked about a video camera that appears as /dev/video0. Such a camera is, usually, a Video4Linux (or v4l) camera, and those are accessed via their character devices.
With that said, the protocol for communicating with the camera might have wrappers that makes life easier. If you open the actual device, you might have to implement a rather complicated handshake with it. Instead, you can use the v4l library to wrap the details of the access.
Make no mistake. You're still talking to the character device in /dev. It's just that it's not your code that does it, but the library's.

Linux device driver for a RS232 device in embedded system

I have recently started learning to write Linux device drivers for a specific project that I am working on. Previously most of the work I have done has been with devices running no OS so Linux drivers and development is somewhat new to me.
For the project I am working on I have an embedded system running a Linux based operating system. I have an external device with is controlled via RS232 that I need to write a driver for.
Questions:
1) Is there a way to access serial ports from withing kernel space (and possibly use serial.h, serial_core.h, etc.), how is this usually done, any good examples?
2) From what I found it seems like it would be much easier to access the serial ports in user space by just opening dev/ttyS* and writing to it. When writing a driver for a device like this (RS232 device) is it preferred to do it in user space or is there a way to write a kernel module? How does one decide to write a driver as a kernel module over user space or vise versa?
Are drivers only for generic devices such as UART/serial and then above that is userspace or should this driver be written as a kernel module? I appreciate the help, I have been unable to find much information to answer my questions.
There are a few times when a module that communicates over a serial port may be in the kernel. The pppd (point to point protocol daemon) is one example as Linux has some kernel code devoted to that since it is a high traffic use of serial and it also needs to turn around and put the IP packets into kernel space.
Most other uses would work better from user space since you have a good API that already takes care of a lot of the errors that can happen. This also lessens the chance that your errors will result in massive system failure.
Doing things like this from user space does result in some latency. Reads and writes are buffered, and it's often difficult to tell where in the write operations the hardware actually is, and canceling an already succeeded write call isn't really doable from user space, even if the hardware hasn't yet received the bytes.
I would suggest attempting to do it from user space first and then move to OS driver if necessary. Even if it is necessary to move this into an OS level driver, you'll likely be able to get some progress made from user space.

Kernel bypass for UDP and TCP on Linux- what does it involve?

Per http://www.solacesystems.com/blog/kernel-bypass-revving-up-linux-networking:
[...]a network driver called OpenOnload that use “kernel bypass” techniques to run the application and network driver together in user space and, well, bypass the kernel. This allows the application side of the connection to process many more messages per second with lower and more consistent latency.
[...]
If you’re a developer or architect who has fought with context switching for years kernel bypass may feel like cheating, but fortunately it’s completely within the rules.
What are the functions needed to do such kernel bypassing?
A TCP offload engine will "just work", no special application programming needed. It doesn't bypass the whole kernel, it just moves some of the TCP/IP stack from the kernel to the network card, so the driver is slightly higher level. The kernel API is the same.
TCP offload engine is supported by most modern gigabit interfaces.
Alternatively, if you mean "running code on a SolarFlare network adapter's embedded processor/FPGA 'Application Onload Engine'", then... that's card-specific. You're basically writing code for an embedded system, so you need to say which kind of card you're using.
Okay, so the question is not straight forward to answer without knowing how the kernel handles the network stack.
In generel the network stack is made up of a lot of layers, with the lowest one being the actual hardware, typically this hardware is supported by means of drivers (one for each network interface), the nic's typically provide very simple interfaces, think recieve and send raw data.
On top of this physical connection, with the ability to recieve and send data is a lot of protocols, which are layered as well, near the bottem is the ip protocol, which basically allows you to specify the reciever of your information, while at the top you'll find TCP which supports stable connections.
So in order to answer your question, you most first figure out which part of the network stack you'll need to replace, and what you'll need to do. From my understanding of your question it seems like you'll want to keep the original network stack, and then just sometimes use your own, and in that case you should really just implement the strategy pattern, and make it possible to state which packets should be handled by which toplevel of the network stack.
Depending on how the network stack is implemented in linux, you may or may not be able to achieve this, without kernel changes. In a microkernel architecture, where each part of the network stack is implemented in its own service, this would be trivial, as you would simply pipe your lower parts of the network stack to your strategy pattern, and have this pipe the input to the required network toplevel layers.
Do you perhaps want to send and recieve raw IP packets?
Basically you will need to fill in headers and data in a ip-packet.
There are some examples here on how to send raw ethernet packets:
:http://austinmarton.wordpress.com/2011/09/14/sending-raw-ethernet-packets-from-a-specific-interface-in-c-on-linux/
To handle TCP/IP on your own, i think that you might need to disable the TCP driver in a custom kernel, and then write your own user space server that reads raw ip.
It's probably not that efficient though...

How a wireless NIC works on the hardware level and the capabilities?

I couldn't think of a better title name, but to the point.
I wanted to know how the wireless NIC/NIC operates with the system it is connected to (not over a connection, but an internal one or a pci, or usb, or any other peripheral), and what can the card itself do with the system (besides connecting to some router or AP, or anything not dealing with the hardware internals), or what it can communicate to the local system?
I'm not sure if these are defined at the assembly level or in the user-space level, so I would also like to know that as well, if possible.
External wireless NICs attach over USB; internal ones typically use PCI or PCIe.
The details of how these devices communicate with the host are all device-specific. In many cases, the NIC runs a firmware which the host must upload to the device at startup. The details of what this firmware must contain are basically never documented. A few wireless NICs (typically the older ones) actually implement hardware commands to perform operations like associating with an AP, but most do not.
There are no standards here. Every device is a little bit different. There is also almost never any documentation. If you want to learn more, your best bet is to find the source code for a Linux driver for the device you're interested in, dig into it, and hope it's well-commented.

how is tcp(kernel) bypass implemented?

Assuming I would like to avoid the overhead of the linux kernel in handling incoming packets and instead would like to grab the packet directly from user space. I have googled around a bit and it seems that all that needs to happen is one would use raw sockets with some socket options. Is this the case? Or is it more involved than this? And if so, what can I google for or reference in order to implement something like this?
There are many techniques for networking with kernel bypass.
First, if you are sending messages to another process on the same machine, you can do so through a shared memory region with no jumps into the kernel.
Passing packets over a network without involving the kernel gets more interesting, and involves specialized hardware that gets direct access to user memory. This idea is called RDMA.
Here's one way it can work (this is what InfiniBand hardware does). The application registers a memory buffer with the RDMA hardware. This buffer is pinned in physical memory, since swapping it out would obviously be bad (since the hardware will keep writing to the physical memory region). A control region is also mapped into userspace memory. When an application is ready to use the buffer to send or receive a message, it writes a command to the control region. The hardware takes the data from a registered buffer on one end, and places it into another registered buffer at the other end.
Clearly, this is too low level, so there are abstractions that make programming RDMA hardware easier. OFED verbs are one such abstraction.
The InfiniBand software stack has one extra interesting bit: the Sockets Direct Protocol (SDP) that is used for compatibility with existing applications. It works by inserting an LD_PRELOAD shim that translates standard socket API calls into IB verbs.
InfiniBand is just what I'm most familiar with. RoCE/iWARP hardware is very similar from the programmer's perspective, but uses a different transport than InfiniBand (TCP using an offload engine in iWarp, Ethernet in RoCE). There are/were also other approaches to RDMA (Quadrics, for example).

Resources