PCIe bus latency when using ioctl vs read? - c

I've got a hardware client1 who's line of data acquisition cards I've written a Linux PCI kernel driver for.
The card can only communicate 1-4 bytes at a time depending on how the user specifies to utilize it, given this, I utilize ioctl for some of the functionality, but also make use of the file_operations structure to treat the card as a basic character device to give the user of the card the ability to just use read or write if they just want simple 1-byte communication with the card.
After discussing the driver with the client, one of their developers is in the understanding that treating the card as a character device by using open/read/write will introduce latency on the PCI bus, versus using open/ioctl.
Given that the driver makes no distinction how it's opened and the ioctl and read/write functions call the same code, is there any validity to this concern?
If so, how would I test the bus latency from my driver code? Are there kernel functions I can call to test this?
Lastly, wouldn't my test of the bus only be valid for my specific setup (kernel settings, platform, memory timing, CPU, etc.)?
1: they only have 2 other developers, neither of which have ever used Linux

I suspect the client's developer is slightly confused. He's thinking that the distinction between using read or write versus ioctl corresponds to the type of operation performed on the bus. If you explain to him that this is just a software API difference and either option performs the exact same operation on the bus, that should satisfy them.

Related

What does dev_net_set do in Linux?

I am writing a simple net device driver based on the loopback driver and want to register my net_device structure. This and that page on writing a net device say to just call register_netdev. But they're writing fancy drivers with PCI express and other complicated things.
So, if I just want something like the loopback driver, I should presumably base my code on loopback.c. My question is, what does the first line of this code in loopback_net_init do:
dev_net_set(dev, net);
err = register_netdev(dev);
Apparently net is determined by this code in net_namespace.c:
register_pernet_device(ops) ...
__register_pernet_operations(list, ops)
for_each_net(net) ...
What is this looping for? What might go wrong if I skip the dev_net_set call? Why are others not using it?
AFAIK, net is a structure that will allow the kernel to interact with the device. You need it to register the device and remove it in the module cleanup function. Please review the code under linux/net/8021q/ for examples.
AFAIK, looping happens at the level of sockets (layer 5-7), whereas net_dev is used as the kernel component that immediately interacts with the driver, when you actually want to use a say, ethernet card, or SLIP,PLIP for transmitting frames (layer 2-0). Loopback happens at the level of the network subsystem of the kernel, and lies way above the drivers which interact with the hardware. So I don't see why you would need a driver to use the loopback feature. However, there is also a provision for registering a dummy device with net_dev, though I don't know if that is what you are looking for.
That said, if your intention is to simply use some driver that simulates an actual physical device without one and say, reflects the packets that it recieves, that is possible too. Basically till the net_dev layer, the kernel does all the protocol stuff (TCP/IP), and finally passes off the packet to some handle that the device driver registers with the net_dev or something similar. Similarly on receiving stuff, the device triggers an interrupt, the driver does a DMA operation, and the kernel takes over from there. Hence instead of the code for doing the DMA operation, you can make a module that simply pass over a static packet, that is compatible with ethernet/TCP/IP . In a vast majority of cases, all these aspects (the network and other subsystems) are agnostic to the underlying bus details, i.e. it shouldn't matter whether the ethernet card is connected to PCI or ISA but there can be exceptions. Thus, IMHO, you are trying to do something that should only be attempted after having a thorough understanding of the network subsystem, and a good enough understanding of the kernel as a whole. Till then you will be shooting in the dark. Sometimes you may hit, but often-times you will miss.
http://man7.org/linux/man-pages/man8/ip-netns.8.html
A network namespace is logically another copy of the network stack,
with its own routes, firewall rules, and network devices.
So for_each_net is looping over these namespaces and creating a copy of all "per net" network devices in each one.
Use ip netns list to determine whether you are using network namespaces. Often they are not used, so drivers do not necessarily need to use dev_net_set.

Linux Device Tree: How to make the device file?

On my ARM system (Tegra based), I'm running the mainline linux kernel. It uses the device tree system.
I have enabled a hardware driver for the General-Memory-Bus (part of the SoC) in the .dts file by setting its status="okay". Recompiled the dtb and booted the kernel. But no device (/dev/xx) appears.
The driver is compiled into the kernel and can be seen by
cat /lib/modules/$(uname -r)/modules.builtin
The command
cat /sys/firmware/devicetree/base/<path to device>/status
returns "okay".
Do I need to make some kind of "mknod"?
What else is nessesary?
The traditional UNIX "stream of bytes" device model is a pretty high-level abstraction of most modern hardware, and as such there are plenty of drivers which do not create /dev entries for the devices they control largely because they don't fit that model. Bus drivers in particular are very much a case of that - they exist, but only for the sake of discovering and allowing access to the devices behind them; there is no /dev/sata that lets you interact with the actual host controller, sending out raw commands on any old port regardless of what's connected or not; there is no /dev/usb that lets you attempt arbitrary transfers to arbitrary endpoints which may or may not exist.
Furthermore, your typical 'external interface' controller as in this case is orders of magnitude less complex than an interface like SATA or USB - the 'device' itself is often little more than a register block controlling some clocks and a chip-select multiplexer. Even if the driver did create something you could interact with directly, there's not exactly much you could do with it.
The correct way to proceed in this situation is to describe your FPGA device in the DT as a child of the GMI bus, accurately reflecting the hardware, no less, then develop your own driver for that. The bus driver itself just sits transparently in the middle. And if you do want a quick and dirty way to get started by just reading and writing bus addresses directly, well, it's behind a memory-mapped I/O region; that's exactly what /dev/mem exists for.

Linux device driver for a RS232 device in embedded system

I have recently started learning to write Linux device drivers for a specific project that I am working on. Previously most of the work I have done has been with devices running no OS so Linux drivers and development is somewhat new to me.
For the project I am working on I have an embedded system running a Linux based operating system. I have an external device with is controlled via RS232 that I need to write a driver for.
Questions:
1) Is there a way to access serial ports from withing kernel space (and possibly use serial.h, serial_core.h, etc.), how is this usually done, any good examples?
2) From what I found it seems like it would be much easier to access the serial ports in user space by just opening dev/ttyS* and writing to it. When writing a driver for a device like this (RS232 device) is it preferred to do it in user space or is there a way to write a kernel module? How does one decide to write a driver as a kernel module over user space or vise versa?
Are drivers only for generic devices such as UART/serial and then above that is userspace or should this driver be written as a kernel module? I appreciate the help, I have been unable to find much information to answer my questions.
There are a few times when a module that communicates over a serial port may be in the kernel. The pppd (point to point protocol daemon) is one example as Linux has some kernel code devoted to that since it is a high traffic use of serial and it also needs to turn around and put the IP packets into kernel space.
Most other uses would work better from user space since you have a good API that already takes care of a lot of the errors that can happen. This also lessens the chance that your errors will result in massive system failure.
Doing things like this from user space does result in some latency. Reads and writes are buffered, and it's often difficult to tell where in the write operations the hardware actually is, and canceling an already succeeded write call isn't really doable from user space, even if the hardware hasn't yet received the bytes.
I would suggest attempting to do it from user space first and then move to OS driver if necessary. Even if it is necessary to move this into an OS level driver, you'll likely be able to get some progress made from user space.

how is tcp(kernel) bypass implemented?

Assuming I would like to avoid the overhead of the linux kernel in handling incoming packets and instead would like to grab the packet directly from user space. I have googled around a bit and it seems that all that needs to happen is one would use raw sockets with some socket options. Is this the case? Or is it more involved than this? And if so, what can I google for or reference in order to implement something like this?
There are many techniques for networking with kernel bypass.
First, if you are sending messages to another process on the same machine, you can do so through a shared memory region with no jumps into the kernel.
Passing packets over a network without involving the kernel gets more interesting, and involves specialized hardware that gets direct access to user memory. This idea is called RDMA.
Here's one way it can work (this is what InfiniBand hardware does). The application registers a memory buffer with the RDMA hardware. This buffer is pinned in physical memory, since swapping it out would obviously be bad (since the hardware will keep writing to the physical memory region). A control region is also mapped into userspace memory. When an application is ready to use the buffer to send or receive a message, it writes a command to the control region. The hardware takes the data from a registered buffer on one end, and places it into another registered buffer at the other end.
Clearly, this is too low level, so there are abstractions that make programming RDMA hardware easier. OFED verbs are one such abstraction.
The InfiniBand software stack has one extra interesting bit: the Sockets Direct Protocol (SDP) that is used for compatibility with existing applications. It works by inserting an LD_PRELOAD shim that translates standard socket API calls into IB verbs.
InfiniBand is just what I'm most familiar with. RoCE/iWARP hardware is very similar from the programmer's perspective, but uses a different transport than InfiniBand (TCP using an offload engine in iWarp, Ethernet in RoCE). There are/were also other approaches to RDMA (Quadrics, for example).

What does the machine code for networking look like?

At the end of the day every piece of code we write eventually gets turned into assembler and then machine language.
If you were writing assembler and wanting to perform a simple connection between two computers, how would you know which memory addresses to use (let alone offsets) within the assembler? Would you need to know specific addresses relating to the operating system?
I'm just wondering how somebody would write a really "clean" and "efficient" message passing library/compiler- the thing which is getting me is what on earth would network communications/IPC look like in assembler?
I think part of this answer could lie with querying known addresses relating to the OS? For example 0x4545456 to 0x 60000000 contains the Linux kernel data for communications X etc.
The addresses are not specific to your OS. They are specific to your hardware/system. Accessing those has nothing to do with assembler vs. another programming language (e.g. C), in fact most device driver code (the code that actually interacts with the networking hardware) is typically written in C.
Here's just one random sample of a network (ethernet) controller:
IntelĀ® 82580EB/82580DB GbE Controller: Datasheet
There are a bunch of registers that your software, either in assembler, or in another language, has to program to get this thing to actually communicate over ethernet. It's probably easier to start with a simpler example, something like a serial port. Let's build a hypothetical, fixed baud rate, serial port controller, mapped to memory:
Address Meaning
0 RX status (reads 0 when no data to read, 1 a byte is available)
1 RX buffer
2 TX status (reads 0 when ready to send, 1 when busy)
3 TX buffer
Now your software, either in assembler or any other language, can transmit data to another computer, by monitoring (polling) address 2 until it's ready, writing the next byte to address 3. We can also received data from another computer by monitoring (polling) address 0 to see when data is ready and reading the byte from address 1 when the data is there.
In a modern operating system/OS those are all physical addresses which need to be somehow mapped into virtual addresses.
Real world hardware, such as the one I linked to, will typically use interrupts, so you don't need to poll. It will usually have DMA, so the hardware can access your data directly rather than you feeding it byte by byte. It will handle various protocols and will have registers for checking and setting various aspects of this protocol.
In a modern OS the actual interaction with the hardware is implemented in a device driver and user software can exchange data with the device driver through some API. Again, this user code may be written in assembler or any other language. The API will vary depending on the OS. Communication/networking is generally built as a "stack" with higher level protocols implemented over the lower level ones. Which part of this stack is in a user library or part of the OS will vary between different operating systems.
For the hypothetical device I described above the API may consist of two single byte blocking calls, read() and write(). You then use some sort of system call mechanism from either assembler or a higher level language to call these and pass parameters/retrieve the output. In some operating systems device I/O may look like file I/O so you would use the generic file read/write to perform operations on the device and the OS will dispatch those to the right device driver. Furthermore, in a typical OS the actual system call will be available through some sort of library, which again you may call from various programming languages.
There are two pieces of code for doing networking in assembly - the kernel code used by the operating system to actually do the networking, and client code that wants to tell the OS what data to send over the network.
Typically, the hardware in a machine has certain memory addresses dedicated to communicating with the network hardware. The machine code for the OS can then write the appropriate values into this memory to control the hardware that ends up sending and receiving bytes. These memory addresses would be hardcoded into the machine code.
In the case of user code that does networking (say, Mozilla Firefox), the process is different. There is typically a machine instruction or set of instructions that are used for user code to tell the operating system to perform some task (in MIPS, for example, this is syscall, while I think x86 uses the int instruction). Client code would work by setting up some buffers with the appropriate data to send to the network, then would use one of the assembly instructions above to tell the OS that it should send the data. The hardware then invokes the OS, which reads the user data and then uses its own machine code (described above) to actually control the network device appropriately. In this way, the OS can guard direct access to the network device by blocking access to the physical addresses controlling the device and moderating access through system calls. It also means that you don't need to know any memory addresses when writing user code to do networking. The OS handles these details, and all you need to know about is what instruction to execute to trigger the system call.
Hope this helps!

Resources