How to unsafely remove blockdevice driver in Linux - c

I am writing a block device driver for linux.
It is crucial to support unsafe removal (like usb unplug). In other words, I want to be able to shut down the block device without creating memory leaks / crashes even while applications hold open files or performing IO on my device or if it is mounted with file system.
Surely unsafe removal would possibly corrupt the data which is stored on the device, but that is something the customers are willing to accept.
Here is the basics steps I have done:
Upon unsafe removal, block device spawns a zombie which will automatically fail all new IO requests, ioctls, etc. The zombie substitutes make_request function and changes other function pointers so kernel would not need the original block device.
Block device waits for all IO which is running now (and use my internal resources) to complete
It does del_gendisk(); however this does not really free's kernel resources because they are still used.
Block device frees itself.
The zombie keeps track of the amount of opens() and close() on the block device and when last close() occurs it automatically free() itself
Result - I am not leaking the blockdevice, request queue, gen disk, etc.
However this is a very difficult mechanism which requires a lot of code and is extremely prone to race conditions. I am still struggling with corner cases, per_cpu counting of io's and occasional crashes
My questions: Is there a mechanism in the kernel which already does that? I searched manuals, literature, and countless source code examples of block device drivers, ram disks and USB drivers but could not find a solution. I am sure, that I am not the first one to encounter this problem.
I learned from the answer below, by Dave S about the hot-plug mechanism but it does not help me. I need a solution of how to safely shut down the driver and not how to notify the kernel that driver was shut down.
Example of one problem:
blk_queue_make_request() registers a function through which my block devices serves IO. In that function I increment per_cpu counters to know how many IO's are in flight by each cpu. However there is a race condition of function being called but counter was not increased yet, so my device thinks there are 0 IO's, releases the resources and then IO comes and crashes the system. Hotplug will not assist me with this problem as far as I understand

About a decade ago I used hotplugging on a software driver project to safely add/remove an external USB disk drive which interfaced to an embedded Linux driven Set-top Box.
For your project you will also need to write a hot plug. A hotplug is a program which is used by the kernel to notify user mode software when some significant (usually hardware-related) events take place. An example is when a USB device has just been plugged in or removed.
From Linux 2.6 kernel onwards, hotplugging has been integrated with the driver model core so that any bus or class can report hotplug events when devices are added or removed.
In the kernel tree, /usr/src/linux/Documentation/usb/hotplug.txt has basic information about USB Device Driver API support for hotplugging.
See also this link, and GOOGLE as well for examples and documentation.
Another very helpful document which discusses hotplugging with block devices can be found here:
This document also gives a good example of illustrating hotplug events handling:
Below is a table of the main variables you should be aware of:
Hotplug event variables:
Every hotplug event should provide at least the following variables:
The current hotplug action: "add" to add the device, "remove" to remove it.
The 2.6.22 kernel can also generate "change", "online", "offline", and
"move" actions.
Path under /sys at which this device's sysfs directory can be found.
If this is "block", it's a block device. Anything other subsystem is
either a char device or does not have an associated device node.
The following variables are also provided for some devices:
If these are present, a device node can be created in /dev for this device.
Some devices (such as network cards) don't generate a /dev node.
If present, a suggested driver (module) for handling this device. No
relation to whether or not a driver is currently handling the device.
When SUBSYSTEM=net, these variables indicate the name of the interface
and a unique integer for the interface. (Note that "INTERFACE=eth0" could
be paired with "IFINDEX=2" because eth0 isn't guaranteed to come before lo
and the count doesn't start at 0.)
The system is requesting firmware for the device.

If the driver is creating device it could be possible to suddenly delete it:
echo 1 > /sys/block/device-name/device/delete where device-name may be sde, for example,
echo 1 > /sys/class/scsi_device/h:c:t:l/device/delete, where h is the HBA number, c is the channel on the HBA, t is the SCSI target ID, and l is the LUN.
In my case, it perfectly simulates scenarios for crushing writes and recovery of data from journaling.
Normally to safely remove device more steps is needed so deleting device is a pretty drastic event for data and could be useful for testing :)
please consider this:


How do I create a text wrapper for the kernel? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I studied the content from the website and built a compiler, launched a test kernel, but I thought, how can I create a primitive text shell for the kernel with commands? Maybe someone can explain to me with an example how to implement this. There is nothing interesting on the site itself, of course there is an article but it is useless for me. I'm a beginner if that.
how can I create a primitive text shell for the kernel with commands?
The correct way is:
Write enough kernel code to manage various resources (memory, IRQs, IO ports, DMA channels, ...). This should include managing time (a scheduler), and should also include some kind of inter-process communication (so that the scheduler can be told "Don't give this task any more CPU time until/unless it receives data from inter-process communication").
Enumerate devices, determine each device's resources, and start drivers for whichever devices you find. Note that this is hierarchical. For example, if you enumerate PCI buses and find 2 USB controllers attached to PCI buses and start their device drivers, then you'll need to enumerate each of the USB controllers to find any USB devices attached to USB buses and might find 3 USB hubs, and then you'll need to enumerate all 3 USB hubs to see what is plugged into them. All of this should be coordinated by some kind of "device manager" which keeps track of a hierarchical tree of devices, so that (e.g.) if a device is unplugged or sent to a power saving state (or if its device driver crashes) you can inform drivers that depended on that device (e.g. if a USB hub is unplugged you can inform all drivers for devices that were attached to that hub).
write keyboard device driver/s. These should decode the data from the device (likely using tables and other information describing a "keyboard layout" that's loaded from file system) and send packets of data using the kernel's inter-process communication (so that any task can say "don't give me any CPU time until I receive data from the keyboard driver"). This will involve designing a standard way that all keyboard drivers (and all software emulating a keyboard - e.g. "on screen keyboard" for people using touchscreens, etc) will behave (e.g. the format of that data packet they send, etc); and probably should involve creating a formal "keyboard device driver interface" specification for your OS to describe whatever you designed (in addition to designing a file format for "keyboard layout files").
write video device driver/s. This will also include designing a suitable video driver interface for your OS (and should including writing a formal specification describing it). However; video is complex and you can cheat by only designing part of the video driver interface and leaving the rest of it (video mode setting, 3D, GPGPU, ...) until later. The same applies to the video driver itself - you will want to start with "generic raw frame buffer driver" (that just uses a frame buffer configured by the boot loader) and probably won't write actual drivers for specific video cards.
(optional) write some kind of upper layer to control which task is the main task for each set of user input/output devices. This allows the user to have multiple virtual consoles and switch between them (e.g. maybe with "control+alt+F1" to "control+alt+F12"), possibly allowing some virtual consoles to be associated with terminals and others to be associated with different GUIs. It can also make it easy to support "multiple seat" (e.g. if there's 2 keyboards and 2 monitors, then you can have 2 completely separate users with each user having one keyboard and one monitor).
create a task with a simple main loop that uses inter-process communication ("don't give me any more CPU time until/unless I receive data from the keyboard") and processes the data it received to build up a current command string, then (if/when the user presses the enter key) parses the command string and does whatever the command says. Note that if you get this far it's tempting to do a tiny little bit of extra work to support user-space, and make it a normal process instead of a kernel task.
The incorrect way is:
don't have a kernel that supports some/most of the things that device drivers and other code must rely on
don't do any kind of device enumeration. Instead, make wild assumptions about which devices are present and which resources they use.
don't put any thought into device driver interfaces. Just slap together whatever seemed convenient (and continually break everything whenever you change any device driver).
don't use tasks (or inter-process communication). Instead, build the "shell" into the keyboard driver's IRQ handler to make sure the entire OS "pauses" when someone enters any time consuming command.
don't continue working on the OS after you get the shell to "work". This will be necessary because the code will be too inflexible and too fragile (any attempt to do anything else will cause you to have to rewrite everything).
Note: In my experience people that ask questions like "how do I write a kernel shell" are likely to have skipped everything that matters (because they'd know how to write a shell if they've done everything that a shell depends on); and are so focused on having a shell that they're very tempted to do it all the incorrect way (and then get stuck later and regret it).

What does dev_net_set do in Linux?

I am writing a simple net device driver based on the loopback driver and want to register my net_device structure. This and that page on writing a net device say to just call register_netdev. But they're writing fancy drivers with PCI express and other complicated things.
So, if I just want something like the loopback driver, I should presumably base my code on loopback.c. My question is, what does the first line of this code in loopback_net_init do:
dev_net_set(dev, net);
err = register_netdev(dev);
Apparently net is determined by this code in net_namespace.c:
register_pernet_device(ops) ...
__register_pernet_operations(list, ops)
for_each_net(net) ...
What is this looping for? What might go wrong if I skip the dev_net_set call? Why are others not using it?
AFAIK, net is a structure that will allow the kernel to interact with the device. You need it to register the device and remove it in the module cleanup function. Please review the code under linux/net/8021q/ for examples.
AFAIK, looping happens at the level of sockets (layer 5-7), whereas net_dev is used as the kernel component that immediately interacts with the driver, when you actually want to use a say, ethernet card, or SLIP,PLIP for transmitting frames (layer 2-0). Loopback happens at the level of the network subsystem of the kernel, and lies way above the drivers which interact with the hardware. So I don't see why you would need a driver to use the loopback feature. However, there is also a provision for registering a dummy device with net_dev, though I don't know if that is what you are looking for.
That said, if your intention is to simply use some driver that simulates an actual physical device without one and say, reflects the packets that it recieves, that is possible too. Basically till the net_dev layer, the kernel does all the protocol stuff (TCP/IP), and finally passes off the packet to some handle that the device driver registers with the net_dev or something similar. Similarly on receiving stuff, the device triggers an interrupt, the driver does a DMA operation, and the kernel takes over from there. Hence instead of the code for doing the DMA operation, you can make a module that simply pass over a static packet, that is compatible with ethernet/TCP/IP . In a vast majority of cases, all these aspects (the network and other subsystems) are agnostic to the underlying bus details, i.e. it shouldn't matter whether the ethernet card is connected to PCI or ISA but there can be exceptions. Thus, IMHO, you are trying to do something that should only be attempted after having a thorough understanding of the network subsystem, and a good enough understanding of the kernel as a whole. Till then you will be shooting in the dark. Sometimes you may hit, but often-times you will miss.
A network namespace is logically another copy of the network stack,
with its own routes, firewall rules, and network devices.
So for_each_net is looping over these namespaces and creating a copy of all "per net" network devices in each one.
Use ip netns list to determine whether you are using network namespaces. Often they are not used, so drivers do not necessarily need to use dev_net_set.

Linux Device Tree: How to make the device file?

On my ARM system (Tegra based), I'm running the mainline linux kernel. It uses the device tree system.
I have enabled a hardware driver for the General-Memory-Bus (part of the SoC) in the .dts file by setting its status="okay". Recompiled the dtb and booted the kernel. But no device (/dev/xx) appears.
The driver is compiled into the kernel and can be seen by
cat /lib/modules/$(uname -r)/modules.builtin
The command
cat /sys/firmware/devicetree/base/<path to device>/status
returns "okay".
Do I need to make some kind of "mknod"?
What else is nessesary?
The traditional UNIX "stream of bytes" device model is a pretty high-level abstraction of most modern hardware, and as such there are plenty of drivers which do not create /dev entries for the devices they control largely because they don't fit that model. Bus drivers in particular are very much a case of that - they exist, but only for the sake of discovering and allowing access to the devices behind them; there is no /dev/sata that lets you interact with the actual host controller, sending out raw commands on any old port regardless of what's connected or not; there is no /dev/usb that lets you attempt arbitrary transfers to arbitrary endpoints which may or may not exist.
Furthermore, your typical 'external interface' controller as in this case is orders of magnitude less complex than an interface like SATA or USB - the 'device' itself is often little more than a register block controlling some clocks and a chip-select multiplexer. Even if the driver did create something you could interact with directly, there's not exactly much you could do with it.
The correct way to proceed in this situation is to describe your FPGA device in the DT as a child of the GMI bus, accurately reflecting the hardware, no less, then develop your own driver for that. The bus driver itself just sits transparently in the middle. And if you do want a quick and dirty way to get started by just reading and writing bus addresses directly, well, it's behind a memory-mapped I/O region; that's exactly what /dev/mem exists for.

Linux device driver for a RS232 device in embedded system

I have recently started learning to write Linux device drivers for a specific project that I am working on. Previously most of the work I have done has been with devices running no OS so Linux drivers and development is somewhat new to me.
For the project I am working on I have an embedded system running a Linux based operating system. I have an external device with is controlled via RS232 that I need to write a driver for.
1) Is there a way to access serial ports from withing kernel space (and possibly use serial.h, serial_core.h, etc.), how is this usually done, any good examples?
2) From what I found it seems like it would be much easier to access the serial ports in user space by just opening dev/ttyS* and writing to it. When writing a driver for a device like this (RS232 device) is it preferred to do it in user space or is there a way to write a kernel module? How does one decide to write a driver as a kernel module over user space or vise versa?
Are drivers only for generic devices such as UART/serial and then above that is userspace or should this driver be written as a kernel module? I appreciate the help, I have been unable to find much information to answer my questions.
There are a few times when a module that communicates over a serial port may be in the kernel. The pppd (point to point protocol daemon) is one example as Linux has some kernel code devoted to that since it is a high traffic use of serial and it also needs to turn around and put the IP packets into kernel space.
Most other uses would work better from user space since you have a good API that already takes care of a lot of the errors that can happen. This also lessens the chance that your errors will result in massive system failure.
Doing things like this from user space does result in some latency. Reads and writes are buffered, and it's often difficult to tell where in the write operations the hardware actually is, and canceling an already succeeded write call isn't really doable from user space, even if the hardware hasn't yet received the bytes.
I would suggest attempting to do it from user space first and then move to OS driver if necessary. Even if it is necessary to move this into an OS level driver, you'll likely be able to get some progress made from user space.

Does USB mass-storage class requires re-enumeration after timeout?

this might be a stupid question,
I was debugging a USB storage device on an ARM-CortexM4 platform (STM32F4 series) which runs embedded Linux. The ARM is working as USB host, and tries to communicate with a thumb drive in USB full speed (12Mb/s).
Now here is the problem. After successful enumeration and several SCSI commands thru BULK transfers, the capacity and everything can be read correctly. However, after about 15 seconds when I try to send these SCSI commands again (under same condition), the USB host controller just returns 'Transaction Error', which looks like the device is not responding to BULK transfers anymore (not ACKing) and the host controller times out. The question is, is there any timeout mechanism for USB mass-storage class or SCSI system such that, after a timeout the system must be re-enumerated or re-probed, otherwise it won't respond anymore?
I understand this might be due to a stupid error in my program, or due to some limitations on the specific hardware. However when I used usbmon module in Linux on a PC to capture the transfers on the very same thumb drive, I can see the operating system actually sends a sequence probing command (Read-max-Lun followed by Test-unit-ready) every 5 sec, which could be the reason why the thumb drive doesn't fail on my PC.
Thanks! I'm looking forward to any replies.
I think you're on the right track with the Test Unit Ready commands.. I am in the middle of writing a mass storage device driver for an embedded device and When testing on OS X, after the initial SCSI queries, my device receives Test Unit Ready command about once every second when no other activity is occurring. Since your post is quite old, I recommend you post your own solution if you've since solved your problem.
Otherwise try adding periodic test unit ready commands from the host side when there is no other activity.. You could set and activate a timer whenever USB activity is occurring. If the timer fires, u can send a Test unit ready command.. Rinse repeat.
