Call traffic control (tc) from within Linux kernel - c

There is a userspace util called tc(8) for traffic shaping, i.e.
tc qdisc add dev eth0 root tbf rate 10mbit latency 100ms burst 5000.
The internal implementation of the tc command uses netlink to send specific messages to the kernel which in turn will change things accordingly.
However, there is no public interface for the kernel code for this specific procedure - as in, there is no public API like tc_qdisc_add(x,y,z) - as everything is depending on the data from the netlink message itself.
So, is there a trick to simplify the process and simulate a message from the kernel? Is there a way to bypass the userspace call to tc and get the same outcome just from a kernel context?

is there a trick to simplify the process and simulate a message from the kernel?
I don't see any way to make it simple.
If we do not go into the details of the implementation of specific tc-commands, just to contemplate an existing API inside kernel, we can see that all the related to netlink talking and qdiscs adding code is located in /net/sched subdirectory. The main function for registering qdisc is located in /net/sched/sch_api.c and called register_qdisc(). Also we can see registering basic qdiscs and netlink ops in pktsched_init().
Qdisc operations are described via struct Qdisc_ops and comprise such like init, enqueue, change, etc.
Knowing this we can take a look at how is this implemented in tbf module (net/sched/sch_tbf.c). It has several operations described with tbf_qdisc_ops. Here there is operation to change called which normally is invoked like tc_modify_qdisc() -> qdisc_change() -> tbf_change().
So depending on what exactly you want to tune, you can somehow obtain the particular qdisc, build an appropriate netlink message (struct nlmsghdr, as it is done in usermode) and invoke e.g. ->change(...) method of qdisc.
The answer does not claim to be correct. I just tried to clarify the situation a little bit.

Related

Implementation of linux char driver with multiple parameters to access

I am writing a simple char driver which accesses a PCI card. It is registered to sysfs with the help of a new class and accessible under /dev/foodev. Using standard file operations I can perform simple read and write operations to the device.
My problem: I have multiple parameters stored at different addresses on the card (version, status, control, ...) which I would like access independently. Currently having only one read and one write function I therefore have to change the address every time again in the driver code.
Obviously there is a more convenient way to implement this. I stumbled about the two following approaches and was wondering which is the better one in terms of sustainability and user accessibility:
Using ioctl commands setting the address/parameter before an
access.
Having the device already nicely set up in udev using multiple attributes
(device_create_file()) of which the user than just can write/read from
different "files":
/dev/foodev
../version
../status
../control
I think you should take a look at the PCI framework to implement your driver.
Don't (mis)use ioctls; you'll have race conditions. Use the attributes as files. That scheme is already used in sysfs. E.G. look at GPIO LEDs and keys. – sawdust

How Callback is maintained from Userspace to Kernel Space

I am learning about the driver and looking into the watchdog driver code where some value is being written to /sys/devices/virtual/wdc_per now I guess this is the logic how driver gets its value from userspace and exposed file in user space is
"sys/devices/virtual/wdc_per"
But now how actually this value from wdc_per is reached to driver, there must some callback maintained
In My case its GPIO based Watchdog driver and gpio_wdt.c may be having this callback.
But I really could not figure out how it actually happens
Anybody can help me find out this userspace to kernel space link.
First of all, this driver, gpio_wdt.c, doesn't seem to exist in the mainline kernel as of this date, so it's hard to comment it.
Sysfs (usually mounted at /sys) is actually very easy to use. This is a great example of how to create Sysfs attributes. Basically, you create attributes (will become the Sysfs file names) and register them with two defined operations (callbacks): store and show, which are the equivalent of resp. write and read. The show callback is called everytime the Sysfs file (attribute) is read and store when it's written.
When writing a device driver that belongs to an existing class (most likely your situation), you will rarely need to do that yourself. This is because the standard Linux device classes already have a working set of Sysfs attributes that your driver will use more or less indirectly.
For example, the leds class (LED devices), of which you will find the devices in /sys/class/leds, has a bunch of Sysfs attributes per LED so that a user may read/modify them from userspace (brightness, maximum brightness, trigger, etc.). Now, if you look at LED specific drivers in /drivers/leds, you won't find manual Sysfs attributes creations. You will find, however, a call to led_classdev_register when the driver is probed, which takes a struct led_classdev* as a parameter. This structure has a brightness_set callback member the specific driver needs to provide. When a user writes to /sys/class/leds/whatever-led/brightness, the leds class' store Sysfs callback gets called which in turn calls the specific driver's brightness_set callback.
My point is: make sure you really know your device class before manually adding Sysfs attributes. Anyway, when submitting your driver to the LKML, you will know fast enough if it was a good decision.

State machine event generation in multi-processor architecture

I'm having a small architecture argument with a coworker at the moment. I was hoping some of you could help settle it by strongly suggesting one approach over another.
We have a DSP and Cortex-M3 coupled together with shared memory. The DSP receives requests from the external world and some of these requests are to execute certain wireless test functionality which can only be done on the CM3. The DSP writes to shared memory, then signals the CM3 via an interrupt. The shared memory indicates what the request is along with any necessary data required to perform the request (channel to tune to, register of RF chip to read, etc).
My preference is to generate a unique event ID for each request that can occur in the interrupt. Then before leaving the interrupt pass the event on to the state machine's event queue, which would get handled in the thread devoted to RF activity.
My coworker would instead like to pass a single event ID (generic RF command) to the state machine and have the parsing of the shared memory area occur after receiving this event ID in the state machine. After parsing, then you would know the specific command that you need to act on.
I dislike this approach because you will be doing the parsing of shared memory in whatever state you happen to be in. You can make this a function, but it's still processing that should be state-independent. She doesn't like the idea of parsing shared memory in the interrupt.
Any comments on the better approach? If it helps, we're using the QP framework from Miro Samek for state machine implementation.
EDIT: moved statechart to ftp://hiddenoaks.asuscomm.com/Statechart.bmp
Here's a compromise:
pass a single event ID (generic RF command) to the state machine from the interrupt
create an action_function that "parses" the shared memory and returns a specific command
guard RF_EVENT transitions in the statechart with [parser_action_func() == RF_CMD_1] etc.
The statechart code generator should be smart enough to execute parser_action_func() only once per RF_EVENT. (Dunno if QP framework is that smart).
This has the same statechart semantics of your "unique event ID for each request," and avoids parsing the shared memory in the interrupt handler.
ADDENDUM
The difference in the statechart is N transitions labeled
----RF_EVT_CMD_1---->
----RF_EVT_CMD_2---->
...
----RF_EVT_CMD_N---->
verus
----RF_EVT[cmd()==CMD_1]---->
----RF_EVT[cmd()==CMD_2]---->
...
----RF_EVT[cmd()==CMD_N]---->
where cmd() is the parsing action function.

protocol handler using dev_add_pack consumes cpu

I wrote a kernel module and used dev_add_pack to get all the incoming packets.
According to given filter rules, if packet matches, I am forwarding it to user space.
When I am loading this kernel module and send udp traffic using sipp,
ksoftirqd process appears and starts consume cpu. (I am testing this by top command)
is there any way to save cpu ?
I guess you use ETH_P_ALL type to register your packet_type structure to protocol stack. And I think your packet_type->func is the bottleneck, which maybe itself consumes lots of cpu, or it break the existing protocol stack model and triggers other existing packet_type functions to consumes cpu. So the only way to save cpu is to optimize you packet_type->func. If your function is too complicated, you should consider to spit the function to several parts, use the simple part as the packet_type->func which runs in ksoftirqd context, while the complicated parts should be put to other kernel thread context(you can create new thread in your kernel module if needed).

Linux device model: Same device but different drivers

I'm customising Linux for an ARM9 Atmel AT91SAM960 board.
In the device file Atmel named all the USART the same atmel_usart. Of course with id enumeration:
static struct platform_device at91sam9260_uart0_device = {
.name = "atmel_usart",
.id = 1,
.dev = { ...}
}
According to the Linux Device model, all these devices (5 UARTS on a SAM9260) would be bind to the driver named atmel_usart.
I don't want to set a TTYS driver on all UARTS which will be registerd. I have several own drivers which serve for different specialised purposes (LON, RS-485 etc.) I want the control which driver does serve a certain USART. So what could I do:
The Atmel device files are unsatisfiable and I can do it better. So I rename (patch) the devices in the device file. However, in case I want a TTYS driver on UART4 I would be in trouble.
I manipulate (patch) the device file,
so that I'm able the access the
structures platform_device. I could
change their names before I would
register them. But as far as I
understood the idea of the Linux Driver Model,
devices should be
registered early during boot-up but the binding to a driver follows .... later.
I could write a driver, which has an
alias name and which would be binded
to a specific bus_Id ->
atmel_usart.4. Can I really?
What solutions else exist. I want to touch a minimal set of Kernel files but I want all the freedom possible?
Addendum what freedom means to me: I can specify at runtime how the UARTS can be used
with the Atmel-Serial driver (ttyS)
with my own drivers
It means also, that changes to the kernel source are minimal.
I built my own line discipline drivers. You can build them as kernel modules and attach them to the UARTs at runtime. No changes to the Linux source are necessary.
Any funny timing or control stuff can be done through ioctl(). Specifically, I implemented a timing-sensitive RS-485 protocol in this way.
When I did this (Linux 2.6.17) there was no dynamic registration mechanism, so I overwrote the existing line disciplines. The Linux code is (was) pretty straightforward, and I was satisfied that this would be a safe thing to do.
Your problem is quite easily solved. The 5 UART devices are presently registered at kernel startup and their function is locked. This is now how it normally works for PCI or USB devices, right? So what you need to do is pull the device registration out of the startup code and register it dynamically. You can even register/unregister as needed.
at91_register_uart() is being called from your board file for every UART that needs registered. at91_add_device_serial() will then platform_device_register all those you what setup. One solution is to let at91_register_uart() be called for all 5 UARTS, but then remove the call to at91_add_device_serial() from your board. You can then make it an exported function that can be called by your loadable drivers. You can even add an argument to it (int) so that instead of looping on all UARTS, you can select which ones to register individually. You can also mirror this function by making one that unregisters the devices.
NOTE: I think you'll need to always leave one UART dedicated as your console, if you are using one that way. You could probably hide that in the exported function by only allowing index 0->3 as in input and then mapping 0->3 to 0-4, skipping the UART that you want to use for console.

Resources