How to know Darwin kernel scheduler time slice?

How to know Darwin kernel scheduler time slice? - c

On Linux, sched.h contains the definition of
int sched_rr_get_interval(pid_t pid, struct timespec * tp);
to get the time slice of a process. However the file shipping with OS X El Capitan doesn't hold that definition.
Is there an alternative for this on OS X?

The API's related to this stuff are pretty byzantine and poorly documented, but here's what I've found.
First, the datatypes related to RR scheduling seem to be in /usr/include/mach/policy.h, around line 155. There's this struct:
struct policy_rr_info {
...
integer_t quantum;
....
};
The quantum is, I think, the timeslice (not sure of units.) Then grepping around for this or related types defined in the same place, I found the file /usr/include/mach/mach_types.def, which says that the type struct thread_policy_t contains a field policy_rr_info_t on line 203.
Next, I found in /usr/include/mach/thread_act.h the public function thread_policy_get, which can retrieve information about a thread's policy into a struct thread_policy_t *.
So, working backwards. I think (but haven't tried at all) that you can
Use the thread_policy_get() routine to return information about the thread's scheduling state into a thread_policy_t
That struct seems to have a policy_rr_info_t sub-substructure
That sub-structure should have a quantum field.
That field appears to be the timeslice, but I don't know about the units.
There are no man pages for this part of the API, but this Apple Developer page explains at least a little bit about how to use this API.
Note that this is all gleaned from just grepping the various kernel headers, and I've definitely not tried to use any of these APIs in any actual code.

Related

Proper way of getting the address of non-exported kernel symbols in a Linux kernel module

I'm currently working on a Linux kernel module to intercept some syscalls for printing statistics about them system-wide.
I've come across different ways of getting the address of the sys_call_table symbol, but have yet to find a way that works on recent kernels (e.g. 5.11). On older kernels, wouldn't we have used kallsyms_lookup_name? It looks like that symbol is no longer exported.
I could just look at /proc/kallsyms, but this seems like a bad idea and not generalizable. What are other alternatives?

Disclaimer: using non-exported symbols is in general not a good idea, so you should only do it for educational purposes, not for production-ready modules/drivers.
Before Linux v5.7, you indeed would have used kallsyms_lookup_name() to look-up non-exported kernel symbols from a module. See How do I access any kernel symbol in a kernel module? if you want to know how.
However, the symbol stopped being exported in v5.7 because nobody was using it outside of core kernel code, and it was just there to be abused by modules to find and use other non-exported symbols. Here's also a relevant LWN article on this. Nowadays there isn't really a "proper way" to work around this problem, but there a number of different "hacks" that you could consider.
The following approaches cover both kernel functions and global objects (i.e. global variables):
If you are already compiling the kernel, you can add EXPORT_SYMBOL() after the definition of the symbol(s) you are interested in. This is the simplest option given you are willing to modify the kernel and build a custom one. You could also export kallsyms_lookup_name() in kernel/kallsyms.c and then use that, if you really want.
You can use an unsigned long module parameter, passing the needed symbol address (taken from /proc/kallsyms) when loading the module, and then cast it to the appropriate type:
static unsigned long addr;
module_param_named(addr, addr, ulong, 0);
MODULE_PARM_DESC(addr, "Address of the `foo` symbol");
static <type_of_foo_here> *foo_ptr;
// Examples:
// int foo(char *) -> int (*foo_ptr)(char *)
// unsigned long foo -> unsigned long *foo_ptr
static int __init mymodule_init(void)
{
foo_ptr = (typeof(foo_ptr))addr;
// ...
return 0;
}
Then you'd be able to do something like this:
sudo insmod mymodule.ko addr=0x$(sudo grep ' some_symbol_name' /proc/kallsyms | cut -d' ' -f1)
If your kernel supports kprobes, you can [ab]use a kprobe to make the kernel lookup a symbol for you through kprobe_register(). This approach is detailed in this other answer. Due to the intended usage of kprobes, this will only work for functions, however you can simply find kallsyms_lookup_name() first, and then use that to lookup any other symbol.
In order for this to work, your kernel needs to be configured with CONFIG_KPROBES=y as well as CONFIG_KALLSYMS=y (and possibly also CONFIG_KALLSYMS_ALL=y), since register_kprobe() uses exactly kallsyms_lookup_name() under the hood. Automatic symbol address resolution for kprobes has been supported since Linux v2.6.16.
For functions only, you can also consider re-implementing the functionality in your module. For example, task_statm() implemented in fs/proc/task_mmu.c is a rather small function that only uses other exported functions, so "borrowing" it for use in your module would be rather straightforward.
Chances are that you want to call some non-exported function for a more specific purpose than what it was designed for. In such case, a good idea would be to look at the kernel source to understand how it works, and only re-implement the bare minimum needed for your module.
Finally, you could technically open and read /proc/kallsyms from kernel space using filp_open() + kernel_read() from <linux/fs.h>, though this would probably be the objectively worst solution overall.

We could also find the address of the kallsyms_lookup_name function using kprobes.
Quotes taken from here (kprobes)
Kprobes enables you to dynamically break into any kernel routine and collect debugging and performance information non-disruptively. You can trap at almost any kernel code address
To register a kprobe, first a kprobe struct needs to be initialized with the name of the symbol that needs to be trapped. We can do that by setting the symbol_name in the kprobe struct.
#include <linux/kprobes.h>
static struct kprobe kp = {
.symbol_name = "kallsyms_lookup_name"
};
The kprobe struct has the following elements within it (shortened for brevity):
struct kprobe {
...
/* location of the probe point */
kprobe_opcode_t *addr;
/* Allow user to indicate symbol name of the probe point */
const char *symbol_name;
...
}
With the introduction of the “symbol_name” field to struct kprobe, the probepoint address resolution will now be taken care of by the kernel.
Once the symbol_name is set, the address of the probe point is determined by the kernel.
So, now all that's left to do is to register the probe, extract the probepoint address and then unregister it:
typedef unsigned long (*kallsyms_lookup_name_t)(const char *name);
kallsyms_lookup_name_t kallsyms_lookup_name;
register_kprobe(&kp);
kallsyms_lookup_name = (kallsyms_lookup_name_t) kp.addr;
unregister_kprobe(&kp);
We now have the kallsyms_lookup_name address. Using that we can find the sys_call_table address the old-fashioned way:
kallsyms_lookup_name("sys_call_table");
Source for kprobe struct
Source for kprobe technique

How to get the struct info of a struct

Is it possible to get the struct info (that is, the keys) of any struct? Or is required that you go to the manual page to read up what the actual structure is for that object. Take the following example:
struct stat stats;
stat(filepath, &stats);
printf("Size: %lld\n", stats.st_size);
Is it possible to do something like stats.keys(), or whatever a potentially equivalent operation would be to see the inner structure of a struct ?

You can read the man page, or you can read the header; there are no built-in introspection facilities in the C language that will inspect structs for their attributes in any convenient way.
In theory, if you compile the executable with debugging symbols a debugger might be able to tell you some of this (after loading and parsing the executable and its symbols), but that's generally going to be less convenient than just reading the docs.

Rebuild a dynamic library upon argument typedef change

Let's assume, I have a C structure, DynApiArg_t.
typedef struct DynApiArg_s {
uint32_t m1;
...
uint32_t mx;
} DynApiArg_t;
The pointer of this struct is passed as an arg to a function say
void DynLibApi(DynApiArg_t *arg)
{
arg->m1 = 0;
another_fn_in_the_lib(arg->mold); /* May crash here. (1) */
}
which is present in a dynamic library, libdyn.so. This API is invoked from an executable via a dlopen/dlsym procedure of invocation.
In case this dynamic library is updated to version 2, where DynApiArg_t now has new member, say m2, as below:
typedef struct DynApiArg_s {
uint32_t m1;
OldMbr_t *mold;
...
uint32_t mx;
uint32_t m2;
NewMbr *mnew;
} DynApiArg_t;
Without a complete rebuild of the executable or other libs that call this API via a dlopen/dlsym, everytime this API is invoked, I see the process crashing, due to the some dereference of any member in the struct. I understand accessing m2 may be a problem. But access to member mold like below is seen causing crashes.
typedef void (*fnPtr_t)(DynApiArg_t*);
void DynApiCaller(DynApiArg_t *arg)
{
void *libhdl = dlopen("libdyn.so", RTLD_LAZY | RTLD_GLOBAL);
fnPtr_t fptr = dlsym(libhdl, "DynLibApi");
fnptr(arg); /* actual call to the dynamically loaded API (2) */
}
In the call to the API via fnptr, at line marked (2), when the old/existing members (in v1 of lib, when DynApiCaller was initially compiled) is accessed at (1), it happens to be any garbage value or even NULL at times.
What is the right way to handle such updates without a complete recompilation of the executable everytime the dependant libs are updated?
I've seen libs being named with symliks with version numbers like libsolid.so.4. Is there something related to this versioning system that can help me? If so can you point me to right documentations for these if any?

There are a number of approaches to solve this problem:
Include the API version in the dynamic library name.
Instead of dlopen("libfoo.so"), you use dlopen("libfoo.so.4"). Different major versions of the library are essentially separate, and can coexist on the same system; so, the package name for that library would be e.g. libfoo-4. You can have libfoo.so.4 and libfoo.so.5 installed at the same time. Minor versions, say libfoo-4.2, install libfoo.so.4.2, and symlink libfoo.so.4 to libfoo.so.4.2.
Initially define the structures with zero padding (required to be zero in earlier versions of the library), and have the later versions reuse the padding fields, but keeping the structures the same size.
Use versioned symbol names. This is a Linux extension, using dlvsym(). A single shared library binary can implement several versions of the same dynamic symbol.
Use resolver functions to determine the symbols at load time. This allows e.g. hardware architecture-optimized variants of functions to be selected at run time, but is less useful with a dlopen()-based approach.
Use a structure to describe the library API, and a versioned function to obtain/initialize that API.
For example, version 4 of your library could implement
struct libfoo_api {
int (*func1)(int arg1, int arg2);
double *data;
void (*func2)(void);
/* ... */
};
and only export one symbol,
int libfoo_init(struct libfoo_api *const api, const int version);
Calling that function would initialize the api structure with the symbols supported, with the assumption that the structure corresponds to the specified version. A single shared library can support multiple versions. If a version is not supported, it can return a failure.
This is especially useful for plugin-type interfaces (although then the _init function is more likely to call application-provided functionality registering functions, rather than fill in a structure), as a single file can contain optimized functionality for a number of versions, optimized for a number of compatible hardware architectures (for example, AMD/Intel architectures with different SSE/AVX/AVX2/AVX512 support).
Note that the above implementation details can be "hidden" in a header file, making actual C code using the shared library much simpler. It also helps making the same API work across a number of OSes, simply by changing the header file to use the approach that works best on that OS, while keeping the actual C interface the same.

what is do_nanosleep() in C

I came across a method named do_nanosleep() in C that I don't understand how it is used? One thing I know that it has to do with the suspending the execution of the calling thread, but that task is handled by nanosleep() in C. If that's true, then what is the need of do_nanosleep() here and how it is different from nanosleep()?
For reference, this is what it does.
/* arguments are seconds and nanoseconds */
inline void
do_nanosleep(time_t tv_sec, long tv_nsec)
{
struct timespec req;
req.tv_sec = tv_sec;
req.tv_nsec = tv_nsec;
nanosleep(&req, NULL);
}

Since do_nanosleep() is not a standard function, you will have to track it in your source code, or in the manuals for your system, to see what it does. It might be a portability wrapper which uses nanosleep() when it is available, and something else (usleep() or even sleep()) when it is not. It might do something completely unrelated to sleeping, too — but it probably does do what its name suggests.
Google has not (yet — 5 minutes after it was asked) indexed your question, and it does not know anything about do_nanosleep(). That suggests the code should be in your source somewhere, rather than in a system manual.
With the function definition in the question, we can see that instead of requiring the user to create a struct timespec, they can call do_nanosleep() with two arguments, the first for the seconds and the second for the fractions of a second (0..999,999,999 measured in nanoseconds). It then calls nanosleep(). So, in the minds of the people who wrote the software, do_nanosleep() presents a slightly more convenient interface to the underlying nanosleep() function. Since it is inline, the declarations for struct timespec must still be in scope, so I'm not convinced I agree with the authors, but it is not automatically a wrong decision.

It looks like it's just a simplified (and crippled) wrapper around POSIX nanosleep.
The first parameter is the number of seconds, and the second is the number of nanoseconds.
Like, do_nanosleep(3, 500000000) would (hopefully) sleep for 3 and a half seconds.
Since the function completely ignores return values... Your mileage may vary.

Returning a pair of items from a function

I know all the normal ways of doing this, what I'm looking for is an article I read a long long time ago that showed how to write some crazy helper functions using inline assembly and __naked functions to return a pair of items.
I've tried googling for it endlessly, but to no avail, I'm hoping someone else knows the article I'm talking about and has a link to it.

No assembly necessary
struct PairOfItems {int item1;double item2;};
struct PairOfItems crazyhelperfunction(void){
struct PairOfItems x = {42, -0.42};
return x;
}
I don't know about the article.

The __naked that I know of is
#define __naked __attribute__((naked))
Which is a GCC attribute that is documented here.
It is not supported on all targets. Doing a google code search will turn up some uses of it, though.
The reasons that it is done as a macro is so that it can be defined as empty for targets that don't support it or if the headers are being used with some other compiler.
I do remember (I think) seeing some examples of this for the division and modulation (divmod) functions for avr_gcc headers. This returned a struct that had both return values in it, but stored the whole thing in registers rather than on the stack.
I don't know if __naked had anything to do with the ability to return both parts of the result (which were in a struct) in registers (rather than on the stack), but it did allow the inline function to consist entirely of a call to one helper function.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to know Darwin kernel scheduler time slice? - c

On Linux, sched.h contains the definition of int sched_rr_get_interval(pid_t pid, struct timespec * tp); to get the time slice of a process. However the file shipping with OS X El Capitan doesn't hold that definition. Is there an alternative for this on OS X?

Related

Proper way of getting the address of non-exported kernel symbols in a Linux kernel module

How to get the struct info of a struct

Rebuild a dynamic library upon argument typedef change

what is do_nanosleep() in C

Returning a pair of items from a function

Categories

Resources