Quite by chance stumbled upon some code in kernel jungles and was a bit confused. There are two implementations of kzalloc(): in tools/virtio/linux/kernel.h and the main one in linux/slab.h. Obviously, in most cases the second one is used. But sometimes the "virtio" kzalloc() is used.
"virtio" kzalloc() looks like this:
static inline void *kzalloc(size_t s, gfp_t gfp)
{
void *p = kmalloc(s, gfp);
memset(p, 0, s);
return p;
}
My confusion is that "fake" kmalloc() used inside "tools" directory can return NULL-pointer. Also it looks like the memset() implementation doesn't check NULL-pointers so there could be NULL-pointer dereference.
Is it a bug or am I missing something?
Yes, that definitely looks like a bug.
The tools/ subdirectory is a collection of user space tools (as the name suggests). You can also see this by the fact that several C standard library headers are included. So this of course is not a kernel bug (that would have been very bad), just a minor oversight in the virtio testing tool.
That virtio testing tool seems to re-define some kernel APIs to mock their behavior in userspace. That function though doesn't seem to be ever used in practice, just merely defined.
marco:~/git/linux/tools/virtio$ grep -r kzalloc
linux/kernel.h:static inline void *kzalloc(size_t s, gfp_t gfp)
ringtest/ptr_ring.c:static inline void *kzalloc(unsigned size, gfp_t flags)
marco:~/git/linux/tools/virtio$
It's probably meant to be used by someone who wishes to test some virtio kernel code in userspace.
In any case, you could try reporting the bug. The get_mantainer.pl script suggests:
$ perl scripts/get_maintainer.pl -f tools/virtio/linux/kernel.h
Bad divisor in main::vcs_assign: 0
"Michael S. Tsirkin" <mst#redhat.com> (maintainer:VIRTIO CORE AND NET DRIVERS)
Jason Wang <jasowang#redhat.com> (maintainer:VIRTIO CORE AND NET DRIVERS)
virtualization#lists.linux-foundation.org (open list:VIRTIO CORE AND NET DRIVERS)
linux-kernel#vger.kernel.org (open list)
The header is mainly used for userspace testing, such as virtio_test.
From the git-log of tools/virtio/virtio_test.c:
This is the userspace part of the tool: it includes a bunch of stubs for
linux APIs, somewhat simular to linuxsched. This makes it possible to
recompile the ring code in userspace.
A small test example is implemented combining this with vhost_test
module.
So yes, the code is a bit unsafe (clean coding would test for a NULL pointer prior to memset() and bail out with an appropriate error message), but since it is just a testing tool, it seems to have been considered uncritical to skip this test.
Related
I am looking into programming the ESP8266 serial-wifi chip. In its SDK examples it makes extensive use of a function called os_zalloc where I would expect malloc.
Occasionally though, os_malloc is used as well. So they do not appear to be identical in function.
Unfortunately there is no documentation. Can anybody make an educated guess from the following header file?
#ifndef __MEM_H__
#define __MEM_H__
//void *pvPortMalloc( size_t xWantedSize );
//void vPortFree( void *pv );
//void *pvPortZalloc(size_t size);
#define os_malloc pvPortMalloc
#define os_free vPortFree
#define os_zalloc pvPortZalloc
#endif
Since os_zalloc is a macro, and the definition is given in mem.h, a better question to ask would be about what pvPortZalloc does.
Given the function names pvPortMalloc, vPortFree and pvPortZalloc it would appear that the OS in use is FreeRTOS (or it's commercially licensed equivalent OpenRTOS), which is documented - although not specifically pvPortZalloc, but it would be strange if it was not simply allocate and zero initialise - that is for example what it means here. The functions are part of the target porting layer for FreeRTOS, and are not normally called by the application level, but I imagine here the macro wrapper is used to access the porting layer code for application user rather than write it twice.
In an RTOS kernel RTOS aware dynamic memory allocation functions are required to ensure thread safety, although some standard library implementations include thread safety stubs that you implement using the RTOS mutex calls, which is a better method since existing libaries and C++ new/delete can be more easily used.
I would say "allocate memory and fill with zeros"
We are in the process of modularizing our code for an embedded device, trying to get from $%#!$ spaghetti code to something we can actually maintain. There are several versions of this device, which mostly differ in the amount of peripherals they have, some have an sd card, some have ethernet and so and on. Those peripherals need intialization code.
What I'm looking for is a way to execute the specific initialization code of each component just by putting the .h/.c files into the project (or not). In C++ right now I would be tempted to put a global object into each component that registers the neccessary functions/methods during its initialization. Is there something similiar for plain C code, preferably some pre-processor/compile-time magic?
It should look something like this
main.c
int main(void) {
initAllComponents();
}
sdio.c
MAGIC_REGISTER_COMPONENT(sdio_init)
STATUS_T sdio_init() {
...
}
ethernet.c
MAGIC_REGISTER_COMPONENT(ethernet_init)
STATUS_T ethernet_init() {
...
}
and just by putting the sdio.c/ethernet.c (and/or .h) into the project initAllComponents() would call the respective *_init() functions.
TIA
There is nothing in plain C that does this. There are however compiler extensions and linker magic that you can do.
GCC (and some others that try to be compatible) has __attribute__((constructor)) that you can use on functions. This requires support from the runtime (libc or ld.so or equivalent).
For the linker magic (that I personally prefer since it gives you more control on when things actually happen) look at this question and my answer. I have a functional implementation of it on github. This also requires gcc and a linker that creates the right symbols for us, but doesn't require any support from the runtime.
I was looking for a good general-purpose library for C on top of the standard C library, and have seen several suggestions to use glib. How 'obtrusive' is it in your code? To explain what I mean by obtrusiveness, the first thing I noticed in the reference manual is the basic types section, thinking to myself, "what, am I going to start using gint, gchar, and gprefixing geverything gin gmy gcode gnow?"
More generally, can you use it only locally without other functions or files in your code having to be aware of its use? Does it force certain assumptions on your code, or constraints on your compilation/linking process? Does it take up a lot of memory in runtime for global data structures? etc.
The most obtrustive thing about glib is that any program or library using it is non-robust against resource exhaustion. It unconditionally calls abort when malloc fails and there's nothing you can do to fix this, as the entire library is designed around the concept that their internal allocation function g_malloc "can't fail"
As for the ugly "g" types, you definitely don't need any casts. The types are 100% equivalent to the standard types, and are basically just cruft from the early (mis)design of glib. Unfortunately the glib developers lack much understanding of C, as evidenced by this FAQ:
Why use g_print, g_malloc, g_strdup and fellow glib functions?
"Regarding g_malloc(), g_free() and siblings, these functions are much safer than their libc equivalents. For example, g_free() just returns if called with NULL.
(Source: https://developer.gnome.org/gtk-faq/stable/x908.html)
FYI, free(NULL) is perfectly valid C, and does the exact same thing: it just returns.
I have used GLib professionally for over 6 years, and have nothing but praise for it. It is very light-weight, with lots of great utilities like lists, hashtables, rand-functions, io-libraries, threads/mutexes/conditionals and even GObject. All done in a portable way. In fact, we have compiled the same GLib-code on Windows, OSX, Linux, Solaris, iOS, Android and Arm-Linux without any hiccups on the GLib side.
In terms of obtrusiveness, I have definitely "bought into the g", and there is no doubt in my mind that this has been extremely beneficial in producing stable, portable code at great speed. Maybe specially when it comes to writing advanced tests.
And if g_malloc don't suit your purpose, simply use malloc instead, which of course goes for all of it.
Of course you can "forget about it elsewhere", unless of course those other places somehow interact with glib code, then there's a connection (and, arguable, you're not really "elsewhere").
You don't have to use the types that are just regular types with a prepended g (gchar, gint and so on); they're guaranteed to be the same as char, int and so on. You never need to cast to/from gint for instance.
I think the intention is that application code should never use gint, it's just included so that the glib code can be more consistent.
Is there a way to programmatically check if a single C source file is potentially harmful?
I know that no check will yield 100% accuracy -- but am interested at least to do some basic checks that will raise a red flag if some expressions / keywords are found. Any ideas of what to look for?
Note: the files I will be inspecting are relatively small in size (few 100s of lines at most), implementing numerical analysis functions that all operate in memory. No external libraries (except math.h) shall be used in the code. Also, no I/O should be used (functions will be run with in-memory arrays).
Given the above, are there some programmatic checks I could do to at least try to detect harmful code?
Note: since I don't expect any I/O, if the code does I/O -- it is considered harmful.
Yes, there are programmatic ways to detect the conditions that concern you.
It seems to me you ideally want a static analysis tool to verify that the preprocessed version of the code:
Doesn't call any functions except those it defines and non I/O functions in the standard library,
Doesn't do any bad stuff with pointers.
By preprocessing, you get rid of the problem of detecting macros, possibly-bad-macro content, and actual use of macros. Besides, you don't want to wade through all the macro definitions in standard C headers; they'll hurt your soul because of all the historical cruft they contain.
If the code only calls its own functions and trusted functions in the standard library, it isn't calling anything nasty. (Note: It might be calling some function through a pointer, so this check either requires a function-points-to analysis or the agreement that indirect function calls are verboten, which is actually probably reasonable for code doing numerical analysis).
The purpose of checking for bad stuff with pointers is so that it doesn't abuse pointers to manufacture nasty code and pass control to it. This first means, "no casts to pointers from ints" because you don't know where the int has been :-}
For the who-does-it-call check, you need to parse the code and name/type resolve every symbol, and then check call sites to see where they go. If you allow pointers/function pointers, you'll need a full points-to analysis.
One of the standard static analyzer tool companies (Coverity, Klocwork) likely provide some kind of method of restricting what functions a code block may call. If that doesn't work, you'll have to fall back on more general analysis machinery like our DMS Software Reengineering Toolkit
with its C Front End. DMS provides customizable machinery to build arbitrary static analyzers, for the a language description provided to it as a front end. DMS can be configured to do exactly the test 1) including the preprocessing step; it also has full points-to, and function-points-to analyzers that could be used to the points-to checking.
For 2) "doesn't use pointers maliciously", again the standard static analysis tool companies provide some pointer checking. However, here they have a much harder problem because they are statically trying to reason about a Turing machine. Their solution is either miss cases or report false positives. Our CheckPointer tool is a dynamic analysis, that is, it watches the code as it runs and if there is any attempt to misuse a pointer CheckPointer will report the offending location immediately. Oh, yes, CheckPointer outlaws casts from ints to pointers :-} So CheckPointer won't provide a static diagnostic "this code can cheat", but you will get a diagnostic if it actually attempts to cheat. CheckPointer has rather high overhead (all that checking costs something) so you probably want to run you code with it for awhile to gain some faith that nothing bad is going to happen, and then stop using it.
EDIT: Another poster says There's not a lot you can do about buffer overwrites for statically defined buffers. CheckPointer will do those tests and more.
If you want to make sure it's not calling anything not allowed, then compile the piece of code and examine what it's linking to (say via nm). Since you're hung up on doing this by a "programmatic" method, just use python/perl/bash to compile then scan the name list of the object file.
There's not a lot you can do about buffer overwrites for statically defined buffers, but you could link against an electric-fence type memory allocator to prevent dynamically allocated buffer overruns.
You could also compile and link the C-file in question against a driver which would feed it typical data while running under valgrind which could help detect poorly or maliciously written code.
In the end, however, you're always going to run up against the "does this routine terminate" question, which is famous for being undecidable. A practical way around this would be to compile your program and run it from a driver which would alarm-out after a set period of reasonable time.
EDIT: Example showing use of nm:
Create a C snippet defining function foo which calls fopen:
#include <stdio.h>
foo() {
FILE *fp = fopen("/etc/passwd", "r");
}
Compile with -c, and then look at the resulting object file:
$ gcc -c foo.c
$ nm foo.o
0000000000000000 T foo
U fopen
Here you'll see that there are two symbols in the foo.o object file. One is defined, foo, the name of the subroutine we wrote. And one is undefined, fopen, which will be linked to its definition when the object file is linked together with the other C-files and necessary libraries. Using this method, you can see immediately if the compiled object is referencing anything outside of its own definition, and by your rules, can considered to be "bad".
You could do some obvious checks for "bad" function calls like network IO or assembly blocks. Beyond that, I can't think of anything you can do with just a C file.
Given the nature of C you're just about going to have to compile to even get started. Macros and such make static analysis of C code pretty difficult.
I'm looking for a way to load generated object code directly from memory.
I understand that if I write it to a file, I can call dlopen to dynamically load its symbols and link them. However, this seems a bit of a roundabout way, considering that it starts off in memory, is written to disk, and then is reloaded in memory by dlopen. I'm wondering if there is some way to dynamically link object code that exists in memory. From what I can tell there might be a few different ways to do this:
Trick dlopen into thinking that your memory location is a file, even though it never leaves memory.
Find some other system call which does what I'm looking for (I don't think this exists).
Find some dynamic linking library which can link code directly in memory. Obviously, this one is a bit hard to google for, as "dynamic linking library" turns up information on how to dynamically link libraries, not on libraries which perform the task of dynamically linking.
Abstract some API from a linker and create a new library out its codebase. (obviously this is the least desirable option for me).
So which ones of these are possible? feasible? Could you point me to any of the things I hypothesized existed? Is there another way I haven't even thought of?
I needed a solution to this because I have a scriptable system that has no filesystem (using blobs from a database) and needs to load binary plugins to support some scripts. This is the solution I came up with which works on FreeBSD but may not be portable.
void *dlblob(const void *blob, size_t len) {
/* Create shared-memory file descriptor */
int fd = shm_open(SHM_ANON, O_RDWR, 0);
ftruncate(fd, len);
/* MemMap file descriptor, and load data */
void *mem = mmap(NULL, len, PROT_WRITE, MAP_SHARED, fd, 0);
memcpy(mem, blob, len);
munmap(mem, len);
/* Open Dynamic Library from SHM file descriptor */
void *so = fdlopen(fd,RTLD_LAZY);
close(fd);
return so;
}
Obviously the code lacks any kind of error checking etc, but this is the core functionality.
ETA: My initial assumption that fdlopen is POSIX was wrong, this appears to be a FreeBSD-ism.
I don't see why you'd be considering dlopen, since that will require a lot more nonportable code to generate the right object format on disk (e.g. ELF) for loading. If you already know how to generate machine code for your architecture, just mmap memory with PROT_READ|PROT_WRITE|PROT_EXEC and put your code there, then assign the address to a function pointer and call it. Very simple.
There is no standard way to do it other than writing out the file and then loading it again with dlopen().
You may find some alternative method on your current specific platform. It will up to you to decide whether that is better than using the 'standard and (relatively) portable' approach.
Since generating the object code in the first place is rather platform specific, additional platform-specific techniques may not matter to you. But it is a judgement call - and in any case depends on there being a non-standard technique, which is relatively improbable.
We implemented a way to do this at Google. Unfortunately upstream glibc has failed to comprehend the need so it was never accepted. The feature request with patches has stalled. It's known as dlopen_from_offset.
The dlopen_with_offset glibc code is available in the glibc google/grte* branches. But nobody should enjoy modifying their own glibc.
You don't need to load the code generated in memory, since it is already in memory!
However, you can -in a non portable way- generate machine code in memory (provided it is in a memory segment mmap-ed with PROT_EXEC flag).
(in that case, no "linking" or relocation step is required, since you generate machine code with definitive absolute or relative addresses, in particular to call external functions)
Some libraries exist which do that: On GNU/Linux under x86 or x86-64, I know of GNU Lightning (which generates quickly machine code which runs slowly), DotGNU LibJIT (which generates medium quality code), and LLVM & GCCJIT (which is able to generate quite optimized code in memory, but takes time to emit it). And LuaJit has some similar facility too. Since 2015 GCC 5 has a gccjit library.
And of course, you can still generate C code in a file, fork a compiler to compile it into a shared object, and dlopen that shared object file. I'm doing that in GCC MELT , a domain specific language to extend GCC. It does work quite well in practice.
addenda
If performance of writing the generated C file is a concern (it should not be, since compiling a C file is much slower than writing it) consider using some tmpfs file system for that (perhaps in /tmp/ which is often a tmpfs filesystem on Linux)