Storing struct array in kernel space, Linux - c

I believe I may be over-thinking this problem a bit... I've got a text file located on my filesystem which I am parsing at boot and storing the results into an array of structs. I need to copy this array from user space into kernel space (copy_from_user), and must have this data accessible by the kernel at any time. The data in kernel space will need to be accessed by the Sockets.c file. Is there a special place to store an array within kernel space, or can I simply add a reference to the array in Sockets.c? My C is a bit rusty...
Thanks for any advice.

I believe there are two main parts in your problem:
Passing the data from userspace to kernelspace
Storing the data in the kernelspace
For the first issue, I would suggest using a Netlink socket, rather than the more traditional system call (read/write/ioctl) interface. Netlink sockets allow configuration data to be passed to the kernel using a socket-like interface, which is significantly simpler and safer to use.
Your program should perform all the input parsing and validation and then pass the data to the kernel, preferably in a more structured form (e.g. entry-by-entry) than a massive data blob.
Unless you are interested in high throughput (megabytes of data per second), the netlink interface is fine. The following links provide an explanation, as well as an example:
http://en.wikipedia.org/wiki/Netlink
http://www.linuxjournal.com/article/7356
http://linux-net.osdl.org/index.php/Generic_Netlink_HOWTO
http://www.kernel.org/doc/Documentation/connector/
As far as the array storage goes, if you plan on storing more than 128KB of data you will have to use vmalloc() to allocate the space, otherwise kmalloc() is preferred. You should read the related chapter of the Linux Device Drivers book:
http://lwn.net/images/pdf/LDD3/ch08.pdf
Please note that buffers allocated with vmalloc() are not suitable for DMA to/from devices, since the memory pages are not contiguous. You might also want to consider a more complex data structure like a list if you do not know how many entries you will have beforehand.
As for accessing the storage globally, you can do it as with any C program:
In a header file included by all .c files that you need to access the data put something like:
extern struct my_struct *unique_name_that_will_not_conflict_with_other_symbols;
The extern keyword indicates that this declares a variable that is implemented at another source file. This will make this pointer accesible to all C files that include this header.
Then in a C file, preferrably the one with the rest of your code - if one exists:
struct my_struct *unique_name_that_will_not_conflict_with_other_symbols = NULL;
Which is the actual implementation of the variable declared in the header file.
PS: If you are going to work with the Linux kernel, you really need to brush up on your C. Otherwise you will be in for some very frustrating moments and you WILL end up sorry and sore.
PS2: You will also save a lot of time if you at least skim through the whole Linux Device Drivers book. Despite its name and its relative age, it has a lot of information that is both current and important when writing any code for the Linux Kernel.

You can just define an extern pointer somewhere in the kernel (say, in the sockets.c file where you're going to use it). Initialise it to NULL, and include a declaration for it in some appropriate header file.
In the part of the code that does the copy_from_user(), allocate space for the array using kmalloc() and store the address in the pointer. Copy the data into it. You'll also want a mutex to be locked around access to the array.
The memory allocated by kmalloc() will persist until freed with kfree().

Your question is basic and vague enough that I recommend you work through some of the exercises in this book. The whole of chapter 8 is dedicated to allocating kernel memory.

Initializing the Array as a global variable in your Kernel Module will make it accessible forever until the kernel is running i.e. until your system is running.

Related

What is the best way to allocate an array when reading big files

I have a file around 100 MB, which needs to be processed.
After I get the dimensions of that file (h & w), I should read the data into an array. I am thinking of several ways how to do that:
1. Static (automatic)
int matrix[h][w];
2. Dynamic
// similar to above, but using malloc
I am worried about the limitations (and freeing the memory).
Also, would a static array be freed whet it's scope is over?
In my situation, the solution is using dynamic allocation.
It seems that int matrix[h][w]; puts data in stack, which is limited (small), whereas using malloc() puts data in heap, which is as big as 75% of the virtual memory (in Linux).
It depends on the application, but one good approach most of the time is to map the file to memory. On a modern OS, this will return a pointer to the file contents and make a file on a hard drive work like the swap file: the application will just see its contents as memory, but the OS will only load the pages into memory when (it expects that) you access them. This could save you a lot of time and complication if you only need to read a small part of the file.
This is how the glib library does file I/O, or it is available through mmap() on Unix/Linux and a different set of functions in the Windows API.

Why all system calls in Linux passes arguments to kernel using "call by reference"?

If we look at the syscalls.h file in Linux kernel, we can see that all most all the arguments of the system calls are passed by reference. For example
asmlinkage long sys_open_by_handle_at(int mountdirfd,
struct file_handle __user *handle,
int flags);
Here, file_handle is passed as a pointer. Why not simple the value is not passed to kernel?
Efficiency.
Many (most?) systems implement function calls by pushing argument values onto a stack. If you pass a struct or any other complex data type by value, you'd need to copy it to the stack. There's no reason to do this, since the kernel has access to the entire memory space of the process. Aside from the copy cost, you'd also increase the stack space needed.
In addition, the kernel will need to copy any data it needs to retain into the kernel memory space. The kernel can't rely on user space code behavior. (It's also not going to free anything obtained from user space, which eliminates some any concerns over mixing up responsibility for reclaiming memory.)
Finally, realistically, coders working in the kernel need to be very comfortable with working with pointers. There's really no advantage to passing by value once you're completely comfortable with pointers.
This part is a bit more of an opinion, but I think there's also a strong legacy effect. The Unix kernel and C developed somewhat in tandem. See https://en.wikipedia.org/wiki/C_(programming_language) for some of the history. It's been a long time, but if I recall correctly, older versions of C wouldn't allow you to pass a struct by value. Regardless, working with pointers was highly idiomatic in C (and I would say still is). In other words, this is just how things have always been done.
The memory space for user mode and kernel mode are different. When you make a system call, the MMU of the Linux subsystem makes sure that proper memory mapping of the user space process running in their own Virtual address space is done to the Physical address space of the kernel.
Variables in the user mode stay in the process' virtual address space. They can't just be passed in system calls and expected to get mapped in the physical address space .
This is what my understanding is. Would love to discuss and clarify if needed.
Principally I understand that the struct file_handle parameter of the function sys_open_by_handle_at(() is an "in" parameter, i.e. it is not modified by the function. Therefore it could as well be passed by value. I see about three reasons why this is not done. All reasons are surely valid for this particular function; at least the last argument (K&R) applies to all struct arguments, in all system calls.
The struct can have a size of e.g. 128 bytes which would be slow to copy to the stack.
Passing a pointer obviates the need to know the struct definition on the caller side. The struct is an "opaque handle" filled by a previous call to [sys_]name_to_handle_at(). The caller doesn't want to and actually shouldn't be burdened with the details of the struct's contents. (Leaving the caller innocent obviates the need to recompile the program because the struct's layout changes. I can also imagine that the contents differs between file system types.)
Unix and even its open source complement Linux is older than C99. I suppose that for the longest time K&R C was the smallest common denominator C standard the kernel sources adhered to. In K&R C it is simply not possible to pass structs by value.

Pointer Array shared between OpenCL kernels

Is it possible to share an array of pointers between multiple kernels in OpenCL. If so, how would I go about implementing it? If I am not completely mistaken - which may though be the case - the only way of sharing things between kernels would be a shared cl_mem, however I also think these cannot contain pointers.
This is not possible in OpenCL 1.x because host and device have completely separate memory spaces, so a buffer containing host pointers makes no sense on the device side.
However, OpenCL 2.0 supports Shared Virtual Memory (SVM) and so memory containing pointers is legal because the host and device share an address space. There are three different levels of granularity though, which will limit what you can have those pointers point to. In the coarsest case they can only refer to locations within the same buffer or other SVM buffers currently owned by the device. Yes, cl_mem is still the way to pass in a buffer to a kernel, but in OpenCL 2.0 with SVM that buffer may contain pointers.
Edit/Addition: OP points out they just want to share pointers between kernels. If these are just device pointers, then you can store them in the buffer in one kernel and read them from the buffer in another kernel. They can only refer to __global, not __local memory. And without SVM they can't be used on the host. The host will of course need to allocate the buffer and pass it to both kernels for their use. As far as the host is concerned, it's just opaque memory. Only the kernels know they are __global pointers.
I ran into a similar problem, but I managed to get around it by using a simple pointer structure.I have doubts about the fact that someone says that buffers change their position in memory,perhaps this is true for some special cases.But this definitely cannot happen while the kernel is working with it. I have not tested it on different video cards, but on nvidia(cl 1.2) it works perfectly, so I can access data from an array that was not even passed as an argument into the kernel.
typedef struct
{
__global volatile point_dataT* point;//pointer to another struct in different buffer
} pointerBufT;
__kernel void tester(__global pointerBufT * pointer_buf){
printf("Test id: %u\n",pointer_buf[coord.x+coord.y*img_width].point->id);//Retrieving information from an array not passed to the kernel
}
I know that this is a late reply, but for some reason I have only come across negative answers to similar questions, or a suggestion to use indexes instead of pointers. While a structure with a pointer inside works great.

How to use a mmap file mapping for variables

I'm currently experimenting with IPC via mmap on Unix.
So far, I'm able to map a sparse-file of 10MB into the RAM and access it reading and writing from two separated processes. Awesome :)
Now, I'm currently typecasting the address of the memory-segment returned by mmap to char*, so I can use it as a plain old cstring.
Now, my real question digs a bit further. I've quite a lot experience with higher levels of programming (ruby, java), but never did larger projects in C or ASM.
I want to use the mapped memory as an address-space for variable-allocation. I dont't wether this is possible or does make any sense at all. Im thinking of some kind of a hash-map-like data structure, that lives purely in the shared segment. This would allow some interesting experiments with IPC, even with other languages like ruby over FFI.
Now, a regular implementation of a hash would use pretty often something like malloc. But this woult allocate memory out of the shared space.
I hope, you understand my thoughts, although my english is not the best!
Thank you in advance
Jakob
By and large, you can treat the memory returned by mmap like memory returned by malloc. However, since the memory may be shared between multiple "unrelated" processes, with independent calls to mmap, the starting address for each may be different. Thus, any data structure you build inside the shared memory should not use direct pointers.
Instead of pointers, offsets from the initial map address should be used instead. The data structure would then compute the right pointer value by adding the offset to the starting address of the mmamp region.
The data structure would be built from the single call to mmap. If you need to grow the data structure, you have to extend the mmap region itself. The could be done with mremap or by manually munmap and mmap again after the backing file has been extended.

Where is the FILE struct allocated?

In C, when opening a file with
FILE *fin;
fin=fopen("file.bin","rb");
I only have a pointer to a structure of FILE. Where is the actual FILE struct allocated on Windows machine? And does it contain all the necessary information for accessing the file?
My aim is to dump the whole data segment to disk and then to reload the dumped file back to the beginning of the data segment. The code that reloads the dumped file is placed in a separate function. This way, the fin pointer is local and is on the stack, thus is not being overwritten on reload. But the FILE struct itself is not local. I take care not to overwrite the memory region of size sizeof(FILE) that starts at the address fin.
The
fread(DataSegStart,1,szTillFin,fin);
fread(dummy,1,sizeof(FILE),fin);
fread(DataSegAfterFin,1,szFinTillEnd,fin);
operations completes successfully, but I get an assertion failure on
fclose(fin)
Do I overwrite some other necessary file data other than in the FILE struct?
The actual instance of the FILE structure exists within the standard library. Typically the standard library allocates some number of FILE structures, which may or may not be a fixed number of them. When you call fopen(), it returns a pointer to one of those structures.
The data within the FILE structure likely contains pointers to other things such as buffers. You're unlikely to be able to save and restore those structures to disk without some really deep integration with your standard library implementation.
You may be interested in something like CryoPID which does process save and restore at a different level.
It seems like you're trying to do something dangerous, unlikely to work.
fopen allocates a FILE structure and initializes it. fclose releases it. How it allocates it and what it puts in it is implementation dependent. It could contain a pointer to another piece of memory, which is also allocated somewhere (since it's buffered I/O, I guess it does allocate a buffer somewhere).
Writing code that relies on the internals of fopen is dangerous, most likely won't work, and surely won't be stable and portable.
Well, you have a pointer to a FILE object, so technically you know where it is but you should be aware that FILE is deliberately an opaque type. You shouldn't need to know what it contains, you just need to know that you can pass it to functions that know about it to perform certain actions. Additionally, FILE may not be a complete type so sizeof(FILE) might not be correct and, additionally, the object might contain pointers to other structures. Simply avoiding overwriting the FILE object is not likely to be sufficient for you to avoid corrupting the program by writing over most of its memory.
FILE is defined in stdio.h. It contains all the information about the file but, looking at the code you show, I think you don't understand its purpose. It is created and run through the operating system with the C library which fills FILE with information about the file but it is not contained in the file itself.

Resources