Parsing files in generic kernel extensions - c

Xcode's generic Kernel Extension requires file parsing.
For example, I want to read the contents of the A.txt file and save it as a variable. Just like you used FILE, fopen, EOF in c
As you can see, generic Kernel Extension can not include stdio.h, resulting in an error of use of undeclared identifier.
I am wondering if there is a way to parse a file in generic Kernel Extension like c.
(The following code can be used in Kernel Extension)
FILE *f;
char c;
int index = 0;
f = fopen(filepath, "rt");
while((c = fgetc(f)) != EOF){
fileContent[index] = c;
index++;
}
fileContent[index] = '\0';

It is certainly possible. You'll need to do the following:
Open the file with vnode_open(). This will turn your path into a vnode_t reference. You'll need a VFS authorisation context; you can obtain the current thread's context (i.e. open the file as the user in whose process's context the kernel is currently running) with vfs_context_create() if you don't already have one.
Perform I/O with vn_rdwr(). (Reads & writes use the same function, just pass UIO_READ or UIO_WRITE as the second argument.)
Close the file and drop references to the vnode with vnode_close(). Possibly dispose of a created VFS context using vfs_context_rele().
You'll want to look at the headerdocs for all of those functions, they're defined in <sys/vnode.h> in the Kernel.framework, and explaining every parameter exceeds the scope of a SO question/answer.
Note: As a commenter has already pointed out however, you'll want to make sure that opening files is really what needs to be done to solve whatever your problem is, particularly if you're newish to kernel programming. If at all unsure, I suggest you post a question along the lines of "I'm trying to do X, is reading the file in a kext really the best way forward?" where X is sufficiently high level, not "I need the contents of a file in the kernel" but why, and why a file specifically?
In various kernel execution contexts, file I/O may not be safe (i.e. may sometimes hang the system). If your kext loads early during boot, there might not be a file system yet. File I/O causes a lot to happen in the system, and can take a very long time in kernel terms - especially if you consider network file systems (including netboot environments!). If you're not careful, you might cause a bad user experience if the user is trying to eject a volume with a file your kext has open: the user has no way of resolving this, the OS can only suggest specific apps to close, it can't reach deep into your kext. Plus, there's the usual warnings about kernel programming in general: just because it can be done in the kernel, doesn't mean it should be. It's more the opposite: only if it can't be done any other way should it be done in a kext.

Related

Linux- C program

I have recently(yesterday) started trying to learn linux and to program in this os. Now, one interesting and probably easy problem I came across while surfing the net was something like this:
Consider a C program that takes a directory as an argument in the command line and calculates the sum of all the files' dimensions that are in the directory's tree.
Now, due to the fact that I've been doing a lot of reading and researching in a short matter of time, all my knowledge is piled up in my brian creating a cloud of confusion. If anyone could help me with the code, I'd be really thankful.
what you are asking is a basic task. It can be done in linux but can also be done in microsoft windows with minor code tweaks if you are writing a program in C or C++. you would be writing code, which is sort of at a lower level compared to other ways of doing it, to accomplish what you want.
However you don't need to write a program C, which then requires you to compile it into an executable. Because what you are asking is a basic task, you might be able to do it with a bash shell script which would be linux specific. And if you wanted to do this in Windows then you would write a .bat file which is either the DOS scripting language, or Windows Powershell. I am not that familiar with Windows, i only mention it to help give you a general understanding for "all the knowledge piled up in your brain creating a cloud of confusion".
There is the windirstat program which runs under Microsoft Windows, can get it free from sourceforge and I think it does mostly what you are asking. I am not sure if you can get source code for it.
For linux there is kdirstat and that you can get the source code for from
http://kdirstat.cvs.sourceforge.net/viewvc/kdirstat/
you can download it as GNU tarball.
Look at how that program is written, which is C++ as you'll see a bunch of .cpp files. That would be a good template to work off of, and you can see what libraries they are using to accomplish file system functions. There are 21 .cpp files, look at the file kdirstatmain.cpp first.
For C/C++ code the start of execution is with the function int main(int argc, char *argv[]).
Regarding accomplishing this task with a bash shell script in linux, the best i can tell you is web search on bash shell scripting for linux.
And in linux to calculate the sum of all the files' dimensions that are in the directory's tree we can quickly do that at the linux prompt with the du -sh . command. In linux at the prompt do man du so read about the disk usage command. And then consider looking for the source code for du to use it as a template, and work off how they implemented du to learn and then modify their ways to meet your needs.
linux du command source code
Use opendir(3) to "open" the directory. Since you are interested in learning how to program in GNU/Linux, start by typing man opendir in the terminal to read how this function works. The (3) in opendir(3) means that the help for this function can be found in the section 3 of the manpages. Notice, at the top of the page, that the manpage tells you which #includes you'll need.
If everything goes right, opendir(3) will return a DIR* object. To know which files or subdirectories it contains, you use this object with readdir(3). This should return a pointer of type struct dirent*. You can heck the manual pages for details on the fields of this structure, but the most important for you will probably be d_type and d_name. A second call to this function will return the next entry. When it returns NULL, that either means you have read all files or an error occurred. To know which happened, you should check errno.
Here's a short example that list all entries in /tmp:
#include <stdio.h>
#include <dirent.h>
#include <sys/types.h>
int main(void)
{
DIR *dir;
struct dirent *entry;
dir = opendir("/tmp");
/* should check if dir != NULL */
while ((entry = readdir(dir)) != NULL) {
printf("Found %s\n", entry->d_name);
}
/* You may want to check errno here to see if readdir returned
* NULL because all files were read or because of some error;
* but this is beyond the purposes of my example.
*/
closedir(dir);
return 0;
}
Now you have to process each entry. If it is a directory, you have to descend into it an read its contents. A recursive function will probably help you here. If it is a file, then you have at least two options:
Open it with fopen(3), then use fseek(3) to seek the end of file. Use the return value of fseek(3) to calculate the size of the file in bytes;
Use stat(2) to get a structure with information on the file. Do not confuse it with stat(1). If you simply type man stat, you'll get information about the latter. To force man to read from section 2, type man 2 stat in the command line.
The first approach is certainly simpler. The second will require you to do a bit of reading on how stat(2) works. My advice: you should do it. Not only because it's more in the lines of Linux, but also because it gives you information that fseek(3) doesn't give. For instance, you can use stat(2) to see not only how many bytes the file contains, but how many bytes it occupies in the disk (like du does).
While reading the directory, you may stumble on other types of entries other than files and directories. stat(2) will probably help you figure the sizes of them as well. But you may want to simply ignore them for now.

How can a C shared library function learn the executable's path

I am writing a C shared library in Linux in which a function would like to discover the path to the currently running executable. It does NOT have access to argv[0] in main(), and I don't want to require the program accessing the library to pass that in.
How can a function like this, outside main() and in the wild, get to the path of the running executable? So far I've thought of 2 rather unportable, unreliable ways: 1) try to read /proc/getpid()/exe and 2) try to climb the stack to __libc_start_main() and read the stack params. I worry about all machines having /proc mounted.
Can you think of another way? Is there something buried anywhere in dlopen(NULL, 0) ? Can I get a reliable proc image of self from the kernel??
Thanks for any thoughts.
/proc is your best chance, as "path of the executable" is not that well defined concept in Linux (you can even delete it while the program is running).
To get the breakdown of loaded modules (with the main executable usually being the first entry) you should look at /proc/<pid>/maps. It's a text formatted file which will allow you to associate executable and library paths with load addresses (if the former are known and still valid).
Unless you are writing software that may be used very early in system startup, you can safely assume that /proc will always be mounted on a Linux system. It contains quite a bit of data that is not accessible any other way, and thus must be mounted for a system to function properly. As such, you can pretty easily obtain a path to your executable using:
readlink("/proc/self/exe", buf, sizeof(buf));
If for some reason you want to avoid this, it's also possible to read it from the process's auxiliary vector:
#include <sys/auxv.h>
#include <elf.h>
const char *execpath = (const char *) getauxval(AT_EXECFN);
Note that this will require a recent version of glibc (2.16 or later). It'll also return the path that was used to execute your application (e.g, possibly something like ./binary), rather than its absolute path.

Moving a file on Linux in C

Platform: Debian Wheezy 3.2.0-4-686-pae
Complier: GCC (Debian 4.7.2-5) 4.7.2 (Code::Blocks)
I want to move a file from one location to another. Nothing complex like moving to different drives or to different file systems. I know the "standard" way to do this would be simply copying the file and then removing the original. But I want some way of preserving the file's ownership, mode, last access/modification, etc. . I am assuming that I will have to copy the file and then edit the new file's ownership, mode, etc. afterwards but I have no idea how to do this.
The usual way to move a file in C is to use rename(2), which sometimes fail.
If you cannot use the rename(2) syscall (e.g. because source and target are on different filesystems), you have to query the size, permission and other metadata of the source file with stat(2); copy the data looping on read(2), write(2) (using a buffer of several kilobytes), open(2), close(2) and the metadata using chmod(2), chown(2), utime(2). You might also care about copying attributes using getxattr(2), setxattr(2), listxattr(2). You could also in some cases use sendfile(2), as commented by David C. Rankin.
And if the source and target are on different filesystems, there is no way to make the move atomic and avoid race conditions (So using rename(2) is preferable when possible, because it is atomic according to its man page). The source file can always be modified (by another process) during the move operations...
So a practical way to move files is to first try doing a rename(2), and if that fails with EXDEV (when oldpath and newpath are not on the same mounted filesystem), then you need to copy bytes and metadata. Several libraries provide functions doing that, e.g. Qt QFile::rename.
Read Advanced Linux Programming - and see syscalls(2) - for more (and also try to strace some mv command to understand what it is doing). That book is freely and legally downloadable (so you could find several copies on the Web).
The /bin/mv command (see mv(1)) is part of GNU coreutils which is free software. You could either study its source code, or use strace(1) to understand what that command does (in terms of syscalls(2)). In some open source Unix shells like sash or busybox, mv might be a shell builtin. See also path_resolution(7) and glob(7).
There are subtle corner cases (imagine another process or pthread doing some file operations on the same filesystem, directory, or files). Read some operating system textbook for more.
Using a mix of snprintf(3), system(3), mv(1) could be tricky if the file name contains weird characters such as tab or or newlines, or starts with an initial -. See errno(3).
If the original and new location for the file are on the same filesystem then a "move" is conceptually identical to a "rename."
#include <stdio.h>
int rename (const char *oldname, const char *newname)

Finding file type in Linux programmatically

I am trying to find the file type of a file like .pdf, .doc, .docx etc. but programmatically not using shell command. Actually i have to make an application which blocks access to files of a particular extension. I have already hooked sys_call_table in LKM and now i want that when an open/read system call is triggered then my LKM checks the file type.
I know that we have a current pointer which gives access to current process structure and we can use it to find the file name stored in dentry structure and also in Linux a file type is identified by a magic number stored in starting bytes of file. But i don't know that how to find file type and exactly where it is stored ?
Linux doesn't "store" the file type for its files (unlike Mac OS' resource fork, which I think is the most well-known platform to do this). Files are just named streams of bytes, they have no structure implied by the operating system.
Either you just tell programs which file to use (and then it Does What You Say), or programs use higher-level features to figure it out.
There are programs that re-invent this particular wheel (I'm responsible for one of those), but you can also use e.g. file(1). Of course that requires your program to parse and "understand" the textual output you'll get, which in a sense only moves the problem.
However, I don't think calling into file from kernel space is very wise, so it's probably best to re-create the test for whatever set of types you need, to keep it small.
In other words, I mean you should simply re-implement the required tests. This is quite complicated in general, so if you really need to do it for as a large a set of types as possible, it might not be a very good idea. :/
Actually i have to make an application which blocks access to files of a particular extension.
that's a flawed requirement. If you check by file extension, then you'll miss files that doesn't use the extension which is quite common in Linux since it does not use file extension.
The officially sanctioned way of detecting file type in Linux is by their magic number. The shell command file is basically just a wrapper for libmagic, so you have the option of linking to that library

Execute C program at bootloader level via Assembler

I wrote a custom (VERY basic "Hello world!") bootloader in Assembler and I would like to execute a C program in that. Would the C program work, or fail due to a lost stdio.h file? And how could I bundle the C program along with the bootloader into a single .bin file to dd to a flash drive/CD?
I'm not sure what you mean by "lost stdio.h", but many C runtime functions, including those prototyped in stdio.h, are implemented using system calls. Without an OS running, those system calls won't work.
It is possible to write C code that runs without an OS, for example most common bootloaders have just a tiny amount of assembler and mostly C code. The trick is to avoid using runtime libraries. Alternatives to syscalls, for e.g. display, are BIOS calls and hardware-specific I/O.
To take just one example, in addition to dynamic allocation, fopen in read mode needs the following low-level operations:
Reading a block of data from storage
Reading the file system metadata (often, superblock and root directory)
Processing file system metadata to find out where the file content is stored
Creating a FILE object that contains enough information for fread and fgetc to find the data on disk
You don't have an OS to help with any of that, your C code will need to implement a driver (possibly calling the BIOS) for block read, and implement the behavior of the other steps.

Resources