Finding file type in Linux programmatically - c

I am trying to find the file type of a file like .pdf, .doc, .docx etc. but programmatically not using shell command. Actually i have to make an application which blocks access to files of a particular extension. I have already hooked sys_call_table in LKM and now i want that when an open/read system call is triggered then my LKM checks the file type.
I know that we have a current pointer which gives access to current process structure and we can use it to find the file name stored in dentry structure and also in Linux a file type is identified by a magic number stored in starting bytes of file. But i don't know that how to find file type and exactly where it is stored ?

Linux doesn't "store" the file type for its files (unlike Mac OS' resource fork, which I think is the most well-known platform to do this). Files are just named streams of bytes, they have no structure implied by the operating system.
Either you just tell programs which file to use (and then it Does What You Say), or programs use higher-level features to figure it out.
There are programs that re-invent this particular wheel (I'm responsible for one of those), but you can also use e.g. file(1). Of course that requires your program to parse and "understand" the textual output you'll get, which in a sense only moves the problem.
However, I don't think calling into file from kernel space is very wise, so it's probably best to re-create the test for whatever set of types you need, to keep it small.
In other words, I mean you should simply re-implement the required tests. This is quite complicated in general, so if you really need to do it for as a large a set of types as possible, it might not be a very good idea. :/

Actually i have to make an application which blocks access to files of a particular extension.
that's a flawed requirement. If you check by file extension, then you'll miss files that doesn't use the extension which is quite common in Linux since it does not use file extension.
The officially sanctioned way of detecting file type in Linux is by their magic number. The shell command file is basically just a wrapper for libmagic, so you have the option of linking to that library

Related

Compile binary data into C program and use them like a file

I have a C library which uses a set of binary data files (read only). One of these files, lets call it f1.dat, is used in 99% of applications which use the library, while the other 59 files f2.dat .. f60.dat are used only rarely.
I would like to compile the data of f1.dat directly into the library. The users of the library who never wish to use the data in files f2.dat .. f60.dat would not have to carry an extra data file around, the compiled library .dll or .so would work without extra resources for those users.
The most convenient solution would be if the memory area with the data could be accessed with the same function calls fseek, ftell, read as the data in a file. For the application it should make no difference whether it reads an external fle or this memory "file".
Is there a portable solution for this?

unistd_64.h in Ubuntu

unistd_64 as my understanding (with lots of limited) contains the system call number. When I search the file from terminal, it shows more than one results under different directories as below:
/usr/include/x86_64-linux-gnu/asm/unistd_64.h
/usr/src/linux-headers-3.5.0-23/arch/sh/include/asm/unistd_64.h
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/ asm/.unistd_64.h.cmd
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/asm/unistd_64.h
I don't understand the difference between these files and the use of each file. And the file number 3 has .cmd, what does it mean?
If you are writing an ordinary C program that needs to know system call numbers, you should not use any of those headers. Instead, you should use <sys/syscall.h>. Your C program does not need to know the full pathname of this header; #include <sys/syscall.h> is all that is necessary. However, if you want to read it, it will be found somewhere in /usr/include, probably either /usr/include/sys/syscall.h or /usr/include/x86_64-linux-gnu/sys/syscall.h.
Now, I will explain the files you found:
/usr/include/x86_64-linux-gnu/asm/unistd_64.h: This is a header file that may be used internally by sys/syscall.h. You can read it, but do not include it directly in your program. It probably defines a whole bunch of names that begin with __NR_. Those names should never be used in an ordinary, "userspace" program: always use the names beginning with SYS_ instead.
/usr/src/linux-headers-3.5.0-23/arch/sh/include/asm/unistd_64.h and /usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/asm/unistd_64.h: These are private kernel headers. They exist for the sake of people trying to build kernel modules that are developed separately from the kernel proper. It's possible that one of them is textually the same as /usr/include/x86_64-linux-gnu/asm/unistd_64 but that is not something you should rely on.
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/ asm/.unistd_64.h.cmd: This is not a header file at all, it is used by the Linux kernel's build system.
The first file, which resides in /usr/include (the system include directory) is the one you would include.
The others reside in /usr/src, which is a source code directory that should not be referenced.

pstatus_t no found in procfs.h (LINUX)

I am reading the /proc/PID/status file using my C program and I want to use the pstatus_t struct to directly read the values from the file into this struct. However, my compiler is showing that this file is not present in the procfs.h.
I have checked few examples on internet where they are using the same header file but in my case, it is not working.
When you say "reading /proc/PID/status", I'm assuming that you are running in userspace (as opposed to in the kernel). In this case, the pstatus_t structure is worthless to you. Most files under /proc, including status, are a text-formatted representation of the kernel data structures. There is no way to directly get the binary contents of a kernel pstatus_t structure.

Moving a file on Linux in C

Platform: Debian Wheezy 3.2.0-4-686-pae
Complier: GCC (Debian 4.7.2-5) 4.7.2 (Code::Blocks)
I want to move a file from one location to another. Nothing complex like moving to different drives or to different file systems. I know the "standard" way to do this would be simply copying the file and then removing the original. But I want some way of preserving the file's ownership, mode, last access/modification, etc. . I am assuming that I will have to copy the file and then edit the new file's ownership, mode, etc. afterwards but I have no idea how to do this.
The usual way to move a file in C is to use rename(2), which sometimes fail.
If you cannot use the rename(2) syscall (e.g. because source and target are on different filesystems), you have to query the size, permission and other metadata of the source file with stat(2); copy the data looping on read(2), write(2) (using a buffer of several kilobytes), open(2), close(2) and the metadata using chmod(2), chown(2), utime(2). You might also care about copying attributes using getxattr(2), setxattr(2), listxattr(2). You could also in some cases use sendfile(2), as commented by David C. Rankin.
And if the source and target are on different filesystems, there is no way to make the move atomic and avoid race conditions (So using rename(2) is preferable when possible, because it is atomic according to its man page). The source file can always be modified (by another process) during the move operations...
So a practical way to move files is to first try doing a rename(2), and if that fails with EXDEV (when oldpath and newpath are not on the same mounted filesystem), then you need to copy bytes and metadata. Several libraries provide functions doing that, e.g. Qt QFile::rename.
Read Advanced Linux Programming - and see syscalls(2) - for more (and also try to strace some mv command to understand what it is doing). That book is freely and legally downloadable (so you could find several copies on the Web).
The /bin/mv command (see mv(1)) is part of GNU coreutils which is free software. You could either study its source code, or use strace(1) to understand what that command does (in terms of syscalls(2)). In some open source Unix shells like sash or busybox, mv might be a shell builtin. See also path_resolution(7) and glob(7).
There are subtle corner cases (imagine another process or pthread doing some file operations on the same filesystem, directory, or files). Read some operating system textbook for more.
Using a mix of snprintf(3), system(3), mv(1) could be tricky if the file name contains weird characters such as tab or or newlines, or starts with an initial -. See errno(3).
If the original and new location for the file are on the same filesystem then a "move" is conceptually identical to a "rename."
#include <stdio.h>
int rename (const char *oldname, const char *newname)

C code to Sync Flash Drives

Just as a learning experience, I'm trying to code the following problems in C.
If two flash drives are inserted each having a folder (say,
Course_Notes), then they get synced. That is, data is copied from one
to the other and if there a file already exists, then the newer one is
retained.
I would do this in bash by:
#!/bin/bash
while $1
do
cp -ur /media/PD_1/Course_Notes /media/PD_2/Course_Notes
cp -r /media/PD_2/Course_Notes /media/PD_1/Course_Notes
done
How do I do this in C without too many system calls ?
In the real world, you'd probably use something like rsync for this.
You're probably looking for the stat() family of functions to get size and modification date, along with opendir()/readdir()/closedir() for directory listings. The standard way for copying a file itself is to open the source and destination, and write to the latter what you read from the former with the usual read and write functions.
Note also that the source code for GNU and BSD versions of the standard UNIX utilites, such as cp, is freely available. (likewise for rsync) You may want to read some of that source code to see what you may have missed in your own approach.
Since this is a programming Q&A site, you would use usual programming tools (instead of rewriting them yourself).
This sounds like a perfect application for a Distributed Version Control System. Some of the most popular choices today are Git and Mercurial.

Resources