Copy directory recursively in pure C on Linux/UNIX - c

Can someone guide me on a possible solution? I don't want to use /bin/cp or any other foreign apps. I want my program to be independent. Also I know that every system is quite specific, so I'm interested in UNIX/Linux compatibility.
How can I solve it? Just going down the source directory and creating a new directories in the target one and copying files in them, or there is a better solution?
BTW my goal is: copy all first level subdirs recursively into target dir if they are not present there

You really need some kind of recursive descent into the directory tree. Doing this, you can actually make this very portable (using opendir/readdir on Linux and FindFirstFile/FindNextFile on Windows). The problem that remains is the actual copying. You can use the C standard library for that with the following algorithm:
Open source file
Open target file
In a loop, fread a block of constant size from the source, then fwrite it to the target. Stop if the source file contains no more data
Hope this helps :)

Use the POSIX nftw(3) function to walk the tree you want to copy. You supply this function with a callback function that gets called on the path of each file/directory. Define a callback that copies the file/dir it gets called on into the destination tree. The fourth callback argument of type struct FTW * can be used to compute the relative path.

If you want to use only C, you could use dirent.h. Using this, you can recursively follow the directory structure. Then you could open the files in the binary mode, and write them to the desired location via write stream.

Related

unistd_64.h in Ubuntu

unistd_64 as my understanding (with lots of limited) contains the system call number. When I search the file from terminal, it shows more than one results under different directories as below:
/usr/include/x86_64-linux-gnu/asm/unistd_64.h
/usr/src/linux-headers-3.5.0-23/arch/sh/include/asm/unistd_64.h
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/ asm/.unistd_64.h.cmd
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/asm/unistd_64.h
I don't understand the difference between these files and the use of each file. And the file number 3 has .cmd, what does it mean?
If you are writing an ordinary C program that needs to know system call numbers, you should not use any of those headers. Instead, you should use <sys/syscall.h>. Your C program does not need to know the full pathname of this header; #include <sys/syscall.h> is all that is necessary. However, if you want to read it, it will be found somewhere in /usr/include, probably either /usr/include/sys/syscall.h or /usr/include/x86_64-linux-gnu/sys/syscall.h.
Now, I will explain the files you found:
/usr/include/x86_64-linux-gnu/asm/unistd_64.h: This is a header file that may be used internally by sys/syscall.h. You can read it, but do not include it directly in your program. It probably defines a whole bunch of names that begin with __NR_. Those names should never be used in an ordinary, "userspace" program: always use the names beginning with SYS_ instead.
/usr/src/linux-headers-3.5.0-23/arch/sh/include/asm/unistd_64.h and /usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/asm/unistd_64.h: These are private kernel headers. They exist for the sake of people trying to build kernel modules that are developed separately from the kernel proper. It's possible that one of them is textually the same as /usr/include/x86_64-linux-gnu/asm/unistd_64 but that is not something you should rely on.
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/ asm/.unistd_64.h.cmd: This is not a header file at all, it is used by the Linux kernel's build system.
The first file, which resides in /usr/include (the system include directory) is the one you would include.
The others reside in /usr/src, which is a source code directory that should not be referenced.

Finding file type in Linux programmatically

I am trying to find the file type of a file like .pdf, .doc, .docx etc. but programmatically not using shell command. Actually i have to make an application which blocks access to files of a particular extension. I have already hooked sys_call_table in LKM and now i want that when an open/read system call is triggered then my LKM checks the file type.
I know that we have a current pointer which gives access to current process structure and we can use it to find the file name stored in dentry structure and also in Linux a file type is identified by a magic number stored in starting bytes of file. But i don't know that how to find file type and exactly where it is stored ?
Linux doesn't "store" the file type for its files (unlike Mac OS' resource fork, which I think is the most well-known platform to do this). Files are just named streams of bytes, they have no structure implied by the operating system.
Either you just tell programs which file to use (and then it Does What You Say), or programs use higher-level features to figure it out.
There are programs that re-invent this particular wheel (I'm responsible for one of those), but you can also use e.g. file(1). Of course that requires your program to parse and "understand" the textual output you'll get, which in a sense only moves the problem.
However, I don't think calling into file from kernel space is very wise, so it's probably best to re-create the test for whatever set of types you need, to keep it small.
In other words, I mean you should simply re-implement the required tests. This is quite complicated in general, so if you really need to do it for as a large a set of types as possible, it might not be a very good idea. :/
Actually i have to make an application which blocks access to files of a particular extension.
that's a flawed requirement. If you check by file extension, then you'll miss files that doesn't use the extension which is quite common in Linux since it does not use file extension.
The officially sanctioned way of detecting file type in Linux is by their magic number. The shell command file is basically just a wrapper for libmagic, so you have the option of linking to that library

Traverse Directory Depth First

I need to traverse a directory depth first without using boost but I have not been able to find a good tutorial how to do this. I know how to list the files of the directory, but not sure how to about this one. This list the files of a directory:
Use the ftw or nftw functions if your system has them. Or, grab the fts_* functions from, e.g., the OpenBSD source tree and study those, or use them directly. This problem is harder than you might think, because you can run out of file descriptors when recursing through deep filesystem hierarchies.
Make sure you understand recursion.
I assume you have a function walk(dir_path) which can list all files (and directries) in the dir_path directory. You need to modify it, so it calls it self (recursively) for each directory you find. That's it.

safely reading directory contents

Is it safe to read directory entries via readdir() or scandir() while files are being created or deleted in this directory? Should I prefer one over the other?
EDIT: When I say "safe" I mean entries returned by these functions are valid and can be operated without crashing the program.
Thanks.
It depends by what you mean as "safe". They are safe in the sense that they should not crash your program. However, if you are creating/deleting files as you are reading/scanning that directory, the set of files you get back might not be up-to-date.
When reading/scanning a directory for directory entries, the file pointer (a directory is just a special type of file), moves forward. However, depending upon the file system, there may be nothing to prevent new files from being created in an empty directory entry slot behind your file pointer. Consequently, newly added directory entries may not be immediately detected by readdir()/scandir(). Similar reasoning applies for file deletion / directory entry removal.
Hope this helps.
What's your definition of safety? You won't crash the system, and readdir/scandir won't crash your program. Although they might give you data that is immediately out of date.
The usual semantics for reading a directory are that if you read the directory from beginning to end, you will see all of the files that didn't change during that time exactly once, and you will see files that were created or deleted during that time at most once.
On UNIX-like systems readdir() and scandir() are library functions implemented on top of the same underlying system call (getdents() in Linux, getdirentries() in BSD). So there shouldn't be much difference in their behavior in this regard. I think readdir() is a bit more standard, and therefore will be more portable.

Suppress warning: the use of `mktemp' is dangerous

How can I suppress following warning from gcc linker:
warning: the use of 'mktemp' is dangerous, better use 'mkstemp'
I do know that it's better to use mkstemp() but for some reason I have to use mktemp() function.
I guess you need the path because you pass it to a library that only accepts path names as argument and not file descriptors or FILE pointers. If so you can create a temp dir with mkdtemp and place your file there, the actual name is then unimportant because the path is already unique because of the directory.
If you have to use mktemp then there is not anything you can do to suppress that warning short of removing the section that uses mktemp from libc.so.6.
Why do you have to use mktemp?
Two things:
mktemp is not a standard function
the warning is a special one implemented in the linker as .gnu.warning.mktemp section
Use a native OS API if you really need to write to the disk. Or mkstemp() as suggested.
If you are statically linking the runtime, then the other option is to write your own version of mktemp in an object file. The linker should prefer your version over the runtime version.
Edit: Thanks to Jason Coco for pointing out a major misunderstanding that I had in mktemp and its relatives. This one is a little easier to solve now. Since the linker will prefer a version in an object file, you just need to write mktemp in terms of mkstemp.
The only difficulties are cleaning up the file descriptors that mkstemp will return to you and making everything thread safe. You could use a static array of descriptors and an atexit-registered function for cleanup if you can put a cap on how many temporary files you need. If not, just use a linked list instead.
Use mkstemp:
int fd = mkstemp(template);
After this call, template will be replaced with the actual file name. You will have the file descriptor and the file's path.

Resources