Recursively copying a directory using native C code - c

I was looking to implement the behavior of Linux command cp -Rf <src_dir>/* <dst_dir> that copies everything inside the 'src_dir' into 'dst_dir' recursively
I looked online for help and got a few solutions but they did either of the following that I do not want to do:
using rename(src_dir, dst_dir), which essentially 'moves' the contents and not copies.
I need to keep the contents of the 'src_dir' intact.
Opening to read each file int the 'src_dir'.
I would like to do this w/o opening the files and reading the content.
Can the above be achieved with C without using system("cp -Rf <src_dir>/* <dst_dir>")?
(Edited the original question after realising from a few initial comments that the original lang was confusing)

I was looking to implement the behavior of linux system commands
1) cp that copies everything inside the 'src_dir' into 'dst_dir' recursively,
Can the above be achieved using C (and no linux system calls)?
Of course not; remember that a system call is the main way an application can interact with the Linux kernel: look on syscalls(2) for an exhaustive list of system calls (you are likely to need stat(2)...). Do not confuse system calls with calls to the C library system(3) function (which uses fork(2), execve(2), waitpid(2) etc....)
However, you might be interested in nftw(3) (which of course is implemented above several system calls). Look also into opendir(3) & readdir(3)
You cannot change anything on the file systems without going thru system calls.
addenda
(the question has changed to)
Can the above be achieved with C without using system("cp -Rf /* ")?
Yes; you could use nftw(3) first to scan the file tree (or play appropriately with readdir(3) etc...), then detect the directories to be made (using mkdir(2)...); you'll make appropriately the directories and you'll explicitly copy file contents (e.g. using stdio(3) functions or open(2), read(2), write(2), close(2) system calls....). Notice that open(2) (called by fopen(3)...) is required to read or write any content from a file (even using mmap(2)...) Of course, to copy a file, you need to open it, open its fresh copy, read it and write to the copy in a loop, and close both source and destination files. But you should close (both source & destination files) once the copy is done. BTW, a given process could have several hundreds (or thousands of) open file descriptors (see setrlimit(2)...) and opendir(3) also consume one.
(your number of opened file descriptors (most of them being directories, with opendir(3)) is bounded by the file tree depth so is practically likely to be small, less than a hundred; and using clever caching with nftw(3) you could even avoid that)
BTW, cp is part of GNU coreutils which is free software, so you can study its source code. And you might also use strace(1) to understand what cp is doing.
Read Advanced Linux Programming and also Operating System : Three Easy Pieces (both are freely downloadable, as chapters in PDF)
There is no way to copy the content of one file with a single system call. You need a loop. But look also into the Linux specific copy_file_range(2) which I don't recommend, better loop on POSIX read(2) and write(2)

Related

Is it possible to fake a file stream, such as stdin, in C?

I am working on an embedded system with no filesystem and I need to execute programs that take input data from files specified via command like arguments or directly from stdin.
I know it is possible to bake-in the file data with the binary using the method from this answer: C/C++ with GCC: Statically add resource files to executable/library but currently I would need to rewrite all the programs to access the data in a new way.
Is it possible to bake-in a text file, for example, and access it using a fake file pointer to stdin when running the program?
If your system is an OS-less bare-metal system, then your C library will have "retargetting" stubs or hooks that you need to implement to hook the library into the platform. This will typically include low-level I/O functions such as open(), read(), write(), seek() etc. You can implement these as you wish to implement the basic stdin, stdout, stderr streams (in POSIX and most other implementations they will have fixed file descriptors 0, 1 and 2 respectively, and do not need to be explicitly opened), file I/O and in this case for managing an arbitrary memory block.
open() for example will be passed a file or device name (the string may be interpreted any way you wish), and will return a file descriptor. You might perhaps recognise "cfgdata:" as a device name to access your "memory file", and you would return a unique descriptor that is then passed into read(). You use the descriptor to reference data for managing the stream; probably little more that an index that is incremented by the number if characters read. The same index may be set directly by the seek() implementation.
Once you have implemented these functions, the higher level stdio functions or even C++ iostreams will work normally for the devices or filesystems you have supported in your low level implementation.
As commented, you could use the POSIX fmemopen function. You'll need a libc providing it, e.g. musl-libc or possibly glibc. BTW for benchmarking purposes you might install some tiny Linux-like OS on your hardware, e.g. uclinux

How to write a tail -f like C program

I want to implement a C program in Linux (Ubuntu distro) that mimics tail -f. Note that I do not want to actually call tail -f from my C code, rather implement its behaviour. At the moment I can think of two ways to implement it.
When the program is called, I seek to the end of file. Afterwards, I would read to the end of file periodically and print whatever I read if it is not empty.
The second method which can potentially be more efficient is to again, seek to the end of file. But, this time I "somehow" listen for changes to that file and read to the end of file, only if I it is changed.
With that being said, my question is how to implement the second approach and if someone can share if it is worth the effort. Also, are these the only two options?
NOTE: Thanks for the comments, the question is changed based on them.
There is no standardized mechanism for monitoring changes to a file, so you'll need to implement a "polling" solution anyway (that is, when you hit the end of file, wait a short amount of time and try again.)
On Linux, you can use the inotify family of system calls, but be aware that it won't always work. It doesn't work for special files or remote filesystems, for example, and it may not work for some local filesystems. It is complicated in the case of symlinks. And so on. There is a Windows equivalent, but I believe it suffers from some of the same issues.
So even if you use a notification system, you'll need the polling solution as a backup, and since OS notifications are not guaranteed to be reliable (that is, if the system is under load, notifications might be dropped), you'll need to poll on timeout even if you are using a notification system.
You might want to take a look at the implementation of the GNU tail utility (http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c) to see how the special cases are handled.
You can implement the requirement by following steps:
1) fopen with 'a+' mode;
2) select the file discriptor opened (need do convert from FILE * to file descriptor) and do the read.

What is the call for the "lp filename" command in linux in a c program?

I want to use the above command in a c program in linux.
I have searched so far that there are system calls and exec calls that one may make in a code. Is there any other way using exec or system commands?
Using the system command isn't an ideal command for a multi-threaded server ,what do you suggest?
First make sure you have lp installed in this path. (Using which lp in the terminal).
You may want to understand the lp command. It's a classic unix command to send data to the "line printer", but it works with e.g. .pdf files too nowadays, depending on your printer system. However, it isn't necessarily installed. Sometimes, lpr may work better, too.
See also: http://en.wikipedia.org/wiki/Lp_%28Unix%29
The second part is about executing unix commands. system is the easiest (also the easiest to introduce a security issue into your program!), using fork and execve is one of a number of alternatives (have a look at man execve).
Yes, this code is ok. It will print the file named filename provided that the lp is found at /usr/bin and the filename file exists. You can add checks for that if you want your program to report if something went wrong, other than that it will do exactly what you expect.
Doing system("lp filename"); would work if you don't mind your program blocking after that system() call and until lp finishes.
You could also use popen(3) (instead of system(3)). But you always need to fork a process (both system and popen are calling fork(2)). BTW, if you have a CUPS server you might use some HTTP client protocol library like libcurl but that is probably inconvenient. Better popen or system an lp (or lpr) command.
BTW, printing is a relatively slow and complex operation, so the overhead of forking a process is negligible (I believe you could do that in a server; after all people usually don't print millions of pages). Some libraries might give you some API (e.g. QPrinter in Qt).
Notice that the lp (or lpr) command is not actually doing the printing, it is simply interacting with some print daemon (cupsd, lpd ...) and its spooling system. See e.g. CUPS. So running the lp or lpr command is reasonably fast (much faster than the printing itself), generally a few milliseconds (certainly compatible with a multi-threaded or server application).
Quite often, the command passed to popen or system is constructed (e.g. with snprintf(3) etc...), e.g.
char cmdbuf[128];
snprintf (cmdbuf, sizeof(cmdbuf), "lp %s", filename);
but beware of code injection (think about filename containing foo; rm -rf $HOME) and of buffer overflow
Of course, notice that library functions like system, popen, fopen are generally built above existing syscalls(2). Read Advanced Linux Programming

Using `read` system call on a directory

I was looking at an example in K&R 2 (8.6 Example - Listing Directories). It is a stripped down version of Linux command ls or Windows' dir. The example shows an implementation of functions like opendir, readdir. I've tried and typed the code word-by-word but it still doesn't work. All it does is that it prints the a dot (for the current directory) and exits.
One interesting thing I found in the code (in the implementation of readdir) was that it was calling the system calls like open and read on directory. Something like -
int fd, n;
char buf[1000], *bufp;
bufp = buf;
fd = open("dirname", O_RDONLY, 0);
n = read(fd, bufp, 1000);
write(fd, bufp, n);
When I run this code I get no output even when the folder name "dirname" has some files in it.
Also, the book says, that the implementation is for Version 7 and System V UNIX systems. Is that the reason why it is not working on Linux?
Here is the code- http://ideone.com/tw8ouX.
So does Linux not allow read system calls on directories? Or something else is causing this?
In Version 7 UNIX, there was only one unix filesystem, and its directories had a simple on-disk format: array of struct direct. Reading it and interpreting the result was trivial. A syscall would have been redundant.
In modern times there are many kinds of filesystems that can be mounted by Linux and other unix-like systems (ext4, ZFS, NTFS!), some of which have complex directory formats. You can't do anything sensible with the raw bytes of an arbitrary directory. So the kernel has taken on the responsibility of providing a generic interface to directories as abstract objects. readdir is the central piece of that interface.
Some modern unices still allow read() on a directory, because it's part of their history. Linux history began in the 90's, when it was already obvious that read() on a directory was never going to be useful, so Linux has never allowed it.
Linux does provide a readdir syscall, but it's not used very much anymore, because something better has come along: getdents. readdir only returns one directory entry at a time, so if you use the readdir syscall in a loop to get a list of files in a directory, you enter the kernel on every loop iteration. getdents returns multiple entries into a buffer.
readdir is, however, the standard interface, so glibc provides a readdir function that calls the getdents syscall instead of the readdir syscall. In an ordinary program you'll see readdir in the source code, but getdents in the strace. The C library is helping performance by buffering, just like it does in stdio for regular files when you call getchar() and it does a read() of a few kilobytes at a time instead of a bunch of single-byte read()s.
You'll never use the original unbuffered readdir syscall on a modern Linux system unless you run an executable that was compiled a long time ago, or go out of your way to bypass the C library.
In fact Linux dosn't allow read for directories. See man page and search for errno EISDIR. You will find
The read() and pread() functions shall fail if ...
The fildes argument refers to a directory and the implementation does not allow the directory to be read using read() or pread(). The readdir() function should be used instead.
. Other UNIXes allow it nevertheless.

How to know if a file is being copied?

I am currently trying to check wether the copy of a file from a directory to another is done.
I would like to know if the target file is still being copied.
So I would like to get the number of file descriptors openned on this file.
I use C langage and don't really find a way to resolve that problem.
If you have control of it, I would recommend using the copy-move idiom on the program doing the copying:
cp file1 otherdir/.file1.tmp
mv otherdir/.file1.tmp otherdir/file1
The mv just changes some filesystem entries and is atomic and very fast compared to the copy.
If you're able to open the file for writing, there's a good chance that the OS has finished the copy and has released its lock on it. Different operating systems may behave differently for this, however.
Another approach is to open both the source and destination files for reading and compare their sizes. If they're of identical size, the copy has very likely finished. You can use fseek() and ftell() to determine the size of a file in C:
fseek(fp, 0L, SEEK_END);
sz = ftell(fp);
In linux, try the lsof command, which lists all of the open files on your system.
edit 1: The only C language feature that comes to mind is the fstat function. You might be able to use that with the struct's st_mtime (last modification time) field - once that value stops changing (for, say, a period of 10 seconds), then you could assume that file copy operation has stopped.
edit 2: also, on linux, you could traverse /proc/[pid]/fd to see which files are open. The files in there are symlinks, but C's readlink() function could tell you its path, so you could see whether it is still open. Using getpid(), you would know the process ID of your program (if you are doing a file copy from within your program) to know where to look in /proc.
I think your basic mistake is trying to synchronize a C program with a shell tool/external program that's not intended for synchronization. If you have some degree of control over the program/script doing the copying, you should modify it to perform advisory locking of some sort (preferably fcntl-based) on the target file. Then your other program can simply block on acquiring the lock.
If you don't have any control over the program performing the copy, the only solutions depend on non-portable hacks like lsof or Linux inotify API.
(This answer makes the big, big assumption that this will be running on Linux.)
The C source code of lsof, a tool that tells which programs currently have an open file descriptor to a specific file, is freely available. However, just to warn you, I couldn't make any sense out of it. There are references to reading kernel memory, so to me it's either voodoo or black magic.
That said, nothing prevents you from running lsof through your own program. Running third-party programs from your own program is normally something you try to avoid for several reasons, like security (if a rogue user changes lsof for a malicious program, it will run with your program's privileges, with potentially catastrophic consequences) but inspecting the lsof source code, I came to the conclusion that there's no public API to determine which program has which file open. If you're not afraid of people changing programs in /usr/sbin, you might consider this.
int isOpen(const char* file)
{
char* command;
// BE AWARE THAT THIS WILL NOT WORK IF THE FILE NAME CONTAINS A DOUBLE QUOTE
// OR IF IT CAN SOMEHOW BE ALTERED THROUGH SHELL EXPANSION
// you should either try to fix it yourself, or use a function of the `exec`
// family that won't trigger shell expansion.
// It would be an EXTREMELY BAD idea to call `lsof` without an absolute path
// since it could result in another program being run. If this is not where
// `lsof` resides on your system, change it to the appropriate absolute path.
asprintf(&command, "/usr/sbin/lsof \"%s\"", file);
int result = system(command);
free(command);
return result;
}
If you also need to know which program has your file open (presumably cp?), you can use popen to read the output of lsof in a similar fashion. popen descriptors behave like fopen descriptors, so all you need to do is fread them and see if you can find your program's name. On my machine, lsof output looks like this:
$ lsof document.pdf
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
SomeApp 873 felix txt REG 14,3 303260 5165763 document.pdf
As poundifdef mentioned, the fstat() function can give you the current modification time. But fstat also gives you the size of the file.
Back in the dim dark ages of C when I was monitoring files being copied by various programs I had no control over I always:
Waited until the target file size was >= the source size, and
Waited until the target modification time was at least N seconds older than the current time. N being a number such a 5, and set larger if experience showed that was necessary. Yes 5 seconds seems extreme, but it is safe.
If you don't know what the target file is then the only real choice you have is #2, but user a larger N to allow for the worse case network and local CPU delays, with a healthy safety factor.
using boost libs will solve the issue
boost::filesystem::fstream fileStream(filePath, std::ios_base::in | std::ios_base::binary);
if(fileStream.is_open())
//not getting copied
else
//Wait, the file is getting copied

Resources