Moving a file on Linux in C - c

Platform: Debian Wheezy 3.2.0-4-686-pae
Complier: GCC (Debian 4.7.2-5) 4.7.2 (Code::Blocks)
I want to move a file from one location to another. Nothing complex like moving to different drives or to different file systems. I know the "standard" way to do this would be simply copying the file and then removing the original. But I want some way of preserving the file's ownership, mode, last access/modification, etc. . I am assuming that I will have to copy the file and then edit the new file's ownership, mode, etc. afterwards but I have no idea how to do this.

The usual way to move a file in C is to use rename(2), which sometimes fail.
If you cannot use the rename(2) syscall (e.g. because source and target are on different filesystems), you have to query the size, permission and other metadata of the source file with stat(2); copy the data looping on read(2), write(2) (using a buffer of several kilobytes), open(2), close(2) and the metadata using chmod(2), chown(2), utime(2). You might also care about copying attributes using getxattr(2), setxattr(2), listxattr(2). You could also in some cases use sendfile(2), as commented by David C. Rankin.
And if the source and target are on different filesystems, there is no way to make the move atomic and avoid race conditions (So using rename(2) is preferable when possible, because it is atomic according to its man page). The source file can always be modified (by another process) during the move operations...
So a practical way to move files is to first try doing a rename(2), and if that fails with EXDEV (when oldpath and newpath are not on the same mounted filesystem), then you need to copy bytes and metadata. Several libraries provide functions doing that, e.g. Qt QFile::rename.
Read Advanced Linux Programming - and see syscalls(2) - for more (and also try to strace some mv command to understand what it is doing). That book is freely and legally downloadable (so you could find several copies on the Web).
The /bin/mv command (see mv(1)) is part of GNU coreutils which is free software. You could either study its source code, or use strace(1) to understand what that command does (in terms of syscalls(2)). In some open source Unix shells like sash or busybox, mv might be a shell builtin. See also path_resolution(7) and glob(7).
There are subtle corner cases (imagine another process or pthread doing some file operations on the same filesystem, directory, or files). Read some operating system textbook for more.
Using a mix of snprintf(3), system(3), mv(1) could be tricky if the file name contains weird characters such as tab or or newlines, or starts with an initial -. See errno(3).

If the original and new location for the file are on the same filesystem then a "move" is conceptually identical to a "rename."
#include <stdio.h>
int rename (const char *oldname, const char *newname)

Related

What is a 'whiteout' (S_IFWHT) in Unix?

One of the possible file types that can be obtained using stat(2) is S_IFWHT, also called a whiteout. What is it?
The official Linux kernel contains no such thing. On UNIX systems where it does exist, and possibly in some unofficial patches for Linux, it's a type of file that stops further lookup for a file but reports that it doesn't exist. It's useful with union and overlay filesystems, to be able to remove files that exist in the base image. The Linux kernel's overlayfs does have whiteouts, but they're S_IFCHR files with major and minor number 0, not S_IFWHT.

unistd_64.h in Ubuntu

unistd_64 as my understanding (with lots of limited) contains the system call number. When I search the file from terminal, it shows more than one results under different directories as below:
/usr/include/x86_64-linux-gnu/asm/unistd_64.h
/usr/src/linux-headers-3.5.0-23/arch/sh/include/asm/unistd_64.h
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/ asm/.unistd_64.h.cmd
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/asm/unistd_64.h
I don't understand the difference between these files and the use of each file. And the file number 3 has .cmd, what does it mean?
If you are writing an ordinary C program that needs to know system call numbers, you should not use any of those headers. Instead, you should use <sys/syscall.h>. Your C program does not need to know the full pathname of this header; #include <sys/syscall.h> is all that is necessary. However, if you want to read it, it will be found somewhere in /usr/include, probably either /usr/include/sys/syscall.h or /usr/include/x86_64-linux-gnu/sys/syscall.h.
Now, I will explain the files you found:
/usr/include/x86_64-linux-gnu/asm/unistd_64.h: This is a header file that may be used internally by sys/syscall.h. You can read it, but do not include it directly in your program. It probably defines a whole bunch of names that begin with __NR_. Those names should never be used in an ordinary, "userspace" program: always use the names beginning with SYS_ instead.
/usr/src/linux-headers-3.5.0-23/arch/sh/include/asm/unistd_64.h and /usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/asm/unistd_64.h: These are private kernel headers. They exist for the sake of people trying to build kernel modules that are developed separately from the kernel proper. It's possible that one of them is textually the same as /usr/include/x86_64-linux-gnu/asm/unistd_64 but that is not something you should rely on.
/usr/src/linux-headers-3.5.0-23-generic/arch/x86/include/generated/ asm/.unistd_64.h.cmd: This is not a header file at all, it is used by the Linux kernel's build system.
The first file, which resides in /usr/include (the system include directory) is the one you would include.
The others reside in /usr/src, which is a source code directory that should not be referenced.

Linux- C program

I have recently(yesterday) started trying to learn linux and to program in this os. Now, one interesting and probably easy problem I came across while surfing the net was something like this:
Consider a C program that takes a directory as an argument in the command line and calculates the sum of all the files' dimensions that are in the directory's tree.
Now, due to the fact that I've been doing a lot of reading and researching in a short matter of time, all my knowledge is piled up in my brian creating a cloud of confusion. If anyone could help me with the code, I'd be really thankful.
what you are asking is a basic task. It can be done in linux but can also be done in microsoft windows with minor code tweaks if you are writing a program in C or C++. you would be writing code, which is sort of at a lower level compared to other ways of doing it, to accomplish what you want.
However you don't need to write a program C, which then requires you to compile it into an executable. Because what you are asking is a basic task, you might be able to do it with a bash shell script which would be linux specific. And if you wanted to do this in Windows then you would write a .bat file which is either the DOS scripting language, or Windows Powershell. I am not that familiar with Windows, i only mention it to help give you a general understanding for "all the knowledge piled up in your brain creating a cloud of confusion".
There is the windirstat program which runs under Microsoft Windows, can get it free from sourceforge and I think it does mostly what you are asking. I am not sure if you can get source code for it.
For linux there is kdirstat and that you can get the source code for from
http://kdirstat.cvs.sourceforge.net/viewvc/kdirstat/
you can download it as GNU tarball.
Look at how that program is written, which is C++ as you'll see a bunch of .cpp files. That would be a good template to work off of, and you can see what libraries they are using to accomplish file system functions. There are 21 .cpp files, look at the file kdirstatmain.cpp first.
For C/C++ code the start of execution is with the function int main(int argc, char *argv[]).
Regarding accomplishing this task with a bash shell script in linux, the best i can tell you is web search on bash shell scripting for linux.
And in linux to calculate the sum of all the files' dimensions that are in the directory's tree we can quickly do that at the linux prompt with the du -sh . command. In linux at the prompt do man du so read about the disk usage command. And then consider looking for the source code for du to use it as a template, and work off how they implemented du to learn and then modify their ways to meet your needs.
linux du command source code
Use opendir(3) to "open" the directory. Since you are interested in learning how to program in GNU/Linux, start by typing man opendir in the terminal to read how this function works. The (3) in opendir(3) means that the help for this function can be found in the section 3 of the manpages. Notice, at the top of the page, that the manpage tells you which #includes you'll need.
If everything goes right, opendir(3) will return a DIR* object. To know which files or subdirectories it contains, you use this object with readdir(3). This should return a pointer of type struct dirent*. You can heck the manual pages for details on the fields of this structure, but the most important for you will probably be d_type and d_name. A second call to this function will return the next entry. When it returns NULL, that either means you have read all files or an error occurred. To know which happened, you should check errno.
Here's a short example that list all entries in /tmp:
#include <stdio.h>
#include <dirent.h>
#include <sys/types.h>
int main(void)
{
DIR *dir;
struct dirent *entry;
dir = opendir("/tmp");
/* should check if dir != NULL */
while ((entry = readdir(dir)) != NULL) {
printf("Found %s\n", entry->d_name);
}
/* You may want to check errno here to see if readdir returned
* NULL because all files were read or because of some error;
* but this is beyond the purposes of my example.
*/
closedir(dir);
return 0;
}
Now you have to process each entry. If it is a directory, you have to descend into it an read its contents. A recursive function will probably help you here. If it is a file, then you have at least two options:
Open it with fopen(3), then use fseek(3) to seek the end of file. Use the return value of fseek(3) to calculate the size of the file in bytes;
Use stat(2) to get a structure with information on the file. Do not confuse it with stat(1). If you simply type man stat, you'll get information about the latter. To force man to read from section 2, type man 2 stat in the command line.
The first approach is certainly simpler. The second will require you to do a bit of reading on how stat(2) works. My advice: you should do it. Not only because it's more in the lines of Linux, but also because it gives you information that fseek(3) doesn't give. For instance, you can use stat(2) to see not only how many bytes the file contains, but how many bytes it occupies in the disk (like du does).
While reading the directory, you may stumble on other types of entries other than files and directories. stat(2) will probably help you figure the sizes of them as well. But you may want to simply ignore them for now.

Finding file type in Linux programmatically

I am trying to find the file type of a file like .pdf, .doc, .docx etc. but programmatically not using shell command. Actually i have to make an application which blocks access to files of a particular extension. I have already hooked sys_call_table in LKM and now i want that when an open/read system call is triggered then my LKM checks the file type.
I know that we have a current pointer which gives access to current process structure and we can use it to find the file name stored in dentry structure and also in Linux a file type is identified by a magic number stored in starting bytes of file. But i don't know that how to find file type and exactly where it is stored ?
Linux doesn't "store" the file type for its files (unlike Mac OS' resource fork, which I think is the most well-known platform to do this). Files are just named streams of bytes, they have no structure implied by the operating system.
Either you just tell programs which file to use (and then it Does What You Say), or programs use higher-level features to figure it out.
There are programs that re-invent this particular wheel (I'm responsible for one of those), but you can also use e.g. file(1). Of course that requires your program to parse and "understand" the textual output you'll get, which in a sense only moves the problem.
However, I don't think calling into file from kernel space is very wise, so it's probably best to re-create the test for whatever set of types you need, to keep it small.
In other words, I mean you should simply re-implement the required tests. This is quite complicated in general, so if you really need to do it for as a large a set of types as possible, it might not be a very good idea. :/
Actually i have to make an application which blocks access to files of a particular extension.
that's a flawed requirement. If you check by file extension, then you'll miss files that doesn't use the extension which is quite common in Linux since it does not use file extension.
The officially sanctioned way of detecting file type in Linux is by their magic number. The shell command file is basically just a wrapper for libmagic, so you have the option of linking to that library

C code to Sync Flash Drives

Just as a learning experience, I'm trying to code the following problems in C.
If two flash drives are inserted each having a folder (say,
Course_Notes), then they get synced. That is, data is copied from one
to the other and if there a file already exists, then the newer one is
retained.
I would do this in bash by:
#!/bin/bash
while $1
do
cp -ur /media/PD_1/Course_Notes /media/PD_2/Course_Notes
cp -r /media/PD_2/Course_Notes /media/PD_1/Course_Notes
done
How do I do this in C without too many system calls ?
In the real world, you'd probably use something like rsync for this.
You're probably looking for the stat() family of functions to get size and modification date, along with opendir()/readdir()/closedir() for directory listings. The standard way for copying a file itself is to open the source and destination, and write to the latter what you read from the former with the usual read and write functions.
Note also that the source code for GNU and BSD versions of the standard UNIX utilites, such as cp, is freely available. (likewise for rsync) You may want to read some of that source code to see what you may have missed in your own approach.
Since this is a programming Q&A site, you would use usual programming tools (instead of rewriting them yourself).
This sounds like a perfect application for a Distributed Version Control System. Some of the most popular choices today are Git and Mercurial.

Resources