I am writing a C program using some external binaries to achieve a planned goal. I need to run one command which gives me an output, which in turn I need to process, then feed into another program as input. I am using popen, but wonder if that is the same as using a KornShell (ksh) temporary file instead.
For example:
touch myfile && chmod 700
cat myfile > /tmp/tempfile
process_file < /tmp/tempfile && rm /tmp/tempfile
Since that creates a temporary file which can be readable by root, would it be the same if one used popen in C, knowing that pipes are also files? Or is it safe to assume that the Operating System (OS) will not allow any other process to read your pipe?
You say "that creates a temporary file which can be readable by root", which implies that you are attempting to transfer the data in a way in which the root user cannot read it. That's impossible; in general, the root user has total control of the system, and can thus read any data that is on the system, whether it's in a temporary file or not. Even within a single process, the root user can read the memory of that process.
If you use popen(), there will not be an entry for the file on a filesystem; it creates a pipe, which acts like a file, but doesn't actually write that data to disk, instead it just passes it between two programs.
There will be a file descriptor for it; depending on the system, it may be easier or harder to intercept that data, but it will always be possible to do so. For instance, on Linux, you can just look in /proc/<pid>/fd/ to find all of the open file descriptors and manipulate them (read from or write to them).
Related
Let me explain clearly.
The following is my requirement:
Let's say there is a command which has an option specified as '-f' that takes a filename as argument.
Now I have 5 files and I want to create a new file merging those 5 files and give the new filename as argument for the above command.
But there is a difference between
reading a single file and
merging all files & reading the merged file.
There is more IO (read from 5 files + write to the merged file + any IO our command does with the given file) generated in the second case than IO (any IO our command does with the given file) generated in the first case.
Can we reduce this unwanted IO?
In the end, I really don't want the merged file at all. I only create this merged file just to let the command read the merged files content.
And to say, I also don't want this implementation. The file sizes are not so big and it is okay to have that extra negligible IO. But, I am just curious to know if this can be done.
So in order to implement this, I have following understanding/questions:
Generally what all the commands (that takes the filename argument) does is it reads the file.
In our case, the filename(filepath) is not ready, it's just an virtual/imaginary filename that exists (as the mergation of all files).
So, can we create such virtual filename?
What is a filename? It's an indirect inode entry for a storage location.
In our case, the individual files have different inode entries and all inode entries have different storage locations. And our virtual/imaginary file has in fact no inode and even if we could create an imaginary inode, that can only point to a storage in memory (as there is no reference to the storage location of another file from a storage location of one file in disk)
But, let's say using advanced programming, we are able to create an imaginary filepath with imaginary inode, that points to a storage in memory.
Now, when we give that imaginary filename as argument and when the command tries to open that imaginary file, it finds that it's inode entry is referring to a storage in memory. But the actual content is there in disk and not in the memory. So, the data is not loaded into memory yet, unless we read it explicitly. Hence, again we would need to read the data first.
Simply saying, as there is no continuity or references at storage in disk to the next file data, the merged data needs to be loaded to memory first.
So, with my deduction, it seems we would at least need to put the data in memory. However, as the command itself would need the file to be read (if not the whole file, at least a part of it until the commands's operation is done - let it be parsing or whatever). So, using this method, we could save some significant IO, if it's really a big file.
So, how can we create that virtual file?
My first answer is to write the merged file to tmpfs and refer to that file. But is it the only option or can we actually point to a storage location in memory, other than tmpfs? tmpfs is not option because, my script can be run from any server and we need to have a solution that work from all servers. If I mention to create merged file at /dev/shm in my script, it may fail in the server where it doesn't have /dev/shm. So I should be able to load to memory directly. But I think normal user will not have access to memory and so, it seems can not be done without shm.
Please let me know your comments and also kindly correct me if my understanding anywhere is wrong. Even if it is complicated for my level, kindly post your answer. At least, I might understand it after few months.
Create a fifo (named pipe) and provide its name as an argument to your program. The process that combines the five input files writes to this fifo
mkfifo wtf
cat file1 file2 file3 file4 file5 > wtf # this will block...
[from another terminal] cp wtf omg
Here I used cp as your program, and cat as the program combining the five files. You will see that omg will contain the output of your program (here: cp) and that the first terminal will unblock after the program is done.
Your program (here:cp) is not even aware that its 1st argument wtf refers to a fifo; it just opens it and reads from it like it would do with an ordinary file. (this will fail if the program attempts to seek in the file; seek() is not implemented for pipes and fifos)
What I want:
void printFname(FILE * f)
{
char buf[255];
MagicFunction(f,buf);
printf("File name: %s",buf);
}
So, all I need is "MagicFunction", but unfortunatelly I haven't found such ...
Is there any way to implement using an OS library? (windows.h , cocoa.h, posix.h etc.)
There is no such function. There may be no filename, or more than one filename that correspond with the FILE *. On Unix, a program can continue to have a reference to a file after it has been renamed or deleted, which could mean that you have a FILE * with no name. Or more hard links may be made to the file, which means a file can have multiple names; which one would you choose? To further confuse things, a file can be temporarily hidden, by mounting a filesystem over a directory containing that file. The file will still be on disk, at its original pathname, but the file will be inaccessible at that path because the mount is obscuring it.
It's also possible that the FILE * never corresponded to a file on the filesystem at all; while they usually do, you can create one from any file descriptor using fdopen(), and that file descriptor may be a pipe, socket, or other file-like object that has never had a path on the disk. In some versions of the C library, you can open a string stream (for instance, fmemopen() in glibc), so the FILE * actually just corresponds to a memory buffer.
If you care about the name, it's best to just keep track of what it was named when you opened the file.
There are some hacky ways to approximate getting the filename; if you're just using this for debugging or informational purposes, then they may be sufficient. Most of these will require operating on the file descriptor rather than the FILE *, as the file descriptor is the lower level way of referring to a file. To get the file descriptor, run fileno() on the FILE *, and remember to check for errors in case there is no file descriptor associated with that FILE *.
On Linux, you can do readlink() on "/proc/self/fd/fileno" where fileno is the file descriptor. That will show you what filename the file had when the file was opened, or a string indicating what other kind of file descriptor it is, like a socket or inotify handle. FreeBSD and NetBSD have Linux emulation layers, which include emulation of Linux-style procfs; you may be able to do this on those if you mount a Linux-compatible procfs, though I don't have them available for testing.
On Mac OS X, you don't have /proc/self/fd. If you don't care about finding the original filename, but some other filename that refers to the file would work (such that you could pass it to another program), you can construct one: /.vol/deviceid/inode. For example, /.vol/234881030/281363. To get those values, run fstat() on the file descriptor, and use st_dev and st_ino on the resulting struct stat.
On Windows, files and the filesystem work quite differently than Unix. Apparently it's possible to map a file back to its name on Windows. As of Windows Vista, you can simply call GetFinalPathNameByHandle(). This takes a HANDLE; to get the HANDLE from the file descriptor, call _get_osfhandle(). Prior to Windows Vista, you need to do a little more work, as described in this article. Note that on Windows fileno() is named _fileno(), though the former may work with a warning.
Going even further into hacky territory, there are a few more techniques that you could use. You could shell out to lsof, or you could extract the code it uses to resolve pathnames. lsof actually looks directly in kernel memory, extracting information from the kernel's name cache. This has several limitations, outlined in the lsof FAQ. And of course, you need root or equivalent privileges to do this, either directly or with an suid/sgid binary.
And finally, for a portable but slow solution for finding one or more filenames matching an open file, you could find the device and inode number using fstat() on the file descriptor, and then recursively traverse the filesystem stat()ing every file, until you find a file with matching device and inode number. Remember the caveats I mention above; you may find no matching files, more than one matching file, and even if you don't find any matching files, the file might still be there, but hidden by a mount point. And of course, there may be race conditions; something may rename the file in such a way that you never see it while traversing the hierarchy.
There is no such standard function.
Do you fopen() yourself? If then, maintain FILE * to filename hash table yourself.
Otherwise, it's not possible in general.
I don't think that there is such function even at windows.h,coca.h or unistd.h.
Most probably you write it yourself. Just make a
struct myFile {
FILE *fh;
char *filename;
}
and hold such structures into array of struct myFile and in MagicFunction(f,b) walk on the array looking for the address equal to f.
I am working in the kernel and I am trying to make a system call that takes a partition as input (i.e. /dev/sda1) and then prints every file on the filesystem using printk().
I enter a partition (i.e. /dev/sda1) and I put a printk() inside this system call to print.
First, I tried to do this with a process, because if I am right each process is represented by a task_struct and I tried to access the files with the files_struct. But the problem is that I only have the file descriptors of the opened files and not all the files.
So, what I want to do is that I pass the name of the partition and I printk() the names of all the files.
For example:
I enter the path /dev/sda1 as an argument and let's suppose I have the file a.txt and b.txt inside this partition , so the system call should print a.txt and b.txt.
The signature will be like this:
asmlinkage long sys_acall(char *partition_name);
There is a few things that needs to be discussed.
The partition_name parameter of your syscall should have the __user tag.
If you want to, strictly speaking, read files from a partition you will have to implement filesystem recognition (is that partition ext3, reiserfs, ntfs, ...?) and then implement the driver for that kind of filesystem. As Christ pointed out, partitions doesn't contain files but filesystems does. Another option is use the drivers already implemented for the filesystem on that partition. This option is just horrible.
If you want to read files from a filesystem your work gets easier, you can use the VFS interface to access it, but you will need that filesystem to be mounted (you can do it on-the-fly though).
My final opinion, I would change "implement a system call that prints every file in a partition" for "implement a system call that prints every file in a directory". The signature for that system call would be:
asmlinkage long sys_crazyness(__user const char *dir);
We don't care if the directory passed is the root of a filesystem or just a folder in any depth-level of a filesystem.
If you can change your problem to this one it would be much easier ;)
Can a process 'foo' write to file descriptor 3, for example, in such a way that inside a bash shell one can do
foo 1>f1 2>f2 3>f3
and if so how would you write it (in C)?
You can start your command with:
./foo 2>/dev/null 3>file1 4>file2
Then if you ls -l /proc/_pid_of_foo_/fd you will see that file descriptors are created, and you can write to them via eg.:
write(3,"abc\n",4);
It would be less hacky perhaps if you checked the file descriptor first (with fcntl?).
The shell opens the file descriptors for your program before executing it. Simply use them like you would any other file descriptor, e.g. write(3, buf, len); etc. You may want to do error checking to make sure they were actually opened (attempting to dup them then closing the duplicate would be one easy check).
No.
The file descriptors are opened by the shell and the child process inherits them. It is not the child process which opens these command-line accessible file descriptors, it is the bash process.
There might be a way to convince bash to open additional file descriptors on behalf of the process. That wouldn't be portable to other shells, and I'm not sure if a mechanism even exists -- I am just speculating.
The point is that you can't do this from coding the child process in a special way. The shell would have to abide your desires.
Can a process 'foo' write to file descriptor 3, for example, in such a way that inside a bash shell one can do [...] and if so how would you write it (in C)?
I'm not sure what you are precisely after, but whatever it is, starting point going to be the man dup/man dup2 - this is how the shells make out of a random file descriptor a file descriptor with given number.
But obviously, the process foo has to know somehow that it can write to the file descriptor 3. POSIX only specifies 0, 1 and 2: shell ensures that whatever is started gets the file descriptors and libc in application's context also expects them to be the stdin/stdout/stderr. Starting from 3 and beyond - is up to application developer.
How can I tell if a file is open in C? I think the more technical question would be how can I retrieve the number of references to a existing file and determine with that info if it is safe to open.
The idea I am implementing is a file queue. You dump some files, my code processes the files. I don't want to start processing until the producer closes the file descriptor.
Everything is being done in linux.
Thanks,
Chenz
Digging out that info is a lot of work(you'd have to search thorugh /proc/*/fd
You'd be better off with any of:
Save to temp then rename. Either write your files to a temporary filename or directory, when you're done writinh, rename it into the directory where your app reads them. Renaming is atomic, so when the file is present you know it's safe to read.
Maybe a variant of the above , when you're done writing the file foo you create an empty file named foo.finished. You look for the presence of *.finished when processing files.
Lock the files while writing, that way reading the file will just block until the writer unlocks it. See the flock/lockf functions, they're advisory locks though so both the reader and writer have to lock , and honor the locks.
I don't think there is any way to do this in pure C (it wouldn't be cross platform).
If you know what files you are using ahead of time, you can use inotify to be notified when they open.
Use the lsof command. (List Open Files).
C has facilities for handling files, but not much for getting information on them. In portable C, about the only thing you can do is try to open the file in the desired way and see if it works.
generally you can't do that for variuos reasons (e.g. you cannot say if the file is opened with another user).
If you can control the processes that open the file and you are try to avoid collisions by locking the file (there are many libraries on linux in order do that)
If you are in control of both producer and consumer, you could use lockf() of flock() to lock the file.
there is lsof command on most distros, which shows all currently open files, you can ofcourse grep its output if your files are in the same directory or have some recognizable name pattern.