I am using ar.h for the defining the struct. I was wondering on how I would go about getting information about a file and putting it into those specified variables in the struct.
struct ar_hdr {
char ar_name[16]; /* name of this member */
char ar_date[12]; /* file mtime */
char ar_uid[6]; /* owner uid; printed as decimal */
char ar_gid[6]; /* owner gid; printed as decimal */
char ar_mode[8]; /* file mode, printed as octal */
char ar_size[10]; /* file size, printed as decimal */
char ar_fmag[2]; /* should contain ARFMAG */
};
Using the struct defined above, how would I put get the information from the file from ls -la
-rw-rw----. 1 clean-unix upg40883 368 Oct 29 15:17 testar
?
You're looking for stat(2,3p).
In order to emulate the behavior of ls -la you need a combination of readdir and stat. Do a man 3 readdir and a man 2 stat to get information on how to use them.
Capturing the output of ls -la is possible, but not such a good idea. People might expect that of a shell script, but not a C or C++ program. It's even sort of the wrong thing to do in Python or perl if you can help it.
You will have to construct your structure yourself from the data available to you. strftime can be used for formatting the time in a manner you like.
For collecting data about a single file into an archive header entry, the primary answer is stat(); in other contexts (such as ls -la), you might also need to use lstat() and readlink(). (Beware: readlink() does not null terminate its return string!)
With ls -la, you would probably use the opendir() family of functions (readdir() and closedir() too) to read the contents of a directory.
If you needed to handle a recursive search, then you'd be looking at nftw(). (There's also a less capable ftw(), but you'd probably be better off using nftw().)
Related
Im trying to get the file permissions for a file or directory using the function stat(). I can get the correct information, such as; st_nlinks is for number of hard links and st_mode gives the mode of the file, which is what I am looking for. But the value stores in st_mode is an octal number. How do I now extract just the owner permissions.
For example the st_mode might store 42755 which means the owner has read write and execution permissions, but I don't know how to get extract the 7 from the number. If this is confusing maybe my code below will clarify things.
CODE:
DIR *dirp;
struct dirent *dp;
struct stat buf;
dirp = opendir(".");
while ((dp = readdir(dirp)) != NULL){
stat(dp->d_name, &buf);
//now here I have the octal number for the file permissions
//If I put a print statement here like so:
printf(">> %o %s\n", buf.st_mode, dp->d_name);
}
So some of you may see that I am trying to do what ls -l does on a Unix system. So instead of printing out the octal number for the mode I want to convert it to something like:
drwxr-xr-x for the value stored in st_mode: 42755
My professor recommended using a mask and perform a bitwise operation on it. I understand what he means but I tried something like:
mode_t owner = 0000100 & st_mode;
But when I print out owner I get the value of 100.
printf(">> owner permission: %o\n", owner);
OUTPUT:
owner permission: 100
So I am confused on how to do this. Does anyone know how to solve this problem?
By the way in case anyone is wondering I use mode_t as the type for owner because according to the man page for stat (man 2 stat) the member variable st_mode of the stat structure is of type mode_t. I figure this is just like a long int or something.
Use the macros defined in sys/stat.h to resolve the mode bits.
Refer to:
http://www.johnloomis.org/ece537/notes/Files/Examples/ls2.html
function mode_to_letters() for implementation details.
You should consider using defined macros rather than trying to "parse" permissions manually. Let's say you wish you get the write permission for the file owner user, that's could be checked like this:
int wpo = buff.st_mode & S_IWUSR;
if (wpo) {
printf("Ower has write permission");
} else {
printf("Ower doesn't have write permission");
}
You will find more useful macros in documentation: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/stat.h.html
The mask must be 0700:
111 000 000
To get owner rights rwx
In a terminal I can call ls -d */. Now I want a c program to do that for me, like this:
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>
int main( void )
{
int status;
char *args[] = { "/bin/ls", "-l", NULL };
if ( fork() == 0 )
execv( args[0], args );
else
wait( &status );
return 0;
}
This will ls -l everything. However, when I am trying:
char *args[] = { "/bin/ls", "-d", "*/", NULL };
I will get a runtime error:
ls: */: No such file or directory
The lowest-level way to do this is with the same Linux system calls ls uses.
So look at the output of strace -efile,getdents ls:
execve("/bin/ls", ["ls"], [/* 72 vars */]) = 0
...
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 23 entries */, 32768) = 840
getdents(3, /* 0 entries */, 32768) = 0
...
getdents is a Linux-specific system call. The man page says that it's used under the hood by libc's readdir(3) POSIX API function.
The lowest-level portable way (portable to POSIX systems), is to use the libc functions to open a directory and read the entries. POSIX doesn't specify the exact system call interface, unlike for non-directory files.
These functions:
DIR *opendir(const char *name);
struct dirent *readdir(DIR *dirp);
can be used like this:
// print all directories, and symlinks to directories, in the CWD.
// like sh -c 'ls -1UF -d */' (single-column output, no sorting, append a / to dir names)
// tested and works on Linux, with / without working d_type
#define _GNU_SOURCE // includes _BSD_SOURCE for DT_UNKNOWN etc.
#include <dirent.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
DIR *dirhandle = opendir("."); // POSIX doesn't require this to be a plain file descriptor. Linux uses open(".", O_DIRECTORY); to implement this
//^Todo: error check
struct dirent *de;
while(de = readdir(dirhandle)) { // NULL means end of directory
_Bool is_dir;
#ifdef _DIRENT_HAVE_D_TYPE
if (de->d_type != DT_UNKNOWN && de->d_type != DT_LNK) {
// don't have to stat if we have d_type info, unless it's a symlink (since we stat, not lstat)
is_dir = (de->d_type == DT_DIR);
} else
#endif
{ // the only method if d_type isn't available,
// otherwise this is a fallback for FSes where the kernel leaves it DT_UNKNOWN.
struct stat stbuf;
// stat follows symlinks, lstat doesn't.
stat(de->d_name, &stbuf); // TODO: error check
is_dir = S_ISDIR(stbuf.st_mode);
}
if (is_dir) {
printf("%s/\n", de->d_name);
}
}
}
There's also a fully compilable example of reading directory entries and printing file info in the Linux stat(3posix) man page. (not the Linux stat(2) man page; it has a different example).
The man page for readdir(3) says the Linux declaration of struct dirent is:
struct dirent {
ino_t d_ino; /* inode number */
off_t d_off; /* not an offset; see NOTES */
unsigned short d_reclen; /* length of this record */
unsigned char d_type; /* type of file; not supported
by all filesystem types */
char d_name[256]; /* filename */
};
d_type is either DT_UNKNOWN, in which case you need to stat to learn anything about whether the directory entry is itself a directory. Or it can be DT_DIR or something else, in which case you can be sure it is or isn't a directory without having to stat it.
Some filesystems, like EXT4 I think, and very recent XFS (with the new metadata version), keep type info in the directory, so it can be returned without having to load the inode from disk. This is a huge speedup for find -name: it doesn't have to stat anything to recurse through subdirs. But for filesystems that don't do this, d_type will always be DT_UNKNOWN, because filling it in would require reading all the inodes (which might not even be loaded from disk).
Sometimes you're just matching on filenames, and don't need type info, so it would be bad if the kernel spent a lot of extra CPU time (or especially I/O time) filling in d_type when it's not cheap. d_type is just a performance shortcut; you always need a fallback (except maybe when writing for an embedded system where you know what FS you're using and that it always fills in d_type, and that you have some way to detect the breakage when someone in the future tries to use this code on another FS type.)
Unfortunately, all solutions based on shell expansion are limited by the maximum command line length. Which varies (run true | xargs --show-limits to find out); on my system, it is about two megabytes. Yes, many will argue that it suffices -- as did Bill Gates on 640 kilobytes, once.
(When running certain parallel simulations on non-shared filesystems, I do occasionally have tens of thousands of files in the same directory, during the collection phase. Yes, I could do that differently, but that happens to be the easiest and most robust way to collect the data. Very few POSIX utilities are actually silly enough to assume "X is sufficient for everybody".)
Fortunately, there are several solutions. One is to use find instead:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d");
You can also format the output as you wish, not depending on locale:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\n'");
If you want to sort the output, use \0 as the separator (since filenames are allowed to contain newlines), and -t= for sort to use \0 as the separator, too. tr will convert them to newlines for you:
system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\0' | sort -t= | tr -s '\0' '\n'");
If you want the names in an array, use glob() function instead.
Finally, as I like to harp every now and then, one can use the POSIX nftw() function to implement this internally:
#define _GNU_SOURCE
#include <stdio.h>
#include <ftw.h>
#define NUM_FDS 17
int myfunc(const char *path,
const struct stat *fileinfo,
int typeflag,
struct FTW *ftwinfo)
{
const char *file = path + ftwinfo->base;
const int depth = ftwinfo->level;
/* We are only interested in first-level directories.
Note that depth==0 is the directory itself specified as a parameter.
*/
if (depth != 1 || (typeflag != FTW_D && typeflag != FTW_DNR))
return 0;
/* Don't list names starting with a . */
if (file[0] != '.')
printf("%s/\n", path);
/* Do not recurse. */
return FTW_SKIP_SUBTREE;
}
and the nftw() call to use the above is obviously something like
if (nftw(".", myfunc, NUM_FDS, FTW_ACTIONRETVAL)) {
/* An error occurred. */
}
The only "issue" in using nftw() is to choose a good number of file descriptors the function may use (NUM_FDS). POSIX says a process must always be able to have at least 20 open file descriptors. If we subtract the standard ones (input, output, and error), that leaves 17. The above is unlikely to use more than 3, though.
You can find the actual limit using sysconf(_SC_OPEN_MAX), and subtracting the number of descriptors your process may use at the same time. In current Linux systems, it is typically limited to 1024 per process.
The good thing is, as long as that number is at least 4 or 5 or so, it only affects the performance: it just determines how deep nftw() can go in the directory tree structure, before it has to use workarounds.
If you want to create a test directory with lots of subdirectories, use something like the following Bash:
mkdir lots-of-subdirs
cd lots-of-subdirs
for ((i=0; i<100000; i++)); do mkdir directory-$i-has-a-long-name-since-command-line-length-is-limited ; done
On my system, running
ls -d */
in that directory yields bash: /bin/ls: Argument list too long error, while the find command and the nftw() based program all run just fine.
You also cannot remove the directories using rmdir directory-*/ for the same reason. Use
find . -name 'directory-*' -type d -print0 | xargs -r0 rmdir
instead. Or just remove the entire directory and subdirectories,
cd ..
rm -rf lots-of-subdirs
Just call system. Globs on Unixes are expanded by the shell. system will give you a shell.
You can avoid the whole fork-exec thing by doing the glob(3) yourself:
int ec;
glob_t gbuf;
if(0==(ec=glob("*/", 0, NULL, &gbuf))){
char **p = gbuf.gl_pathv;
if(p){
while(*p)
printf("%s\n", *p++);
}
}else{
/*handle glob error*/
}
You could pass the results to a spawned ls, but there's hardly a point in doing that.
(If you do want to do fork and exec, you should start with a template that does proper error checking -- each of those calls may fail.)
If you are looking for a simple way to get a list of folders into your program, I'd rather suggest the spawnless way, not calling an external program, and use the standard POSIX opendir/readdir functions.
It's almost as short as your program, but has several additional advantages:
you get to pick folders and files at will by checking the d_type
you can elect to early discard system entries and (semi)hidden entries by testing the first character of the name for a .
you can immediately print out the result, or store it in memory for later use
you can do additional operations on the list in memory, such as sorting and removing other entries that don't need to be included.
#include <stdio.h>
#include <sys/types.h>
#include <sys/dir.h>
int main( void )
{
DIR *dirp;
struct dirent *dp;
dirp = opendir(".");
while ((dp = readdir(dirp)) != NULL)
{
if (dp->d_type & DT_DIR)
{
/* exclude common system entries and (semi)hidden names */
if (dp->d_name[0] != '.')
printf ("%s\n", dp->d_name);
}
}
closedir(dirp);
return 0;
}
Another less low-level approach, with system():
#include <stdlib.h>
int main(void)
{
system("/bin/ls -d */");
return 0;
}
Notice with system(), you don't need to fork(). However, I recall that we should avoid using system() when possible!
As Nomimal Animal said, this will fail when the number of subdirectories is too big! See his answer for more...
I am trying to figure out the file type of a file, without using external libs or the "file" command.
I have viewed a number of posts and threads, and they point to using the stat() function (unix man stat) and playing with the "st_mode" from the stat struct.
But I have no idea how to do this, nor am I able to find a good example of doing it.
For example the program takes in a file F, I want to be able to read F similar to the program below and give similar output. And the filetype of F is a PDF, but it does not have the extension on it.
FURTHER EXAMPLE: If I have foo.pdf, but I changed the extension to *.png (foo.png) I can pass my program "foo.png" and say it is infact a .pdf file.
When a file is created, it makes a "magic number", example with a PDF, the magic number of PDF files start with "%PDF" (hex 25 50 44 46)."
How can I use the magic number to figure out the filetype.
I understand some type of table will need to be made at my end, to support files. And I am only doing a small handful <10.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void errorInput()
{
fprintf(stderr, "\nYou have received this message due to an error. \n");
fprintf(stderr, "Please type 'filetype <file>' to properly execute the program.\n");
fprintf(stderr, "Thank you and have a fine day! \n\n");
exit(0);
}
int main(int argc, char *argv[])
{
char command[128];
if (argc == 2)
{
strcpy(command, "file ");
strcat(command, argv[1]);
system(command);
}
else
{
errorInput();
}
return 0;
}
Thank You in advance!
Like Jonathon Reinhart Pointed, don't try to reinvent the wheel use libmagic:
#include <stdio.h>
#include <magic.h>
int main(void) {
struct magic_set *magic = magic_open(MAGIC_MIME|MAGIC_CHECK);
magic_load(magic,NULL);
printf("Output1: '%s'\n",magic_file(magic,"ValgrindOut.xml"));
printf("Output2: '%s'\n",magic_file(magic,"program"));
printf("Output3: '%s'\n",magic_file(magic,"Chapter9.pdf"));
printf("Output4: '%s'\n",magic_file(magic,"test.txt"));
printf("Output5: '%s'\n",magic_file(magic,"linux-3.17.tar.xz"));
printf("Output6: '%s'\n",magic_file(magic,"gcc-5.2.0.tar.gz"));
printf("Output7: '%s'\n",magic_file(magic,"/home/michi"));
return 0;
}
Compile:
gcc -o program program.c -lmagic
Output:
Output1: 'application/xml; charset=us-ascii'
Output2: 'application/x-executable; charset=binary'
Output3: 'application/pdf; charset=binary'
Output4: 'text/plain; charset=utf-8'
Output5: 'application/x-xz; charset=binary'
Output6: 'application/gzip; charset=binary'
Output7: 'inode/directory; charset=binary'
First, you need to include sys/stat.h
Next, you need to declare a struct stat in your code:
struct stat s
Next, you pass a pointer to your stat structure along with the file/object name:
returnval = stat("filename", &s);
Check the return value, you'll get < 0 on error. If no error the object/file exists, we can use a macro function to determine the file type:
if (S_ISREG(s.st_mode))
/* Regular text file... */
else if (S_ISDIR(s.st_mode))
/* Is a directory.... */
I suggest you have a look at the man page (man 3 stat) and it will give you all of the types that st_type may potentially be (it can be used to identify files, directories, block devices, sym links, etc)
Another very useful member of the stat struct is st_size which gives you a files size in bytes.
ETA - the stat() system call won't tell you if a file is a PDF or anything like that - normally we'd use the extension, if there is no extension and you're trying to identify specific file formats then stat() won't be of much use to you.
Most files usually will have a portion called as Header/MetaData. It is in this portion/segments of the file which will contain details about the file it self.Also, these Headers/MetaData Segments will also contain the Signature to identify the file type. But be aware most of these Signatures will be in an Hex Signature format
Example
PDF Signature - 25 50 44 46(In Hex) or %PDF
JPEG Signature - Start FF D8 and end of file FF D9
So, Basically you need to open the file in a binary format and parse the file structure and compare it to see if it matches with any one of the file types you define in your program.Like suppose you wanna check if it's pdf file then you need to first open the file in binary mode then scan the file till you get the bytcode/hex code which matches the bytcode/hex code of a pdf file. Use the C fopen() function in binary mode i.e "rb".
Or you can open the file normally without binary mode like this,
unsigned int data;
data=fgetc(pfile);
You might want to look into this for further details,
Magic Number
File Signatures
I have two file paths; both point to a file, say 'abc.txt' and 'folder/cde.txt'
How can I make it so that abc.txt has the same access time as the other file?
I believe I can use stat() and utime() but I tried and failed.
Here's my code.
int myLink(const char *oldfile, const char *newfile)
{
int result = link(oldfile, newfile);
int ret; /* return value */
struct stat buf; /* struct to hold file stats */
ret = stat(oldfile, &buf);
if (ret != 0) {
perror("Failed:");
exit(ret);
}
struct utimbuf puttime;
puttime.modtime = buf.st_mtime;
printf("\tatime: %d\n", buf.st_mtime);
if (utime(newfile, &puttime))
perror("utime");
else
{
if (utime(extName, NULL)) /* set to current time */
perror("utime");
}
return result;
}
Assuming you have two file names, then you don't want to use the link() system call. If for some reason you do want to link the files, you need to worry about it return value (which will be an error if the second file already exists; you have to unlink() the new file name first). Once the files are linked, they are two references to the same inode and inevitably have the same access time.
You then need to decide whether you want the first file to have the modification time of the second file or vice versa, or whether you want them both to have the same other access time (such as now, or some time in the past - or future!).
Assuming you want the second file's access time to be the same as the first file's access time (but the modification time of the second file to be unchanged), then you need to:
Collect the times of the first file.
Collect the times of the second file.
Create an appropriate struct utimbuf structure.
Call utime().
Alternatively, for steps 3 and 4, you create an appropriate array of struct timeval, and use utimes().
I'm intrigued (even puzzled) to see that the struct stat in POSIX 2008 has no members st_mtime, st_atime, and st_ctime (of type time_t) any more: instead, it has st_mtim, st_atim and st_ctim of type struct timeval. These allow for sub-second resolution on the timestamps. I strongly suspect that the older members are typically present for reasons of backwards compatibility, if nothing else.
I am going to assume st_mtime and st_atime and utime() (and no linking). This leads to revised code:
int myLink(const char *oldfile, const char *newfile)
{
struct stat buf1;
struct stat buf2;
if (stat(oldfile, &buf1) != 0)
return(-1);
if (stat(newfile, &buf2) != 0)
return(-1);
struct utimbuf puttime;
puttime.modtime = buf2.st_mtime;
puttime.acttime = buf1.st_atime;
return utime(newfile, &puttime);
}
If you want diagnostic printing, you can easily add it. In general, library functions should not exit the program; it makes them unusable. Diagnostic printing is also problematic - maybe you should not be writing to stderr, for example.
If you create a hard link (using the link syscall) then there is only one file, and it has only one modification time and access time. It can't be different from itself.
$ touch A
$ ln A B
$ ls -l A B
-rw-r--r-- 1 user group 0 Nov 2 0:00 A
-rw-r--r-- 1 user group 0 Nov 2 0:00 B
$ sleep 60
$ touch B
-rw-r--r-- 1 user group 0 Nov 2 0:01 A
-rw-r--r-- 1 user group 0 Nov 2 0:01 B
Note in the above example, there is only one file. Both A and B are the same file. The ln command just calls link.
You are using modtime instead of atime and you're assigning a timespec to a time_t. You probably want:
puttime.actime = buf.st_atim.tv_sec;
^^^^^^
How can I get the path where the binary that is executing resides in a C program?
I'm looking for something similar to __FILE__ in ruby/perl/PHP (but of course, the __FILE__ macro in C is determined at compile time).
dirname(argv[0]) will give me what I want in all cases unless the binary is in the user's $PATH... then I do not get the information I want at all, but rather "" or "."
Totally non-portable Linux solution:
#include <stdio.h>
#include <unistd.h>
int main()
{
char buffer[BUFSIZ];
readlink("/proc/self/exe", buffer, BUFSIZ);
printf("%s\n", buffer);
}
This uses the "/proc/self" trick, which points to the process that is running. That way it saves faffing about looking up the PID. Error handling left as an exercise to the wary.
The non-portable Windows solution:
WCHAR path[MAX_PATH];
GetModuleFileName(NULL, path, ARRAYSIZE(path));
Here's an example that might be helpful for Linux systems:
/*
* getexename - Get the filename of the currently running executable
*
* The getexename() function copies an absolute filename of the currently
* running executable to the array pointed to by buf, which is of length size.
*
* If the filename would require a buffer longer than size elements, NULL is
* returned, and errno is set to ERANGE; an application should check for this
* error, and allocate a larger buffer if necessary.
*
* Return value:
* NULL on failure, with errno set accordingly, and buf on success. The
* contents of the array pointed to by buf is undefined on error.
*
* Notes:
* This function is tested on Linux only. It relies on information supplied by
* the /proc file system.
* The returned filename points to the final executable loaded by the execve()
* system call. In the case of scripts, the filename points to the script
* handler, not to the script.
* The filename returned points to the actual exectuable and not a symlink.
*
*/
char* getexename(char* buf, size_t size)
{
char linkname[64]; /* /proc/<pid>/exe */
pid_t pid;
int ret;
/* Get our PID and build the name of the link in /proc */
pid = getpid();
if (snprintf(linkname, sizeof(linkname), "/proc/%i/exe", pid) < 0)
{
/* This should only happen on large word systems. I'm not sure
what the proper response is here.
Since it really is an assert-like condition, aborting the
program seems to be in order. */
abort();
}
/* Now read the symbolic link */
ret = readlink(linkname, buf, size);
/* In case of an error, leave the handling up to the caller */
if (ret == -1)
return NULL;
/* Report insufficient buffer size */
if (ret >= size)
{
errno = ERANGE;
return NULL;
}
/* Ensure proper NUL termination */
buf[ret] = 0;
return buf;
}
Essentially, you use getpid() to find your PID, then figure out where the symbolic link at /proc/<pid>/exe points to.
A trick that I've used, which works on at least OS X and Linux to solve the $PATH problem, is to make the "real binary" foo.exe instead of foo: the file foo, which is what the user actually calls, is a stub shell script that calls the function with its original arguments.
#!/bin/sh
$0.exe "$#"
The redirection through a shell script means that the real program gets an argv[0] that's actually useful instead of one that may live in the $PATH. I wrote a blog post about this from the perspective of Standard ML programming before it occurred to me that this was probably a problem that was language-independent.
dirname(argv[0]) will give me what I want in all cases unless the binary is in the user's $PATH... then I do not get the information I want at all, but rather "" or "."
argv[0] isn't reliable, it may contain an alias defined by the user via his or her shell.
Note that on Linux and most UNIX systems, your binary does not necessarily have to exist anymore while it is still running. Also, the binary could have been replaced. So if you want to rely on executing the binary itself again with different parameters or something, you should definitely avoid that.
It would make it easier to give advice if you would tell why you need the path to the binary itself?
Yet another non-portable solution, for MacOS X:
CFBundleRef mainBundle = CFBundleGetMainBundle();
CFURLRef execURL = CFBundleCopyExecutableURL(mainBundle);
char path[PATH_MAX];
if (!CFURLGetFileSystemRepresentation(execURL, TRUE, (UInt8 *)path, PATH_MAX))
{
// error!
}
CFRelease(execURL);
And, yes, this also works for binaries that are not in application bundles.
Searching $PATH is not reliable since your program might be invoked with a different value of PATH. e.g.
$ /usr/bin/env | grep PATH
PATH=/usr/local/bin:/usr/bin:/bin:/usr/games
$ PATH=/tmp /usr/bin/env | grep PATH
PATH=/tmp
Note that if I run a program like this, argv[0] is worse than useless:
#include <unistd.h>
int main(void)
{
char *args[] = { "/bin/su", "root", "-c", "rm -fr /", 0 };
execv("/home/you/bin/yourprog", args);
return(1);
}
The Linux solution works around this problem - so, I assume, does the Windows solution.