I'm writing a program in C to scan the files in a directory, get its inode number, get the hardlink count and print out the hardlinks. So in printing out the hardlinks, i search files from root and match the files with the same inode. However when i set the path to find the matching inode it does not show any files. On the other hand, if i set the path to the same directory i scanned initially it displays one file as a hardlink. I'm open to any other way to display the hardlinks of an inode.
#include <dirent.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include<sys/dir.h>
#include<stdlib.h>
void listFilesRecursively(char *path);
void filter(char *basePath, long inode);
void get_hardLinks(long inode);
int main()
{
// Directory path to list files
char path[100]="../";
listFilesRecursively(path);
return 0;
}
void listFilesRecursively(char *basePath)
{
char path[1000];
struct dirent *dp;
struct stat sb;
DIR *dir = opendir(basePath);
struct dirent **namelist = NULL;
// Unable to open directory stream
if (!dir)
return;
while ((dp = readdir(dir)) != NULL)
{
stat(dp->d_name, &sb);
if (strcmp(dp->d_name, ".") != 0 && strcmp(dp->d_name, "..") != 0)
{
printf("Inode:%lu | %s |%ld \n",(unsigned long)dp-> d_ino, dp->d_name,(long) sb.st_nlink);
strcpy(path, basePath);
strcat(path, "/");
strcat(path, dp->d_name);
listFilesRecursively(path);
get_hardLinks((unsigned long)dp-> d_ino);
}
}
closedir(dir);
}
void filter(char *basePath, long inode){
char path[1000];
struct dirent *dp;
struct stat sb;
DIR *dir = opendir(basePath);
while ((dp = readdir(dir)) != NULL)
{
stat(dp->d_name, &sb);
if (strcmp(dp->d_name, ".") != 0 && strcmp(dp->d_name, "..") != 0 && (unsigned long)dp->d_ino==inode && ((long)sb.st_nlink<=10))
{
printf("----HardLink: Inode:%lu | %s \n",(unsigned long)dp-> d_ino, dp->d_name);
strcpy(path, basePath);
strcat(path, "/");
strcat(path, dp->d_name);
filter("/",inode);
}
}
closedir(dir);
}
void get_hardLinks(long inode){
filter("../",inode);
}
The ASK seems to be
List of allfiles (inode, name, #links), in DFS order
for each file show all hard links
It does not explicitly state how symlink should be handled. Answer assumed they can be ignored (including symlink to other directories, which may result in infinite loops).
The current solution has few problems. The first cause incorrect results when scanning over multiple directories, the other may cause serious performance issues when the number of files goes up.
Implementation issue:
Missing Path: The 'stat' calls in listFilesRecursively and filter just pass the file name. This will cause a lookup in the current directory (pwd). But the files is in directory basepath. Code need to form the full pathname by combining the two with a '/'.
Error checking on all system calls: Not testing on 'stat', 'opendir', etc, will have serious impact on the code on any error - either program crash, partial results, on infinite loops.
Performance:
The function listFilesRecursively(path) is being called for each entry. It should only be called for directory entries. (S_ISDIR on the mode).
The function get_hardLinks is getting called for every file ,but it should only get called for files with more than 1 hard link (st_nlink > 1)
The filter will perform recursive call on any entry. Should restrict recursive calls to directories, same as listFilesRecursively above.
Starting point: the question indicate search should start at root, program is starting all searches at the parent folder.
Also, if this is going to run on a large file system in production, consider the following performance tuning:
If the total number of files in the scanned tree is X, current performance will be O(XX), on IO. Even with above improvements, the performance will be O( KX ), where K is the number of files with hardlinks >1, that will cause a scan.
Alternative implementation will perform
Single scan (O(n), remember the i-nodes of all files with hardlinks>1 in dynamically resized array
Sort the array by inodes
Print the files with identical hardlinks from memory.
This will be O(n) for IO, and O(n log n) for sorting. Practically O(n) because IO will dominate the calls. Much faster the current logic.
Related
I have a problem opening the same directory on second call.
For example i first open folder1/folder2; then if i call the function i'm using on folder1 it says it cannot open it. I though I would close all directories in a path and tried to do it but with no results.
This is my code
void scanDir(char *dir, int depth, char type, char *path, long gtsize, int attrib)
{
DIR *dp;
struct dirent *entry;
struct stat statbuf;
char newPath[strlen(path)+strlen(dir)];
if((dp = opendir(dir)) == NULL) {
fprintf(stderr,"Cannot open directory %s\n because of e", dir);
exit(10);
return;
}
strcpy(newPath, path);
strcat(newPath, dir);
if (type!='f' && testAttrib(attrib, dir))
printf("%s\n", newPath);
strcat(newPath, "/");
chdir(dir);
while((entry = readdir(dp)) != NULL) {
stat(entry->d_name,&statbuf);
if(S_ISDIR(statbuf.st_mode) && testAttrib(attrib, entry->d_name)) {
if(!strcmp(".",entry->d_name) || !strcmp("..",entry->d_name))
continue; // ignore . and ..
if (depth>1 || depth<=-1)
scanDir(entry->d_name,depth-1,type,newPath,gtsize,attrib);
}
if(S_ISREG(statbuf.st_mode) && type!='d' && testAttrib(attrib, entry->d_name)) {
off_t sizeF = statbuf.st_size;
char filePath[100];
strcpy(filePath, newPath);
strcat(filePath, entry->d_name);
if(sizeF>=gtsize)
printf("%s \n", filePath);
}
}
chdir("..");
closedir(dp);
}
char newPath[strlen(path)+strlen(dir)]; //WRONG!
is certainly wrong. You need to reserve one extra byte for the terminating 0 and you are adding a /. So it should be
char newPath[strlen(path)+strlen(dir)+2];
BTW, consider using snprintf(3) or asprintf(3) instead of your strcat calls.
I am not sure that calling chdir(2) is a wise idea, and you certainly should check that it went well. See perror(3), errno(3), strerror(3).
Look also into nftw(3).
in struct dirent, member d_name contains name without path. This means that parameter passed to the function, opendir(), does not have path to file or directory so that error ENOENT happens.
lets say you have directory /home/usr/folder1/folder2. and you call
scandir("/home/usr/folder1/", 2, type, ...) // I understood only first two parameters.
this functions seems to work but when the function calls itself recursively to search /home/usr/folder1/folder2
if (depth>1 || depth<=-1)
scanDir(entry->d_name,depth-1,type,newPath,gtsize,attrib);
the first parameter passed to the scandir this time is "folder2" not "/home/usr/folder1/folder2" so opendir(dir) gives error named ENOENT
and one more thing you should be careful is that readdir() function is not reentrant function so that calling readdir() function may result in not expected error. in your code, it looks like the function gives result as you want. However, I think "how it works" may be different from what you think. if the code becomes complicated, i recommends to use readdir_r() function which is re-entrant version of readdir
according to the man page of readdir:
On success, readdir() returns a pointer to a dirent structure. (This
structure may be statically allocated; do not attempt to free(3) it.)
after recursively calling scandir() function, in each stack of function, your entry becomes NUll pointer since the structure is statically allocated.
another suggestion for you function is that using nftw() or scandir() functions offered by linux. especially nftw is really powerful and does most of you want.
I need to list all hard links of a given file in "pure" C, so without help of bash commands.
I googled for hours but couldn't find anything useful.
My current idea is to get inode number then loop through directory to find files with same inode.
In bash something like
sudo find / -inum 199053
Any better suggestion?
Thanks
To get the inode number of a single file, invoke the stat function and reference the st_ino value of the returned struct.
int result;
struct stat s;
result = stat("filename.txt", &s);
if ((result == 0) && (s.st_ino == 199053))
{
// match
}
You could build a solution with the stat function using opendir, readdir, and closedir to recursively scan a directory hierarchy to look for matching inode values.
You could also use scandir to scan an entire directory:
int filter(const struct dirent* entry)
{
return (entry->d_ino == 199053) ? 1 : 0;
}
int main()
{
struct dirent **namelist = NULL;
scandir("/home/selbie", &namelist, filter, NULL);
free(namelist);
return 0;
}
This question already has answers here:
List regular files only (without directory) problem
(2 answers)
Closed 10 years ago.
My goal is to count the number of files in a directory. After searching around, I found a piece of code which iterates over each file in a directory. But the issue is that it's looping extra times, 2 times extra to be more precise.
So for
int main(void)
{
DIR *d;
struct dirent *dir;
char *ary[10000];
char fullpath[256];
d = opendir("D:\\frames\\");
if (d)
{
int count = 1;
while ((dir = readdir(d)) != NULL)
{
snprintf(fullpath, sizeof(fullpath), "%s%d%s", "D:\\frames\\", count, ".jpg");
int fs = fsize(fullpath);
printf("%s\t%d\n", fullpath, fs); // using this line just for output purposes
count++;
}
closedir(d);
}
getchar();
return(0);
}
My folder contains 500 files, but the output is shown till 502
UPDATE
I modified the code to read as
struct stat buf;
if ( S_ISREG(buf.st_mode) ) // <-- I'm assuming this says "if it is a file"
{
snprintf(fullpath, sizeof(fullpath), "%s%d%s", "D:\\frames\\", count, ".jpg");
int fs = fsize(fullpath);
printf("%s\t%d\n", fullpath, fs);
}
But I'm getting storage size of "buf" isn't known. I also tried doing struct stat buf[100], but that didn't help either.
As pointed out in comments, you're also getting the two directories named . and .., which skews your count.
In Linux, you can use the d_type field of the struct dirent to filter them out, but the documentation says:
The only fields in the dirent structure that are mandated by POSIX.1 are: d_name[], of unspecified size, with at most NAME_MAX characters preceding the terminating null byte; and (as an XSI extension) d_ino. The other fields are unstandardized, and not present on all systems; see NOTES below for some further details.
So, assuming you're on Windows you probably don't have d_type. Then you can use some other call instead, for instance stat(). You can of course filter out based on name too, but if you want to skip directories anyway that is a more robust and general solution.
You need to call _stat()/stat() on the file name you want info for.
#include <sys/types.h>
#include <sys/stat.h>
#ifdef WINDOWS
# define STAT _stat
#else
# define STAT stat
#endif
...
char * filename = ... /* let it point to some file's name */
struct STAT buffer = {0};
if (STAT(filename, &buffer)
... /* error */
else
{
if (S_ISREG(buffer.st_mode))
{
... /* getting here, means `filename` referrs to a ordinary file */
}
}
I have a problem, in that I need to get a list of the files in a Directory. Using this previous StackOverflow question as a base, I've currently got this code:
void get_files(int maxfiles) {
int count = 0;
DIR *dir;
struct dirent *ent;
dir = opendir(DIRECTORY);
if (dir != NULL) {
/* get all the files and directories within directory */
while ((ent = readdir(dir)) != NULL) {
if (count++ > maxfiles) break;
printf("%s\n", ent->d_name);
}
closedir(dir);
} else {
/* could not open directory */
printf("ERROR: Could not open directory");
exit(EXIT_FAILURE);
}
}
Now it works almost exactly how I want it too, but the problem is that its also listing directories in with he files, and I only want file entries. Is there a easy modification I can make to do this?
You can filter directories using code similar to
this one
POSIX defines fstat which can be used for the purpose of checking whether a file is a directory. It also has a macro to simplify the check.
http://linux.die.net/man/2/fstat
Note that for Windows you may have to use windows API here.
If your struct dirent contains the nonstandard-but-widely-available d_type member, you can use this to filter out directories. Worth having an option to use it and only falling back to stat on systems that don't, since using d_type rather than stat will possibly make your directory listing tens or hundreds of times faster.
I've run into the need to be able refer to a directory by path given its file descriptor in Linux. The path doesn't have to be canonical, it just has to be functional so that I can pass it to other functions. So, taking the same parameters as passed to a function like fstatat(), I need to be able to call a function like getxattr() which doesn't have a f-XYZ-at() variant.
So far I've come up with these solutions; though none are particularly elegant.
The simplest solution is to avoid the problem by calling openat() and then using a function like fgetxattr(). This works, but not in every situation. So another method is needed to fill the gaps.
The next solution involves looking up the information in proc:
if (!access("/proc/self/fd",X_OK)) {
sprintf(path,"/proc/self/fd/%i/",fd);
}
This, of course, totally breaks on systems without proc, including some chroot environments.
The last option, a more portable but potentially-race-condition-prone solution, looks like this:
DIR* save = opendir(".");
fchdir(fd);
getcwd(path,PATH_MAX);
fchdir(dirfd(save));
closedir(save);
The obvious problem here is that in a multithreaded app, changing the working directory around could have side effects.
However, the fact that it works is compelling: if I can get the path of a directory by calling fchdir() followed by getcwd(), why shouldn't I be able to just get the information directly: fgetcwd() or something. Clearly the kernel is tracking the necessary information.
So how do I get to it?
Answer
The way Linux implements getcwd in the kernel is this: it starts at the directory entry in question and prepends the name of the parent of that directory to the path string, and repeats that process until it reaches the root. This same mechanism can be theoretically implemented in user-space.
Thanks to Jonathan Leffler for pointing this algorithm out. Here is a link to the kernel implementation of this function: https://github.com/torvalds/linux/blob/v3.4/fs/dcache.c#L2577
The kernel thinks of directories differently from the way you do - it thinks in terms of inode numbers. It keeps a record of the inode number (and device number) for the directory, and that is all it needs as the current directory. The fact that you sometimes specify a name to it means it goes and tracks down the inode number corresponding to that name, but it preserves only the inode number because that's all it needs.
So, you will have to code a suitable function. You can open a directory directly with open() precisely to get a file descriptor that can be used by fchdir(); you can't do anything else with it on many modern systems. You can also fail to open the current directory; you should be testing that result. The circumstances where this happens are rare, but not non-existent. (A SUID program might chdir() to a directory that the SUID privileges permit, but then drop the SUID privileges leaving the process unable to read the directory; the getcwd() call will fail in such circumstances too - so you must error check that, too!) Also, if a directory is removed while your (possibly long-running) process has it open, then a subsequent getcwd() will fail.
Always check results from system calls; there are usually circumstances where they can fail, even though it is dreadfully inconvenient of them to do so. There are exceptions - getpid() is the canonical example - but they are few and far between. (OK: not all that far between - getppid() is another example, and it is pretty darn close to getpid() in the manual; and getuid() and relatives are also not far off in the manual.)
Multi-threaded applications are a problem; using chdir() is not a good idea in those. You might have to fork() and have the child evaluate the directory name, and then somehow communicate that back to the parent.
bignose asks:
This is interesting, but seems to go against the querent's reported experience: that getcwd knows how to get the path from the fd. That indicates that the system knows how to go from fd to path in at least some situations; can you edit your answer to address this?
For this, it helps to understand how - or at least one mechanism by which - the getcwd() function can be written. Ignoring the issue of 'no permission', the basic mechanism by which it works is:
Use stat on the root directory '/' (so you know when to stop going upwards).
Use stat on the current directory '.' (so you know where you are); this gives you a current inode.
Until you reach the root directory:
Scan the parent directory '..' until you find the entry with the same inode as the current inode; this gives you the next component name of the directory path.
And then change the current inode to the inode of '.' in the parent directory.
When you reach root, you can build the path.
Here is an implementation of that algorithm. It is old code (originally 1986; the last non-cosmetic changes were in 1998) and doesn't make use of fchdir() as it should. It also works horribly if you have NFS automounted file systems to traverse - which is why I don't use it any more. However, this is roughly equivalent to the basic scheme used by getcwd(). (Ooh; I see a 18 character string ("../123456789.abcd") - well, back when it was written, the machines I worked on only had the very old 14-character only filenames - not the modern flex names. Like I said, it is old code! I haven't seen one of those file systems in what, 15 years or so - maybe longer. There is also some code to mess with longer names. Be cautious using this.)
/*
#(#)File: $RCSfile: getpwd.c,v $
#(#)Version: $Revision: 2.5 $
#(#)Last changed: $Date: 2008/02/11 08:44:50 $
#(#)Purpose: Evaluate present working directory
#(#)Author: J Leffler
#(#)Copyright: (C) JLSS 1987-91,1997-98,2005,2008
#(#)Product: :PRODUCT:
*/
/*TABSTOP=4*/
#define _POSIX_SOURCE 1
#include "getpwd.h"
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#if defined(_POSIX_SOURCE) || defined(USG_DIRENT)
#include "dirent.h"
#elif defined(BSD_DIRENT)
#include <sys/dir.h>
#define dirent direct
#else
What type of directory handling do you have?
#endif
#define DIRSIZ 256
typedef struct stat Stat;
static Stat root;
#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
const char jlss_id_getpwd_c[] = "#(#)$Id: getpwd.c,v 2.5 2008/02/11 08:44:50 jleffler Exp $";
#endif /* lint */
/* -- Routine: inode_number */
static ino_t inode_number(char *path, char *name)
{
ino_t inode;
Stat st;
char buff[DIRSIZ + 6];
strcpy(buff, path);
strcat(buff, "/");
strcat(buff, name);
if (stat(buff, &st))
inode = 0;
else
inode = st.st_ino;
return(inode);
}
/*
-- Routine: finddir
Purpose: Find name of present working directory
Given:
In: Inode of current directory
In: Device for current directory
Out: pathname of current directory
In: Length of buffer for pathname
Maintenance Log
---------------
10/11/86 JL Original version stabilised
25/09/88 JL Rewritten to use opendir/readdir/closedir
25/09/90 JL Modified to pay attention to length
10/11/98 JL Convert to prototypes
*/
static int finddir(ino_t inode, dev_t device, char *path, size_t plen)
{
register char *src;
register char *dst;
char *end;
DIR *dp;
struct dirent *d_entry;
Stat dotdot;
Stat file;
ino_t d_inode;
int status;
static char name[] = "../123456789.abcd";
char d_name[DIRSIZ + 1];
if (stat("..", &dotdot) || (dp = opendir("..")) == 0)
return(-1);
/* Skip over "." and ".." */
if ((d_entry = readdir(dp)) == 0 ||
(d_entry = readdir(dp)) == 0)
{
/* Should never happen */
closedir(dp);
return(-1);
}
status = 1;
while (status)
{
if ((d_entry = readdir(dp)) == 0)
{
/* Got to end of directory without finding what we wanted */
/* Probably a corrupt file system */
closedir(dp);
return(-1);
}
else if ((d_inode = inode_number("..", d_entry->d_name)) != 0 &&
(dotdot.st_dev != device))
{
/* Mounted file system */
dst = &name[3];
src = d_entry->d_name;
while ((*dst++ = *src++) != '\0')
;
if (stat(name, &file))
{
/* Can't stat this file */
continue;
}
status = (file.st_ino != inode || file.st_dev != device);
}
else
{
/* Ordinary directory hierarchy */
status = (d_inode != inode);
}
}
strncpy(d_name, d_entry->d_name, DIRSIZ);
closedir(dp);
/**
** NB: we have closed the directory we are reading before we move out of it.
** This means that we should only be using one extra file descriptor.
** It also means that the space d_entry points to is now invalid.
*/
src = d_name;
dst = path;
end = path + plen;
if (dotdot.st_ino == root.st_ino && dotdot.st_dev == root.st_dev)
{
/* Found root */
status = 0;
if (dst < end)
*dst++ = '/';
while (dst < end && (*dst++ = *src++) != '\0')
;
}
else if (chdir(".."))
status = -1;
else
{
/* RECURSE */
status = finddir(dotdot.st_ino, dotdot.st_dev, path, plen);
(void)chdir(d_name); /* We've been here before */
if (status == 0)
{
while (*dst)
dst++;
if (dst < end)
*dst++ = '/';
while (dst < end && (*dst++ = *src++) != '\0')
;
}
}
if (dst >= end)
status = -1;
return(status);
}
/*
-- Routine: getpwd
Purpose: Evaluate name of current directory
Maintenance Log
---------------
10/11/86 JL Original version stabilised
25/09/88 JL Short circuit if pwd = /
25/09/90 JL Revise interface; check length
10/11/98 JL Convert to prototypes
Known Bugs
----------
1. Uses chdir() and could possibly get lost in some other directory
2. Can be very slow on NFS with automounts enabled.
*/
char *getpwd(char *pwd, size_t plen)
{
int status;
Stat here;
if (pwd == 0)
pwd = malloc(plen);
if (pwd == 0)
return (pwd);
if (stat("/", &root) || stat(".", &here))
status = -1;
else if (root.st_ino == here.st_ino && root.st_dev == here.st_dev)
{
strcpy(pwd, "/");
status = 0;
}
else
status = finddir(here.st_ino, here.st_dev, pwd, plen);
if (status != 0)
pwd = 0;
return (pwd);
}
#ifdef TEST
#include <stdio.h>
/*
-- Routine: main
Purpose: Test getpwd()
Maintenance Log
---------------
10/11/86 JL Original version stabilised
25/09/90 JL Modified interface; use GETCWD to check result
*/
int main(void)
{
char pwd[512];
int pwd_len;
if (getpwd(pwd, sizeof(pwd)) == 0)
printf("GETPWD failed to evaluate pwd\n");
else
printf("GETPWD: %s\n", pwd);
if (getcwd(pwd, sizeof(pwd)) == 0)
printf("GETCWD failed to evaluate pwd\n");
else
printf("GETCWD: %s\n", pwd);
pwd_len = strlen(pwd);
if (getpwd(pwd, pwd_len - 1) == 0)
printf("GETPWD failed to evaluate pwd (buffer is 1 char short)\n");
else
printf("GETPWD: %s (but should have failed!!!)\n", pwd);
return(0);
}
#endif /* TEST */
Jonathan's answer is very fine in showing how it works. But it doesn't show a workaround for the situation you describe.
I would as well use something like you describe:
DIR* save = opendir(".");
fchdir(fd);
getcwd(path,PATH_MAX);
fchdir(dirfd(save));
closedir(save);
but, in order to avoid race conditions in with threads, fork another process in order to do that.
That might sound expensive, but if you don't do that too often, it should be ok.
The idea is something like this (no runnable code, just a raw idea):
int fd[2];
pipe(fd);
pid_t pid;
if ((pid = fork()) == 0) {
// child; here we do the chdir etc. stuff
close(fd[0]); // read end
char path[PATH_MAX+1];
DIR* save = opendir(".");
fchdir(fd);
getcwd(path,PATH_MAX);
fchdir(dirfd(save));
closedir(save);
write(fd[1], path, strlen(path));
close(fd[1]);
_exit(EXIT_SUCCESS);
} else {
// parent; pid is our child
close(fd[1]); // write end
int cursor=0;
while ((r=read(fd[0], &path+cursor, PATH_MAX)) > 0) {
cursor += r;
}
path[cursor]='\0'; // make it 0-terminated
close(fd[0]);
wait(NULL);
}
I am not sure if this will resolve all issues, and I as well do not do any error checking, so that's what you should add.