How are dirent entries ordered?

How are dirent entries ordered? - c

I am at a loss as to how dirent entries are ordered. For example, if I had the code
DIR* dir = opendir("/some/directory");
struct dirent* entry;
while ((entry = readdir(dir))
printf("%s\n", entry->d_name);
This may output something like the following:
abcdef
example3
..
.
123456789
example2
example1
As you can see, this output is not alphabetically ordered. So, I was wondering how exactly dirent entries are ordered? What causes some entries to have a higher precedence than others?

They are not ordered alphabetically; they are retrieved in the order that the filesystem maintains them.
The directory "file" simply contains a list of filenames and inode numbers. For some filesystem types, the filesystem prefers to not split a filename/inode value across blocks. As the filesystem adds or removes files from the list, it may find space in one of the blocks. Other schemes (such as frequently-used filenames being earlier in the list) are possible.
A list sorted by filename depends upon the way things are sorted: it can be locale-dependent. (The filesystem does not know or care about your locale-settings). So that decision is left to applications rather than the filesystem itself.
For additional comments, see
What determines the order directory entries are returned by getdents?
What is the “directory order” of files in a directory (used by ls -U)?
In Bash, are wildcard expansions guaranteed to be in order?

They are not ordered in any relevant way. It's up to the implementation to retrieve and return directory entries in whatever order is most convenient.
Advanced Programming in the UNIX Environment, 3rd ed., goes a little further and even says that the order is usually not alphabetical (Chapter 4, Section 4.22):
Note that the ordering of entries within the directory is
implementation dependent and is usually not alphabetical.
If you're wondering, the output of ls is sorted because ls sorts it.

Related

How to print the file tree of only the found files in a recursive function?

I have a recursive function that searches a path for a given file name. What I am trying to do is to print the files that match, along with their parent directories.
So for a file tree like this:
mydir
mysubdir
mysubsubdir
file1
file2
file1
mysubdir2
file2
I want to print this when I search for file1:
mydir
mysubdir
mysubdir
file1
file1
I am able to see each found files' paths, so I thought of constructing a new tree from those paths and then printing that tree, but It seems to me that there must be a much simpler way.

Your function needs the path from the root to the current directory that you are processing. For example, via a const char ** argument, and append to each time you descent a directory (linked list if you don't like recalloc or ensure sufficiently large size up front). When there is match you can print the path starting from the root (see below though).
To get the short-cut behavior of mydir/file1, you need the path the previous match. This could be another const char ** argument. The print rule is now refined to indent as many levels as previous and current match have common path elements, then print the remaining unique path from current match. This implies a depth first search and sub-directories are visited in sorted order.
Instead of printing as you go along, you can also record each match in const char *** or as tree as suggested by #wildplasser. Then loop/walk through the result using the same refined print algorithm (if you use a tree you only need to know the level not the prefix path). If you don't do a depth first search, you can sort the array, in this approach. And if you use a tree to store the result, you walked the depth first.

File opening with only base name in C

In C , How to open a file by considering only base name of the file for example there may be any name in the suffix part of file but the base name will be same like Unit_123, Unit_245, Unit_658.
In C , I have to give only base name, irrespective of any suffix like 123, 245, 658 by giving only base name the file should open.
In Linux shell script, this can be achieved by giving file name followed by as astreix(), for example if we give Unit irrespective of suffix it will take the file name.. how to achieve this in c

There is no standard way in C to do this. It is operating system dependent.

You need to iterate over the files in a directory with wildcards. The C standard doesn't provide any function for this, but there are, of course, platform-dependent solutions:
On Linux or other Posix ststelms, you can use glob (3), which can take wildcards like those understood in the shell.
On Windows, there is FindFirstFile and FindNextFile, which takes at least asterisks and question marks as wildcards.

To achieve what you want in C, the standard way would involve getting a directory listing for the directory holding the files of interest. If the files are located in a single directory, then the function scandir will fill the dirent struct with filenames from the directory. scandir takes as its 3rd argument a filter function of the type:
int (*filter)(const struct dirent *)
This allows you to match only the filenames that satisfy the criteria you provide in the filter function.
If you need to search a directory-tree for files/sub-directories, then functions you want are ftw and nftw. Both can return listings of the files and/or sub-directories present (depending on the FLAGS) which can then be parsed for the matching files. Take a look at all and decide what will fit your needs the best.
None of these functions represent the only way to obtain and parse file listings in C. They are simply the general functions that come to mind to do what you describe.

Open every file but not links to other directories when using scandir()

I want to recursively copy one directory into another (like cp -R) using POSIX scandir().
The problem is that when I copy a directory like /sys/bus/, which contains links to higher levels (for example: foo/foo1/foo2/foo/foo1/foo2/foo/... ) the system enters a loop status and copies the directories "in the middle" forever...
How can I check if the file I'm opening with dirent is a link or not?

Look at this: How to check whether two file names point to the same physical file
You need to store a list of inodes that you have visited to make sure that you don't get any duplicates. If you have two hard links to the same file, there is no "one" canonical name. One possibility is to first store all the files and then recurse through all the filenames. You can store the path structure separately from the inodes and file contents.

How to determine if a path is inside a directory? (POSIX)

In C, using POSIX calls, how can I determine if a path is inside a target directory?
For example, a web server has its root directory in /srv, this is getcwd() for the daemon.
When parsing a request for /index.html, it returns the contents of /srv/index.html.
How can I filter out requests for paths outside of /srv?
/../etc/passwd,
/valid/../../etc/passwd,
etc.
Splitting the path at / and rejecting any array containing .. will break valid accesses /srv/valid/../index.html.
Is there a canonical way to do this with system calls? Or do I need to manually walk the path and count directory depth?

There's always realpath:
The realpath() function shall derive, from the pathname pointed to by *file_name*, an absolute pathname that resolves to the same directory entry, whose resolution does not involve '.' , '..' , or symbolic links.
Then compare what realpath gives you with your desired root directory and see if they match up.
You could also clean up the filename by hand by expanding the double-dots before you prepend the "/srv". Split the incoming path on slashes and walk through it piece by piece. If you get a "." then remove it and move on; if you get a "..", then remove it and the previous component (taking care not go past the first entry in your list); if you get anything else, just move on to the next component. Then paste what's left back together with slashes between the components and prepend your "/srv/". So if someone gives you "/valid/../../etc/passwd", you'll end up with "/srv/etc/passwd" and "/where/is/../pancakes/house" will end up as "/srv/where/pancakes/house".
That way you can't get outside "/srv" (except through symbolic links of course) and an incoming "/../.." will be the same as "/" (just like in a normal file system). But you'd still want to use realpath if you're worried about symbolic under "/srv".
Working with the path name component by component would also allow you to break the connection between the layout you present to the outside world and the actual file system layout; there's no need for "/this/that/other/thing" to map to an actual "/srv/this/that/other/thing" file anywhere, the path could just be a key in some sort of database or some sort of namespace path to a function call.

To determine if a file F is within a directory D, first stat D to determine its device number and inode number (members st_dev and st_ino of struct stat).
Then stat F to determine if it is a directory. If not, call basename to determine the name of the directory containing it. Set G to the name of this directory. If F was already a directory, set G=F.
Now, F is within D if and only if G is within D. Next we have a loop.
while (1) {
if (samefile(d_statinfo.d_dev, d_statinfo.d_ino, G)) {
return 1; // F was within D
} else if (0 == strcmp("/", G) {
return 0; // F was not within D.
}
G = dirname(G);
}
The samefile function is simple:
int samefile(dev_t ddev, ino_t dino, const char *path) {
struct stat st;
if (0 == stat(path, &st)) {
return ddev == st.st_dev && dino == st.st_no;
} else {
throw ...; // or return error value (but also change the caller to detect it)
}
}
This will work on POSIX filesystems. But many filesystems are not POSIX. Problems to look out for include:
Filesystems where the device/inode are not unique. Some FUSE filesystems are examples of this; they sometimes make up inode numbers when the underlying filesystems don't have them. They shouldn't re-use inode numbers, but some FUSE filesystems have bugs.
Broken NFS implementations. On some systems all NFS filesystems have the same device number. If they pass through the inode number as it exists on the server, this could cause a problem (though I've never seen it happen in practice).
Linux bind mount points. If /a is a bind mount of /b, then /a/1 correctly appears to be inside /a, but with the implementation above, /b/1 also appears to be inside /a. I think that's probably the correct answer. However, if this is not the result you prefer, this is easily fixed by changing the return 1 case to call strcmp() to compare the path names too. However, for this to work you will need to start by calling realpath on both F and D. The realpath call can be quite expensive (since it may need to hit the disk a number of times).
The special path //foo/bar. POSIX allows path names beginning with // to be special in a way which is somewhat not well defined. Actually I forget the precise level of guarantee about semantics that POSIX provides. I think that POSIX allows //foo/bar and //baz/ugh to refer to the same file. The device/inode check should still do the right thing for you but you may find it does not (i.e. you may find that //foo/bar and //baz/ugh can refer to the same file but have different device/inode numbers).
This answer assumes that we start with an absolute path for both F and D. If this is not guaranteed you may need to do some conversion using realpath() and getcwd(). This will be a problem if the name of the current directory is longer than PATH_MAX (which can certainly happen).

You should simply process .. yourself and remove the previous path component when it's found, so that there are no occurrences of .. in the final string you use for opening files.

POSIX seekdir() and telldir() behaviour after target folder modification

consider the following task :
1) read a target directory contents, pass each found dirent structure to some filter function and remember filtered elements somehow for the later processing
2) some time later, iterate through the filtered elements and process them (do some I/O)
The most obvious way is to save names of sub-directories.
However, I want to keep memory usage to the minimum and to avoid additional I/O.
According to POSIX manuals, I can save position of each directory entry using telldir() and restore them later using seekdir(). To keep these positions valid, I have to keep target directory opened and to not use rewinddir() call.
Keeping a directory stream open and storing a list of dir positions(long int`s) seems to be an appropriate solution.
However, it is unclear whether stored positions remain valid after folder modification. I didn`t found any comments on these conditions in the POSIX standard.
1) Whether stored positions remain valid when only new directory entries are added/removed ?
2) Whether stored positions of unmodified directory entries remain valid in case of some of the filtered directory entries were removed ?
3) Is it possible for the stored position to point to another directory entry after folder modification ?
It is easy to test and find out the answer on these questions for the particular system, but I would like to know what standards say on this topic
Thank you

Until you call rewinddir or close and reopen the directory, your view of the directory contents should not change. Sorry I don't have the reference handy. I'll find it later if you need it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How are dirent entries ordered? - c

Related

How to print the file tree of only the found files in a recursive function?

File opening with only base name in C

Open every file but not links to other directories when using scandir()

How to determine if a path is inside a directory? (POSIX)

POSIX seekdir() and telldir() behaviour after target folder modification

Categories

Resources