K&R chapter 8, readdir function - c

I'm stuck at this function (found in fsize() example in K&R chapter 8):
#include <sys/dir.h>
/* local directory structure */
/* readdir: read directory entries in sequence */
Dirent *readdir(DIR *dp)
{
struct direct dirbuf; /* local directory structure */
static Dirent d; /* return: portable structure */
while (read(dp->fd, (char *) &dirbuf, sizeof(dirbuf)) == sizeof(dirbuf)) {
if (dirbuf.d_ino == 0) /* slot not in use */
continue;
d.ino = dirbuf.d_ino;
strncpy(d.name, dirbuf.d_name, DIRSIZ);
d.name[DIRSIZ] = '\0'; /* ensure termination */
return &d;
}
return NULL;
}
In this function Dirent and DIR are custom structs written by K&R (not the one found in dirent.h):
typedef struct { /* portable directory entry */
long ino; /* inode number */
char name[NAME_MAX+1]; /* name + '\0' terminator */
} Dirent;
typedef struct {
int fd;
Dirent d;
} DIR;
When I use the code in the book, it runs fine but there were two problems (questions):
The file listing process does not happen recursively. It only applies once with the current directory.
I can't understand the line with read() function as above.
1) If dp->fd is a directory's, read() returns with errno 21 (directory error),
2) How could a read() like that fill in the memory structure dirbuf, doesn't it suppose to read only character/byte of some sorts?
Thanks.

Think for a moment about the costs of a recursive structure.
You would need a list of sub directories, for each dirent. That increases your memory requirements drastically, as well as complexities your memory allocation (can't use stack-allocated structs anymore, you must use malloc/free) code.
For this reason, I say that #1 Is invalid.
Not entirely sure if this is homework, but I cannot reproduce #2, so for now I'll leave it alone.

Calling the function once returns the "next" directory entry. It is intended to be called repeatedly - once for each directory entry.
The read syscall (declared in unistd.h) cannot be given a directory file descriptor. This is most likely a different "read" function. dirbuf is declared in the function, so it isn't read only.

Related

cannot read `struct direct` as shown in K&R2

An example of implementing own readdir as shown in K&R2 here:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <string.h>
#define NAME_MAX 14 /*longest filenames component; system-dependent */
#ifndef DIRSIZ
#define DIRSIZ 14
#endif
typedef struct {
long ino; /*inode number */
char name[NAME_MAX+1]; /*name + '\0' terminator */
} my_dirent;
typedef struct {
int fd; /* file descriptor for directory */
my_dirent d; /*the directory entry */
} MY_DIR;
/*
* opendir: open a directory for readdir calls
*/
MY_DIR *my_opendir(char *dirname)
{
int fd;
struct stat stbuf;
MY_DIR *dp;
if((fd = open(dirname, O_RDONLY, 0)) == -1
|| fstat(fd, &stbuf) == -1
|| (stbuf.st_mode & S_IFMT) != S_IFDIR
|| (dp = malloc(sizeof(MY_DIR))) == NULL)
return NULL;
dp->fd = fd;
return dp;
}
/*
* closedir: close directory opened by opendir
*/
void my_closedir(MY_DIR *dp)
{
if(dp) {
close(dp->fd);
free(dp);
}
}
#include <sys/dir.h>
/*
* readdir: read directory entries in sequence
*/
my_dirent *my_readdir(MY_DIR *dp)
{
struct direct dirbuf; /* local directory structure */
static my_dirent d; /* portable structure */
// HERE BELOW: the body of while loop never executes (I have no idea why) so NULL is returned and causes segfault when dereferencing in printf
while(read(dp->fd, (char*) &dirbuf, sizeof(dirbuf)) == sizeof(dirbuf)) {
if(dirbuf.d_ino == 0) /* slot not in use */
continue;
d.ino = dirbuf.d_ino;
strncpy(d.name, dirbuf.d_name, DIRSIZ);
d.name[DIRSIZ] = '\0';
return &d;
}
return NULL;
}
int main()
{
MY_DIR *dp = my_opendir(".");
my_dirent *dent = my_readdir(dp);
printf("directory info:\nname: %s; fd: %d; ino: %ld\n", dent->name, dp->fd, dent->ino);
}
I made debugging so I know why. As in comments, the while header
while(read(dp->fd, (char*) &dirbuf, sizeof(dirbuf)) == sizeof(dirbuf)) {
...
}
Is fails so the function returns NULL, which is dereferenced in printf. So the question is How to read that struct. I have found from dir.h
#define direct dirent
So that structure is in effect dirent, which has following defintion dirent.h:
struct dirent
{
#ifndef __USE_FILE_OFFSET64
__ino_t d_ino;
__off_t d_off;
#else
__ino64_t d_ino;
__off64_t d_off;
#endif
unsigned short int d_reclen;
unsigned char d_type;
char d_name[256]; /* We must not include limits.h! */
};
But that should not matter since, in read(2), I am using sizeof which will get the proper size. So why does the while header fails?
Remember that K&R 2 was written almost 35 years ago. Besides discussing ANSI C aka C89, of which most (but not quite all) is still applicable to modern C, K&R also discuss many features that are not standardized but are specific to UNIX, or rather, to the UNIX of its day.
In former times, one would access directory entries as they do: by open()ing the directory like a file, and reading data from it in some specified format. K&R themselves say that the format they use is specific to Version 7 and System V UNIX, so there is no reason at all to expect it to work with other versions of UNIX, much less with Linux which evolved completely independently. In fact, the whole idea of using read() to get directory entries is now obsolete and generally not available. The business of getting directory entries off the disk is done within the kernel, and it provides this data to user-space through more standardized APIs like readdir or getdents.
K&R is a classic and there is much to be learned from it, but its age does show and you cannot be surprised when some of what they say is not applicable to the present day.
Note when a system call fails, your first step in determining the reason should be to check the value of errno, perhaps using perror(). If so, you would have seen that it was EISDIR "Is a directory". Referring to the read(2) man page indicates that this error occurs when "fd refers to a directory", implying that read() from a directory is generally not allowed. That would at least have helped you shift your focus from "How to read that struct" to "since I can't read the struct, by what method should I be getting directory entries instead?"

How to make getdents() behave like read() on directories [K&R section 8.6]

I’m new to programming and C and I'm currently working through K&R. Apologies in advance if this isn't the most succinct way of characterizing the problem.
For context, in section 8.6 of K&R (not the exercises but the actual chapter) they implement the function fsize() that prints out the size of files in a directory and its sub-directories recursively. The code in the book uses the syscall read() to implement a basic version of readdir(), which returns a pointer to the next entry in a directory.
Up until this section of K&R, all source code has worked fine on my machine, however, the code in this chapter relies on using the read() function on directories to get their contents, which according to a source I found here [1], doesn’t work on Linux and many modern systems.
However, there exists a syscall getdents() which seems to do roughly same thing [2]. So as an exercise I tried to re-implement readdir() and came across the following problems:
read() on directories seems to know in advance the size of each entry, so it's able to return one entry at a time and let read() handle the issue of "remembering" the location of the next entry every time it is called.
getdents() on the other hand doesn't know the size of each entry in advance, so I have to read the entire buffer first and then loop through it in the readdir() using the member d_reclen (I copied from the example at the bottom of man getdents), meaning now my readdir() function has to handle the issue of "remembering" the location of the next entry in the stream every time readdir() is called.
So my questions are as follows:
Am I correct in my understanding that getdents() cannot be made to behave like read() in the sense that it can read one entry at a time and handle the "remembering of the next position"?
If it is true that getdents() cannot behave like read(), what is the best way to implement "remembering position", in particular if getdents() need to be called multiple time on several sub-directories? I've shown an excerpt of what I tried below: using the file descriptor assigned by the system as a way of indexing the results of getdents() in an array. However this attempt seems to fail given how opendir() and closedir() are implemented — the system will reassign file descriptors once closedir() has been called and opendir() is called on the next subdirectory (and this information is not available to readdir()).
Last Note: I want my implementation of read_dir() to behave exactly like that of readdir() in K&R. Meaning I wouldn't have to change any of the other functions or structures to make it work.
// NTD: _direct's structure needs to match how system implements directory
// entries. After reading from file descriptor into _direct, we then
// copy only the relevant elements (d_ino and d_name) to Dirent
struct _direct { // directory entry
long d_ino; // inode number
off_t d_off; // Not included in K&R
unsigned short d_reclen; // Not included in K&R
char d_name[]; // long name does not have '\0'
};
#define BUFSIZE 4096 // Size of buffer when reading from getdents()
#define MAXFILES 1024 // Max files that read_dir() can open
struct _streamdents {
int pos;
int nread;
char *buf;
};
// read_dir: read directory entries in sequence
Dirent *read_dir(_dir *dp)
{
struct _direct *dirbuf; // local directory structure
static Dirent d; // return: portable structure
static struct _streamdents *readdents[MAXFILES];
if (dp->fd > MAXFILES - 1) {
printf("Error in read_dir: Cannot continue reading, too many directories\n");
return NULL;
}
// Check if directory has already been read; if not, create stream.
// Important if fxn is called for a sub-directory and then needs
// to return to a parent directory and continue reading.
if (readdents[dp->fd] == NULL) {
char *buf = malloc(BUFSIZE);
int nread = syscall(SYS_getdents, dp->fd, buf, BUFSIZE);
int pos = 0;
struct _streamdents *newdent = malloc(sizeof(struct _streamdents));
newdent->buf = buf;
newdent->pos = pos;
newdent->nread = nread;
readdents[dp->fd] = newdent;
}
struct _streamdents *curdent = readdents[dp->fd];
int pos = curdent->pos;
int nread = curdent->nread;
char *buf = curdent->buf;
while (pos < nread) {
dirbuf = (struct _direct *) (buf + pos);
if (dirbuf->d_ino == 0) // slot not in use
continue;
d.ino = dirbuf->d_ino;
strncpy(d.d_name, dirbuf->d_name, DIRSIZ);
curdent->pos += dirbuf->d_reclen;
return &d;
}
if (nread == -1) {
printf("Error in getdents(): %s\n", strerror(errno));
}
return NULL;
}
Thank you

How does readdir return a pointer to information for the NEXT file?

I'm learning C. so I'm just kinda confused about the function readdir. In the book K&R, the function dirwalk includes the following
while ((dp = readdir(dfd)) != NULL){
if (strcmp(dp->name, ".") == 0
//...code...
Based on my understanding, each time the whileloop is passed, dp (directory entry) is advanced one step, so next directory entry (which is associated with a file) can be processed (while dp != NULL)
My question is: How doesreaddir return a new directory entry each time it's called? Where does it show that? Please don't use too much jargon as I just started learning about this. Here's the code for readdir. Thanks.
#include <sys/dir.h>
Dirent *readdir(DIR *dp)
{
struct direct dirbuf; \* local directory structure *\
static Dirent d;
while (read(dp->fd, (char *) &dirbuf, sizeof(dirbuf))
== sizeof(dirbuf)) {
if (dirbuf.d_ino == 0) \* slot not in use *\
continue;
d.ino = dirbuf.d_ino;
strncpy(d.name, dirbuf.d_name, DIRSIZ);
d.name[DIRSIZ] = '\0'; \* ensure termination *\
return &d;
}
return NULL;
}
First, this is not the code that the POSIX readdir would use on any relevant operating system...
How the code works is really simple, assuming that you know how files work. A directory on the UNIX systems is just as readable a file as any other file would be - a directory would appear as if a binary file of directory records - in this implementation the dirbuf structure is one record in a directory. Therefore reading sizeof dirbuf bytes from the file descriptor gives you a next entry from the directory - the filename and its associated inode number.
If a file is deleted an entry might be marked unused by setting the inode number to 0, and it is skipped by the code.
When a next used entry is found, its filename and inode number is copied to the Dirent d, which has static storage duration. It means that there is only one Dirent allocated for use by readdir for the entire duration of the program. readdir will return the same pointer over and over again, pointing to the same structure, but the contents of the structure change.
Finally, when all entries in the directory have been read the last call to readdir will execute a read that will not read sizeof (dirbuf) bytes and the loop is broken, and NULL pointer is returned.

Is it possible to implement readdir() in Ubuntu 12.10 (kernel 3.5)?

In 8.6 of K & R, the authors implemented a simple version of readdir(). The code is as follows:
#include <sys/dir.h> /* local directory structure */
/* readdir: read directory entries in sequence */
Dirent *readdir(DIR *dp)
{
struct direct dirbuf; /* local directory structure */
static Dirent d; /* return: portable structure */
while (read(dp->fd, (char *) &dirbuf, sizeof(dirbuf))
== sizeof(dirbuf)) {
if (dirbuf.d_ino == 0) /* slot not in use */
continue;
d.ino = dirbuf.d_ino;
strncpy(d.name, dirbuf.d_name, DIRSIZ);
d.name[DIRSIZ] = '\0'; /* ensure termination */
return &d;
}
return NULL;
}
In my opinion, in the line with read(), dp->fd is the file descriptor of the directory. The authors used read() to get struct direct directly from the directory file.
However, in Ubuntu, it is not possible to read a directory file. When I tried to read a directory, I just got something strange.
I read in APUE that in some systems, this action is not allowed. So is there any other ways to realize my own readdir()?
You are looking at code from 40 years ago. Directories are simply not implemented like that on any modern platform. Read the documentation for your filesystem (ext4 if you are on Linux) if you really need to write code to manipulate it.

readdir looping more times than number of files present in the directory [duplicate]

This question already has answers here:
List regular files only (without directory) problem
(2 answers)
Closed 10 years ago.
My goal is to count the number of files in a directory. After searching around, I found a piece of code which iterates over each file in a directory. But the issue is that it's looping extra times, 2 times extra to be more precise.
So for
int main(void)
{
DIR *d;
struct dirent *dir;
char *ary[10000];
char fullpath[256];
d = opendir("D:\\frames\\");
if (d)
{
int count = 1;
while ((dir = readdir(d)) != NULL)
{
snprintf(fullpath, sizeof(fullpath), "%s%d%s", "D:\\frames\\", count, ".jpg");
int fs = fsize(fullpath);
printf("%s\t%d\n", fullpath, fs); // using this line just for output purposes
count++;
}
closedir(d);
}
getchar();
return(0);
}
My folder contains 500 files, but the output is shown till 502
UPDATE
I modified the code to read as
struct stat buf;
if ( S_ISREG(buf.st_mode) ) // <-- I'm assuming this says "if it is a file"
{
snprintf(fullpath, sizeof(fullpath), "%s%d%s", "D:\\frames\\", count, ".jpg");
int fs = fsize(fullpath);
printf("%s\t%d\n", fullpath, fs);
}
But I'm getting storage size of "buf" isn't known. I also tried doing struct stat buf[100], but that didn't help either.
As pointed out in comments, you're also getting the two directories named . and .., which skews your count.
In Linux, you can use the d_type field of the struct dirent to filter them out, but the documentation says:
The only fields in the dirent structure that are mandated by POSIX.1 are: d_name[], of unspecified size, with at most NAME_MAX characters preceding the terminating null byte; and (as an XSI extension) d_ino. The other fields are unstandardized, and not present on all systems; see NOTES below for some further details.
So, assuming you're on Windows you probably don't have d_type. Then you can use some other call instead, for instance stat(). You can of course filter out based on name too, but if you want to skip directories anyway that is a more robust and general solution.
You need to call _stat()/stat() on the file name you want info for.
#include <sys/types.h>
#include <sys/stat.h>
#ifdef WINDOWS
# define STAT _stat
#else
# define STAT stat
#endif
...
char * filename = ... /* let it point to some file's name */
struct STAT buffer = {0};
if (STAT(filename, &buffer)
... /* error */
else
{
if (S_ISREG(buffer.st_mode))
{
... /* getting here, means `filename` referrs to a ordinary file */
}
}

Resources