C: recursively opening sub-directories and creating new files - c

I'm writing something that recursively finds .c and .h files and deletes all comments (just as a learning excercise). For every .c/.h file found, this program creates an additional file which is equal to the original file without the comments. So for example, "helloworld.c" would result in an additional file "__helloworld.c"
The problem I am encountering is this:
I have a loop which iterates over all entries in a directory, and keeps going until it stops finding files with .c or .h extensions. However, the loop never actually ends, since each time a file is found, another is created. So I have this recursive situation where "__helloworld.c" becomes "____helloworld.c" which becomes "______helloworld.c", etc.
(in case anyone suggests, yes it is necessary for the new files to have a .c extension.)
One possible solution may be to keep track of the inode numbers so we know only to iterate over original files, however this requires several iterations of the loop: once to count directory entries,
(and use this number to initialise array for inode nums), twice to store inode numbers, and finally a third time to do the work.
Can anybody share any ideas that could achieve this in a single pass of the loop?
code is split across two files so I have posted the main recursive routine:
consume_comments():
takes single file as argument, creates new file with comments omitted
My main routine pretty much just does some argument handling- the routine posted below is where the real problems are.
/*
opens a directory stream of the dir pointed to by 'filename',
looks for .c .h files, consumes comments. If 'rc' == 1, find()
calls itself when it encounters a sub-directory.
*/
int find (const char * dirname)
{
int count = 3;
DIR * dh;
struct dirent * dent;
struct stat buf;
const char * fnext;
int filecount = 0;
chdir(dirname);
if ((dh = opendir(".")) == NULL)
{
printf("Error opening directory \"%s\"\n", dirname);
exit(-1);
}
while ((dent = readdir(dh)) != NULL)
{
if (count) count--;
if (!count)
{
if (lstat(dent->d_name, &buf) == -1)
{
printf("Error opening file \"%s\" for lstat()\n", dent->d_name);
exit(EXIT_FAILURE);
}
if (S_ISDIR(buf.st_mode) && rc)
{
find(dent->d_name);
chdir("..");
//when this find() completes, it will be one level down:
//so we must come back up again.
}
if (S_ISREG(buf.st_mode))
{
fnext = fnextension(dent->d_name);
if (*fnext == 'c' || *fnext == 'h')
{
consume_comments(dent->d_name);
printf("Comments consumed:%20s\n", dent->d_name);
}
}
}
}
}

You can use 1 of the 3 solutions
As suggested in comment by #Theolodis, ignore files starting with
__.
Split your algorithm into 2 parts. In first part prepare a list of
all the .c and .h files(recursive). In second step, go through the list and
generated stripped versions of files(non-recursive).
Prepare the stripped .c and .h files in some temp directory
(/tmp in linux or %TEMP% in windows) and move it to folder once
all the .c and .h files of the folders have been processed. Now
scan all the sub-folders.

I do see multiple solutions to your problem. But in any case you might need to check if the file you are going to create does already exist or not! Otherwise you could run into cases where you do override existing files!
(Example: file.c, __file.c in your directory, you check the file __file.c and generate the file ____file.c, then you check the file file.c and override the file __file.c)
Ignore files that do begin with your chosen prefix.
advantages: easy to implement
downsides: you might miss some files starting with your prefix
while going through all the directory you make a set of unique filenames you have already created. Before converting any file you check if this file has been created by yourself.
advantages: you don't miss files that begin with your prefix
disadvantages: if you do have a very long list of files the memory usage might explode.
edit: the second and third solution of Mohit Jain look pretty good too!

New implementation, using a routine chk_prefix() to match the prefix of filenames.
char * prefix = "__nmc_";
int chk_prefix (char * name)
{
int nsize = strlen(name);
int fsize = strlen(prefix);
int i;
if (nsize < fsize) return 1;
for (i = 0; i < fsize; i++)
{
if (name[i] != prefix[i]) return 1;
}
return 0;
}
int find (const char * dirname)
{
int count = 3;
DIR * dh;
struct dirent * dent;
struct stat buf;
const char * fnext;
int filecount = 0;
chdir(dirname);
if ((dh = opendir(".")) == NULL)
{
printf("Error opening directory \"%s\"\n", dirname);
exit(-1);
}
while ((dent = readdir(dh)) != NULL)
{
if (count) count--;
if (!count)
{
if (lstat(dent->d_name, &buf) == -1)
{
printf("Error opening file \"%s\" for lstat()\n", dent->d_name);
exit(EXIT_FAILURE);
}
if (S_ISDIR(buf.st_mode) && rc)
{
find(dent->d_name);
chdir("..");
//when this find() completes, it will be one level down:
//so we must come back up again.
}
if (S_ISREG(buf.st_mode))
{
fnext = fnextension(dent->d_name);
if (*fnext == 'c' || *fnext == 'h' && chk_prefix(dent->d_name))
{
consume_comments(dent->d_name);
printf("Comments consumed:%20s\n", dent->d_name);
}
}
}
}
}

Related

Count the number of files in, and below a directory in Linux C recursively

I write a function to count the number of files in, and below a directory (including files in the sub directory).
However, When I test the code on a directory with sub directory, it always report error said: "fail to open dir: No such file or directory".
Is there any thing I could do to make it work?
int countfiles(char *root, bool a_flag)//a_flag decide if it including hidden file
{
DIR *dir;
struct dirent * ptr;
int total = 0;
char path[MAXPATHLEN];
dir = opendir(root); //open root dirctory
if(dir == NULL)
{
perror("fail to open dir");
exit(1);
}
errno = 0;
while((ptr = readdir(dir)) != NULL)
{
//read every entry in dir
//skip ".." and "."
if(strcmp(ptr->d_name,".") == 0 || strcmp(ptr->d_name,"..") == 0)
{
continue;
}
//If it is a directory, recurse
if(ptr->d_type == DT_DIR)
{
sprintf(path,"%s%s/",root,ptr->d_name);
//printf("%s/n",path);
total += countfiles(path, a_flag);
}
if(ptr->d_type == DT_REG)
{
if(a_flag == 1){
total++;
}
else if (a_flag == 0){
if (isHidden(ptr->d_name) == 0){
total++;
}
}
}
}
if(errno != 0)
{
printf("fail to read dir");
exit(1);
}
closedir(dir);
return total;
}
Is there anything I could make it to work?
Sure, lots. Personally, I'd start by using the correct interface for this stuff, which in Linux and POSIXy systems would be nftw(). This would lead to a program that was not only shorter and more effective, but would not as easily get confused if someone renames a directory or file in the tree being scanned at the same time.
Programmers almost never implement opendir()/readdir()/closedir() as robustly and as efficiently as nftw(), scandir(), glob(), or the fts family of functions do. Why teachers still insist on using the archaic *dir() functions in this day and age, puzzles me to no end.
If you have to use the *dir functions because your teacher does not know POSIX and wants you to use interfaces you should not use in real life, then look at how you construct the path to the new directory: the sprintf() line. Perhaps even print it (path) out, and you'll probably find the fix on your own.
Even then, sprintf() is not something that is allowed in real life programs (because it will cause a silent buffer overrun when the arguments are longer than expected; and that CAN happen in Linux, because there actually isn't a fixed limit on the length of a path). You should use at minimum snprintf() and check its return value for overruns, or in Linux, asprintf() which allocates the resulting string dynamically.

How to properly use S_ISREG function

EDIT: After some help from the forum it was made clear that this issue was not in the use of S_ISREG() but in my use of lstat(). Sorry for the misleading question.
I'm looking in a directory and trying to tell the difference between regular files and sub-directories.
I've looked through other people's issues with this problem, and while some were similar, none were answered clearly enough to fix my code.
int find(char *argv)
{
DIR *pointerToDir;
struct dirent *pointerToDirent;
struct stat status;
int mode;
pointerToDir = opendir(argv);
if(pointerToDir == NULL)
{
printf("Can't open that directory (or it doesn't exist)\n");
closedir(pointerToDir);
return 0;
}
else
{
while((pointerToDirent = readdir(pointerToDir)) != NULL)
{
lstat(pointerToDirent->d_name, &status);
mode = S_ISREG(status.st_mode);
if(mode != 0)
printf("%s must be a file\n", pointerToDirent->d_name);
else
printf("%s must be a dir\n", pointerToDirent->d_name);
}
closedir(pointerToDir);
return 0;
}
}
I pass the program a test directory that has 2 sub-directories and 2 regular files. The layout would be something like this:
dir1
sub1
sub2
dir1.txt
test.c
Now when I run my program and pass "dir1" as the argument, I would expect it to return the following:
. must be a dir
.. must be a dir
sub1 must be a dir
dir1.txt must be a file
sub2 must be a dir
test.c must be a file
But instead, it returns that they are all "dirs". What am I missing?

Reading all files in two directories at the same time

I have a problem with task. I have two path to directories. I can read all files from first path in argv[1] but can't open files from second folder from argv[2]. Quantity of files is equal. The way at the begining to write name of fales in array is failed because their is about a few hundred.I have an example how I try reading files. Need help. Thanks!
#include "stdafx.h"
#include "windows.h"
int main(int argc, char* argv[])
{
FILE *fp = 0;
uchar tmpl1[BUFFER_SIZE] = { 0 };
uchar tmpl2[BUFFER_SIZE] = { 0 };
size_t size;
size_t n;
FILE *Fl = 0;
if (argc != 3 || argv[1] == NULL || argv[2] == NULL)
{
printf("Error", argv[0]);
return -1;
}
char Fn[255];
HANDLE hFind;
WIN32_FIND_DATA ff;
char Fn1[255];
HANDLE hFind1;
WIN32_FIND_DATA ff1;
sprintf_s(Fn, 255, "%s\\*", argv[1]);
sprintf_s(Fn1, 255, "%s\\*", argv[2]);
if ((hFind = FindFirstFile(Fn, &ff)) != INVALID_HANDLE_VALUE)
{
if ((hFind1 = FindFirstFile(Fn1, &ff1)) != INVALID_HANDLE_VALUE)
{
do
{
if (ff.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) continue;
ff1.dwFileAttributes;
sprintf_s(Fn, "%s\\%s", argv[1], ff.cFileName);
sprintf_s(Fn1, "%s\\%s", argv[2], ff1.cFileName);
// here I can't read file's name from second folder
printf(Fn, "%s\\%s", argv[1], ff.cFileName);
printf(Fn1, "%s\\%s", argv[2], ff1.cFileName);
if (fopen_s(&fp, Fn, "rb") != 0)
{
printf("Error reading\nUsage: %s <tmpl1>\n", argv[1]);
return -1;
}
size = _filelength(_fileno(fp));
n = fread(tmpl1, size, 1, fp);
fclose(fp);
fp = 0;
} while (FindNextFile(hFind, &ff));
// also I have a problem how read next file in second directory
FindClose(hFind);
}
}
return 0;
}
I didn't read why you want to scan two directories concurrently.
When I saw "at the same time" in the title I thought "concurrently". Afterwards, I saw the presented code and realized it shall be done rather "interleaved" instead of "concurrently" but that's not essential.
I assume you want to associate the file names in the first directory somehow to the file names in the second directory. This might be comparing the file names, read data from a file of first directory and read other data from an associated file of second directory, or may be something completely different.
However, based on this assumption, you have to consider that:
You should not assume to get file names in any useful order when scanning with FindFirstFile()/FindNextFile(). These functions return the files in its "physical order" i.e. how they are listed internally. (At best, you get . and .. always as first entries but I even wouldn't count on this.)
Considering this, I would recommend the following procedure:
read file names from first directory and store them in an array names1
read file names from second directory and store them in an array names2
sort arrays names1 and names2 with an appropriate criterion (e.g. lexicographically)
process the arrays names1 and names2.
As you see, the "read file names from directory and store them in an array" could be implemented as function and re-used as well as the sorting.
This said, finally, the answer for how to interleave two directory scans:
HANDLE hFind1 = FindFirstFile(Fn1, &ff1);
HANDLE hFind2 = FindFirstFile(Fn2, &ff2);
while (hFind1 != INVALID_HANDLE_VALUE || hFind2 != INVALID_HANDLE_VALUE) {
if (hFind1 != INVALID_HANDLE_VALUE) {
/** #todo process ff1 somehow */
}
if (hFind2 != INVALID_HANDLE_VALUE) {
/** #todo process ff2 somehow */
}
/* iterate: */
if (!FindNextFile(hFind1, &ff1)) {
FindClose(hFind1); hFind1 = INVALID_HANDLE_VALUE;
}
if (!FindNextFile(hFind2, &ff2)) {
FindClose(hFind2); hFind2 = INVALID_HANDLE_VALUE;
}
}
Please, note that I "abuse" the handles hFind1 and hFind2 itself for loop repetition. Thus, I do not need the extra ifs. (I like things like that.)
Btw. this loop iterates until both directories are scanned completely (even if they don't contain the same number of entries).
If you want to iterate instead until at least one directory is scanned completely you may achieve this by simply changing the while condition to:
while (hFind1 != INVALID_HANDLE_VALUE && hFind2 != INVALID_HANDLE_VALUE) {
if the loop shall be terminated as soon as at least one directory scan fails.
At last, a little story out of my own past (where I learnt a useful lesson regarding this):
I just had finished my study (of computer science) and was working at home on a rather fresh installed Windows NT when I started to copy a large directory from a CD drive to harddisk. The estimated time was round-about 1 hour and I thought: 'Hey. It does multi-tasking!' Thus, I started a second File Manager to copy another directory from this CD drive concurrently. When I hit the OK button, the prompt noises of the CD drive alerted me as well as the estimated time which "exploded" to multiple hours. After that, I behaved like to expect: tapped on my forehead and mumbled something like "unshareable resources"... (and, of course, stopped the second copying and went for a coffee instead.)

Using ftw() properly in c

I have the following in my code: (Coding in c)
ftw(argv[2], parseFile, 100)
argv[2] is a local directory path. For instance. argv[2] = "TestCases" and there is a testcases folder in the same directory as my .o file.
My understanding is that this should traverse the directory TestCases and send every file it finds to the function parseFile.
What actually happens is it simply sends my argument to the function parseFile and that is all. What am I doing wrong? How am I suppose to use this properly?
EDIT: This is parseFile:
int parseFile(const char * ftw_filePath,const struct stat * ptr, int flags){
FILE * file;
TokenizerT * currFile;
char fileString[1000], * currWord, * fileName;
fileName = strdup(ftw_filePath);
if( fileName == NULL || strlen(fileName) <= 0){
free(fileName);
return -1;
}
printf("\n%s\n",fileName);
if(strcmp(fileName,"-h")== 0){
printf("To run this program(wordstats) type './wordstat.c' followed by a space followed by the file's directory location. (e.g. Desktop/CS211/Assignment1/test.txt )");
free(fileName);
return 1;
}
else{
file=fopen(fileName,"r");
}
if(!file){
fprintf(stderr,"Error: File Does not Exist in designated location. Please restart the program and try again.\n");
free(fileName);
return 0;
}
memset(fileString, '\0', 1000);
while(fscanf(file,"%s", fileString) != EOF){ /* traverses the file line by line*/
stringToLower(fileString);
currFile = TKCreate("alphanum",fileString);
while((currWord = TKGetNextToken(currFile)) != NULL) {
insert_List(currWord, words,fileName);
}
free(currFile->delimiters);
free(currFile->copied_string);
free(currFile);
memset(fileString, '\0', 1000);
}
fclose(file);
free(fileName);
return 1;
}
It will work if I input TestCases/big.txt for my argv[2] but not if I put TestCases
As described in the man page, a non-zero return value from the function that ftw is calling tells ftw to stop running.
Your code has various return statements, but the only one that returns 0 is an error condition.
A properly designed C callback interface has a void* argument that you can use to pass arbitrary data from the surrounding code into the callback. [n]ftw does not have such an argument, so you're kinda up a creek.
If your compiler supports thread-local variables (the __thread storage specifier) you can use them instead of globals; this will work but is not really that much tidier than globals.
If your C library has the fts family of functions, use those instead. They are available on most modern Unixes (including Linux, OSX, and recent *BSD)

Reading in .txt file with different extension in C

At the moment my program has no problem reading in a .txt file, but my program needs to read in a text file with a different file extension (.emu is the requirement). When simply changing the same file's extension to .emu, the variable 'file' is NULL and therefore the file isn't opened, can anyone help?
Had a little look around and haven't been able to find a solution so any help is much appreciated
here's the source code:
void handleArgs (const char *filename, int trace, int before, int after) {
FILE *file = fopen(filename, "r");
char *address = malloc(MAX_ADD_LENGTH * sizeof(char));
char *instruction = malloc(MAX_INS_LENGTH * sizeof(char));
long int addressDecoded;
if (file == NULL || file == 0) {
fprintf(stderr, "Error: Could not open file");
}
else {
if (ferror(file) == 0) {
while (fscanf(file, "%s %s", address, instruction) != EOF) {
if (strlen(address) == 8 && strlen(instruction) == 8) {
addressDecoded = strtol(address, NULL, 16);
printf("%ld\n", addressDecoded);
//instruction = decodeInstruction(instruction);
}
else {
fprintf(stderr, "Error: particular line is of wrong length");
}
}
}
}
fclose(file);
}
argument 'filename' when executing is simply '/foopath/test.emu'
There's nothing special to C about the file extension. Reread your code for simple errors like changing the filename in one place, but not the other. If you're passing in the filename, pass the whole name, not just the part to the left of the period.
Files are data, and have names. What comes before the dot in a name, is just as much a part of it as what comes after -- the extensions were created just as hints as to what the file contains, but they are NOT required to be strictly related to the file's contents.
The file may not exist, or your priviledges may not be enough to open it. Or maybe there's some other kind of error. How can you diagnose this?
When you use a system call and it doesn't behave the way you want to, there's a variable called errno in errno.h (#include <errno.h>) that will contain a number representing the status of the last call. There's a huge list of symbolic constants to put names to these values, you can google it up.
For example, if you try to open a file and the returned pointer is useless, you might want to check errno to see if the file existed, or if you're exceding system restrictions for opened files, etc.

Resources