How to recursively traverse directories in C on Windows - c

Ultimately I want to travel through a folder's files and subdirectories and write something to all files i find that have a certain extension(.wav in my case). when looping how do i tell if the item I am at is a directory?

Here is how you do it (this is all from memory so there may be errors):
void FindFilesRecursively(LPCTSTR lpFolder, LPCTSTR lpFilePattern)
{
TCHAR szFullPattern[MAX_PATH];
WIN32_FIND_DATA FindFileData;
HANDLE hFindFile;
// first we are going to process any subdirectories
PathCombine(szFullPattern, lpFolder, _T("*"));
hFindFile = FindFirstFile(szFullPattern, &FindFileData);
if(hFindFile != INVALID_HANDLE_VALUE)
{
do
{
if(FindFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
// found a subdirectory; recurse into it
PathCombine(szFullPattern, lpFolder, FindFileData.cFileName);
FindFilesRecursively(szFullPattern, lpFilePattern);
}
} while(FindNextFile(hFindFile, &FindFileData));
FindClose(hFindFile);
}
// Now we are going to look for the matching files
PathCombine(szFullPattern, lpFolder, lpFilePattern);
hFindFile = FindFirstFile(szFullPattern, &FindFileData);
if(hFindFile != INVALID_HANDLE_VALUE)
{
do
{
if(!(FindFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY))
{
// found a file; do something with it
PathCombine(szFullPattern, lpFolder, FindFileData.cFileName);
_tprintf_s(_T("%s\n"), szFullPattern);
}
} while(FindNextFile(hFindFile, &FindFileData));
FindClose(hFindFile);
}
}
So you could call this like
FindFilesRecursively(_T("C:\\WINDOWS"), _T("*.wav"));
to find all the *.wav files in C:\WINDOWS and its subdirectories.
Technically you don't have to do two FindFirstFile() calls, but I find the pattern matching functions Microsoft provides (i.e. PathMatchFileSpec or whatever) aren't as capable as FindFirstFile(). Though for "*.wav" it would probably be fine.

Based on your mention of .wav, I'm going to guess you're writing code for Windows (that seems to be where *.wav files are most common). In this case, you use FindFirstFile and FindNextFile to traverse directories. These use a WIN32_FIND_DATA structure, which has a member dwFileAttributes that contains flags telling the attributes of the file. If dwAttributes & FILE_ATTRIBUTE_DIRECTORY is non-zero, you have the name of a directory.

Very Helpful.
I had anyway, a stack overflow since it was always adding "." to the path and returning to the same path = endless loop.
Adding this solved it:
// found a subdirectory; recurse into it
PathCombine(szFullPattern, lpFolder, FindFileData.cFileName);
FindFilesRecursively(szFullPattern, lpPattern);
if (FindFileData.cFileName[0] == '.') continue;

opendir and readdir (on unix), here's an example:
http://opengroup.org/onlinepubs/007908775/xsh/readdir.html
or FindFirstFile on windows
you could also use the shell pretty easily:
find . -name "*.wav"
or
ls **/*.wav (in zsh and newer bashes)

Related

Why am I seeing hidden files such as .DS_Store using fts(3) for traverse?

When I go through subdirectories I print hidden files such as .DS_Store along with usual files. I cannot understand why.
As far as I understand FTS_F flag is for usual files, not hidden files.
Also from documentation:
By default, unless they are specified as path arguments
to fts_open(), any files named "." or ".." encountered
in the file hierarchy are ignored.
Here is my code:
int traverse(char *dirName)
{
FTS *ftsp;
FTSENT *p, *chp;
int fts_options = FTS_COMFOLLOW | FTS_LOGICAL | FTS_NOCHDIR;
if ((ftsp = fts_open(&dirName, fts_options, NULL)) == NULL) {
printf("Open failed.");
return 1;
}
/* get all children directories */
chp = fts_children(ftsp, 0);
if (chp == NULL) {
return 0; /* no files to traverse */
}
while ((p = fts_read(ftsp)) != NULL) {
switch (p->fts_info) {
case FTS_D:
printf("d %s\n", p->fts_path);
break;
case FTS_F:
//if(!isHidden(p->fts_path))
printf("f %s\n", p->fts_path);
break;
default:
break;
}
}
fts_close(ftsp);
return 0;
}
There is no such thing as hidden file. Hiding/not-displaying files whose names begin with a dot is purely a convention. If you want to skip processing them, you can do so yourself.
I suspect the source of your confusion is the text you quoted:
By default, unless they are specified as path arguments to fts_open(), any files named "." or ".." encountered in the file hierarchy are ignored.
This text is referring to files (actually directories) with the names . (self) and .. (parent), not files whose names begin with dots.
Also note that the fts.h functions are non-standard, and the versions provided on GNU/Linux (glibc-based) systems are not safe to use because they're not compatible with 64-bit file sizes and inode numbers. If you want to use fts you should get a portable version from one of the BSDs or gnulib to include in your program's source tree rather than using the system one.

windows c code for listing the file names recursively inside a directory with desired extension

I want to recursively list out file names inside a directory using windows API with desired extension file name.
I have tried out with this but Shlwapi.h seems to be not comfortable with function PathCombine. Could you please let me know if it works at all?
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
#include <strsafe.h>
#include "Shlwapi.h"
#pragma comment(lib, "User32.lib")
void FindFilesRecursively(LPCTSTR lpFolder, LPCTSTR lpFilePattern)
{
TCHAR szFullPattern[MAX_PATH];
WIN32_FIND_DATA FindFileData;
HANDLE hFindFile;
// first we are going to process any subdirectories
PathCombine(szFullPattern, lpFolder,_T("*"));
hFindFile = FindFirstFile(szFullPattern, &FindFileData);
if(hFindFile != INVALID_HANDLE_VALUE)
{
do
{
if(FindFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
// found a subdirectory; recurse into it
PathCombine(szFullPattern, lpFolder, FindFileData.cFileName);
FindFilesRecursively(szFullPattern, lpFilePattern);
}
} while(FindNextFile(hFindFile, &FindFileData));
FindClose(hFindFile);
}
// now we are going to look for the matching files
PathCombine(szFullPattern, lpFolder, lpFilePattern);
hFindFile = FindFirstFile(szFullPattern, &FindFileData);
if(hFindFile != INVALID_HANDLE_VALUE)
{
do
{
if(!(FindFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY))
{
// found a file; do something with it
PathCombine(szFullPattern, lpFolder, FindFileData.cFileName);
_tprintf_s(_T("%s\n"), szFullPattern);
}
} while(FindNextFile(hFindFile, &FindFileData));
FindClose(hFindFile);
}
}
int main()
{
FindFilesRecursively(_T("E:\\Logstotest"), _T("*.log"));
return 0;
}
yup, this is a linking error: 1>task2.obj : error LNK2001: unresolved external symbol __imp_PathCombineW
http://msdn.microsoft.com/en-us/library/windows/desktop/bb773571%28v=vs.85%29.aspx says you need to link against it:
put #pragma comment(lib, "shlwapi.lib") in your source code.
Your code works fine if you exclude the direcories named "." and ".." from the search.
The body of your first while loop should look like this :
if(FindFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
// Exclude "." and ".." directories
if (_tcscmp(FindFileData.cFileName, _T(".")) != 0 &&
_tcscmp(FindFileData.cFileName, _T("..")) != 0)
{
// found a subdirectory; recurse into it
PathCombine(szFullPattern, lpFolder, FindFileData.cFileName);
FindFilesRecursively(szFullPattern, lpFilePattern);
}
}
The "." directory is the current directory and if you recurse into that you will never get out of recursion, because you will scan the same directory over and over again until the stack is full.
The ".." directory is the directory "above" the current directory and if you scan that you will also run into an infinite recursion for the same reason as stated above.
BTW you can see those directories by using the dir command in a cmd window.

problems with searching in files and directories .. windows programming

I'm studying this book (Addison Wesley Windows System Programming 4th Edition) and I think its useless Im working on a searching code that support the recursive so it can go in deepth in files and directories the code is working ( I guess ) no syntax error but the output is not what I want the out put of the search is like:
not found
Now, here are the folders:
not found
Searching in d:\iust\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\..\e-books\.\.\.\.\E-BOOKS
The file name is: d:\iust\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\..\e-books\.\.\.\.\E-BOOKS\*Test*
not found
Now, here are the folders:
not found
Searching in d:\iust\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\..\e-books\.\.\.\..
The file name is: d:\iust\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\..\e-books\.\.\.\..\*Test*
not found
Now, here are the folders:
First I notiiced that what ever I do it will not search just inside the folder i specified but in all whole drive and the second annoying probem is the DOTS the . and .. those appear in each folder how can I avoid this problem. now as i said before Im using the book I mentioned before but I dont know I just dont like what i did is there a better way to form my code .
the code :
#include "stdafx.h"
#include <windows.h>
void SearchForFile(TCHAR *folder, TCHAR *file){
_tprintf(L"Searching in %s\n",folder); //just to show the state
TCHAR temp[1000];
_stprintf(temp,L"%s\\%s",folder,file); // here wrote into temp the location as folder/file
_tprintf(L"The file name is: %s\n",temp);
HANDLE f;
WIN32_FIND_DATA data;
f=FindFirstFile(temp,&data);
if(f==INVALID_HANDLE_VALUE){
_tprintf(L"not found\n");
}
else{
_tprintf(L"found this file: %s\n",data.cFileName);
while(FindNextFile(f,&data)){
_tprintf(L"found this file: %s\n",data.cFileName);
}
FindClose(f);
}
_stprintf(temp,L"%s\\*",folder); // "d:\*" for example
_tprintf(L"Now, here are the folders:\n");
f=FindFirstFile(temp,&data);
TCHAR temp2[1000];
if(f==INVALID_HANDLE_VALUE){
_tprintf(L"not found\n");
}
else{
if((data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0)
{
//_tprintf(L"found this directory: %s\n",data.cFileName);
_stprintf(temp2,L"%s\\%s",folder,data.cFileName);
SearchForFile(temp2,file);
}
while(FindNextFile(f,&data)){// _tprintf(L"%d %d\n",data.dwFileAttributes,FILE_ATTRIBUTE_DIRECTORY);
if((data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0)
// _tprintf(L"found this directory: %s\n",data.cFileName);
{
_stprintf(temp2,L"%s\\%s",folder,data.cFileName);
SearchForFile(temp2,file);
}
}
FindClose(f);
}
}
int _tmain(int argc, _TCHAR* argv[])
{
SearchForFile(L"d:\\test", L"*Test*");
return 0;
}
You have to filter out the . and .. pseudo-folders found in every folder.
Roughly, in your recursive branch:
if((data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0
&& data.data.cFileName != "."
&& data.data.cFileName != "..")
In general, you should skip "." and ".." directories, they are synonyms for "current" and "parent" directory.
Pretty much no matter how you find the contents of a directory on Windows the first matches will be '.' (the current directory) and '..' (the parent directory). You probably want to ignore both of them.
Usually you explicitly test for and skip the "." and ".." subdirectories that are present in all directories (but the root). The code you're using searches subdirectories recursively, and since you're not ignoring the ".." directory, it'll search that, which will eventually lead to the root directory, and search all subdirectories from there -- meaning it'll search the whole disk.

Searching files in C on Windows

How would one search for files on a computer?
Maybe looking for certain extensions.
I need to iterate through all the files and examine file names.
Say I wanted to find all files with an .code extension.
For Windows, you would want to look into the FindFirstFile() and FindNextFile() functions. If you want to implement a recursive search, you can use GetFileAttributes() to check for FILE_ATTRIBUTE_DIRECTORY. If the file is actually a directory, continue into it with your search.
A nice wrapper for FindFirstFile is dirent.h for windows (google dirent.h Toni Ronkko)
#define S_ISREG(B) ((B)&_S_IFREG)
#define S_ISDIR(B) ((B)&_S_IFDIR)
static void
scan_dir(DirScan *d, const char *adir, BOOL recurse_dir)
{
DIR *dirfile;
int adir_len = strlen(adir);
if ((dirfile = opendir(adir)) != NULL) {
struct dirent *entry;
char path[MAX_PATH + 1];
char *file;
while ((entry = readdir(dirfile)) != NULL)
{
struct stat buf;
if(!strcmp(".",entry->d_name) || !strcmp("..",entry->d_name))
continue;
sprintf(path,"%s/%.*s", adir, MAX_PATH-2-adir_len, entry->d_name);
if (stat(path,&buf) != 0)
continue;
file = entry->d_name;
if (recurse_dir && S_ISDIR(buf.st_mode) )
scan_dir(d, path, recurse_dir);
else if (match_extension(path) && _access(path, R_OK) == 0) // e.g. match .code
strs_find_add_str(&d->files,&d->n_files,_strdup(path));
}
closedir(dirfile);
}
return;
}
Use FindFirstFile() or FindNextFile() functions and a recursive algorithm to traverse sub-folders.
FindFirstFile()/ FindNextFile() will do the job in finding the list of files in the directory. To do recursive search through the sub-directories you might use _splitpath
to split the path, into directory and filenames, and then use the resulting directory detail to do a recursive directory search.

How to ignore hidden files with opendir and readdir in C library

Here is some simple code:
DIR* pd = opendir(xxxx);
struct dirent *cur;
while (cur = readdir(pd)) puts(cur->d_name);
What I get is kind of messy: including dot (.), dot-dot (..) and file names that end with ~.
I want to do exactly the same thing as the command ls. How do I fix this, please?
This is normal. If you do ls -a (which shows all files, ls -A will show all files except for . and ..), you will see the same output.
. is a link referring to the directory it is in: foo/bar/. is the same thing is foo/bar.
.. is a link referring to the parent directory of the directory it is in: foo/bar/.. is the same thing as foo.
Any other files beginning with . are hidden files (by convention, it is not really enforced by anything; this is different from Windows, where there is a real, official hidden attribute). Files ending with ~ are probably backup files created by your text editor (again, this is convention, these really could be anything).
If you don't want to show these types of files, you have to explicitly check for them and ignore them.
Eliminating hidden files:
DIR* pd = opendir(xxxx);
struct dirent *cur;
while (cur = readdir(pd)) {
if (cur->d_name[0] != '.') {
puts(cur->d_name);
}
}
Eliminating hidden files and files ending in "~":
DIR* pd = opendir(xxxx);
struct dirent *cur;
while (cur = readdir(pd)) {
if (cur->d_name[0] != '.' && cur->d_name[strlen(cur->d_name)-1] != '~') {
puts(cur->d_name);
}
}
Stick a if (cur->d_name[0] != '.') before you process the name.
The UNIX hidden file standard is the leading dot, which . and .. also match.
The trailing ~ is the standard for backup files. It's a little more work to ignore those, but a multi-gigahertz CPU can manage it. Use something like if (cur->d_name[strlen(cur->d_name)-1] == '~')
This behavior is exactly like what ls -a does. If you want filtering then you'll need to do it after the fact.

Resources