Searching files in C on Windows - c

How would one search for files on a computer?
Maybe looking for certain extensions.
I need to iterate through all the files and examine file names.
Say I wanted to find all files with an .code extension.

For Windows, you would want to look into the FindFirstFile() and FindNextFile() functions. If you want to implement a recursive search, you can use GetFileAttributes() to check for FILE_ATTRIBUTE_DIRECTORY. If the file is actually a directory, continue into it with your search.

A nice wrapper for FindFirstFile is dirent.h for windows (google dirent.h Toni Ronkko)
#define S_ISREG(B) ((B)&_S_IFREG)
#define S_ISDIR(B) ((B)&_S_IFDIR)
static void
scan_dir(DirScan *d, const char *adir, BOOL recurse_dir)
{
DIR *dirfile;
int adir_len = strlen(adir);
if ((dirfile = opendir(adir)) != NULL) {
struct dirent *entry;
char path[MAX_PATH + 1];
char *file;
while ((entry = readdir(dirfile)) != NULL)
{
struct stat buf;
if(!strcmp(".",entry->d_name) || !strcmp("..",entry->d_name))
continue;
sprintf(path,"%s/%.*s", adir, MAX_PATH-2-adir_len, entry->d_name);
if (stat(path,&buf) != 0)
continue;
file = entry->d_name;
if (recurse_dir && S_ISDIR(buf.st_mode) )
scan_dir(d, path, recurse_dir);
else if (match_extension(path) && _access(path, R_OK) == 0) // e.g. match .code
strs_find_add_str(&d->files,&d->n_files,_strdup(path));
}
closedir(dirfile);
}
return;
}

Use FindFirstFile() or FindNextFile() functions and a recursive algorithm to traverse sub-folders.

FindFirstFile()/ FindNextFile() will do the job in finding the list of files in the directory. To do recursive search through the sub-directories you might use _splitpath
to split the path, into directory and filenames, and then use the resulting directory detail to do a recursive directory search.

Related

In C how can I find in a directory all the file names and store them in a array of strings of characters?

I create a code in C to find all file names in a directory, this code is recursive so if it finds other directories within this will review the others directories. The search in the directory and the search of files works well,, because i printed on screen what the program read and i realized that find all filenames and does not repeat any filename. The problem i have is that i saves into an array of strings of characters the file name found, and when you find all the files i will have in the main the arrangement of strings with all the file names, but when i printed on screen that arrangement there is not all the files in it, but a few repeated files names. The ultimate goal of this program is not print all file names in screen, the purpose is to save in the arrangement all the file names .I can not see the error, if someone can tell me something that I'm doing wrong I'll be grateful.
void findfiles(char *root,char *p[],int *tam){
DIR * dir;
struct dirent *entrada;
struct stat stt;
dir = opendir(root);
char *aux;
char nombre[BUFFER_TAM];
char buf[30];
if (dir == NULL) {
printf("hola4\n");
return;
}
while ((entrada = readdir(dir)) != NULL) {
if (strcmp(entrada->d_name,".")==0 || strcmp(entrada->d_name,"..")==0);
else {
if (entrada->d_type == DT_DIR){
strcpy(nombre,root);
strcat(nombre,"/");
strcat(nombre,entrada->d_name);
findfiles(nombre,p,tam);
}
else {
strcpy(nombre,root);
strcat(nombre,"/");
strcat(nombre,entrada->d_name);
p[*tam]=malloc(strlen(nombre)+1);
p[*tam]=nombre;
*tam = *tam +1;
}
}
}
}
void main(){
char *archivos[BUFFER_TAM];
char root[BUFFER_TAM]="/home/jesusmolina/Documentos";
int i=0,tam=0;
findfiles(root,archivos,&tam);
for (i;i<tam;i++)
printf("%s\n",archivos[i]);
}
p[*tam]=malloc(strlen(nombre)+1);
p[*tam]=nombre;
You allocate a chunk of memory, then immediately lose the pointer to that memory and leak it. You probably wanted:
p[*tam]=malloc(strlen(nombre)+1);
strcpy(p[*tam], nombre);

Why am I seeing hidden files such as .DS_Store using fts(3) for traverse?

When I go through subdirectories I print hidden files such as .DS_Store along with usual files. I cannot understand why.
As far as I understand FTS_F flag is for usual files, not hidden files.
Also from documentation:
By default, unless they are specified as path arguments
to fts_open(), any files named "." or ".." encountered
in the file hierarchy are ignored.
Here is my code:
int traverse(char *dirName)
{
FTS *ftsp;
FTSENT *p, *chp;
int fts_options = FTS_COMFOLLOW | FTS_LOGICAL | FTS_NOCHDIR;
if ((ftsp = fts_open(&dirName, fts_options, NULL)) == NULL) {
printf("Open failed.");
return 1;
}
/* get all children directories */
chp = fts_children(ftsp, 0);
if (chp == NULL) {
return 0; /* no files to traverse */
}
while ((p = fts_read(ftsp)) != NULL) {
switch (p->fts_info) {
case FTS_D:
printf("d %s\n", p->fts_path);
break;
case FTS_F:
//if(!isHidden(p->fts_path))
printf("f %s\n", p->fts_path);
break;
default:
break;
}
}
fts_close(ftsp);
return 0;
}
There is no such thing as hidden file. Hiding/not-displaying files whose names begin with a dot is purely a convention. If you want to skip processing them, you can do so yourself.
I suspect the source of your confusion is the text you quoted:
By default, unless they are specified as path arguments to fts_open(), any files named "." or ".." encountered in the file hierarchy are ignored.
This text is referring to files (actually directories) with the names . (self) and .. (parent), not files whose names begin with dots.
Also note that the fts.h functions are non-standard, and the versions provided on GNU/Linux (glibc-based) systems are not safe to use because they're not compatible with 64-bit file sizes and inode numbers. If you want to use fts you should get a portable version from one of the BSDs or gnulib to include in your program's source tree rather than using the system one.

Check if a file is a specific type in C

I'm writing my first C program, though I come from a C++ background.
I need to iterate through a directory of files and check to see if the file is a header file, and then return the count.
My code is as follows, it's pretty rudimentary I think:
static int CountHeaders( const char* dirname ) {
int header_count = 0;
DIR* dir_ptr;
struct dirent* entry;
dir_ptr = opendir( dirname );
while( ( entry = readdir( dir_ptr ) ) )
{
if ( entry->d_type == DT_REG )
{
//second if statement to verify the file is a header file should be???
++header_count;
}
}
closedir( dir_ptr );
return header_count;
}
What would be a good if statement to check to see if the file is a header?
Simply check if the file extension is .h, something like:
const char *ext = strrchr (entry->d_name, '.');
if ((ext != NULL) && (!strcmp (ext+1, "h"))) {
// header file
}
Ofcourse, note that this assumes all your header files have an .h extension, which may or may not be true, the C standard does not mandate that header files must have an .h extension.
Each dirent structure has a d_name containing the name of the file, so I'd be looking to see if that followed some pattern, like ending in .h or .hpp.
That would be code along the lines of:
int len = strlen (entry->d_name);
if ((len >= 2) && strcmp (&(entry->d_name[len - 2]), ".h") == 0))
header_count++;
if ((len >= 4) && strcmp (&(entry->d_name[len - 4]), ".hpp") == 0))
header_count++;
Of course, that won't catch truly evil people from calling their executables ha_ha_fooled_you.hpp but thanfkfully they're in the minority.
You may even want to consider an endsWith() function to make your life easier:
int endsWith (char *str, char *end) {
size_t slen = strlen (str);
size_t elen = strlen (end);
if (slen < elen)
return 0;
return (strcmp (&(str[slen-elen]), end) == 0);
}
:
if (endsWith (entry->d_name, ".h")) header_count++;
if (endsWith (entry->d_name, ".hpp")) header_count++;
There are some much better methods than checking the file extension.
Wikipedia has a good article here and here. The latter idea is called the magic number database which essentially means that if a file contains blah sequence then it is the matching type listed in the database. Sometimes the number has restrictions on locations and sometimes it doesnt. This method IMO is more accurate albeit slower than file extension detection.
But then again, for something as simple as checking to see if its a header, this may be a bit of overkill XD
You could check if the last few characters are one of the header-file extensions, .h, .hpp, etc. Use the dirent struct's d_name for the name of the file.
Or, you could run the 'file' command and parse its result.
You probably just want to check the file extension. Using dirent, you would want to look at d_name.
That's up to you.
The easiest way is to just look at the filename (d_name), and check whether it ends with something like ".h" or ".hpp" or whatever.
Opening the file and actually reading it to see if it's valid c/c++, on the other hand, will be A LOT more complex... you could run it through a compiler, but not every header works on its own, so that test will give you a lot of false negatives.

problems with searching in files and directories .. windows programming

I'm studying this book (Addison Wesley Windows System Programming 4th Edition) and I think its useless Im working on a searching code that support the recursive so it can go in deepth in files and directories the code is working ( I guess ) no syntax error but the output is not what I want the out put of the search is like:
not found
Now, here are the folders:
not found
Searching in d:\iust\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\..\e-books\.\.\.\.\E-BOOKS
The file name is: d:\iust\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\..\e-books\.\.\.\.\E-BOOKS\*Test*
not found
Now, here are the folders:
not found
Searching in d:\iust\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.
\.\.\.\.\.\..\e-books\.\.\.\..
The file name is: d:\iust\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\
.\.\.\.\.\.\.\.\..\e-books\.\.\.\..\*Test*
not found
Now, here are the folders:
First I notiiced that what ever I do it will not search just inside the folder i specified but in all whole drive and the second annoying probem is the DOTS the . and .. those appear in each folder how can I avoid this problem. now as i said before Im using the book I mentioned before but I dont know I just dont like what i did is there a better way to form my code .
the code :
#include "stdafx.h"
#include <windows.h>
void SearchForFile(TCHAR *folder, TCHAR *file){
_tprintf(L"Searching in %s\n",folder); //just to show the state
TCHAR temp[1000];
_stprintf(temp,L"%s\\%s",folder,file); // here wrote into temp the location as folder/file
_tprintf(L"The file name is: %s\n",temp);
HANDLE f;
WIN32_FIND_DATA data;
f=FindFirstFile(temp,&data);
if(f==INVALID_HANDLE_VALUE){
_tprintf(L"not found\n");
}
else{
_tprintf(L"found this file: %s\n",data.cFileName);
while(FindNextFile(f,&data)){
_tprintf(L"found this file: %s\n",data.cFileName);
}
FindClose(f);
}
_stprintf(temp,L"%s\\*",folder); // "d:\*" for example
_tprintf(L"Now, here are the folders:\n");
f=FindFirstFile(temp,&data);
TCHAR temp2[1000];
if(f==INVALID_HANDLE_VALUE){
_tprintf(L"not found\n");
}
else{
if((data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0)
{
//_tprintf(L"found this directory: %s\n",data.cFileName);
_stprintf(temp2,L"%s\\%s",folder,data.cFileName);
SearchForFile(temp2,file);
}
while(FindNextFile(f,&data)){// _tprintf(L"%d %d\n",data.dwFileAttributes,FILE_ATTRIBUTE_DIRECTORY);
if((data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0)
// _tprintf(L"found this directory: %s\n",data.cFileName);
{
_stprintf(temp2,L"%s\\%s",folder,data.cFileName);
SearchForFile(temp2,file);
}
}
FindClose(f);
}
}
int _tmain(int argc, _TCHAR* argv[])
{
SearchForFile(L"d:\\test", L"*Test*");
return 0;
}
You have to filter out the . and .. pseudo-folders found in every folder.
Roughly, in your recursive branch:
if((data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) != 0
&& data.data.cFileName != "."
&& data.data.cFileName != "..")
In general, you should skip "." and ".." directories, they are synonyms for "current" and "parent" directory.
Pretty much no matter how you find the contents of a directory on Windows the first matches will be '.' (the current directory) and '..' (the parent directory). You probably want to ignore both of them.
Usually you explicitly test for and skip the "." and ".." subdirectories that are present in all directories (but the root). The code you're using searches subdirectories recursively, and since you're not ignoring the ".." directory, it'll search that, which will eventually lead to the root directory, and search all subdirectories from there -- meaning it'll search the whole disk.

How to recursively traverse directories in C on Windows

Ultimately I want to travel through a folder's files and subdirectories and write something to all files i find that have a certain extension(.wav in my case). when looping how do i tell if the item I am at is a directory?
Here is how you do it (this is all from memory so there may be errors):
void FindFilesRecursively(LPCTSTR lpFolder, LPCTSTR lpFilePattern)
{
TCHAR szFullPattern[MAX_PATH];
WIN32_FIND_DATA FindFileData;
HANDLE hFindFile;
// first we are going to process any subdirectories
PathCombine(szFullPattern, lpFolder, _T("*"));
hFindFile = FindFirstFile(szFullPattern, &FindFileData);
if(hFindFile != INVALID_HANDLE_VALUE)
{
do
{
if(FindFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
// found a subdirectory; recurse into it
PathCombine(szFullPattern, lpFolder, FindFileData.cFileName);
FindFilesRecursively(szFullPattern, lpFilePattern);
}
} while(FindNextFile(hFindFile, &FindFileData));
FindClose(hFindFile);
}
// Now we are going to look for the matching files
PathCombine(szFullPattern, lpFolder, lpFilePattern);
hFindFile = FindFirstFile(szFullPattern, &FindFileData);
if(hFindFile != INVALID_HANDLE_VALUE)
{
do
{
if(!(FindFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY))
{
// found a file; do something with it
PathCombine(szFullPattern, lpFolder, FindFileData.cFileName);
_tprintf_s(_T("%s\n"), szFullPattern);
}
} while(FindNextFile(hFindFile, &FindFileData));
FindClose(hFindFile);
}
}
So you could call this like
FindFilesRecursively(_T("C:\\WINDOWS"), _T("*.wav"));
to find all the *.wav files in C:\WINDOWS and its subdirectories.
Technically you don't have to do two FindFirstFile() calls, but I find the pattern matching functions Microsoft provides (i.e. PathMatchFileSpec or whatever) aren't as capable as FindFirstFile(). Though for "*.wav" it would probably be fine.
Based on your mention of .wav, I'm going to guess you're writing code for Windows (that seems to be where *.wav files are most common). In this case, you use FindFirstFile and FindNextFile to traverse directories. These use a WIN32_FIND_DATA structure, which has a member dwFileAttributes that contains flags telling the attributes of the file. If dwAttributes & FILE_ATTRIBUTE_DIRECTORY is non-zero, you have the name of a directory.
Very Helpful.
I had anyway, a stack overflow since it was always adding "." to the path and returning to the same path = endless loop.
Adding this solved it:
// found a subdirectory; recurse into it
PathCombine(szFullPattern, lpFolder, FindFileData.cFileName);
FindFilesRecursively(szFullPattern, lpPattern);
if (FindFileData.cFileName[0] == '.') continue;
opendir and readdir (on unix), here's an example:
http://opengroup.org/onlinepubs/007908775/xsh/readdir.html
or FindFirstFile on windows
you could also use the shell pretty easily:
find . -name "*.wav"
or
ls **/*.wav (in zsh and newer bashes)

Resources