Traverse Directory Depth First

Traverse Directory Depth First - c

I need to traverse a directory depth first without using boost but I have not been able to find a good tutorial how to do this. I know how to list the files of the directory, but not sure how to about this one. This list the files of a directory:

Use the ftw or nftw functions if your system has them. Or, grab the fts_* functions from, e.g., the OpenBSD source tree and study those, or use them directly. This problem is harder than you might think, because you can run out of file descriptors when recursing through deep filesystem hierarchies.

Make sure you understand recursion.
I assume you have a function walk(dir_path) which can list all files (and directries) in the dir_path directory. You need to modify it, so it calls it self (recursively) for each directory you find. That's it.

Related

Determine which binary will run via execlp in advance

Edit #1
The "Possible duplicates" so far are not duplicates. They test for the existence of $FILE in $PATH, rather than providing the full path to the first valid result; and the top answer uses bash command line commands, not pure c.
Original Question
Of all the exec family functions, there are a few which do $PATH lookups rather than requiring an absolute path to the binary to execute.
From man exec:
The execlp(), execvp(), and execvpe() functions duplicate the actions
of the shell in searching for an executable file if the specified
filename does not contain a slash
(/) character. The file is sought in the colon-separated list of directory pathnames specified in the PATH environment variable. If
this variable isn't defined, the path
list defaults to the current directory followed by the list of directories returned by confstr(_CS_PATH). (This confstr(3)
call typically returns the value
"/bin:/usr/bin".)
Is there a simple, straightforward way, to test what the first "full path to execute" will evaluate to, without having to manually iterate through all the elements in the $PATH environment variable, and appending the binary name to the end of the path? I would like to use a "de facto standard" approach to estimating the binary to be run, rather than re-writing a task that has likely already been implemented several times over in the past.
I realize that this won't be a guarantee, since someone could potentially invalidate this check via a buggy script, TOCTOU attacks, etc. I just need a decent approximation for testing purposes.
Thank you.

Is there a simple, straightforward way, to test what the first "full path to execute" will evaluate to, without having to manually iterate through all the elements in the $PATH environment variable
No, you need to iterate thru $PATH (i.e. getenv("PATH") in C code). Some (non standard) libraries provide a way to do that, but it is really so simple that you should not bother. You could use strchr(3) to find the "next" occurrence of colon :, so coding that loop is really simple. As Jonathan Leffler commented, they are subtleties (e.g. permissions, hanging symbolic links, some other process adding some new executable to a directory mentionned in your $PATH) but most programs ignore them.
And what is really relevant is the PATH value before running execvp. In practice, it is the value of PATH when starting your program (because outside processes cannot change it). You just need to be sure that your program don't change PATH which is very likely (the corner case, and difficult one, would be some other thread -of the same process- changing the PATH environment variable with putenv(3) or setenv(3)).
In practice the PATH won't change (unless you have some code explicitly changing it). Even if you use proprietary libraries and don't have time to check their source code, you can expect PATH to stay the same in practice during execution of your process.
If you need some more precise thing, and assuming you use execp functions on program names which are compile time constants, or at least constant after your program initialization reading some configuration files, you could do what many shells are doing: "caching" the result of searching the PATH into some hash table, and using execve on that. Still, you cannot avoid the issue of some other process adding or removing files into directories mentioned in your PATH; but most programs don't care (and are written with the implicit hypothesis that this don't happen, or is notified to your program: look at the rehash builtin of zsh as an example).
But you always need to test against failure of exec (including execlp(3) & execve(2)) and fork functions. They could fail for many reasons, even if the PATH has not changed and directories and files mentioned in it have not been changed.

Storing folder's paths

Where can I store folder's paths, which can be accessed from every function/variable in a C program?
Ex. I have an executable called do_input.exe in the path c:\tests\myprog\bin\do_input.exe,
another one in C:\tools\degreesToDms.exe, etc. how and where should I store these?
I stored them as strings in an header file which I included in every project's file but someone discouraged from doing this. Are they right?

I stored them as strings in an header file which I included in every project's file but someone discouraged from doing this. Are they right?
Yes, they are absolutely right: "baking in" installation-specific strings with paths in a file system into a compiled code is not a good decision, because you must recompile simply to change locations of some key files. This limits the flexibility of other members of your team to run your tests, and may prevent your tests from being ran automatically in an automated testing environment.
A better solution would use a plain text configuration file with the locations of the key directories, and functions that read that file and produce correct locations at run-time.
Alternatively, you could provide locations of key directories as command-line parameters to your program. This way, users who run your program would be able to set correct locations without recompiling.

If they stay the same, then I don't see any problem defining these paths in a ".h" header file included in all the various .c files that reference the paths. But every computer this thing will be running on may have different paths ("Tests" instead of "test"), so this is super risky programming and probably only safe if you're running it on a single machine or a set of machines that you control directly.
If the paths will change, then you need to create a storage place for these paths (e.g. static character array, etc.) and then have methods to allow these to be fetched and possibly reset dynamically (e.g. instead of writing output files to "results", maybe the user wants to change things to write files to "/tmp"). Totally depends on what you are doing in your code and what the tools you're writing will be doing.

Copy directory recursively in pure C on Linux/UNIX

Can someone guide me on a possible solution? I don't want to use /bin/cp or any other foreign apps. I want my program to be independent. Also I know that every system is quite specific, so I'm interested in UNIX/Linux compatibility.
How can I solve it? Just going down the source directory and creating a new directories in the target one and copying files in them, or there is a better solution?
BTW my goal is: copy all first level subdirs recursively into target dir if they are not present there

You really need some kind of recursive descent into the directory tree. Doing this, you can actually make this very portable (using opendir/readdir on Linux and FindFirstFile/FindNextFile on Windows). The problem that remains is the actual copying. You can use the C standard library for that with the following algorithm:
Open source file
Open target file
In a loop, fread a block of constant size from the source, then fwrite it to the target. Stop if the source file contains no more data
Hope this helps :)

Use the POSIX nftw(3) function to walk the tree you want to copy. You supply this function with a callback function that gets called on the path of each file/directory. Define a callback that copies the file/dir it gets called on into the destination tree. The fourth callback argument of type struct FTW * can be used to compute the relative path.

If you want to use only C, you could use dirent.h. Using this, you can recursively follow the directory structure. Then you could open the files in the binary mode, and write them to the desired location via write stream.

safely reading directory contents

Is it safe to read directory entries via readdir() or scandir() while files are being created or deleted in this directory? Should I prefer one over the other?
EDIT: When I say "safe" I mean entries returned by these functions are valid and can be operated without crashing the program.
Thanks.

It depends by what you mean as "safe". They are safe in the sense that they should not crash your program. However, if you are creating/deleting files as you are reading/scanning that directory, the set of files you get back might not be up-to-date.
When reading/scanning a directory for directory entries, the file pointer (a directory is just a special type of file), moves forward. However, depending upon the file system, there may be nothing to prevent new files from being created in an empty directory entry slot behind your file pointer. Consequently, newly added directory entries may not be immediately detected by readdir()/scandir(). Similar reasoning applies for file deletion / directory entry removal.
Hope this helps.

What's your definition of safety? You won't crash the system, and readdir/scandir won't crash your program. Although they might give you data that is immediately out of date.
The usual semantics for reading a directory are that if you read the directory from beginning to end, you will see all of the files that didn't change during that time exactly once, and you will see files that were created or deleted during that time at most once.
On UNIX-like systems readdir() and scandir() are library functions implemented on top of the same underlying system call (getdents() in Linux, getdirentries() in BSD). So there shouldn't be much difference in their behavior in this regard. I think readdir() is a bit more standard, and therefore will be more portable.

Moving libraries and headers

I have some c code which provides libfoo.so and libfoo.a along with the header file foo.h. A large number of clients currently use these libraries from /old_location/lib and /old_location/include directories which is where they are disted.
Now I want to move this code to /new_location. Yet I am not in a position to inform the clients about this change. I would want the old clients to continue accessing the libs and headers from the /old_location.
For this, will creating symlinks to the libs/headers to the new locations work?
/old_location/lib/libfoo.so -> /new_location/lib/libnewfoo.so
/old_location/lib/libfoo.a -> /new_location/lib/libnewfoo.a
/old_location/inlcude/foo.h -> /new_location/inlcude/foo.h
[Note that I need to name the new lib as libnewfoo and not libfoo due to some constraints. Can this renaming cause any problem? Yet the C code that generates these has not changed.]
It seems to work for the few simple cases I tried. But can there be cases where clients are using the libs and headers in a way which may break as a result of this change. Please let me know what kind of intricacies can be involved in this. Sorry if this seems to be a novice question, I've hardly worked with c before and am a java person.

You have to differentiate between compile time and run time.
For compile time, clients need to update their Makefile and / or configure logic.
For run time, you simply tell ld.so via ld.so.conf about where to find the .so library (or tell your clients to adjust LD_LIBRARY_PATH, a second best choice). The static library does not matter as its code is already built into the executable.
And yes, by providing symbolic links you can make the move 'disappear' as well and provide all files via the old location.
And all this is pretty testable from your end before roll-out.

I don't see any reason why this would break, this is more a question about symlinks than C. To an unsuspecting user program (one which doesn't have special code to detect symlinks and complain), a symlink is transparent.
If you do experience errors feel free to post them and we'll do our best to advise. However I see nothing off the top of my head that would cause issues.

The only problem with the symlinks could be if some clients mount the new location with a different path, which is possible in a networked unix type environment. For example, you could have the location as:
/var/stuff/new_location/include/...
and the client could be mounting that as:
/auto/var/stuff/new_location/include/..
In which case a relative symlink might work better, i.e.:
old_location/include/foo.h -> ../new_location/include/foo.h
Another thing to consider is to replace old_location/foo.h with:
/*
* Please note that this library has moved to a new location...
*/
#include "new_location/include/foo.h"

The symlinks will work on any operating system and file system that supports symlinks.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight