C libraries for directory access - c

I know that standard C doesn't give me any ability to do anything with folders, but I would like a fairly portable and cross-platform way to access folders. At this time, all I need to do is make a folder, check if a folder exists, and possibly delete a folder. I can forsee needing to read files from a folder in the near future, but that's not a very pressing need.
Anyway, I was wondering if there was a good cross-platform C library for working with directories. In an absolute pinch I can probably roll my own to work on POSIX and Windows, but I was wondering if there were any good ones already out there. I've been considering GLib or the Apache Portable Runtime, but both of those come with a lot more stuff than I really need, and I'd like to keep this fairly lightweight. I've also considered using the internals of a popular scripting language, like Perl or Python, but that also seems like a lot of overhead just for directory functions.
If anyone has anything to add to this list that I should look into, or wants to make a good case for one of the options I've already listed, please tell me. I don't want to sound like I'm asking for code, but if you posted a simple function like int direxist(char *dirname) that returned true if the directory exists and false otherwise, just to illustrate the API for your library of choice, that would be really awesome, and I imagine not too hard. If you want to advocate using POSIX/rolling my own, do that too, because I'm a sucker for learning new stuff like this by doing it myself.
Just to make sure, I want C, not C++. I'm sure boost is good, but I'm not interested in C++ solutions.

I would jump on the APR bandwagon. It does give you a lot more than directory access, but it is the best multi-platform C library that I've used. Chances are that you will find yourself needing some of the other components of it in the future anyway, so you might as well have them handy.
The other option is to implement the POSIX API set over Win32 and just write everything in POSIX. The bonus here is that the Windows is quickly becoming the only modern OS that does not include a POSIX runtime implementation.

I've been considering GLib or the Apache Portable Runtime, but both of those come with a lot more stuff than I really need, and I'd like to keep this fairly lightweight.
It's quite probable that GLib will already be installed (on GNU/Linux, at least). The only problem is that it will add a dependency.
I've also considered using the internals of a popular scripting language, like Perl or Python, but that also seems like a lot of overhead just for directory functions.
I would rather use Python in the first place, and possibly use C for specific parts of code.
>>> def direxist(dirname):
... return os.path.isdir(dirname)
...
>>> direxist('/')
True
>>> direxist('/home')
True
>>> direxist('/home/bastien/Petites leçons de typographie.pdf')
False
About writing your own C function, it would go something like this:
#include <stdio.h>
#ifdef _WIN32
#include <windows.h>
#else
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#endif
int direxist(const char* dirname)
{
#ifdef _WIN32
/* ... */
#else
struct stat fileinfo;
int ret = -1;
if (stat(dirname, &fileinfo) == -1)
{
perror("direxist");
}
else
{
if (S_ISDIR(fileinfo.st_mode))
{
ret = 1;
}
else
{
ret = 0;
}
}
return ret;
#endif
}
int
main (void)
{
printf("%d\n", direxist("/"));
return 0;
}
I don't how to do it with Win32, so you'll to find that yourself.
However, I would strongly recommend using an external library. You don't go far with just the C library or by reinventing the wheel.

I think you should use APR or something in the same vein. Designing a POSIX API and then implement it for windows does not work well in my experience (which is quite limited I must confess).
File IO, and related semantics are just too different, so you have to design your API to deal with windows right away. There are also different constraints on what can be put in the public API. One example of such a problem is the python C API for file handling. It was clearly designed with a POSIX POV, and it is very difficult to use it on windows because of things like sharing C runtimes objects, etc...

Related

Multiplatform support, preprocesser or linking with individual libraries

I'm working on a homebrew game for the GBA, and was thinking about porting it to the PC (likely using SDL) as well.
I haven't dealt with the problem of multiplatform support before, so I don't really have any experience.
I came up with two possible ways of going about it, but both have drawbacks, and I don't know if there is a way better solution I'm missing out on.
First would use the preprocessor. A header file would be included in all files which would #define GBA, and based on whether it is defined, the appropriate headers will be included and the appropriate platform specific code will be compiled for the platform.
I would implement it something like
/* GBA definition is in platform.h */
/* Example.c */
void example()
#ifdef GBA
{
/* GBA specific implementation goes here */
}
#else
{
/* PC specific implementation goes here */
}
#endif
The drawback I see here is for a large project, this can get very messy and is frankly kind of ugly and difficult to read.
The other option I can think of is creating static libraries for each platform. Therefore the main source code for both platforms will be the same, increasing ease of simultaneous development, and when building for GBA or PC, the appropriate libraries and settings will be specified and that's it.
The obvious drawback here is that if there needs to be a change in the implementation of something in the library, if something needs to be added, or anything really regarding the library, it needs to be maintained and rebuilt constantly, along with the main actual program.
If there is a better way to approach this, what would it be?
If the ways I mentioned are the standard way of doing it, which is more common / better for long term development?
Here's what I would do [and have done]. Doing [a lot of] #ifdef/#else/#endif sequences is hard to maintain. Trust me, I've done it, until I found better ways. Below is a way I've used in the past. There are other similar approaches.
Here is the generic code:
// example.c -- generic code
#ifdef _USE_GBA_
#include <example_gba.c>
#endif
#ifdef _USE_SDL_
#include <example_sdl.c>
#endif
void
example(void)
{
// NOTE: this will get optimized using tail recursion into a jump or
// example_dep will get inlined here
example_dep();
}
Here is the GBA specific code:
// example_gba.c -- GBA specific code
static void
example_dep(void)
{
// ...
}
Here is the SDL code:
// example_sdl.c -- SDL specific code
static void
example_dep(void)
{
// ...
}

Determine OS during runtime

Neither ISO C nor POSIX offer functionality to determine the underlying OS during runtime. From a theoretical point of view, it doesn't matter since C offers wrappers for the most common system calls, and from a nit-picking point of view, there doesn't even have to be an underlying OS.
However, in many real-world scenarios, it has proven helpful to know more about the host environment than C is willing to share, e.g. in order to find out where to store config files or how to call select(), so:
Is there an idiomatic way for an application written in C to determine the underlying OS during runtime?
At least, can I easily decide between Linux, Windows, BSD and MacOS?
My current guess is to check for the existence of certain files/directories, such as C:\ or /, but this approach seems unreliable. Maybe querying a series of such sources may help to establish the notion of "OS fingerprints", thus increasing reliability. Anyway, I'm looking forward to your suggestions.
Actually, most systems have a uname command which shows the current kernel in use. On Mac OS, this is usually "Darwin", on Linux it's just plain "Linux", on Windows it's "ERROR" and FreeBSD will return "FreeBSD".
More complete list of uname outputs
I'm pretty sure that there's a C equivalent for uname, so you won't need system()
IF you are on a POSIX system, you can call uname() from <sys/utsname.h>.
This obviously isn't 100% portable, but I don't think there will be any method that can grant that at runtime.
see the man page for details
Runtime isn't the time to determine this, being that without epic kludges binaries for one platform won't run on another, you should just use #ifdefs around the platform sensitive code.
The accepted answer states uname, but doesn't provide a minimal working example, so here it is for anyone interested-hope it will save you the time it took for me:
#include <stdio.h>
#include <stdlib.h>
#include <sys/utsname.h>
int main(void) {
struct utsname buffer;
if (uname(&buffer) != 0) {
perror("uname");
exit(0);
}
printf("OS: %s\n", buffer.sysname);
return 0;
}
(Possible) Output:
OS: Linux
PS: Unfortunately, this uses a POSIX header: Compilation fails due to missing file sys/utsname.h, which most probably won't work in Windows.
if (strchr(getenv("PATH"),'\\'))
puts("You may be on windows...");
Even do I agree that "Runtime isn't the time to determine this..."

Plugin architecture in C using libdl

I've been toying around, writing a small IRC framework in C that I'm now going to expand with some core functionality - but beyond that, I'd like it to be extensible with plugins!
Up until now, whenever I wrote something IRC related (and I wrote a lot, in about 6 different languages now... I'm on fire!) and actually went ahead to implement a plugin architecture, it was inside an interpreted language that had facilities for doing (read: abusing) so, like jamming a whole script file trough eval in Ruby (bad!).
Now I want to abuse something in C!
Basically there's three things I could do
define a simple script language inside of my program
use an existing one, embedding an interpreter
use libdl to load *.so modules on runtime
I'm fond of the third one and raather avoid the other two options if possible. Maybe I'm a masochist of some sort, but I think it could be both fun and useful for learning purposes.
Logically thinking, the obvious "pain-chain" would be (lowest to highest) 2 -> 1 -> 3, for the simple reason that libdl is dealing with raw code that can (and will) explode in my face more often than not.
So this question goes out to you, fellow users of stackoverflow, do you think libdl is up to this task, or even a realistic thought?
libdl is very well suited to plug-in architectures - within certain boundaries :-). It is used a lot for exactly this sort of purpose in a lot of different software. It works well in situations where there is a well-defined API/interface between the main program and the plug-in, and a number of different plug-ins implement the same API/interface. For instance, your IRC client might have plug-ins that implement gateways to different IM protocols (Jabber, MSN, Sametime etc...) - all of these are very similar, so you could define an API with functions such as "send message", "check for reply" etc - and write a bunch of plug-ins that each implemented a different one of the protocols.
The situation where it works less well is where you want to have the plug-ins make arbitrary changes to the behaviour of the main program - in the way that, for instance, Firefox plug-ins can change the behaviour of browser tabs, their appearance, add/remove buttons, and so on. This sort of thing is much easier to achieve in a dynamic language (hence why much of Firefox is implemented in javascript), and if this is the sort of customisation you want you may be better off with your option (2), and writing a lot of your UI in the scripting language...
dlopen() / dlsym() are probably the easiest way to go. Some silly psuedo code:
int run_module(const char *path, char **args)
{
void *module;
void (*initfunc)(char **agrs);
int rc = 0;
module = dlopen(path, RTLD_NOW);
if (module == NULL)
err_out("Could not open module %s", path);
initfunc = dlsym(module, "module_init");
if (initfunc == NULL) {
dlclose(module);
err_out("Could not find symbol init_func in %s", path);
}
rc = initfunc(args);
dlclose(module);
return rc;
}
You would, of course, want much more in the way of error checking, as well as code that actually did something useful :) It is, however extremely easy and convenient to write a plug-in architecture around the pair and publish an easy spec for others to do the same.
You'd probably want something more along the lines of load_module(), the above just loads the SO, seeks an entry point and blocks until that entry point exits.
That's not to say that writing your own scripting language is a bad idea. People could write complex filters, responders, etc without having to go through a lot of trouble. Perhaps both would be a good idea. I don't know if you'd want a full fledged LUA interpreter .. maybe you could come up with something that makes taking actions based on regular expressions simple.
Still, plug in modules will not only make your life simpler, they'll help you grow a community of people developing stuff around whatever you make.
There are plenty of existing C programs out there that use dlopen() / dlsym() to implement a plugin architecture (including more than one IRC-related one); so yes, it is definitely up to the task.

C - alternative to #ifdef

I'm trying to streamline large chunk of legacy C code in which, even today, before doing the build guy who maintains it takes a source file(s) and manually modifies the following section before the compilation based on the various types of environment.
The example follows but here's the question. I'm rusty on my C but I do recall that using #ifdef is discouraged. Can you guys offer better alternative? Also - I think some of it (if not all of it) can be set as environment variable or passed in as a parameter and if so - what would be a good way of defining these and then accessing from the source code?
Here's snippet of the code I'm dealing with
#define DAN NO
#define UNIX NO
#define LINUX YES
#define WINDOWS_ES NO
#define WINDOWS_RB NO
/* Later in the code */
#if ((DAN==1) || (UNIX==YES))
#include <sys/param.h>
#endif
#if ((WINDOWS_ES==YES) || (WINDOWS_RB==YES) || (WINDOWS_TIES==YES))
#include <param.h>
#include <io.h>
#include <ctype.h>
#endif
/* And totally insane harcoded paths */
#if (DAN==YES)
char MasterSkipFile[MAXSTR] = "/home/dp120728/tools/testarea/test/MasterSkipFile";
#endif
#if (UNIX==YES)
char MasterSkipFile[MAXSTR] = "/home/tregrp/tre1/tretools/MasterSkipFile";
#endif
#if (LINUX==YES)
char MasterSkipFile[MAXSTR] = "/ptehome/tregrp/tre1/tretools/MasterSkipFile";
#endif
/* So on for every platform and combination */
Sure, you can pass -DWHATEVER on the command line. Or -DWHATEVER_ELSE=NO, etc. Maybe for the paths you could do something like
char MasterSkipFile[MAXSTR] = SOME_COMMAND_LINE_DEFINITION;
and then pass
-DSOME_COMMAND_LINE_DEFINITION="/home/whatever/directory/filename"
on the command line.
One thing we used to do is have a generated .h file with these definitions, and generate it with a script. That helped us get rid of a lot of brittle #ifs and #ifdefs
You need to be careful about what you put there, but machine-specific parameters are good candidates - this is how autoconf/automake work.
EDIT: in your case, an example would be to use the generated .h file to define INCLUDE_SYS_PARAM and INCLUDE_PARAM, and in the code itself use:
#ifdef INCLUDE_SYS_PARAM
#include <sys/param.h>
#endif
#ifdef INCLUDE_PARAM
#include <param.h>
#endif
Makes it much easier to port to new platforms - the existence of a new platform doesn't trickle into the code, only to the generated .h file.
Platform specific configuration headers
I'd have a system to generate the platform-specific configuration into a header that is used in all builds. The AutoConf name is 'config.h'; you can see 'platform.h' or 'porting.h' or 'port.h' or other variations on the theme. This file contains the information needed for the platform being built. You can generate the file by copying a version-controlled platform-specific variant to the standard name. You can use a link instead of copying. Or you can run configuration scripts to determine its contents based on what the script finds on the machine.
Default values for configuration parameters
The code:
#if (DAN==YES)
char MasterSkipFile[MAXSTR] = "/home/dp120728/tools/testarea/MasterSkipFile";
#endif
#if (UNIX==YES)
char MasterSkipFile[MAXSTR] = "/home/tregrp/tre1/tretools/MasterSkipFile";
#endif
#if (LINUX==YES)
char MasterSkipFile[MAXSTR] = "/ptehome/tregrp/tre1/tretools/MasterSkipFile";
#endif
Would be better replaced by:
#ifndef MASTER_SKIP_FILE_PATH
#define MASTER_SKIP_FILE_PATH "/opt/tretools/MasterSkipFile"
#endif
const char MasterSkipFile[] = MASTER_SKIP_FILE_PATH;
Those who want the build in a different location can set the location via:
-DMASTER_SKIP_FILE_PATH='"/ptehome/tregtp/tre1/tretools/PinkElephant"'
Note the use of single and double quotes; try to avoid doing this on the command line with backslashes in the path. You can use a similar default mechanism for all sorts of things:
#ifndef DEFAULTABLE_PARAMETER
#define DEFAULTABLE_PARAMETER default_value
#endif
If you choose your defaults well, this can save a lot of energy.
Relocatable software
I'm not sure about the design of the software that can only be installed in one location. In my book, you need to be able to have the old version 1.12 of the product installed on the machine at the same time as the new 2.1 version, and they should be able to operate independently. A hard-coded path name defeats that.
Parameterize by feature not platform
The key difference between the AutoConf tools and the average alternative system is that the configuration is done based on features, not on platforms. You parameterize your code to identify a feature that you want to use. This is crucial because features tend to appear on platforms other than the original. I look after code where there are lines like:
#if defined(SUN4) || defined(SOLARIS_2) || defined(HP_UX) || \
defined(LINUX) || defined(PYRAMID) || defined(SEQUENT) || \
defined(SEQUENT40) || defined(NCR) ...
#include <sys/types.h>
#endif
It would be much, much better to have:
#ifdef INCLUDE_SYS_TYPES_H
#include <sys/types.h>
#endif
And then on the platforms where it is needed, generate:
#define INCLUDE_SYS_TYPES_H
(Don't take this example header too literally; it is the concept I am trying to get over.)
Treat platform as a bundle of features
As a corollary to the previous point, you do need to detect platform and define the features that are applicable to that platform. This is where you have the platform-specific configuration header which defines the configuration features.
Product features should be enabled in a header
(Elaborating on a comment I made to another answer.)
Suppose you have a bunch of features in the product that need to be included or excluded conditionally. For example:
KVLOCKING
B1SECURITY
C2SECURITY
DYNAMICLOCKS
The relevant code is included when the appropriate define is set:
#ifdef KVLOCKING
...KVLOCKING stuff...
#else
...non-KVLOCKING stuff...
#endif
If you use a source code analysis tool like cscope, then it is helpful if it can show you when KVLOCKING is defined. If the only place where it is defined is in some random Makefiles scattered around the build system (let's assume there are a hundred sub-directories that are used in this), it is hard to tell whether the code is still in use on any of your platforms. If the defines are in a header somewhere - the platform specific header, or maybe a product release header (so version 1.x can have KVLOCKING and version 2.x can include C2SECURITY but 2.5 includes B1SECURITY, etc), then you can see that KVLOCKING code is still in use.
Believe me, after twenty years of development and staff turnover, people don't know whether features are still in use or not (because it is stable and never causes problems - possibly because it is never used). And if the only place to find whether KVLOCKING is still defined is in the Makefiles, then tools like cscope are less helpful - which makes modifying the code more error prone when trying to clean up later.
Its much saner to use :
#if SOMETHING
.. from platform to platform, to avoid confusing broken preprocessors. However any modern compiler should effectively argue your case in the end. If you give more details on your platform, compiler and preprocessor you might receive a more concise answer.
Conditional compilation, given the plethora of operating systems and variants therein is a necessary evil. if, ifdef, etc are most decidedly not an abuse of the preprocessor, just exercising it as intended.
My preferred way would be to have the build system do the OS detection. Complex cases you'd want to isolate the machine-specific stuff into a single source file, and have completely different source files for the different OSes.
So in this case, you'd have a #include "OS_Specific.h" in that file. You put the different includes, and the definition of MasterSkipFile for this platform. You can select between them by specifying different -I (include path directories) on your compiler command line.
The nice thing about doing it this way is that somebody trying to figure out the code (perhaps debugging) doesn't have to wade through (and possibly be misled by) phantom code for a platform they aren't even running on.
I've seen build systems in which most of the source files started something off like this:
#include PLATFORM_CONFIG
#include BUILD_CONFIG
and the compiler was kicked off with:
cc -DPLATFORM_CONFIG="linuxconfig.h" -DBUILD_CONFIG="importonlyconfig.h"
(this may need backslash escapes)
this had the effect of letting you separate out the platform settings in one set of files and the configuration settings in another. Platform settings manages handling library calls that may not exist on one platform or not in the right format as well as defining important size dependent types--things that are platform specific. Build settings handles what features are being enabled in the output.
Generalities
I'm a heretic who has been cast out from the Church of the GNU Autotools. Why? Because I like to understand what the hell my tools are doing. And because I've had the experience of trying to combine two components, each of which insisted on a different, incompatible version of autotools being the default version installed on my computer.
I work by creating one .h file or .c filed for every combination of platform and significant abstraction. I work hard to define a central .h file that says what the interface is. Often this means I wind up creating a "compatibility layer" that insulates me from differences between platforms. Often I wind up using ANSI Standard C whenever possible, instead of platform-specific functionality.
I sometimes write scripts to generate platform-dependent files. But the scripts are always written by hand and documented, so I know what they do.
I admire Glenn Fowler's nmake and Phong Vo's iffe (if feature exists), which I think are better engineered than the GNU tools. But these tools are part of the AT&T Software Technology suite, and I haven't been able to figure out how to use them without buying into the whole AST way of doing things, which I don't always understand.
Your example
There clearly needs to be
extern char MasterSkipFile[];
in a .h file somewhere, and you can then link against a suitable .o.
The conditional inclusion of the "right set of .h files for the platform" is something I would handle by trying to stick to ANSI C when possible, and when not possible, defining a compatibility layer in a platform-specific .h file. As it is, I can't tell what names the #includes are trying to import, so I can't give more specific advice.

How to walk a directory in C

I am using glib in my application, and I see there are convenience wrappers in glib for C's remove, unlink and rmdir. But these only work on a single file or directory at a time.
As far as I can see, neither the C standard nor glib include any sort of recursive directory walk functionality. Nor do I see any specific way to delete an entire directory tree at once, as with rm -rf.
For what I'm doing this I'm not worried about any complications like permissions, symlinks back up the tree (infinite recursion), or anything that would rule out a very naive
implementation... so I am not averse to writing my own function for it.
However, I'm curious if this functionality is out there somewhere in the standard libraries gtk or glib (or in some other easily reused C library) already and I just haven't stumbled on it. Googling this topic generates a lot of false leads.
Otherwise my plan is to use this type of algorithm:
dir_walk(char* path, void* callback(char*) {
if(is_dir(path) && has_entries(path)) {
entries = get_entries(path);
for(entry in intries) { dir_walk(entry, callback); }
}
else { callback(path) }
}
dir_walk("/home/user/trash", remove);
Obviously I would build in some error handling and the like to abort the process as soon as a fatal error is encountered.
Have you looked at <dirent.h>? AFAIK this belongs to the POSIX specification, which should be part of the standard library of most, if not all C compilers. See e.g. this <dirent.h> reference (Single UNIX specification Version 2 by the Open Group).
P.S., before someone comments on this: No, this does not offer recursive directory traversal. But then I think this is best implemented by the developer; requirements can differ quite a lot, so one-size-fits-all recursive traversal function would have to be very powerful. (E.g.: Are symlinks followed up? Should recursion depth be limited? etc.)
You can use GFileEnumerator if you want to do it with glib.
Several platforms include ftw and nftw: "(new) file tree walk". Checking the man page on an imac shows that these are legacy, and new users should prefer fts. Portability may be an issue with either of these choices.
Standard C libraries are meant to provide primitive functionality. What you are talking about is composite behavior. You can easily implement it using the low level features present in your API of choice -- take a look at this tutorial.
Note that the "convenience wrappers" you mention for remove(), unlink() and rmdir(), assuming you mean the ones declared in <glib/gstdio.h>, are not really "convenience wrappers". What is the convenience in prefixing totally standard functions with a "g_"? (And note that I say this even if I who introduced them in the first place.)
The only reason these wrappers exist is for file name issues on Windows, where these wrappers actually consist of real code; they take file name arguments in Unicode, encoded in UTF-8. The corresponding "unwrapped" Microsoft C library functions take file names in system codepage.
If you aren't specifically writing code intended to be portable to Windows, there is no reason to use the g_remove() etc wrappers.

Resources