Recursive CreateDirectory - c

I found many examples of CreatingDirectory recursively, but not the one I was looking for.
here is the spec
Given input
\\server\share\aa\bb\cc
c:\aa\bb\cc
USING helper API
CreateDirectory (char * path)
returns true, if successful
else
FALSE
Condition: There should not be any parsing to distinguish if the path is Local or Server share.
Write a routine in C, or C++

I think it's quite easier... here a version that works in every Windows version:
unsigned int pos = 0;
do
{
pos = path.find_first_of("\\/", pos + 1);
CreateDirectory(path.substr(0, pos).c_str(), NULL);
} while (pos != std::string::npos);
Unicode:
pos = path.find_first_of(L"\\/", pos + 1);
Regards,

This might be exactly what you want.
It doesn't try to do any parsing to distinguish if the path is Local or Server share.
bool TryCreateDirectory(char *path){
char *p;
bool b;
if(
!(b=CreateDirectory(path))
&&
!(b=NULL==(p=strrchr(path, '\\')))
){
size_t i;
(p=strncpy((char *)malloc(1+i), path, i=p-path))[i]='\0';
b=TryCreateDirectory(p);
free(p);
b=b?CreateDirectory(path):false;
}
return b;
}
The algorithm is quite simple, just pass the string of higher level directory recursively while creation of current level of directory fails until one success or there is no more higher level. When the inner call returns with succeed, create the current. This method do not parse to determ the local or server it self, it's according to the CreateDirectory.
In WINAPI, CreateDirectory will never allows you to create "c:" or "\" when the path reaches that level, the method soon falls in to calling it self with path="" and this fails, too. It's the reason why Microsoft defines file sharing naming rule like this, for compatibility of DOS path rule and simplify the coding effort.

Totally hackish and insecure and nothing you'd ever actually want to do in production code, but...
Warning: here be code that was typed in a browser:
int createDirectory(const char * path) {
char * buffer = malloc((strlen(path) + 10) * sizeof(char));
sprintf(buffer, "mkdir -p %s", path);
int result = system(buffer);
free(buffer);
return result;
}

How about using MakeSureDirectoryPathExists() ?

Just walk through each directory level in the path starting from the root, attempting to create the next level.
If any of the CreateDirectory calls fail then you can exit early, you're successful if you get to the end of the path without a failure.
This is assuming that calling CreateDirectory on a path that already exists has no ill effects.

The requirement of not parsing the pathname for server names is interesting, as it seems to concede that parsing for / is required.
Perhaps the idea is to avoid building in hackish expressions for potentially complex syntax for hosts and mount points, which can have on some systems elaborate credentials encoded.
If it's homework, I may be giving away the algorithm you are supposed to think up, but it occurs to me that one way to meet those requirements is to start trying by attempting to mkdir the full pathname. If it fails, trim off the last directory and try again, if that fails, trim off another and try again... Eventually you should reach a root directory without needing to understand the server syntax, and then you will need to start adding pathname components back and making the subdirs one by one.

std::pair<bool, unsigned long> CreateDirectory(std::basic_string<_TCHAR> path)
{
_ASSERT(!path.empty());
typedef std::basic_string<_TCHAR> tstring;
tstring::size_type pos = 0;
while ((pos = path.find_first_of(_T("\\/"), pos + 1)) != tstring::npos)
{
::CreateDirectory(path.substr(0, pos + 1).c_str(), nullptr);
}
if ((pos = path.find_first_of(_T("\\/"), path.length() - 1)) == tstring::npos)
{
path.append(_T("\\"));
}
::CreateDirectory(path.c_str(), nullptr);
return std::make_pair(
::GetFileAttributes(path.c_str()) != INVALID_FILE_ATTRIBUTES,
::GetLastError()
);
}

void createFolders(const std::string &s, char delim) {
std::stringstream ss(s);
std::string item;
char combinedName[50]={'\0'};
while (std::getline(ss, item, delim)) {
sprintf(combinedName,"%s%s%c",combinedName,item.c_str(),delim);
cout<<combinedName<<endl;
struct stat st = {0};
if (stat(combinedName,&st)==-1)
{
#if REDHAT
mkdir(combinedName,0777);
#else
CreateDirectory(combinedName,NULL);
#endif
}
}
}

Related

What does git_sysdir_find_in_dirlist() in libgit2 do?

I'm working on porting some logic from libgit2 into Go, but not a 1:1 port as Go works differently. I think this function scans a directory tree, but I'm not sure.
static int git_sysdir_find_in_dirlist(
git_buf *path,
const char *name,
git_sysdir_t which,
const char *label)
{
// allocations
size_t len;
const char *scan, *next = NULL;
const git_buf *syspath;
// check the path to make sure it exists?
GIT_ERROR_CHECK_ERROR(git_sysdir_get(&syspath, which));
if (!syspath || !git_buf_len(syspath))
goto done;
// this is the part I don't understand
for (scan = git_buf_cstr(syspath); scan; scan = next) {
/* find unescaped separator or end of string */
for (next = scan; *next; ++next) {
if (*next == GIT_PATH_LIST_SEPARATOR &&
(next <= scan || next[-1] != '\\'))
break;
}
len = (size_t)(next - scan);
next = (*next ? next + 1 : NULL);
if (!len)
continue;
GIT_ERROR_CHECK_ERROR(git_buf_set(path, scan, len));
if (name)
GIT_ERROR_CHECK_ERROR(git_buf_joinpath(path, path->ptr, name));
if (git_path_exists(path->ptr))
return 0;
}
done:
git_buf_dispose(path);
git_error_set(GIT_ERROR_OS, "the %s file '%s' doesn't exist", label, name);
return GIT_ENOTFOUND;
}
It's the for loop that's confusing me. for (scan = git_buf_cstr(syspath); scan; scan = next) { ... } looks like it's iterating/scanning syspath, and then I get totally lost at for (scan = git_buf_cstr(syspath); scan; scan = next) { ... }.
What does this function specifically do?
This is not looking at a directory tree but rather at delimited string containing a list of directories. For instance (though this clearly isn't aimed at this particular case), as the top level documentation says:
GIT_ALTERNATE_OBJECT_DIRECTORIES
Due to the immutable nature of Git objects, old objects can be
archived into shared, read-only directories. This variable
specifies a ":" separated (on Windows ";" separated) list of Git
object directories which can be used to search for Git objects. New
objects will not be written to these directories.
Entries that begin with " (double-quote) will be interpreted as
C-style quoted paths, removing leading and trailing double-quotes
and respecting backslash escapes. E.g., the value
"path-with-\"-and-:-in-it":vanilla-path has two paths:
path-with-"-and-:-in-it and vanilla-path.
The function is obviously scanning a character-separated path list (whatever that character is—probably colon or semicolon as above), checking for backslash prefixes, so that you can write /a:C\:/a to allow the thing to look into either /a or C:/a.
This function is tasked with locating the file name inside a "configuration level", (ie. ~/.git/, /etc/git, see git_sysdir_t for a list of known locations). As those levels are stored as a bunch of static ("readonly") / (or \)-separated C strings, and we can't modify that at runtime, we have to jump through hoops to what amounts to a foreach-string loop.

File IO in the apache portable runtime library

While working through Zed Shaw's learn C the Hard Way, I encountered the function apr_dir_make_recursive() which according to the documentation here has the type signature
apr_status_t apr_dir_make_recursive(const char *path, apr_fileperms_t perm, apr_pool_t *pool)
Which makes the directory, identical to the Unix command mkdir -p.
Why would the IO function need a memory pool in order to operate?
My first thought was that it was perhaps an optional argument to populate the newly made directory, however the code below uses an initialized but presumptively empty memory pool. Does this mean that the IO function itself needs a memory pool, that we are passing in for it to use? But that doesn't seem likely either; couldn't the function simply create a local memory pool for it to use which is then destroyed upon return or error?
So, what use is the memory pool? The documentation linked is unhelpful on this point.
Code shortened and shown below, for the curious.
int DB_init()
{
apr_pool_t *p = NULL;
apr_pool_initialize();
apr_pool_create(&p, NULL);
if(access(DB_DIR, W_OK | X_OK) == -1) {
apr_status_t rc = apr_dir_make_recursive(DB_DIR,
APR_UREAD | APR_UWRITE | APR_UEXECUTE |
APR_GREAD | APR_GWRITE | APR_GEXECUTE, p);
}
if(access(DB_FILE, W_OK) == -1) {
FILE *db = DB_open(DB_FILE, "w");
check(db, "Cannot open database: %s", DB_FILE);
DB_close(db);
}
apr_pool_destroy(p);
return 0;
}
If you pull up the source, you'll see: apr_dir_make_recursive() calls path_remove_last_component():
static char *path_remove_last_component (const char *path, apr_pool_t *pool)
{
const char *newpath = path_canonicalize (path, pool);
int i;
for (i = (strlen(newpath) - 1); i >= 0; i--) {
if (path[i] == PATH_SEPARATOR)
break;
}
return apr_pstrndup (pool, path, (i < 0) ? 0 : i);
}
This function is creating copies of the path in apr_pstrndup(), each representing a smaller component of it.
To answer your question - because of how it was implemented. Would it be possible to do the same without allocating memory, yes. I think in this case everything came out cleaner and more readable by copying the necessary path components.
The implementation of the function (found here) shows that the pool is used to allocate strings representing the individual components of the path.
The reason the function does not create its own local pool is because the pool may be reused across multiple calls to the apr_*() functions. It just so happens that DB_init() does not have a need to reuse an apr_pool_t.

Check if a file is a specific type in C

I'm writing my first C program, though I come from a C++ background.
I need to iterate through a directory of files and check to see if the file is a header file, and then return the count.
My code is as follows, it's pretty rudimentary I think:
static int CountHeaders( const char* dirname ) {
int header_count = 0;
DIR* dir_ptr;
struct dirent* entry;
dir_ptr = opendir( dirname );
while( ( entry = readdir( dir_ptr ) ) )
{
if ( entry->d_type == DT_REG )
{
//second if statement to verify the file is a header file should be???
++header_count;
}
}
closedir( dir_ptr );
return header_count;
}
What would be a good if statement to check to see if the file is a header?
Simply check if the file extension is .h, something like:
const char *ext = strrchr (entry->d_name, '.');
if ((ext != NULL) && (!strcmp (ext+1, "h"))) {
// header file
}
Ofcourse, note that this assumes all your header files have an .h extension, which may or may not be true, the C standard does not mandate that header files must have an .h extension.
Each dirent structure has a d_name containing the name of the file, so I'd be looking to see if that followed some pattern, like ending in .h or .hpp.
That would be code along the lines of:
int len = strlen (entry->d_name);
if ((len >= 2) && strcmp (&(entry->d_name[len - 2]), ".h") == 0))
header_count++;
if ((len >= 4) && strcmp (&(entry->d_name[len - 4]), ".hpp") == 0))
header_count++;
Of course, that won't catch truly evil people from calling their executables ha_ha_fooled_you.hpp but thanfkfully they're in the minority.
You may even want to consider an endsWith() function to make your life easier:
int endsWith (char *str, char *end) {
size_t slen = strlen (str);
size_t elen = strlen (end);
if (slen < elen)
return 0;
return (strcmp (&(str[slen-elen]), end) == 0);
}
:
if (endsWith (entry->d_name, ".h")) header_count++;
if (endsWith (entry->d_name, ".hpp")) header_count++;
There are some much better methods than checking the file extension.
Wikipedia has a good article here and here. The latter idea is called the magic number database which essentially means that if a file contains blah sequence then it is the matching type listed in the database. Sometimes the number has restrictions on locations and sometimes it doesnt. This method IMO is more accurate albeit slower than file extension detection.
But then again, for something as simple as checking to see if its a header, this may be a bit of overkill XD
You could check if the last few characters are one of the header-file extensions, .h, .hpp, etc. Use the dirent struct's d_name for the name of the file.
Or, you could run the 'file' command and parse its result.
You probably just want to check the file extension. Using dirent, you would want to look at d_name.
That's up to you.
The easiest way is to just look at the filename (d_name), and check whether it ends with something like ".h" or ".hpp" or whatever.
Opening the file and actually reading it to see if it's valid c/c++, on the other hand, will be A LOT more complex... you could run it through a compiler, but not every header works on its own, so that test will give you a lot of false negatives.

a recursive function to manipulate a given path

I am working on modifying the didactic OS xv6 (written in c) to support symbolic links (AKA shortcuts).
A symbolic link is a file of type T_SYM that contains a path to it's destination.
For doing that, i wrote a recursive function that gets a path and a buffer and fills the buffer with the "real" path (i.e. if the path contains a link, it should be replaced by the real path, and a link can occur at any level in the path).
Basically, if i have a path a/b/c/d, and a link from f to a/b, the following operations should be equivalent:
cd a/b/c/d
cd f/c/d
Now, the code is written, but the problem that i try to solve is the problem of starting the path with "/" (meaning that the path is absolute and not relative).
Right now, if i run it with a path named /dir1 it treats it like dir1 (relative instead of absolute).
This is the main function, it calls the recursive function.
pathname is the given path, buf will contain the real path.
int readlink(char *pathname, char *buf, size_t bufsize){
char name[DIRSIZ];
char realpathname[100];
memset(realpathname,0,100);
realpathname[0] = '/';
if(get_real_path(pathname, name, realpathname, 0, 0)){
memmove(buf, realpathname, strlen(realpathname));
return strlen(realpathname);
}
return -1;
}
This is the recursive part.
the function returns an inode structure (which represents a file or directory in the system). it builds the real path inside realpath.
ilock an iunlock are being used to use the inode safely.
struct inode* get_real_path(char *path, char *name, char* realpath, int position){
struct inode *ip, *next;
char buf[100];
char newpath[100];
if(*path == '/')
ip = iget(ROOTDEV, ROOTINO);// ip gets the root directory
else
ip = idup(proc->cwd); // ip gets the current working directory
while((path = skipelem(path, name)) != 0){name will get the next directory in the path, path will get the rest of the directories
ilock(ip);
if(ip->type != T_DIR){//if ip is a directory
realpath[position-1] = '\0';
iunlockput(ip);
return 0;
}
if((next = dirlookup(ip, name, 0)) == 0){//next will get the inode of the next directory
realpath[position-1] = '\0';
iunlockput(ip);
return 0;
}
iunlock(ip);
ilock(next);
if (next->type == T_SYM){ //if next is a symbolic link
readi(next, buf, 0, next->size); //buf contains the path inside the symbolic link (which is a path)
buf[next->size] = 0;
iunlockput(next);
next = get_real_path(buf, name, newpath, 0);//call it recursively (might still be a symbolic link)
if(next == 0){
realpath[position-1] = '\0';
iput(ip);
return 0;
}
name = newpath;
position = 0;
}
else
iunlock(next);
memmove(realpath + position, name, strlen(name));
position += strlen(name);
realpath[position++]='/';
realpath[position] = '\0';
iput(ip);
ip = next;
}
realpath[position-1] = '\0';
return ip;
}
I have tried many ways to do it right but with no success. If anyone sees the problem, i'd be happy to hear the solution.
Thanks,
Eyal
I think it's clear that after running get_real_path(pathname, name, realpathname, 0, 0) the realpathname cannot possibly start with a slash.
Provided the function executes successfully, the memmove(realpath + position, name, strlen(name)) ensures that realpath starts with name, as the position variable always contains zero at the first invocation of memmove.
I'd suggest something like
if(*path == '/') {
ip = iget(ROOTDEV, ROOTINO); // ip gets the root
realpath[position++] = '/';
} else
ip = idup(proc->cwd); // ip gets the current working directory
P.S. I'm not sure why you put a slash into the realpathname before executing the get_real_path, since at this point you don't really know whether the path provided is an absolute one.
Ok, found the problem...
The problem was deeper than what i thought...
Somehow the realpath was changed sometimes with no visible reason... but the reason was the line:
name = newpath;
the solution was to change that line to
strcpy(name,newpath);
the previous line made a binding between the name and the realpath... which can be ok if we were not dealing with softlinks. When dereferencing a subpath, this binding ruined everything.
Thanks for the attempts

Way to pass argv[] to CreateProcess()

My C Win32 application should allow passing a full command line for another program to start, e.g.
myapp.exe /foo /bar "C:\Program Files\Some\App.exe" arg1 "arg 2"
myapp.exe may look something like
int main(int argc, char**argv)
{
int i;
for (i=1; i<argc; ++i) {
if (!strcmp(argv[i], "/foo") {
// handle /foo
} else if (!strcmp(argv[i], "/bar") {
// handle /bar
} else {
// not an option => start of a child command line
break;
}
}
// run the command
STARTUPINFO si;
PROCESS_INFORMATION pi;
// customize the above...
// I want this, but there is no such API! :(
CreateProcessFromArgv(argv+i, NULL, NULL, FALSE, 0, NULL, NULL, &si, &pi);
// use startup info si for some operations on a process
// ...
}
I can think about some workarounds:
use GetCommandLine()
and find a substring corresponding to argv[i]
write something similar to ArgvToCommandLine() also mentioned in another SO question
Both of them lengthy and re-implement cumbersome windows command line parsing logic, which is already a part of CommandLineToArgvW().
Is there a "standard" solution for my problem? A standard (Win32, CRT, etc.) implementation of workarounds counts as a solution.
It's actually easier than you think.
1) There is an API, GetCommandLine() that will return you the whole string
myapp.exe /foo /bar "C:\Program Files\Some\App.exe" arg1 "arg 2"
2) CreateProcess() allows to specify the command line, so using it as
CreateProcess(NULL, "c:\\hello.exe arg1 arg2 etc", ....)
will do exactly what you need.
3) By parsing your command line, you can just find where the exe name starts, and pass that address to the CreateProcess() . It could be easily done with
char* cmd_pos = strstr(GetCommandLine(), argv[3]);
and finally: CreateProcess(NULL, strstr(GetCommandLine(), argv[i]), ...);
EDIT: now I see that you've already considered this option. If you're concerned about performance penalties, they are nothing comparing to process creation.
The only standard function which you not yet included in your question is PathGetArgs, but it do not so much. The functions PathQuoteSpaces and PathUnquoteSpaces can be also helpful. In my opinion the usage of CommandLineToArgvW in combination with the with GetCommandLineW is what you really need. The usage of UNICODE during the parsing of the command line is in my opinion mandatory if you want to have a general solution.
I solved it as follows: With your Visual Studio install you can find a copy of some of the standard code used to create the C library. In particular if you look in VC\crt\src\stdargv.c you will find the implementation of "wparse_cmdline" function which creates argc and argv from the result of GetCommandLineW API. I created an augmented version of this code which also created a "cmdv" array of pointers which pointed back into the original string at the place where each argv pointer begins. You can then act on argv arguments as you wish, and when you want to pass the "rest" on to CreateProcess you can just pass in cmdv[i] instead.
This solution has the advantages that is uses the exact same parsing code, still provides argv as usual, and allows you to pass on the original without needing to re-quote or re-escape it.
I have faced the same problems with you. The thing is, we don't need to parse the whole string, if we can separate the result of GetCommandLine(), then you can put them together afterwards.
According to Microsoft's documentation, you should consider only backslashes and quotes.
You can find their documents here.
And, then, you can call Solve to get the next parameter start point.
E.g.
"a b c" d e
First Part: "a b c"
Next Parameter Start: d e
I solved the examples in Microsoft documentation, so do worry about the compatibility.
By calling the Solve function recursively, you can get the whole argv array.
Here's the file test.c
#include <stdio.h>
extern char* Solve(char* p);
void showString(char *str)
{
char *end = Solve(str);
char *p = str;
printf("First Part: ");
while(p < end){
fputc(*p, stdout);
p++;
}
printf("\nNext Parameter Start: %s\n", p + 1);
}
int main(){
char str[] = "\"a b c\" d e";
char str2[] = "a\\\\b d\"e f\"g h";
char str3[] = "a\\\\\\\"b c d";
char str4[] = "a\\\\\\\\\"b c\" d e";
showString(str);
showString(str2);
showString(str3);
showString(str4);
return 0;
}
Running result are:
First Part: "a b c"
Next Parameter Start: d e
First Part: a\\b
Next Parameter Start: d"e f"g h
First Part: a\\\"b
Next Parameter Start: c d
First Part: a\\\\"b c"
Next Parameter Start: d e
Here's all the source code of Solve function, file findarg.c
/**
This is a FSM for quote recognization.
Status will be
1. Quoted. (STATUS_QUOTE)
2. Normal. (STATUS_NORMAL)
3. End. (STATUS_END)
Quoted can be ended with a " or \0
Normal can be ended with a " or space( ) or \0
Slashes
*/
#ifndef TRUE
#define TRUE 1
#endif
#define STATUS_END 0
#define STATUS_NORMAL 1
#define STATUS_QUOTE 2
typedef char * Pointer;
typedef int STATUS;
static void MoveSlashes(Pointer *p){
/*According to Microsoft's note, http://msdn.microsoft.com/en-us/library/17w5ykft.aspx */
/*Backslashes are interpreted literally, unless they immediately precede a double quotation mark.*/
/*Here we skip every backslashes, and those linked with quotes. because we don't need to parse it.*/
while (**p == '\\'){
(*p)++;
//You need always check the next element
//Skip \" as well.
if (**p == '\\' || **p == '"')
(*p)++;
}
}
/* Quoted can be ended with a " or \0 */
static STATUS SolveQuote(Pointer *p){
while (TRUE){
MoveSlashes(p);
if (**p == 0)
return STATUS_END;
if (**p == '"')
return STATUS_NORMAL;
(*p)++;
}
}
/* Normal can be ended with a " or space( ) or \0 */
static STATUS SolveNormal(Pointer *p){
while (TRUE){
MoveSlashes(p);
if (**p == 0)
return STATUS_END;
if (**p == '"')
return STATUS_QUOTE;
if (**p == ' ')
return STATUS_END;
(*p)++;
}
}
/*
Solve the problem and return the end pointer.
#param p The start pointer
#return The target pointer.
*/
Pointer Solve(Pointer p){
STATUS status = STATUS_NORMAL;
while (status != STATUS_END){
switch (status)
{
case STATUS_NORMAL:
status = SolveNormal(&p); break;
case STATUS_QUOTE:
status = SolveQuote(&p); break;
case STATUS_END:
default:
break;
}
//Move pointer to the next place.
if (status != STATUS_END)
p++;
}
return p;
}
I think it's actually harder than you think for a general case.
See What's up with the strange treatment of quotation marks and backslashes by CommandLineToArgvW.
It's ultimately up to the individual programs how they tokenize the command-line into an argv array (and even CommandLineToArgv in theory (and perhaps in practice, if what one of the comments said is true) could behave differently than the CRT when it initializes argv to main()), so there isn't even a standard set of esoteric rules that you can follow.
But anyway, the short answer is: no, there unfortunately is no easy/standard solution. You'll have to roll your own function to deal with quotes and backslashes and the like.

Resources