I have to do a program where I need to index the files in a specified directory. I've gotten the indexing part down, but what I'm having trouble with is how to navigate to the directory.
For example, say when I start the program, it will ask "What directory would you like to index," And then the input would be "usr/Documents/CS/Assignment4," how do I get to the "Assignment4" directory? I know recursion is needed, but I'm really confused as to how directories work in C. Say my source file is in "usr/Documents/SourceCode," then what should I do to get to Assignment4?
I know I sound like I want all the answers, but I'm completely lost as to how directories work, and the book I have sucks. So even if all you have is a link to a good tutorial on this, that would be fantastic.
I'm running Linux, Ubuntu to be exact. GCC is the compiler.
The C programming language doesn't have a notion of a file system. This is instead an operating system specific question.
Based on the style of directory in your question though it sounds like you're on a unix / linux style system. If that's the case then you're looking for the opendir function
http://linux.die.net/man/3/opendir
Recursively traversing a directory in C goes something like this:
Use opendir and readdir to list the directory entries. I probably shouldn't be doing this, but I'm posting a full code sample (sans error handling) because there are a bunch of little things you have to do to ensure you're using the API correctly:
DIR *dir;
struct dirent *de;
const char *name;
dir = opendir(dirpath);
if (dir == NULL) {
/* handle error */
}
for (;;) {
errno = 0;
de = readdir(dir);
if (de == NULL) {
if (errno != 0) {
/* handle error */
} else {
/* no more entries left */
break;
}
}
/* name of file (prefix it with dirpath to get a usable file path) */
name = de->d_name;
/* ignore . and .. */
if (name[0] == '.' && (name[1] == '\0' || (name[1] == '.' && name[2] == '\0')))
continue;
/* do something with the file */
}
if (closedir(dir) != 0) {
/* handle error */
}
When working with each file, be sure to prepend the dirpath to it (along with a slash, if needed). You could also use chdir to descend and ascend, but it introduces complications in practice (e.g. you can't traverse two directories simultaneously), so I personally recommend keeping your current working directory stationary and using string manipulation to concatenate paths.
To find out if a path is a directory or not (and hence whether you should opendir() it), I recommend using lstat() rather than stat(), as the latter follows symbolic links, meaning your directory traversal could get caught in a loop and you'll end up with something like this ctags output.
Of course, since directory structure is recursive in nature, recursion plays a natural role in the traversal process: make a recursive call when a child path is a directory.
The name of the directory is only a string.
So opendir("filename"); will make it possible to read the directory "file".
However you should perhaps start thinking in filenames and pathes.
"usr/Documents/SourceCode" + "/../CS/Assignment4" is the same as "usr/Documents/CS/Assignment4" however I assume you are missing the leading "/".
Well, I don't get how you can be lost how directories work. A directory is nothing different than a "folder" in Windows or in Mac OS X. Bottom line a hard disk has a filesystem and a filesystem only consists out of folders/directories that "contain" files (and special files like named sockets etc., this should not interest you right now).
Hope this helped at least a bit.
Angelo
Related
These terms may not be 100% accurate, but I'm using the GCC compiler and POSIX library. I have C code compiled with the SQLite amalgamation file to a single executable.
In the user interface that exchanges JSON messages with the C program, I'd like to make it possible for users to copy the SQLite database files they create through the C program, and copy a full directory/folder.
Thus far, I've been able to rename and move files and folders programmatically.
I've read many questions and answers here, at Microsoft's C runtime library, and other places but I must be missing the fundamental points. I'm using regular old C, not C++ or C#.
My question is are there POSIX functions similar to rename(), _mkdir(), rmdir(), remove(), _stat(), that allow for programmatic copying of files and folders in Windows and Linux?
If not, can one just make a new folder and/or file and fread/fwrite the bytes from the original file to the new file?
I am primarily concerned with copying SQLite database files, although I wouldn't mind knowing the answer in general also.
Is this answer an adequate method?
Is the system() function a poor method? It seems to work quite well. However, it took awhile to figure out how to stop the messages, such as "copied 2 files" from being sent to stdout and shutting down the requesting application since it's not well-formed JSON. This answer explains how and has a link to Microsoft "Using command redirection operators". A /q in xcopy may or may not be necessary also, but certainly didn't do the job alone.
Thank you very much for any direction you may be able to provide.
The question that someone suggested as an answer and placed the little submission box on this question is one that I had already linked to in my question. I don't mean to be rude but, if it had answered my question, I would not have written this one. Thank you whoever you are for taking the time to respond, I appreciate it.
I don't see how that would be a better option than using system() because with the right parameters all the sub-directories and files of a single parent folder can be copied in one statement without having to iterate through all of them manually. Is there any reason why it would not be better to use system() apart from the fact that code will need to be different for each OS?
Handling errors are a bit different because system() doesn't return an errno but an exit code; however, the errors can be redirected from stderr to a file and pulled from there, when necessary
rename(): posix
_mkdir(): not posix. You want mkdir which is. mkdir takes two arguments, the second of which should usually be 077.
rmdir(): posix
remove(): posix
_stat(): not posix, you want stat() which is.
_stat and _mkdir are called as such on the Windows C library because they're not quite compatible with the modern Unix calls. _mkdir is missing an argument, and _stat looks like a very old version of the Unix call. You'll have trouble on Windows with files larger than 2GB.
You could do:
#ifdef _WIN32
int mkdir(const char *path, int mode) { return _mkdir(path); } /* In the original C we could have #defined this but that doesn't work anymore */
#define stat _stat64
#endif
but if you do so, test it like crazy.
In the end, you're going to be copying stuff with stdio; this loop works. (beware the linked answer; it has bugs that'll bite ya.)
int copyfile(const char *src, const char *dst)
{
const int bufsz = 65536;
char *buf = malloc(bufsz);
if (!buf) return -1; /* like mkdir, rmdir, return 0 for success, -1 for failure */
FILE *hin = fopen(src, "rb");
if (!hin) { free(buf); return -1; }
FILE *hout = fopen(dst, "wb");
if (!hout) { free(buf); fclose(hin); return -1; }
size_t buflen;
while ((buflen = fread(buf, 1, bufsz)) > 0) {
if (buflen != fwrite(buf, 1, buflen)) {
fclose(hout);
fclose(hin);
free(buf);
return -1; /* IO error writing data */
}
}
free(buf);
int r = ferror(hin) ? -1 : 0; /* check if fread had indicated IO error on input */
fclose(hin);
return r | (fclose(hout) ? -1 : 0); /* final case: check if IO error flushing buffer -- don't omit this it really can happen; calling `fflush()` won't help. */
}
Given two paths as char*, I can't determine if the two paths are pointing to the same file.
How to implement in C a platform-independent utility to check if paths are pointing to the same file or not.
Using strcmp will not work because on windows paths can contain \ or /
Using ist_ino will not help because it does not work on windows
char *fileName = du->getFileName();
char *oldFileName = m_duPtr->getFileName();
bool isSameFile = pathCompare(fileName, oldFileName) == 0;//(strcmp(fileName, oldFileName) == 0);
if (isSameFile){
stat(fileName, &pBuf);
stat(oldFileName, &pBuf2);
if (pBuf.st_ino == pBuf2.st_ino){
bRet = true;
}
}
You can't. Hard links also exist on Windows and the C standard library has no methods for operating on them.
Plausible solutions to the larger problem: link against cygwin1.dll and use the st_ino method. You omitted st_dev from your sample code and need to put it back.
While there is an actual way to accomplish this on Windows, it involves ntdll methods and I had to read Cygwin's code to find out how to do it.
The methods are NtGetFileInformationByHandle and NtFsGetVolumeInformationNyHandle. There are documented kernel32 calls that claim to do the same thing. See the cygwin source code for why they don't work right (buggy fs drivers).
How to move a particular file from one folder to another folder?
What I have tried,
#include <stdio.h>
int main() {
FILE *tFile;
if (tFile != NULL)
tFile = NULL;
if ((tFile = fopen("TempFile.txt", "rw")) == NULL) {
return -1;
}
mv("TempFile.txt", "../MST");
printf("Done Succesfully\n");
return 0;
}
Error :
test.c:17:2: warning: no newline at end of file
/tmp/ccKLWYNa.o(.text+0x5e): In function `main':
: undefined reference to `mv'
collect2: ld returned 1 exit status
Please guide me how can I do this.
You really should read Advanced Linux Programming and syscalls(2)
To move (from C) a file from one place to another in the same file system just use the rename(2) syscall.
At the very least, for your particular example, you'll need to code:
char* srcpath = "TempFile.txt"; // assume it is a variable path
char destpath[1024];
snprintf (destpath, sizeof(destpath), "../MST/%s", srcpath);
if (rename (srcpath, destpath)) {
// something went wrong
if (errno == EXDEV) {
// copy data and meta data
} else { perror("rename"); exit(EXIT_FAILURE); };
}
else { // the rename succeeded
}
If you really want to mv TempFile.txt ../MST/TempFile.txt specifically for TempFile.txt only you could just call rename("TempFile.txt", "../MST/TempFile.txt") and handle the error cases like I suggest. If you are sure that ../MST/lie in the same file system than . then EXDEV should not happen and you don't need to handle it particularly (but you do need to handle errors).
If you want to move a file between two different file systems, you have to copy the data (and perhaps some of the meta-data) yourself (and then remove e.g. with unlink(2)) the original source file). You could detect that situation by various means: you could just try the rename and if errno (see errno(3)) is EXDEV you need to copy the file. Or you could use stat(2) to query the source file(and the destination directory) meta-data -e.g. its size and its file system.
Of course, you need to understand what are files on Linux (or Posix), in particular what is an inode.... See inode(7) and credentials(7)
You could have used system with /bin/mv (but be careful about strange characters -like spaces or semicolons- in the file paths, you need to escape them to avoid code injection), apparently you don't want to.
You should play with strace(1) (or perhaps also ltrace) on mv in various situations to understand what it is doing. Also, study the source code of GNU coreutils which provides /bin/mv notably in mv.c ...
Some extra C or C++ libraries may provide you with functions to move files (in the same filesystem they should do a rename, in different file systems they copy the source file data and perhaps some meta-data and unlink the source, so cannot be atomic), e.g. in C g_file_move (from Gio with Glib from Gnome), or in C++ copy_file -followed by remove in Boost, etc etc....
PS. For temporary files see tmpfile(3), mkstemp(3), etc...
I am trying to make a simple program that handles files and directories, but I have two major problems:
how can I check whether a file or directory exists or not, and
how do I know if it is a file, directory, symbolic link, device, named pipe etc.? Mainly file and directories matter for now, but I'd like to know the others too.
EDIT: Too all of those who are suggesting to use stat() or a similar function, I have already looked into that, and while it might answer my first question, I can't figure out how it would answer the second...
Since you're inquiring about named pipes/symlinks etc, you're probably on *nix, so use the
lstat() function
struct stat info;
if(lstat(name,&info) != 0) {
if(errno == ENOENT) {
// doesn't exist
} else if(errno == EACCES) {
// we don't have permission to know if
// the path/file exists.. impossible to tell
} else {
//general error handling
}
return;
}
//so, it exists.
if(S_ISDIR(info.st_mode)) {
//it's a directory
} else if(S_ISFIFO(info.st_mode)) {
//it's a named pipe
} else if(....) {
}
Se docs here for the S_ISXXX macros you can use.
The stat() function should give you everything you are looking for (or more specifically lstat() since stat() will follow the link).
Use stat (or if you wish to get information about a symbolic link instead of following it and getting information about the destination, lstat)
NAME
stat - get file status
SYNOPSIS
#include <sys/stat.h>
int stat(const char *restrict path, struct stat *restrict buf);
DESCRIPTION
The stat() function shall obtain information about the named file and write it to the area pointed to by the buf argument. The path argument points to a pathname naming a file. Read, write, or execute permission of the named file is not required. An implementation that provides additional or alternate file access control mechanisms may, under implementation-defined conditions, cause stat() to fail. In particular, the system may deny the existence of the file specified by path.
If the named file is a symbolic link, the stat() function shall continue pathname resolution using the contents of the symbolic link, and shall return information pertaining to the resulting file if the file exists.
The buf argument is a pointer to a stat structure, as defined in the header, into which information is placed concerning the file.
The C routines opendir(), readdir() and closedir() provide a way for me to traverse a directory structure. However, each dirent structure returned by readdir() does not seem to provide a useful way for me to obtain the set of pointers to DIR that I would need to recurse into the directory subdirectories.
Of course, they give me the name of the files, so I could either append that name to the directory path and stat() and opendir() them, or I could change the current working directory of the process via chdir() and roll it back via chdir("..").
The problem with the first approach is that if the length of the directory path is great enough, then the cost to pass a string containing it to opendir() will overweight the cost of opening a directory. If you are a bit more theoretical, you could say your complexity could increase beyond linear time (in the total character count of the (relative) filenames in the directory tree).
Also, the second approach has a problem. Since each process has a single current working directory, all but one thread will have to block in a multithreaded application. Also, I don't know if the current working directory is just a mere convenience (i.e., the relative path will be appended to it prior to a filesystem query). If it is, this approach will be inefficient too.
I am accepting alternatives to these functions. So how is it one can traverse a UNIX directory tree efficiently (linear time in the total character count of the files under it)?
Have you tried ftw() aka File Tree Walk ?
Snippit from man 3 ftw:
int ftw(const char *dir, int (*fn)(const char *file, const struct stat *sb, int flag), int nopenfd);
ftw() walks through the directory tree starting from the indicated directory dir. For each found entry in the tree, it calls fn() with the full pathname of the entry, a pointer to the stat(2) structure for the entry and an int flag
You seem to be missing one basic point: directory traversal involves reading data from the disk. Even when/if that data is in the cache, you end up going through a fair amount of code to get it from the cache into your process. Paths are also generally pretty short -- any more than a couple hundred bytes is pretty unusual. Together these mean that you can pretty reasonably build up strings for all the paths you need without any real problem. The time spent building the strings is still pretty minor compared to the time to read data from the disk. That means you can normally ignore the time spent on string manipulation, and work exclusively at optimizing disk usage.
My own experience has been that for most directory traversal a breadth-first search is usually preferable -- as you're traversing the current directory, put the full paths to all sub-directories in something like a priority queue. When you're finished traversing the current directory, pull the first item from the queue and traverse it, continuing until the queue is empty. This generally improves cache locality, so it reduces the amount of time spent reading the disk. Depending on the system (disk speed vs. CPU speed, total memory available, etc.) it's nearly always at least as fast as a depth-first traversal, and can easily be up to twice as fast (or so).
The way to use opendir/readdir/closedir is to make the function recursive! Have a look at the snippet here on Dreamincode.net.
Hope this helps.
EDIT Thanks R.Sahu, the linky has expired, however, found it via wayback archive and took the liberty to add it to gist. Please remember, to check the license accordingly and attribute the original author for the source! :)
Probably overkill for your application, but here's a library designed to traverse a directory tree with hundreds of millions of files.
https://github.com/hpc/libcircle
Instead of opendir(), you can use a combination of openat(), dirfd() and fdopendir() and construct a recursive function to walk a directory tree:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <dirent.h>
void
dir_recurse (DIR *parent, int level)
{
struct dirent *ent;
DIR *child;
int fd;
while ((ent = readdir(parent)) != NULL) {
if ((strcmp(ent->d_name, ".") == 0) ||
(strcmp(ent->d_name, "..") == 0)) {
continue;
}
if (ent->d_type == DT_DIR) {
printf("%*s%s/\n", level, "", ent->d_name);
fd = openat(dirfd(parent), ent->d_name, O_RDONLY | O_DIRECTORY);
if (fd != -1) {
child = fdopendir(fd);
dir_recurse(child, level + 1);
closedir(child);
} else {
perror("open");
}
} else {
printf("%*s%s\n", level, "", ent->d_name);
}
}
}
int
main (int argc, char *argv)
{
DIR *root;
root = opendir(".");
dir_recurse(root, 0);
closedir(root);
return 0;
}
Here readdir() is still used to get the next directory entry. If the next entry is a directory, then we find the parent directory fd with dirfd() and pass this, along with the child directory name to openat(). The resulting fd refers to the child directory. This is passed to fdopendir() which returns a DIR * pointer for the child directory, which can then be passed to our dir_recurse() where it again will be valid for use with readdir() calls.
This program recurses over the whole directory tree rooted at .. Entries are printed, indented by 1 space per directory level. Directories are printed with a trailing /.
On ideone.