Problems grabbing file names using SDL_strdup and similar - c

I'm trying to create a program with SDL2.
In a certain part of the code, I'm writing functions to grab names of all present files in a given directory path (and keep them in memory) so that, in another function, I can check if a specified file was present the last moment the directory was checked.
I'm using dirent.h to suit my needs but I'm running into a few problems:
All the files are properly captured by readdir() (no exception), however they aren't always properly copied into memory after using SDL_strdup() (code is below).
I'm using SDL_malloc()/SDL_realloc()/SDL_strdup() to be as cross-platform as possible to avoid having problems when porting code (as I've read that strdup isn't C standard).
Here's my code:
typedef struct FileList {
char **files;
size_t num;
} FileList;
FileList *GetFileList(const char *path){
struct dirent *dp = NULL;
DIR *dir = NULL;
size_t i = 0;
FileList *filelist = SDL_malloc(sizeof(FileList)); /* changing this to a calloc doesn't help */
/* Check if filelist == NULL */
filelist->files = NULL;
dir = opendir(path);
/* Check if dir == NULL */
while ((dp = readdir(dir))){
if (dp->d_name[0] == '.'){
continue; /* skip self, parent and all files starting with . */
}
printf("Copying: %s\n", dp->d_name); /* Always show the name of each file */
filelist->files = SDL_realloc(filelist->files, ++i);
filelist->files[i-1] = SDL_strdup(dp->d_name);
printf("Copied: %s\n\n", filelist->files[i-1]); /* Varies: either shows the file's name, either gives me plain gibberish or just nothing */
}
filelist->num = i;
closedir(dir);
return filelist;
}
Output varies. When it doesn't crash, I either get all filenames correctly copied, or I get most of them copied and some contain nothing or plain gibberish (as commented); if it does crash, sometimes I get a Segfault while using SDL_strdup(), other times I get a Segfault when using closedir().
I've even considered exchanging the SDL_realloc() scenario with an initial memory allocation of filelist->files by giving it the number of files (thanks to another function) but I get the same problem.
Any suggestion to change my coding style to a more defensive one (since I do believe this one is rather dangerous) will be appreciated, although I've tried all I could for this case. I'm currently working on a Mac OS X using built-in gcc Apple LLVM 6.0 (clang-600.0.56).

You need space for pointers, and sizeof(char *) != 1 so
filelist->files = (char**) SDL_realloc(filelist->files, ++i);
needs to be
filelist->files = SDL_realloc(filelist->files, ++i * sizeof(char *));
but that's actually a bad idea, because SDL_realloc could return NULL in which case you will loose reference to the original pointer, so a good way of doing it is
void *ptr;
ptr = SDL_realloc(filelist->files, ++i * sizeof(char *));
if (ptr == NULL)
handleThisErrorAndDoNotContinue();
filelist->files = ptr;
and always check for allocator functions if they returned NULL, because you have no control over the size of the data you are trying to read and you can run out of memory at least in theory, so you should make your code safe by checking the success of these functions.

Related

How to make getdents() behave like read() on directories [K&R section 8.6]

I’m new to programming and C and I'm currently working through K&R. Apologies in advance if this isn't the most succinct way of characterizing the problem.
For context, in section 8.6 of K&R (not the exercises but the actual chapter) they implement the function fsize() that prints out the size of files in a directory and its sub-directories recursively. The code in the book uses the syscall read() to implement a basic version of readdir(), which returns a pointer to the next entry in a directory.
Up until this section of K&R, all source code has worked fine on my machine, however, the code in this chapter relies on using the read() function on directories to get their contents, which according to a source I found here [1], doesn’t work on Linux and many modern systems.
However, there exists a syscall getdents() which seems to do roughly same thing [2]. So as an exercise I tried to re-implement readdir() and came across the following problems:
read() on directories seems to know in advance the size of each entry, so it's able to return one entry at a time and let read() handle the issue of "remembering" the location of the next entry every time it is called.
getdents() on the other hand doesn't know the size of each entry in advance, so I have to read the entire buffer first and then loop through it in the readdir() using the member d_reclen (I copied from the example at the bottom of man getdents), meaning now my readdir() function has to handle the issue of "remembering" the location of the next entry in the stream every time readdir() is called.
So my questions are as follows:
Am I correct in my understanding that getdents() cannot be made to behave like read() in the sense that it can read one entry at a time and handle the "remembering of the next position"?
If it is true that getdents() cannot behave like read(), what is the best way to implement "remembering position", in particular if getdents() need to be called multiple time on several sub-directories? I've shown an excerpt of what I tried below: using the file descriptor assigned by the system as a way of indexing the results of getdents() in an array. However this attempt seems to fail given how opendir() and closedir() are implemented — the system will reassign file descriptors once closedir() has been called and opendir() is called on the next subdirectory (and this information is not available to readdir()).
Last Note: I want my implementation of read_dir() to behave exactly like that of readdir() in K&R. Meaning I wouldn't have to change any of the other functions or structures to make it work.
// NTD: _direct's structure needs to match how system implements directory
// entries. After reading from file descriptor into _direct, we then
// copy only the relevant elements (d_ino and d_name) to Dirent
struct _direct { // directory entry
long d_ino; // inode number
off_t d_off; // Not included in K&R
unsigned short d_reclen; // Not included in K&R
char d_name[]; // long name does not have '\0'
};
#define BUFSIZE 4096 // Size of buffer when reading from getdents()
#define MAXFILES 1024 // Max files that read_dir() can open
struct _streamdents {
int pos;
int nread;
char *buf;
};
// read_dir: read directory entries in sequence
Dirent *read_dir(_dir *dp)
{
struct _direct *dirbuf; // local directory structure
static Dirent d; // return: portable structure
static struct _streamdents *readdents[MAXFILES];
if (dp->fd > MAXFILES - 1) {
printf("Error in read_dir: Cannot continue reading, too many directories\n");
return NULL;
}
// Check if directory has already been read; if not, create stream.
// Important if fxn is called for a sub-directory and then needs
// to return to a parent directory and continue reading.
if (readdents[dp->fd] == NULL) {
char *buf = malloc(BUFSIZE);
int nread = syscall(SYS_getdents, dp->fd, buf, BUFSIZE);
int pos = 0;
struct _streamdents *newdent = malloc(sizeof(struct _streamdents));
newdent->buf = buf;
newdent->pos = pos;
newdent->nread = nread;
readdents[dp->fd] = newdent;
}
struct _streamdents *curdent = readdents[dp->fd];
int pos = curdent->pos;
int nread = curdent->nread;
char *buf = curdent->buf;
while (pos < nread) {
dirbuf = (struct _direct *) (buf + pos);
if (dirbuf->d_ino == 0) // slot not in use
continue;
d.ino = dirbuf->d_ino;
strncpy(d.d_name, dirbuf->d_name, DIRSIZ);
curdent->pos += dirbuf->d_reclen;
return &d;
}
if (nread == -1) {
printf("Error in getdents(): %s\n", strerror(errno));
}
return NULL;
}
Thank you

My C function recurses over system directories and segfaults over large inputs. How can I fix this?

I am trying to code a program that searches through a given directory and all sub-directories and files within it (and the sub-directories and files of the sub-directories and so on) and print outs all files that have a given set of permissions (int target_perm).
It works fine on smaller input, but returns Segmentation fault (core dumped) when it has to recurse over directories with large quantities of files. Valgrind reveals that this is due to stack overflow.
Is there any way I can fix my function so it can work with arbitrarily large directories?
void recurse_dir(struct stat *sb, struct dirent *de, DIR *dr, int target_perm, char* curr_path) {
if ((strcmp(".", de->d_name) != 0) && (strcmp("..", de->d_name) != 0)) {
char full_file_name[strlen(curr_path) + strlen(de->d_name)+1];
strcpy(full_file_name, curr_path);
strcpy(full_file_name + strlen(curr_path), de->d_name);
full_file_name[strlen(curr_path) + strlen(de->d_name)] = '\0';
if (stat(full_file_name, sb) < 0) {
fprintf(stderr, "Error: Cannot stat '%s'. %s\n", full_file_name, strerror(errno));
} else {
char* curr_perm_str = permission_string(sb);
int curr_perm = permission_string_to_bin(curr_perm_str);
free(curr_perm_str);
if ((curr_perm == target_perm )) {
printf("%s\n", full_file_name);
}
if (S_ISDIR(sb->st_mode)) {
DIR *dp;
struct dirent *dent;
struct stat b;
dp = opendir(full_file_name);
char new_path[PATH_MAX];
strcpy(new_path, full_file_name);
new_path[strlen(full_file_name)] ='/';
new_path[strlen(full_file_name)+1] ='\0';
if (dp != NULL) {
if ((dent = readdir(dp)) != NULL) {
recurse_dir(&b, dent, dp, target_perm, new_path);
}
closedir(dp);
} else {
fprintf(stderr, "Error: Cannot open directory '%s'. %s.\n", de->d_name, strerror(errno));
}
}
}
}
if ((de = readdir(dr)) != NULL) {
recurse_dir(sb, de, dr, target_perm, curr_path);
}
}
The problem here is not actually the recursion, although I've addressed that particular problem below. The problem is that your directory hierarchy probably includes symbolic links which make some directories aliases for one of their parents. An example from a Ubuntu install:
$ ls -ld /usr/bin/X11
lrwxrwxrwx 1 root root 1 Jan 25 2018 /usr/bin/X11 -> .
$ # Just for clarity:
$ readlink -f /usr/bin/X11
usr/bin
So once you encounter /usr/bin/X11, you enter into an infinite loop. This will rapidly exhaust the stack, but getting rid of recursion won't fix the problem, since the infinite loop is still an infinite loop.
What you need to do is either:
Avoid following symlinks, or
(better) Avoid following symlinks which resolve to directories, or
Keep track of all the directories you've encountered during the recursive scan, and check to make sure that any new directory hasn't already been examined.
The first two solutions are easier (you just need to check the filetype field in the struct stat) but they will fail to list some files you may be interested in (for example, when a symlink resolves to a directory outside of the directory structure you're examining.)
Once you fix the above problem, you might want to consider these suggestions:
In recursive functions, it's always a good idea to reduce the size of a stack frame to the minimum possible. The maximum recursion depth during a directory walk shouldn't be more than the maximum number of path segments in a filename (but see point 3 below), which shouldn't be too big a number. (On my system, the maximum depth of a file in the /usr hierarchy is 16, for example.) But the amount of stack used is the product of the size of the stack frame and the maximum recursion depth, so if your stack frames are large, then you'll have less recursion capacity.
In pursuit of the above goal, you should avoid the use of local arrays. For example, the declaration
char new_path[PATH_MAX];
adds PATH_MAX bytes to every stack frame (on my system, that's 4k). And that's in addition to the VLA full_file_name. For what it's worth, I compiled your function on a 64-bit Linux system, and found that the stack frame size is 4,280 bytes plus the size of the VLA (rounded to a multiple of 16 for alignment). That's probably not going to use more than 150Kb of stack, assuming a reasonable file hierarchy, which is within the limits. But that could increase significantly if your system has a larger value of PATH_MAX (which, in any case, cannot be relied on to be the maximum size of a filepath).
Good style dictates using dynamically-allocated memory for variables like these ones. But an even better approach would be to avoid using so many different buffers.
Parenthetically, you also need to be aware of the cost of strlen. In order to compute the length of a string, the strlen function needs to scan all its bytes looking for the NUL terminator. C strings, unlike string objects in higher-level languages, do not contain any indication of their length. So when you do this:
char full_file_name[strlen(curr_path) + strlen(de->d_name)+1];
strcpy(full_file_name, curr_path);
strcpy(full_file_name + strlen(curr_path), de->d_name);
full_file_name[strlen(curr_path) + strlen(de->d_name)] = '\0';
you end up scanning curr_path three times, and de->d_name twice, even though the lengths of these strings will not change. Rather than doing that, you should save the lengths in local variables so that they can be reused.
Alternatively, you could find a different way to concatenate the strings. One simple possibility which also dynamically allocates the memory, is:
char* full_file_name;
asprintf(&full_file_name, "%s%s", curr_path, de->d_name);
Note: You should check the return value of asprintf, both to verify that there was not a memory allocation problem, and also to save the length of full_file_name in case you need it later. asprintf is available on Linux and BSD derivatives, including OS X. But it's easy to implement using the Posix-standard snprintf and there are short, freely-reusable implementations available.)
You could use asprintf to compute new_path, as well, again removing the stack allocation of a possibly large array (and avoiding the buffer overflow if PATH_MAX is not big enough to contain the new filepath, which is definitely possible):
char* newpath;
asprintf("%s/", full_file_path);
But that's kind of silly. You're copying an entire filepath just in order to add a single character at the end. Better would be to leave space for the slash when you create full_file_path in the first place, and fill it in when you need it:
char* full_file_name;
int full_file_name_len = asprintf(&full_file_name, "%s%s\0",
curr_path, de->d_name);
if (full_file_name_len < 0) { /* handle error */ }
--full_file_name_len; /* Bytes written includes the \0 in the format */
/* Much later, instead of creating new_path: */
if (dp != NULL) {
full_file_name[full_file_name_len - 1] = '/';
if ((dent = readdir(dp)) != NULL) {
recurse_dir(&b, dent, dp, target_perm, full_file_name);
}
full_file_name[full_file_name_len - 1] = '\0';
closedir(dp);
}
There are other ways to do this. In fact, you really only need a single filepath buffer which you could pass down through the recursion. The recursion only appends to the filepath, so it's only necessary to restore the NUL byte at the end of every recursive call. However, in production code you would not want a fixed-length buffer, which might turn out to be too small, so you would need to implement some kind of reallocation strategy.
While I was trying to figure out the actual stack frame size for your function, which required compiling it, I ran into this code which relies on some undeclared functions:
char* curr_perm_str = permission_string(sb);
int curr_perm = permission_string_to_bin(curr_perm_str);
free(curr_perm_str);
Making a guess about what these two functions do, I think you could safely replace the above with
int curr_perm = sb->st_mode & (S_IRWXU|S_IRWXG|S_IRWXO);
Or perhaps
int curr_perm = sb->st_mode
& (S_ISUID|S_ISGID|S_ISVTX|S_IRWXU|S_IRWXG|S_IRWXO);
if you want to include the setuid and sticky bits.

Using ftw() properly in c

I have the following in my code: (Coding in c)
ftw(argv[2], parseFile, 100)
argv[2] is a local directory path. For instance. argv[2] = "TestCases" and there is a testcases folder in the same directory as my .o file.
My understanding is that this should traverse the directory TestCases and send every file it finds to the function parseFile.
What actually happens is it simply sends my argument to the function parseFile and that is all. What am I doing wrong? How am I suppose to use this properly?
EDIT: This is parseFile:
int parseFile(const char * ftw_filePath,const struct stat * ptr, int flags){
FILE * file;
TokenizerT * currFile;
char fileString[1000], * currWord, * fileName;
fileName = strdup(ftw_filePath);
if( fileName == NULL || strlen(fileName) <= 0){
free(fileName);
return -1;
}
printf("\n%s\n",fileName);
if(strcmp(fileName,"-h")== 0){
printf("To run this program(wordstats) type './wordstat.c' followed by a space followed by the file's directory location. (e.g. Desktop/CS211/Assignment1/test.txt )");
free(fileName);
return 1;
}
else{
file=fopen(fileName,"r");
}
if(!file){
fprintf(stderr,"Error: File Does not Exist in designated location. Please restart the program and try again.\n");
free(fileName);
return 0;
}
memset(fileString, '\0', 1000);
while(fscanf(file,"%s", fileString) != EOF){ /* traverses the file line by line*/
stringToLower(fileString);
currFile = TKCreate("alphanum",fileString);
while((currWord = TKGetNextToken(currFile)) != NULL) {
insert_List(currWord, words,fileName);
}
free(currFile->delimiters);
free(currFile->copied_string);
free(currFile);
memset(fileString, '\0', 1000);
}
fclose(file);
free(fileName);
return 1;
}
It will work if I input TestCases/big.txt for my argv[2] but not if I put TestCases
As described in the man page, a non-zero return value from the function that ftw is calling tells ftw to stop running.
Your code has various return statements, but the only one that returns 0 is an error condition.
A properly designed C callback interface has a void* argument that you can use to pass arbitrary data from the surrounding code into the callback. [n]ftw does not have such an argument, so you're kinda up a creek.
If your compiler supports thread-local variables (the __thread storage specifier) you can use them instead of globals; this will work but is not really that much tidier than globals.
If your C library has the fts family of functions, use those instead. They are available on most modern Unixes (including Linux, OSX, and recent *BSD)

Custom shell glob problem

I have to write a shell program in c that doesn't use the system() function. One of the features is that we have to be able to use wild cards. I can't seem to find a good example of how to use glob or this fnmatch functions that I have been running into so I have been messing around and so far I have a some what working blog feature (depending on how I have arranged my code).
If I have a glob variable declared as a global then the function partially works. However any command afterwards produces in error. example:
ls *.c
produce correct results
ls -l //no glob required
null passed through
so I tried making it a local variable. This is my code right now:
int runCommand(commandStruct * command1) {
if(!globbing)
execvp(command1->cmd_path, command1->argv);
else{
glob_t globbuf;
printf("globChar: %s\n", globChar);
glob(globChar, GLOB_DOOFFS, NULL, &globbuf);
//printf("globbuf.gl_pathv[0]: %s\n", &globbuf.gl_pathv[0]);
execvp(command1->cmd_path, &globbuf.gl_pathv[0]);
//globfree(&globbuf);
globbing = 0;
}
return 1;
}
When doing this with the globbuf as a local, it produces a null for globbuf.gl_path[0]. Can't seem to figure out why. Anyone with a knowledge of how glob works know what might be the cause? Can post more code if necessary but this is where the problem lies.
this works for me:
...
glob_t glob_buffer;
const char * pattern = "/tmp/*";
int i;
int match_count;
glob( pattern , 0 , NULL , &glob_buffer );
match_count = glob_buffer.gl_pathc;
printf("Number of mathces: %d \n", match_count);
for (i=0; i < match_count; i++)
printf("match[%d] = %s \n",i,glob_buffer.gl_pathv[i]);
globfree( &glob_buffer );
...
Observe that the execvp function expects the argument list to end with a NULL pointer, i.e. I think it will be the easiest to create your own char ** argv copy with all the elements from the glob_buffer.gl_pathv[] and a NULL pointer at the end.
You are asking for GLOB_DOOFFS but you did not specify any number in globbuf.gl_offs saying how many slots to reserve.
Presumably as a global variable it gets initialized to 0.
Also this: &globbuf.gl_pathv[0] can simply be globbuf.gl_pathv.
And don't forget to run globfree(globbuf).
I suggest running your program under valgrind because it probably has a number of memory leaks, and/or access to uninitialized memory.
If you don't have to use * style wildcards I've always found it simpler to use opendir(), readdir() and strcasestr(). opendir() opens a directory (can be ".") like a file, readdir() reads an entry from it, returns NULL at the end. So use it like
struct dirent *de = NULL;
DIR *dirp = opendir(".");
while ((de = readdir(dirp)) != NULL) {
if ((strcasestr(de->d_name,".jpg") != NULL) {
// do something with your JPEG
}
}
Just remember to closedir() what you opendir(). A struct dirent has the d_type field if you want to use it, most files are type DT_REG (not dirs, pipes, symlinks, sockets, etc.).
It doesn't make a list like glob does, the directory is the list, you just use criteria to control what you select from it.

fopen Segfault error on large files

Hello everyone I'm new to C but I've recently been getting a weird segfault error with my fopen.
FILE* thefile = fopen(argv[1],"r");
The problem I've been having is that this code works on other smaller text files, but when I try with a file around 400MB it will give a sefault error. I've even tried hardcoding the filename but that doesn't work either. Could there be a problem in the rest of the code causing the segfault on this line?(doubt it but would like to know if its possible. It's just really odd that no errors come up for a small text file, but a large text file does get errors.
Thanks!
EDIT* didn't want to bog this down with too much but heres my code
int main(int argc, char *argv[])
{
if(argc != 3)
{
printf("[ERROR] Invalid number of arguments. Please pass 2 arguments, input_bound_file (column 1:probe, columne 2,...: samples) and desired_output_file_name");
exit(2);
}
int i,j;
rankAvg= g_hash_table_new(g_direct_hash, g_direct_equal);
rankCnt= g_hash_table_new(g_direct_hash, g_direct_equal);
table = g_hash_table_new_full (g_direct_hash, g_direct_equal, NULL, g_free);
getCounts(argv[1]);
printf("NC=: %i nR =: %i",nC,nR);
double srcMat[nR][nC];
int rankMat[nR][nC];
double normMat[nR][nC];
int sorts[nR][nC];
char line[100];
FILE* thefile = fopen(argv[1],"r");
printf("%s\n", strerror(errno));
FILE* output = fopen(argv[2],"w");
char* rownames[100];
i=0;j = 1;
int processedProbeNumber = 0;
int previousStamp = 0;
fgets(line,sizeof(line),thefile); //read file
while(fgets(line,sizeof(line),thefile) != NULL)
{
cleanSpace(line); //creates only one space between entries
char dest[100];
int len = strlen(line);
for(i = 0; i < len; i++)
{
if(line[i] == ' ') //read in rownames
{
rownames[j] = strncpy(dest, line, i);
dest[i] = '\0';
break;
}
}
char* token = strtok(line, " ");
token = strtok(NULL, " ");
i=1;
while(token!=NULL) //put words into array
{
rankMat[j][i]= abs(atof(token));
srcMat[j][i] = abs(atof(token));
token = strtok(NULL, " ");
i++;
}
// set the first column as a row id
j++;
processedProbeNumber++;
if( (processedProbeNumber-previousStamp) >= 10000)
{
previousStamp = processedProbeNumber;
printf("\tnumber of loaded lines = %i",processedProbeNumber);
}
}
printf("\ttotal number of loaded lines = %i \n",processedProbeNumber);
fclose(thefile);
How do you know that fopen is seg faulting? If you're simply sprinkling printf in the code, there's a chance the standard output isn't sent to the console before the error occurs. Obviously, if you're using a debugger you will know exactly where the segfault occured.
Looking at your code, nR and nC aren't defined so I don't know how big rankMat and srcMat are, but two thoughts crossed my mind while looking at your code:
You don't check i and j to ensure that they don't exceed nR and nC
If nR and nC are sufficiently large, that may mean you're using a very large amount of memory on the stack (srcMat, rankMat, normMat, and sorts are all huge). I don't know what environemnt you're running in, but some systems my not be able to handle huge stacks (Linux, Windows, etc. should be fine, but I do a lot of embedded work). I normally allocate very large structures in the heap (using malloc).
Generally files 2GB (2**31) or larger are the ones you can expect to get this on. This is because you are starting to run out of space in a 32-bit integer for things like file indices, and one bit is typically taken up for directions in relative offsets.
Supposedly on Linux you can get around this issue by using the following macro defintion:
#define _FILE_OFFSET_BITS 64
Some systems also provide a separate API call for large file opens (eg: fopen64() in MKS).
400Mb should not be considered a "large file" nowadays. I would reserve this for files larger than, say, 2Gb.
Also, just opening a file is very unlikely to give a segfault. WOuld you show us the code that access the file? I suspect some other factor is at play here.
UPDATE
I still can't tell exactly what's happening here. There are strange things that could be legitimate: you discard the first line and also the first token of each line.
You also assign to all the rownames[j] (except the first one) the address of dest which is a variable that has a block scope and whose associated memory is most likely to be reused outside that block. I hope you don't rely on rownames[j] to be any meaningful (but then why you have them?) and you never try to access them.
C99 allows you to mix variable declarations with actual instructions but I would suggest a little bit of cleaning to make the code clearer (also a better indentation would help).
From the symptoms I would look for some memory corruption somewhere. On small files (and hence less tokens) it may go unnoticed, but with larger files (and many more token) it fires a segfault.

Resources