SegFault in Cache Sim C program - c

EDIT: so it's easier for you guys to find where in my code I think the segfault is happening, look under the commented section called /* Cache Structure */
EDIT2: made a huge mistake with gdb the correct output when it segfaults from gdb is:
(gdb) run
Starting program: /.autofs/ilab/ilab_users/petejodo/Assignment4/a.out
1
/.autofs/ilab/ilab_users/petejodo/Assignment4/a.out
Program received signal SIGSEGV, Segmentation fault.
0x00449453 in strlen () from /lib/libc.so.6
(gdb)
EDIT3: after resorting to n00b tactics like printing lines, I've found it prints before:
file = fopen(purefile, "r");
if (file == 0){
printf ("Could not find file!\n");
return 0;
}
but doesnt print "test" I have underneath it.
ORIGINAL
So after checking all the threads on segfault and using gdb I can't figure out why my program is segfaulting at this line. I'm writing a cache simulator (only have write-through in my code currently) and I didn't include my methods because they are fine, it's in the main method.
int main(int argc, char **argv) {
FILE* file;
/* Counter variables */
int i;
int j;
/* Helper Variables */
int setAdd;
int totalSet;
int trash;
int size;
int extra;
char rw;
/* Necessary Character Arrays */
char hex[100];
char bin[100];
char origTag[100];
char bbits[100];
char sbits[100];
char tbits[100];
/* Cache Info Variables */
int setNumber = 4096; /* cacheSize/blockSize : (16,384/4) */
int setBits = 12; /* log(setNumber)/log(2) : (log(4096)/log(2)) */
int tagSize = 18; /* 32-(blockBits + setBits) **blockBits = log(blockSize)/log(2)** : (32 - (2 + 12) */
/* Results */
int cacheHit = 0;
int cacheMiss = 0;
int write = 0;
int read = 0;
/* Cache Structure */
tempLine cache[4096];
char* style;
char* purefile;
if (strcmp(argv[1], "-h")==0)
{
puts("Usage: sim <write policy> <trace file>");
return 0;
}
style = argv[1];
purefile = argv[2];
file = fopen(purefile, "r");/* HYPOTHESIZED SEGFAULT HERE */
if (file == 0){
printf ("Could not find file!\n");
return 0;
}
printf("test1");
/* Setting Structure Default Values */
for(i = 0; i < setNumber; i++)
{
cache[i].tag = (char *)malloc(sizeof(char)*(tagSize + 1));
for(j = 0; j < tagSize; j++)
{
cache[i].tag[j] = '0';
}
cache[i].valid = 0;
}
/* Main Loop */
while(fgetc(file) != '#')
{
setAdd = 0;
totalSet = 0;
fseek(file, -1, SEEK_CUR);
fscanf(file, "%d: %c %s\n", &trash, &rw, origTag);
/* Cutting off '0x' off from address '0x00000000' and adding 0's if necessary */
size = strlen(origTag);
extra = (10 - size);
for(i = 0; i < extra; i++)
hex[i] = '0';
for(i = extra, j = 0; i < (size-(2-extra)); i++, j++)
hex[i] = origTag[j + 2];
hex[8] = '\0';
hex2bin(hex, bin);
split(bin, bbits, sbits, tbits);
/* Changing cArray into int */
for(i = 0, j = (setBits - 1); i < setBits; i++, j--)
{
if (sbits[i] == '1')
setAdd = 1;
if (sbits[i] == '0')
setAdd = 0;
setAdd = setAdd * pow(2, j);
totalSet += setAdd;
}
/* Calculating Hits and Misses */
if (cache[totalSet].valid == 0)
{
cache[totalSet].valid = 1;
strcpy(cache[totalSet].tag, tbits);
}
if ((cache[totalSet].valid == 1) && (strcmp(cache[totalSet].tag, tbits) == 0))
{
/* HIT */
if (rw == 'W')
{
cacheHit++;
write++;
}
if (rw == 'R')
cacheHit++;
}
else
{
/* MISS */
if (rw == 'R')
{
cacheMiss++;
read++;
}
if (rw == 'W')
{
cacheMiss++;
read++;
write++;
}
cache[totalSet].valid = 1;
strcpy(cache[totalSet].tag, tbits);
}
/* End Calculations */
}
printResult(cacheHit, cacheMiss, read, write);
return 0;
}
And what I got out of gdb was this:
**INCORRECT**
(gdb) run
Starting program: /.autofs/ilab/ilab_users/petejodo/Assignment4/a.out
Program received signal SIGSEGV, Segmentation fault.
0x080489b6 in main (argc=1, argv=0xbfffe954) at sim.c:128
128 if (strcmp(argv[1], "-h")==0)
(gdb) bt
#0 0x080489b6 in main (argc=1, argv=0xbfffe954) at sim.c:128
I'm kind of lost, any help would be great. Thanks!
Oh, and also checking the value of argv[1] it is NULL or 0x0 because I think it is because of that. argv[1] is supposed to contain either the help flag or write through or write back, I just haven't coded in whether to check for write through or write back just yet.

You need to test if argc > 1 if you want to access argv[1]. A program always receives the name it was called by (e.g. ./a.out in your case) as argv[0], so argc >= 1 is always true. To access the first real argument - i.e. the second element of argv - you need to test if there are at least two arguments. However, testing if there are > n elements when accessing argv[n] is more readable IMO since both numbers are the same.
Here's an example on how to change your code:
if (argc < 3 || (argv > 1 && strcmp(argv[1], "-h")==0))
{
puts("Usage: sim <write policy> <trace file>");
return 0;
}
In this case you will show an error if there are not enough arguments provided to have valid argv[1] and argv[2] elements or if the first argument is -h (in case someone puts stuff after -h - otherwise it'd be handled by the too-few-arguments check).
You should have a look at getopt() by the way - it makes argument/switch parsing much easier and cleaner.

Related

(C DSA) Why are char positions written using fputc() are off by one when read by fgetc() and searched?

Both write() and search() use the same O(n) for loop but the search reports match position values one less than the write() reports writing them. Is there something that changes when a byte is written or is there something wrong with the read / search logic concerning the file start? The few related questions I found had to do with copying arrays and messing up indices, here it's a 1:1 basic for loop reporting different results as far as I understand for search and write cases.
void write() {
char ch_a = 'A', ch_b = 'B', ch_null = '\0';
FILE *p_file = fopen(pch_fname, "w");
if (p_file != NULL) {
for (int i = 0; i < n_out_len_bytes; i++) {
if (i % (999 * 777) == 0) {
printf("%d\n", i); //informs B's position
fputc(ch_b, p_file); //writes B instead of A
}else{
fputc(ch_a, p_file); //writes A
}
}
fputc(ch_null, p_file);
}
fclose(p_file);
}
void read() {
FILE *p_file = fopen(pch_input_fname, "r");
int n_size = -1;
int n_malloc = 30 * 1024 * 1024;
int n_char;
pch_input = (char *) malloc((ssize_t) n_malloc);
if (p_file != NULL) {
while (n_char = fgetc(p_file)) {
n_size++;
pch_input[n_size - 1] = n_char;
}
pch_input[n_size] = '\0';
}
fclose(p_file);
n_input_len = n_size;
}
void search() {
char *pch_pattern = "B";
int n_end_i = n_input_len - n_pattern_len;
for (int i = 0; i < n_end_i; i++)
if (pch_input[i] == pch_pattern[0])
printf("%c == %c at i:%d\n", pch_input[i], pch_pattern[0], i); // outputs B
//^^ expected to output same position as write() but outputs 'write().i-1'
if (i % (999 * 777) == 0)
printf("%c == %c at i:%d\n", pch_input[i], pch_pattern[0], i); // outputs A
}
n_size in read() initialized to -1 and n_size - 1 is used to store the read output to string which does not fail in this case:
read starts out with int n_size = -1;. The while uses n_size++; and then stores in pch_input[n_size - 1]. The first character is stored in index [-1]. That should cause problems, but the characters are all off by one. – user3121023 2
Thank you so much, cannot believe I missed it!
Either initializing n_size = 0 or storing at n_size (vs n_size -1) fixes it. Search and write are both fine.

Can't eliminate one character in my array while parsing it even though I handle that character

So this is my second time adapting my code to fscanf to get what I want. I threw some comments next to the output. The main issue I am having is that the one null character or space is getting added into the array. I have tried to check for the null char and the space in the string variable and it does not catch it. I am a little stuck and would like to know why my code is letting that one null character through?
Part where it is slipping up "Pardon, O King," output:King -- 1; -- 1
so here it parses king a word and then ," goes through the strip function and becomes \0, then my check later down the road allows it through??
Input: a short story containing apostrophes and commas (the lion's rock. First, the lion woke up)
//Output: Every unique word that shows up with how many times it shows up.
//Lion -- 1
//s - 12
//lion -- 8
//tree -- 2
//-- 1 //this is the line that prints a null char?
//cub -- //3 it is not a space! I even check if it is \0 before entering
//it into the array. Any ideas (this is my 2nd time)?
//trying to rewrite my code around a fscanf function.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <ctype.h>
//Remove non-alpha numeric characters
void strip_word(char* string)
{
char* string_two = calloc(80, sizeof(char));
int i;
int c = 0;
for(i = 0; i < strlen(string); i++)
{
if(isalnum(string[i]))
{
string_two[c] = string[i];
++c;
}
}
string_two[i] = '\0';
strcpy(string, string_two);
free(string_two);
}
//Parse through file
void file_parse(FILE* text_file, char*** word_array, int** count_array, int* total_count, int* unique_count)
{
int mem_Size = 8;
int is_unique = 1;
char** words = calloc(mem_Size, sizeof(char *)); //Dynamically allocate array of size 8 of char*
if (words == NULL)
{
fprintf(stderr, "ERROR: calloc() failed!");
}
int* counts = calloc(mem_Size, sizeof(int)); //Dynamically allocate array of size 8 of int
if (counts == NULL)
{
fprintf(stderr, "ERROR: calloc() failed!");
}
printf("Allocated initial parallel arrays of size 8.\n");
fflush(stdout);
char* string;
while('A')
{
is_unique = 1;
fscanf(text_file, " ,");
fscanf(text_file, " '");
while(fscanf(text_file, "%m[^,' \n]", &string) == 1) //%m length modifier
{
is_unique = 1;
strip_word(string);
if(string == '\0') continue; //if the string is empty move to next iteration
else
{
int i = 0;
++(*total_count);
for(i = 0; i < (*unique_count); i++)
{
if(strcmp(string, words[i]) == 0)
{
counts[i]++;
is_unique = 0;
break;
}
}
if(is_unique)
{
++(*unique_count);
if((*unique_count) >= mem_Size)
{
mem_Size = mem_Size*2;
words = realloc(words, mem_Size * sizeof(char *));
counts = realloc(counts, mem_Size * sizeof(int));
if(words == NULL || counts == NULL)
{
fprintf(stderr, "ERROR: realloc() failed!");
}
printf("Re-allocated parallel arrays to be size %d.\n", mem_Size);
fflush(stdout);
}
words[(*unique_count)-1] = calloc(strlen(string) + 1, sizeof(char));
strcpy(words[(*unique_count)-1], string);
counts[(*unique_count) - 1] = 1;
}
}
free(string);
}
if(feof(text_file)) break;
}
printf("All done (successfully read %d words; %d unique words).\n", *total_count, *unique_count);
fflush(stdout);
*word_array = words;
*count_array = counts;
}
int main(int argc, char* argv[])
{
if(argc < 2 || argc > 3) //Checks if too little or too many args
{
fprintf(stderr, "ERROR: Invalid Arguements\n");
return EXIT_FAILURE;
}
FILE * text_file = fopen(argv[1], "r");
if (text_file == NULL)
{
fprintf(stderr, "ERROR: Can't open file");
}
int total_count = 0;
int unique_count = 0;
char** word_array;
int* count_array;
file_parse(text_file, &word_array, &count_array, &total_count, &unique_count);
fclose(text_file);
int i;
if(argv[2] == NULL)
{
printf("All words (and corresponding counts) are:\n");
fflush(stdout);
for(i = 0; i < unique_count; i++)
{
printf("%s -- %d\n", word_array[i], count_array[i]);
fflush(stdout);
}
}
else
{
printf("First %d words (and corresponding counts) are:\n", atoi(argv[2]));
fflush(stdout);
for(i = 0; i < atoi(argv[2]); i++)
{
printf("%s -- %d\n", word_array[i], count_array[i]);
fflush(stdout);
}
}
for(i = 0; i < unique_count; i++)
{
free(word_array[i]);
}
free(word_array);
free(count_array);
return EXIT_SUCCESS;
}
I'm not sure quite what's going wrong with your code. I'm working on macOS Sierra 10.12.3 with GCC 6.3.0, and the local fscanf() does not support the m modifier. Consequently, I modified the code to use a fixed size string of 80 bytes. When I do that (and only that), your program runs without obvious problem (certainly on the input "the lion's rock. First, the lion woke up").
I also think that the while ('A') loop (which should be written conventionally while (1) if it is used at all) is undesirable. I wrote a function read_word() which gets the next 'word', including skipping blanks, commas and quotes, and use that to control the loop. I left your memory allocation in file_parse() unchanged. I did get rid of the memory allocation in strip_word() (eventually — it worked OK as written too).
That left me with:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <ctype.h>
static void strip_word(char *string)
{
char string_two[80];
int i;
int c = 0;
int len = strlen(string);
for (i = 0; i < len; i++)
{
if (isalnum(string[i]))
string_two[c++] = string[i];
}
string_two[c] = '\0';
strcpy(string, string_two);
}
static int read_word(FILE *fp, char *string)
{
if (fscanf(fp, " ,") == EOF ||
fscanf(fp, " '") == EOF ||
fscanf(fp, "%79[^,' \n]", string) != 1)
return EOF;
return 0;
}
static void file_parse(FILE *text_file, char ***word_array, int **count_array, int *total_count, int *unique_count)
{
int mem_Size = 8;
char **words = calloc(mem_Size, sizeof(char *));
if (words == NULL)
{
fprintf(stderr, "ERROR: calloc() failed!");
}
int *counts = calloc(mem_Size, sizeof(int));
if (counts == NULL)
{
fprintf(stderr, "ERROR: calloc() failed!");
}
printf("Allocated initial parallel arrays of size 8.\n");
fflush(stdout);
char string[80];
while (read_word(text_file, string) != EOF)
{
int is_unique = 1;
printf("Got [%s]\n", string);
strip_word(string);
if (string[0] == '\0')
continue;
else
{
int i = 0;
++(*total_count);
for (i = 0; i < (*unique_count); i++)
{
if (strcmp(string, words[i]) == 0)
{
counts[i]++;
is_unique = 0;
break;
}
}
if (is_unique)
{
++(*unique_count);
if ((*unique_count) >= mem_Size)
{
mem_Size = mem_Size * 2;
words = realloc(words, mem_Size * sizeof(char *));
counts = realloc(counts, mem_Size * sizeof(int));
if (words == NULL || counts == NULL)
{
fprintf(stderr, "ERROR: realloc() failed!");
exit(EXIT_FAILURE);
}
printf("Re-allocated parallel arrays to be size %d.\n", mem_Size);
fflush(stdout);
}
words[(*unique_count) - 1] = calloc(strlen(string) + 1, sizeof(char));
strcpy(words[(*unique_count) - 1], string);
counts[(*unique_count) - 1] = 1;
}
}
}
printf("All done (successfully read %d words; %d unique words).\n", *total_count, *unique_count);
fflush(stdout);
*word_array = words;
*count_array = counts;
}
int main(int argc, char *argv[])
{
if (argc < 2 || argc > 3)
{
fprintf(stderr, "ERROR: Invalid Arguements\n");
return EXIT_FAILURE;
}
FILE *text_file = fopen(argv[1], "r");
if (text_file == NULL)
{
fprintf(stderr, "ERROR: Can't open file");
return EXIT_FAILURE;
}
int total_count = 0;
int unique_count = 0;
char **word_array = 0;
int *count_array = 0;
file_parse(text_file, &word_array, &count_array, &total_count, &unique_count);
fclose(text_file);
if (argv[2] == NULL)
{
printf("All words (and corresponding counts) are:\n");
fflush(stdout);
for (int i = 0; i < unique_count; i++)
{
printf("%s -- %d\n", word_array[i], count_array[i]);
fflush(stdout);
}
}
else
{
printf("First %d words (and corresponding counts) are:\n", atoi(argv[2]));
fflush(stdout);
for (int i = 0; i < atoi(argv[2]); i++)
{
printf("%s -- %d\n", word_array[i], count_array[i]);
fflush(stdout);
}
}
for (int i = 0; i < unique_count; i++)
free(word_array[i]);
free(word_array);
free(count_array);
return EXIT_SUCCESS;
}
When run on the data file:
the lion's rock. First, the lion woke up
the output was:
Allocated initial parallel arrays of size 8.
Got [the]
Got [lion]
Got [s]
Got [rock.]
Got [First]
Got [the]
Got [lion]
Got [woke]
Got [up]
All done (successfully read 9 words; 7 unique words).
All words (and corresponding counts) are:
the -- 2
lion -- 2
s -- 1
rock -- 1
First -- 1
woke -- 1
up -- 1
When the code was run on your text, including double quotes, like this:
$ echo '"Pardon, O King,"' | cw37 /dev/stdin
Allocated initial parallel arrays of size 8.
Got ["Pardon]
Got [O]
Got [King]
Got ["]
All done (successfully read 3 words; 3 unique words).
All words (and corresponding counts) are:
Pardon -- 1
O -- 1
King -- 1
$
It took a little finnagling of the code. If there isn't an alphabetic character, your code still counts it (because of subtle problems in strip_word()). That would need to be handled by checking strip_word() more carefully; you test if (string == '\0') which checks (belatedly) whether memory was allocated where you need if (string[0] == '\0') to test whether the string is empty.
Note that the code in read_word() would be confused into reporting EOF if there were two commas in a row, or an apostrophe followed by a comma (though it handles a comma followed by an apostrophe OK). Fixing that is fiddlier; you'd probably be better off using a loop with getc() to read a string of characters. You could even use that loop to strip non-alphabetic characters without needing a separate strip_word() function.
I am assuming you've not yet covered structures yet. If you had covered structures, you'd use an array of a structure such as struct Word { char *word; int count; }; and allocate the memory once, rather than needing two parallel arrays.

C: never getting to return statement

I am implementing a program which hides messages in ppm files and then retrieves them. One of the last things I need to do is to "return 2" if there is a problem opening the file. If I enter a file which can't be opened (file doesn't exist or wrong extension, etc) then it will display the error message on stderr, but for some reason it won't execute the very next line, "return 2"!
It simply then returns 0 instead of returning 2.
int load_ppm_image(const char *input_file_name, unsigned char **data, int *n,
int *width, int *height, int *max_color)
{
char line[80];
char c;
FILE * image_file;
//image_file = fopen(input_file_name, "rb");
if (fopen(input_file_name, "rb") == NULL) // checks to see if file cant be opened
{
fprintf(stderr, "The input image file could not be opened\n");
return 2; // why isn't this being returned???
}
else
{
image_file = fopen(input_file_name, "rb");
}
fscanf(image_file, "%s", line);
fscanf(image_file, "%d %d", width, height);
*n = (3 * (*width) * (*height));
fscanf(image_file, "%d%c", max_color, &c);
*data = (unsigned char *)malloc((*n)*sizeof(unsigned char));
size_t m = fread(*data, sizeof(unsigned char), *n, image_file);
assert(m == *n);
fclose(image_file);
return 0;
}
int hide_message(const char *input_file_name, const char *message, const char *output_file_name)
{
unsigned char * data;
int n;
int width;
int height;
int max_color;
n = 3 * width * height;
int code = load_ppm_image(input_file_name, &data, &n, &width, &height, &max_color);
if (code)
{
// return the appropriate error message if the image doesn't load correctly
return code;
}
int len_message;
int count = 0;
unsigned char letter;
// get the length of the message to be hidden
len_message = (int)strlen(message);
for(int j = 0; j < len_message; j++)
{
letter = message[j];
int mask = 0x80;
// loop through each byte
for(int k = 0; k < 8; k++)
{
if((letter & mask) == 0)
{
//set right most bit to 0
data[count] = 0xfe & data[count];
}
else
{
//set right most bit to 1
data[count] = 0x01 | data[count];
}
// shift the mask
mask = mask>>1 ;
count++;
}
}
// create the null character at the end of the message (00000000)
for(int b = 0; b < 8; b++){
data[count] = 0xfe & data[count];
count++;
}
// write a new image file with the message hidden in it
int code2 = write_ppm_image(output_file_name, data, n, width, height, max_color);
if (code2)
{
// return the appropriate error message if the image doesn't load correctly
return code2;
}
return 0;
}
Edit:
int main(int argc, const char * argv[])
{
if (argc < 2 || argv[1][0] != '-')
{
// the user didn't enter a switch
fprintf(stderr, "Usage: no switch was entered.\n");
return 1;
}
else if ((strcmp(argv[1], "-e") == 0) && (argc == 5))
{
// hides a message in an output file
hide_message(argv[2], argv[3], argv[4]);
}
else if ((strcmp(argv[1], "-d") == 0) && (argc == 3))
{
// retrieves a hidden message in a given file
retrieve_message(argv[2]);
}
else
{
// all other errors prompt a general usage error
fprintf(stderr, "Usage: error");
return 1;
}
}
Program ended with exit code: 0
The exit code of a C program is the return value of main or the status value in an explicit exit function call. In your case, the load_ppm_image returns a value to its caller function hide_message which in turn returns the same value to its caller function main. However, main does not explicitly return anything for the case that calls hide_message. The C standard specifies that main implicitly returns 0 if it reaches the end of the function without an explicit return. Hence the exit code of 0 in your case.
To get the behaviour you desire modify your main code to return the hide_message return value.
else if ((strcmp(argv[1], "-e") == 0) && (argc == 5))
{
// hides a message in an output file
return (hide_message(argv[2], argv[3], argv[4]));
}
From code :
In main function you are calling the hide_message function, but not collecting return value of hide_message()
hide_message(argv[2], argv[3], argv[4]);
So the main function is ending with return 0.

"test.exe encountered a breakpoint"

I am writing a UNIX paste clone. However I keep getting "encountered a breakpoint" messages, but VS won't tell me on what line it happened.
#include <stdio.h>
#include <stdlib.h>
#define INITALLOC 16
#define STEP 8
int main(int argc, char *argv[])
{
if (horzmerge(argc - 1, argv + 1) == 0) {
perror("horzmerge");
return EXIT_FAILURE;
}
getchar();
return EXIT_SUCCESS;
}
int horzmerge(int nfiles, const char **filenames)
{
FILE **files;
char *line;
int i;
if ((files = malloc(nfiles * sizeof (FILE *))) == NULL)
return 0;
for (i = 0; i < nfiles; ++i)
if ((files[i] = fopen(filenames[i], "r")) == NULL)
return 0;
do {
for (i = 0; i < nfiles; ++i) {
if (getline(files[i], &line) == 0)
return 0;
fprintf(stdout, "%s", line);
free(line);
}
putchar('\n');
} while (!feof(files[0])); /* we can still get another line */
for (i = 0; i < nfiles; ++i)
fclose(files[i]);
free(files);
return 1;
}
int getline(FILE *fp, char **dynline)
{
size_t nalloced = INITALLOC;
int c, i;
if ((*dynline = calloc(INITALLOC, sizeof(char))) == NULL)
return 0;
for (i = 0; (c = getc(fp)) != EOF && c != '\n'; ++i) {
if (i == nalloced)
if ((*dynline = realloc(*dynline, nalloced += STEP)) == NULL)
return 0;
(*dynline)[i] = c;
}
(*dynline)[i] = '\0';
if (c == EOF)
return EOF;
return i;
}
I placed breakpoints, and saw that it was the free(line) statement in horzmerge. But sometimes the program runs fine. Sometimes it doesn't. Sometimes I get a "Heap corrupted" in getline. I've been working on this code for a week, still can't find the bug(s).
It looks to me like the line where you null-terminate the input string is capable of overrunning the buffer you calloced or realloced. That has the potential of corrupting your heap when you free that buffer.
Dont't forget to leave room for the null character at the end of the string when you allocate memory.
Null-terminated strings are like disco. They still suck forty years later.

What is the cause of my segment fault in C?

When I compile my code, I get no errors. However, when I attempt to run it, I get a segmentation fault (core dumped). Here is my main:
Original code
void main(int argc, char *argv[]){
if(argc < 3){
return;
}
char *stop_list_name = argv[1];
char *doc_names[argc - 2];
int i;
for(i = 0; i < argc; i++){
doc_names[i] = argv[i];
}
//create the array of stop words
char *stopWords[50];
char *word;
int word_counter = 0;
FILE *fp;
fp = fopen(stop_list_name, "r");
if(fp != NULL){
while(!feof(fp)){
fscanf(fp, "%s", word);
stopWords[word_counter] = word;
word_counter++;
}
}
fclose(fp);
for(i = 0; stopWords[i] != '\0'; i++){
printf("%s", stopWords[i]);
}
}
I'm pretty sure something is wrong in my while loop, but I don't exactly know what, or how to fix it.
Amended code
After seeing the answers, I modified my code so it looks like this, but it still crashes. What's wrong now?
int main(int argc, char *argv[]){
if(argc < 3){
return;
}
char *stop_list_name = argv[1];
char *doc_names[argc - 2];
int i;
for(i = 2; i < argc; i++){
doc_names[i-2] = argv[i];
}
//create the array of stop words
enum {MAX_STOP_WORDS = 50};
char *stopWords[MAX_STOP_WORDS];
int word_counter = 0;
FILE *fp = fopen(stop_list_name, "r");
if(fp != NULL){
char word[64];
int i;
for(i = 0; i < MAX_STOP_WORDS && fscanf(fp, "%63s", word) == 1; i++){
stopWords[i] = strdup(word);
}
word_counter = i;
fclose(fp);
}
for(i = 0; stopWords[i] != '\0'; i++){
printf("%s", stopWords[i]);
}
}
Problems in the original code
One possible source of problems is:
char *doc_names[argc - 2];
int i;
for(i = 0; i < argc; i++){
doc_names[i] = argv[i];
}
You allocate space for argc-2 pointers and proceed to copy argc pointers into that space. That's a buffer overflow (in this case, a stack overflow too). It can easily cause the trouble. A plausible fix is:
for (i = 2; i < argv; i++)
doc_names[i-2] = argv[i];
However, you really don't need to copy the argument list; you can just process the arguments from index 2 to the end. I note that the code shown doesn't actually use doc_names, but the out-of-bounds assignment can still cause trouble.
You are not allocating space to read a word into, nor allocating new space for each stop word, nor do you ensure that you do not overflow the bounds of the array in which you're storing the words.
Consider using:
enum { MAX_STOP_WORDS = 50 };
char *stopWords[MAX_STOP_WORDS];
int word_counter = 0;
FILE *fp = fopen(stop_list_name, "r");
if (fp != NULL)
{
char word[64];
for (i = 0; i < MAX_STOP_WORDS && fscanf(fp, "%63s", word) == 1; i++)
stopWords[i] = strdup(word);
word_counter = i;
fclose(fp);
}
This diagnosed problem is definitely a plausible cause of your crash. I used i (declared earlier in the code) in the loop because word_counter makes the loop control line too long for SO.
Strictly, strdup() is not a part of standard C, but it is a part of POSIX. If you don't have POSIX, you can write your own:
#include <stdlib.h>
#include <string.h>
char *strdup(const char *str)
{
size_t len = strlen(str) + 1;
char *result = malloc(len);
if (result != 0)
memmove(result, str, len);
return result;
}
You also have some other bad practices on display:
while (!feof(file)) is always wrong.
What should main() return in C and C++?
You should only call fclose(fp) if the fopen() worked, so you need to move the fclose() inside the if statement body.
Problems in the amended code
There's one important and a couple of very minor problems in the amended code:
Your loop that prints the stop words depends on a null pointer (curiously spelled as '\0' — it is a valid but unconventional spelling for a null pointer), but the initialization code doesn't set a null pointer.
There are (at least) two options for fixing that:
Add a null pointer:
for (i = 0; i < MAX_STOP_WORDS-1 && fscanf(fp, "%63s", word) == 1; i++)
stopWords[i] = strdup(word);
stopWords[i] = 0;
fclose(fp);
}
for (i = 0; stopWords[i] != '\0'; i++)
printf("%s\n", stopWords[i]);
Note that the upper bound is now MAX_STOP_WORDS - 1.
Or you can use wordCount instead of a condition:
for (i = 0; i < wordCount; i++)
printf("%s\n", stopWords[i]);
I'd choose the second option.
One reason for doing that is it avoids warnings about wordCount being set and not used — a minor problem.
And doc_names is also set but not used.
I worry about those because my default compiler options generate errors for unused variables — so the code doesn't compile until I fix it. That leads to:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
if (argc < 3)
{
fprintf(stderr, "Usage: %s stop-words docfile ...\n", argv[0]);
return 1;
}
char *stop_list_name = argv[1];
char *doc_names[argc - 2];
int i;
for (i = 2; i < argc; i++)
{
doc_names[i - 2] = argv[i];
}
int doc_count = argc - 2;
// create the array of stop words
enum { MAX_STOP_WORDS = 50 };
char *stopWords[MAX_STOP_WORDS];
int word_counter = 0;
FILE *fp = fopen(stop_list_name, "r");
if (fp != NULL)
{
char word[64];
int i;
for (i = 0; i < MAX_STOP_WORDS && fscanf(fp, "%63s", word) == 1; i++)
stopWords[i] = strdup(word);
word_counter = i;
fclose(fp);
}
for (i = 0; i < word_counter; i++)
printf("stop word %d: %s\n", i, stopWords[i]);
for (i = 0; i < doc_count; i++)
printf("document %d: %s\n", i, doc_names[i]);
return 0;
}
And, given a stop words file containing:
help
able
may
can
it
should
do
antonym
prozac
and compiling it (source file sw19.c, program sw19) with:
$ gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
> -Wold-style-definition -Werror sw19.c -o sw19
and running it as:
$ ./sw19 stopwords /dev/null
stop word 0: help
stop word 1: able
stop word 2: may
stop word 3: can
stop word 4: it
stop word 5: should
stop word 6: do
stop word 7: antonym
stop word 8: prozac
document 0: /dev/null
$
You are trying to store the scanned string to an uninitialized pointer,
fscanf(fp, "%s", word);
and word, is not even initialized.
You could use a static buffer for that, just like this
char word[100];
if (fscanf(fp, "%99s", word) != 1)
word[0] = '\0'; /* ensure that `word' is nul terminated on input error */
Also, while (!feof(fp)) is wrong, because the EOF marker wont be set until fscanf() attempts to read past the end of the file, so the code will iterate one extra time. And in that case you would store the same word twice.
Note that you will also need to allocate space for the array of pointers, maybe there you could use malloc().

Resources