Parsing and data overwriting issues in C using custom strtok - c

I'm reading in a .csv file, which I then need to parse into tokens. I tried using strtok(), but that unfortunately cannot return null fields (which my data is fulll of). So I went with a home-made version of strtok that I found, strtok_single, which returns the correct values that I need.
The data is input into my array correctly; but there is something wrong because before the initilization loops finish, the data gets overwritten. I've tried print statements and analyzing the problem but I just can't figure out what's wrong. Any insight at all would be helpful.
Here is the homemade strtok function I'm using:
char* strtok_single(char* str, char const* delims) {
static char* src = NULL;
char* p, *ret = 0;
if (str != NULL)
src = str;
if (src == NULL)
return NULL;
if ((p = strpbrk(src, delims)) != NULL) {
*p = 0;
ret = src;
src = ++p;
}
return ret;
}
Here is my code:
int main() {
int numLines = 0;
int ch, i, j;
char tmp[1024];
char* field;
char line[1024];
FILE* fp = fopen("filename.csv", "r");
// count number of lines in file
while ((ch = fgetc(fp)) != EOF) {
if (ch == '\n')
numLines++;
}
fclose(fp);
// Allocate memory for each line in file
char*** activity = malloc(numLines * sizeof(char**));
for (i = 0; i < numLines; i++) {
activity[i] = malloc(42 * sizeof(char*));
for (j = 0; j < 42; j++) {
activity[i][j] = malloc(100 * sizeof(char));
}
}
// read activity file and initilize activity matrix
FILE* stream = fopen("filename.csv", "r");
i = 0;
while (fgets(line, 1024, stream)) {
j = 0;
int newlineLoc = strcspn(line, "\n");
line[newlineLoc] = ',';
strcpy(tmp, line);
field = strtok_single(tmp, ",");
while (field != NULL) {
for (j = 0; j < 42; j++) {
activity[i][j] = field;
field = strtok_single(NULL, ",");
// when I print activity[i][j] here, the values are correct
}
// when I print activity[i][j] here, the values are correct for the
// first iteration
// and then get overwritten by partial data from the next line
}
i++;
} // close while
fclose(stream);
// by the time I get to here my matrix is full of garbage
// some more code that prints the array and frees memory
} // close main

activity[i][j] = field;
When the loops finish, each activity[i][j] points to somewhere in tmp, which is overwritten in each loop. Instead, since you pre-allocate space in each activity[i][j], you should just copy the contents of the string to that:
strcpy(activity[i][j], field);
Being careful of buffer overflow (i.e. if field is more than 99 characters).
Also, the sizeof(char) is superfluous since it's always 1 by definition.

Your line "activity[i][j] = field;" is backwards - you want the pointer assigned to the malloc'd memory.

Related

C, values in array of pointers dissapear (pointers)

I seem to be losing the reference to my pointers here. I dont know why but I suspect its the pointer returned by fgets that messes this up.
I was told a good way to read words from a file was to get the line then separate the words with strok, but how can I do this if my pointers inside words[i] keep dissapearing.
text
Natural Reader is
john make tame
Result Im getting.
array[0] = john
array[1] = e
array[2] =
array[3] = john
array[4] = make
array[5] = tame
int main(int argc, char *argv[]) {
FILE *file = fopen(argv[1], "r");
int ch;
int count = 0;
while ((ch = fgetc(file)) != EOF){
if (ch == '\n' || ch == ' ')
count++;
}
fseek(file, 0, SEEK_END);
size_t size = ftell(file);
fseek(file, 0, SEEK_SET);
char** words = calloc(count, size * sizeof(char*) +1 );
int i = 0;
int x = 0;
char ligne [250];
while (fgets(ligne, 80, file)) {
char* word;
word = strtok(ligne, " ,.-\n");
while (word != NULL) {
for (i = 0; i < 3; i++) {
words[x] = word;
word = strtok(NULL, " ,.-\n");
x++;
}
}
}
for (i = 0; i < count; ++i)
if (words[i] != 0){
printf("array[%d] = %s\n", i, words[i]);
}
free(words);
fclose(file);
return 0;
}
strtok does not allocate any memory, it returns a pointer to a delimited string in the buffer.
therefore you need to allocate memory for the result if you want to keep the word between loop iterations
e.g.
word = strdup(strtok(ligne, " ,.-\n"));
You could also hanle this by using a unique ligne for each line read, so make it an array of strings like so:
char ligne[20][80]; // no need to make the string 250 since fgets limits it to 80
Then your while loop changes to:
int lno = 0;
while (fgets(ligne[lno], 80, file)) {
char *word;
word = strtok(ligne[lno], " ,.-\n");
while (word != NULL) {
words[x++] = word;
word = strtok(NULL, " ,.-\n");
}
lno++;
}
Adjust the first subscript as needed for the maximum size of the file, or dynamically allocate the line buffer during each iteration if you don't want such a low limit. You could also use getline instead of fgets, if your implementation supports it; it can handle the allocation for, though you then need to free the blocks when you are done.
If you are processing real-world prose, you might want to include other delimiters in your list, like colon, semicolon, exclamation point, and question mark.

How to avoid buffer overflow with C struct array of strings

I'm running into buffer overflows when reading a file in C and copying character arrays. There are three potentially offending pieces of code and I can't figure out where I'm going wrong.
The first reads a file and populates it into a hashmap:
bool load_file(const char* in_file, hmap hashtable[]) {
for(int x = 0; x < HASH_SIZE; x++) {
hashtable[x] = NULL;
}
FILE *fptr = fopen(in_file, "r");
char c[LENGTH] = "";
c[0] = '\0';
while (fgets(c, sizeof(c)-1, fptr) != NULL) {
node *n = malloc(sizeof(node));
hmap new_node = n;
new_node->next = NULL;
strncpy(new_node->content, c, LENGTH-1);
// do stuff to put it into the hashtable
}
fclose(fptr);
return true;
}
The second checks whether given content is in the hashmap:
bool check_content(const char* content, hmap hashtable[]) {
char c_content[LENGTH] = "";
strncpy(c_content, content, LENGTH-1);
// do stuff to check if it's in the hashmap
return false;
}
and the third parses a given file and checks whether its content is in the hashmap:
int check_file(FILE* fp, hmap hashtable[], char * not_found[]) {
int num_not_found = 0;
char c[1000] = "";
while (fgets(c, sizeof(c)-1, fp) != NULL) {
char * pch;
char curToken[LENGTH] = "";
pch = strtok (c," ");
strncpy(curToken, pch, LENGTH-1);
curToken[LENGTH]=0;
if(!check_content(curToken, hashtable)) {
not_found[num_not_found] = malloc(LENGTH*sizeof(not_found[num_not_found]));
strncpy(not_found[num_not_found], curToken, LENGTH-1);
num_not_found++;
}
}
fclose(fp);
return num_not_found;
}
Finally, main calls these and frees mallocs:
int main (int argc, char *argv[])
{
hmap hashtable[HASH_SIZE];
load_file(argv[2], hashtable);
FILE *fptr = fopen(argv[1], "r");
char * not_found[MAX_ENTRIES];
int num_not_found = check_file(fptr, hashtable, not_found);
for(int x=0; x<num_not_found; x++) {
free(not_found[x]);
}
for(int y=0; hashtable[y] != NULL; y++) {
free(hashtable[y]);
}
return 0;
}
My question is this: for each of the three code snippets, what have I done that causes buffer overflows? Many thanks in advance!
I finally got rid of the buffer overflow problems mostly by following David's advice in the comments, plus figuring out that I had one more malloc than I needed. The fixes were:
new_node->next needed a malloc
The malloc for new_node->next should happen only if it's actually going to be used.
not_found[num_not_found] = malloc(LENGTH*sizeof(not_found[num_not_found])); was wrong and should have been notfound[num_not_found] = malloc(sizeof(char) * (strlen(pch)+1)) (assuming pch wasn't null terminated). (Side note, for whatever reason, on my computer, malloc(sizeof(char) * strlen(pch)+1) is not the same as malloc(strlen(pch)+1))
The return of every malloc really does have to be validated.

C - Why does char array only return last value that's put into it?

I'm writing in C. I'm trying to read lines from a text file, parse the line, and put some info into an array of strings. When I test my code, every value in the array seems to be the last value inserted. What causes this?
int r;
char *users[51]; //given no more than 50 users
for (r = 0; r < 51; r++) {
int n = 15; //arbitrary guess at length of unknown usernames
users[r] = malloc((n + 1) * sizeof(char));
}
FILE *fp;
fp = fopen(argv[1], "r");
char *username;
int counter = 0;
char line[100];
while (fgets(line, 100, fp) != NULL) {
username = strtok(line, ":");
users[counter] = username;
printf("%s\n", username);
printf("%s\n", users[counter]);
//counter increase for later
counter += 1;
strtok is a very confusing function:
it modifies the array it receives a pointer to
it returns a pointer to an element of this array.
it keeps an internal state which makes it non-reentrant and non thread-safe.
Hence username points inside line. You store this pointer into users[counter]. At the end of the loop, all entries in users point to the same array that has been overwritten by every call to fgets().
You should duplicate the contents of the array with strdup():
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[) {
char *users[51]; //given no more than 50 users
int r;
FILE *fp;
if (argc < 2) {
fprintf(stderr, "missing filename argument\n");
return 1;
}
fp = fopen(argv[1], "r");
if (fp == NULL) {
fprintf(stderr, "cannot open file %s\n", argv[1]);
return 1;
}
char line[100];
int counter = 0;
while (counter < 50 && fgets(line, 100, fp) != NULL) {
char *username = strtok(line, ":");
if (username != NULL) {
users[counter] = strdup(username);
//counter increase for later
counter += 1;
}
}
users[counter] = NULL;
...
}
You put a sensible value into each entry in the array:
users[r] = malloc((n+1) * sizeof(char));
But then you overwrite it with a nonsensical value (a pointer into line):
users[counter] = username;
What you presumably wanted to do was copy the string pointed to by username into the space allocated at users[counter]. The strcpy function can do this.
I would just like to add to David's answer that you should also check the username string is null terminated before using strcpy(). I can't remember if strtok() null terminates but you shouldn't rely on it anyway.

read int** from file C

I have to read a file in C and create an int**.
This is the file:
2
-1,1,1,0,0,1
1,-1,0,1,0
I'm doing this:
FILE *fp = fopen("grafo.txt", "r");
char line[100];
int numLinea = 0;
char** tokens;
while (1) {
if (fgets(line,150, fp) == NULL) break;
if(numLinea == 0){
NUMERO_NODOS = atoi( line );
nodos = (int **)malloc (NUMERO_NODOS*sizeof(int *));
}else{
tokens = str_split(line, ',');
if (tokens) {
for (int i = 0; *(tokens + i); i++) {
char* contactoNodo;
strcpy(contactoNodo, *(tokens + i));
int numNodo = numLinea-1;
nodos[numNodo] = (int *) malloc (NUMERO_NODOS*sizeof(int));
nodos[numNodo][i] = atoi(contactoNodo);
printf("nodos[%i][%i] = %i\n",numNodo,i,nodos[numNodo][i]);
printf("nodos[0][0] = %i\n",nodos[0][0]);
//free(contactoNodo);
}
printf("nodos[0][0] = %i\n",nodos[0][0]);
//free(tokens);
}
}
numLinea++;
//printf("%3d: %s", i, line);
}
And this is the output:
nodos[0][0] = -1
nodos[0][0] = -1
nodos[0][1] = 1
nodos[0][0] = -1163005939
(...)
Why is nodos[0][0] = -1163005939 in the second iteration of the for loop?
SOLUTION
LOL, it was that:
if(i==0){
nodos[numNodo] = (int *) malloc (NUMERO_NODOS*sizeof(int));
}
I can't believe I didn't see it. Thanks MikeCAT!!!
Fatal errors:
You invoked undefined behavior by using value of uninitialized variable contactoNodo having automatic storage duration, which is indeteminate.
You threw away what is read in the first iteration by allocating new buffer and overwriting the pointer to old buffer by it, and invoked undefined behavior again by reading contents of buffer allocated via malloc and not initialized.
Warnings:
You should pass correct (equals or less than the actual buffer size) buffer size to fgets() to avoid buffer overrun.
They say you shouldn't cast the result of malloc() in C.
Try this:
FILE *fp = fopen("grafo.txt", "r");
char line[100];
int numLinea = 0;
char** tokens;
while (1) {
/* use correct buffer size to avoid buffer overrun */
if (fgets(line,sizeof(line), fp) == NULL) break;
if(numLinea == 0){
NUMERO_NODOS = atoi( line );
/* remove cast of what is returned from malloc() */
nodos = malloc (NUMERO_NODOS*sizeof(int *));
}else{
tokens = str_split(line, ',');
if (tokens) {
for (int i = 0; *(tokens + i); i++) {
char contactoNodo[100]; /* allocate buffer statically */
strcpy(contactoNodo, *(tokens + i));
int numNodo = numLinea-1;
if (i == 0) { /* allocate buffer in only the first iteration */
/* remove cast of what is returned from malloc() */
nodos[numNodo] = malloc (NUMERO_NODOS*sizeof(int));
}
nodos[numNodo][i] = atoi(contactoNodo);
printf("nodos[%i][%i] = %i\n",numNodo,i,nodos[numNodo][i]);
printf("nodos[0][0] = %i\n",nodos[0][0]);
/* do not free() what is not allocated via memory management functions such as malloc() */
}
printf("nodos[0][0] = %i\n",nodos[0][0]);
//free(tokens);
}
}
numLinea++;
//printf("%3d: %s", i, line);
}

Why is my array of pointers getting overwritten after dynamic allocation?

I'm working on a little C program for a class that reads the lines in from a file and then sorts them using qsort. Long story short, I am dynamically allocating memory for every line of a file, stored as a char*, in an array of char*. The reading in and storing ostensibly works fine based upon the output (see below), but when I print out the lines, they are all duplicates of the last line in the file. Can anyone point out my (most likely painfully obvious) error?
Here is the relevant code to the problem I'm currently running into:
char* trim_white_space(char* str);
char* get_line(FILE* infile, char temp[]);
int main(int argc, char* argv[]) {
FILE* infile;
char* input_file = argv[1];
int cnt = 0;
char temp[MAX_LINE_LENGTH]; //to hold each line as it gets read in
char* tempPointer = temp;
if (argc < 2) {
printf("No input file provided");
return EXIT_FAILURE;
}
//determine the number of lines in the file
infile = fopen(input_file, "r");
int num_lines_in_file = num_lines(infile);
fclose(infile);
//allocate pointers for each line
char** lines = (char**) malloc(num_lines_in_file * sizeof(char*));
//temporarily store each line, and then dynamically allocate exact memory for them
infile = fopen(input_file, "r");
for (cnt = 0; cnt != num_lines_in_file; cnt++) {
tempPointer = get_line(infile, temp);
lines[cnt] = (char*) malloc(strlen(tempPointer) + 1);
lines[cnt] = trim_white_space(tempPointer);
printf("%d: %s\n", cnt, lines[cnt]);
}
fclose(infile);
//print the unsorted lines (for debugging purposes)
printf("Unsorted list:\n");
for (cnt = 0; cnt != num_lines_in_file; cnt++) {
printf("%s\n", lines[cnt]);
}
char* get_line(FILE* infile, char temp[]) {
fgets(temp, MAX_LINE_LENGTH-1, infile);
char* pntr = temp;
return pntr;
}
char *trimwhitespace(char *str)
{
char *end;
// Trim leading space
while(isspace(*str)) str++;
if(*str == 0) // All spaces?
return str;
// Trim trailing space
end = str + strlen(str) - 1;
while(end > str && isspace(*end)) end--;
// Write new null terminator
*(end+1) = 0;
return str;
}
I have this sample input file 5-1input.dat:
Hi guys
x2 My name is
Slim Shady
For real
And here's the output I'm getting:
user#user-VirtualBox ~/Desktop/Low-level/HW5 $ ./homework5-1 5-1input.dat
0: Hi guys
1: x2 My name is
2: Slim Shady
3: For real
Unsorted list:
For real
For real
For real
For real
As in the comments, you should change your loop to:
for (cnt = 0; cnt != num_lines_in_file; cnt++) {
tempPointer = get_line(infile, temp);
lines[cnt] = (char*) malloc(strlen(tempPointer) + 1);
strncpy(lines[cnt], trim_white_space(tempPointer), strlen(tempPointer)+1);
printf("%d: %s\n", cnt, lines[cnt]);
}
The size in strncpy is based on the size of malloc you've used.
Of course you can optimize this code, e.g. to count strlen only once, etc.

Resources