I'm reading records from a CSV file using fgets() to read the file one line at a time, and strtok() to parse the fields in each line. I'm encountering a problem where fgets() overwrites a string that was previously written, in favor of the new string.
Here's an example of what I mean by that:
record.csv (This is the file I'm reading in)
John,18
Johann,29
main.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct customer {
char *name;
int age;
} Customer;
int main(void)
{
FILE *csv_data;
char line[100], *token;
Customer newData[2];
csv_data = fopen("record.csv", "r");
// Index 0 for John's data, index 1 for Johann's data
int i = 0;
/* loops until end of file */
while(fgets(line, 100, csv_data)) {
/* name field */
token = strtok(line, ",");
if (token != NULL) {
newData[i].name = token;
}
/* age field */
token = strtok(NULL, ",");
if (token != NULL) {
// atoi() converts ascii char to integer
newData[i].age = atoi(token);
}
i++;
}
/* print John's records */
printf("%s\n", newData[0].name);
printf("%d\n", newData[0].age);
/* print Johann's records */
printf("%s\n", newData[1].name);
printf("%d\n", newData[1].age);
return 0;
}
When we compile and execute this, it prints out:
Johann
18
Johann
29
"John" in newData[0].name gets overwritten with "Johann" during the second iteration of the while loop. Notice however that only the strings get mixed up, but not the integers. I suspect this has to do with fgets because when I modified the above source to only run fgets once, the output for "John" was as it should be.
Maybe I'm misusing fgets (or perhaps my assumption is wrong), but could someone give me some pointers on why the strings are being overwritten with each calls to fgets?
Second Update: Thank-you very much again to all the commenters and the answerers. Good to know those things which I was not aware of. The source works perfect now.
You are not copy the string but the pointer to the string.
A very simple way to copy the string, but note that this limit the size of the string at 99 characters.
typedef struct customer {
char name[100];
int age;
} Customer;
strcpy(newData[i].name, token);
Do:
newData[i].name = malloc( strlen( token ) + 1 );
strcpy( newData[i].name, token );
or define name member as char name[64]; and then again strcpy( newData[i].name, token ); without malloc. The 64 bytes for name can be more or less.
Related
I am trying to read a file line by line and split it into words. Those words should be saved into an array. However, the program only gets the first line of the text file and when it tries to read the new line, the program crashes.
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {
token = strtok(NULL, " ");
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}
After good answer from xing I decided to write my FULL simple program realizing your task and tell something about my solution. My program reads line-by-line a file, given as input argument and saves next lines into a buffer.
Code:
#include <assert.h>
#include <errno.h>
#define _WITH_GETLINE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define assert_msg(x) for ( ; !(x) ; assert(x) )
int
main(int argc, char **argv)
{
FILE *file;
char *buf, *token;
size_t length, read, size;
assert(argc == 2);
file = fopen(argv[1], "r");
assert_msg(file != NULL) {
fprintf(stderr, "Error ocurred: %s\n", strerror(errno));
}
token = NULL;
length = read = size = 0;
while ((read = getline(&token, &length, file)) != -1) {
token[read - 1] = ' ';
size += read;
buf = realloc(buf, size);
assert(buf != NULL);
(void)strncat(buf, token, read);
}
printf("%s\n", buf);
fclose(file);
free(buf);
free(token);
return (EXIT_SUCCESS);
}
For file file.txt:
that is a
text
which I
would like to
read
from file.
I got a result:
$ ./program file.txt
that is a text which I would like to read from file.
Few things which is worth to say about that solution:
Instead of fgets(3) I used getline(3) function because of easy way to knowledge about string length in line (read variable) and auto memory allocation for got string (token). It is important to remember to free(3) it. For Unix-like systems getline(3) is not provided by default in order to avoid compatibility problems. Therefore, #define _WITH_GETLINE macro is used before <stdio.h> header to make that function available.
buf contains only mandatory amount of space needed to save string. After reading one line from file buf is extended by the required amount of space by realloc(3). Is it a bit more "universal" solution. It is important to remember about freeing objects allocated on heap.
I also used strncat(3) which ensures that no more than read characters (length of token) would be save into buf. It is also not the best way of using strncat(3) because we also should testing a string truncation. But in general it is better than simple using of strcat(3) which is not recommended to use because enables malicious users to arbitrarily change a running program's functionality through a buffer overflow attack. strcat(3) and strncat(3) also adds terminating \0.
A getline(3) returns token with a new line character so I decided to replace it from new line to space (in context of creating sentences from words given in file). I also should eliminate last space but I do not wanted to complicate a source code.
From not mandatory things I also defined my own macro assert_msg(x) which is able to run assert(3) function and shows a text message with error. But it is only a feature but thanks to that we are able to see error message got during wrong attempts open a file.
The problem is getting the next token in the inner while loop and passing the result to strcpy without any check for a NULL result.
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {//not at the end of the line. yet!
token = strtok(NULL, " ");//get next token. but token == NULL at end of line
//passing NULL to strcpy is a problem
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}
By incorporating the check into the while condition, passing NULL as the second argument to strcpy is avoided.
while ( ( token = strtok ( NULL, " ")) != NULL) {//get next token != NULL
//if token == NULL the while block is not executed
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
Sanitize your loops, and don't repeat yourself:
#include <stdio.h>
#include <string.h>
int main(void)
{
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
for(i=0; fgets(buf,sizeof(buf),inputfile); ) {
for(token = strtok(buf, " "); token != NULL; token = strtok(NULL, " ")){
strcpy(words[i++], token);
}
}
return 0;
}
I have some C-code that reads in a text file line by line, hashes the strings in each line, and keeps a running count of the string with the biggest hash values.
It seems to be doing the right thing but when I issue the print statement:
printf("Found Bigger Hash:%s\tSize:%d\n", textFile.biggestHash, textFile.maxASCIIHash);
my print returns this in the output:
Preprocessing: dict1
Found BiSize:110h:a
Found BiSize:857h:aardvark
Found BiSize:861h:aardwolf
Found BiSize:937h:abandoned
Found BiSize:951h:abandoner
Found BiSize:1172:abandonment
Found BiSize:1283:abbreviation
Found BiSize:1364:abiogenetical
Found BiSize:1593:abiogenetically
Found BiSize:1716:absentmindedness
Found BiSize:1726:acanthopterygian
Found BiSize:1826:accommodativeness
Found BiSize:1932:adenocarcinomatous
Found BiSize:2162:adrenocorticotrophic
Found BiSize:2173:chemoautotrophically
Found BiSize:2224:counterrevolutionary
Found BiSize:2228:counterrevolutionist
Found BiSize:2258:dendrochronologically
Found BiSize:2440:electroencephalographic
Found BiSize:4893:pneumonoultramicroscopicsilicovolcanoconiosis
Biggest Size:46umonoultTotal Words:71885covolcanoconiosis
So tt seems I'm misusing printf(). Below is the code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define WORD_LENGTH 100 // Max number of characters per word
// data1 struct carries information about the dictionary file; preprocess() initializes it
struct data1
{
int numRows;
int maxWordSize;
char* biggestWord;
int maxASCIIHash;
char* biggestHash;
};
int asciiHash(char* wordToHash);
struct data1 preprocess(char* fileName);
int main(int argc, char* argv[]){
//Diagnostics Purposes; Not used for algorithm
printf("Preprocessing: %s\n",argv[1]);
struct data1 file = preprocess(argv[1]);
printf("Biggest Word:%s\t Size:%d\tTotal Words:%d\n", file.biggestWord, file.maxWordSize, file.numRows);
//printf("Biggest hashed word (by ASCII sum):%s\tSize: %d\n", file.biggestHash, file.maxASCIIHash);
//printf("**%s**", file.biggestHash);
return 0;
}
int asciiHash(char* word)
{
int runningSum = 0;
int i;
for(i=0; i<strlen(word); i++)
{
runningSum += *(word+i);
}
return runningSum;
}
struct data1 preprocess(char* fName)
{
static struct data1 textFile = {.numRows = 0, .maxWordSize = 0, .maxASCIIHash = 0};
textFile.biggestWord = (char*) malloc(WORD_LENGTH*sizeof(char));
textFile.biggestHash = (char*) malloc(WORD_LENGTH*sizeof(char));
char* str = (char*) malloc(WORD_LENGTH*sizeof(char));
FILE* fp = fopen(fName, "r");
while( strtok(fgets(str, WORD_LENGTH, fp), "\n") != NULL)
{
// If found a larger hash
int hashed = asciiHash(str);
if(hashed > textFile.maxASCIIHash)
{
textFile.maxASCIIHash = hashed; // Update max hash size found
strcpy(textFile.biggestHash, str); // Update biggest hash string
printf("Found Bigger Hash:%s\tSize:%d\n", textFile.biggestHash, textFile.maxASCIIHash);
}
// If found a larger word
if( strlen(str) > textFile.maxWordSize)
{
textFile.maxWordSize = strlen(str); // Update biggest word size
strcpy(textFile.biggestWord, str); // Update biggest word
}
textFile.numRows++;
}
fclose(fp);
free(str);
return textFile;
}
You forget to remove the \r after reading. This is in your input because (1) your source file comes from a Windows machine (or at least one which uses \r\n line endings), and (2) you use the fopen mode "r", which does not translate line endings on your OS (again, presumably Windows).
This results in the weird output as follows:
Found Bigger Hash:text\r\tSize:123
– see the position of the \r? So what happens when outputting this string, you get at first
Found Bigger Hash:text
and then the cursor gets repositioned to the start of the line by \r. Next, a tab is output – not by printing spaces but merely moving the cursor to the 8thth position:
1234567↓
Found Bigger Hash:text
and the rest of the string is printed over the one already shown:
Found BiSize:123h:text
Possible solutions:
Open your file in "rt" "text" mode, and/or
Check for, and remove, the \r code as well as \n.
I'd go for both. strchr is pretty cheap and will make your code a bit more foolproof.
(Also, please simplify your fgets line by splitting it up into several distinct operations.)
Your statement
while( strtok(fgets(str, WORD_LENGTH, fp), "\n") != NULL)
takes no account of the return value from fgets() or the way strtok() works.
The way to do this is something like
char *fptr, *sptr;
while ((fptr = fgets(str, WORD_LENGTH, fp)) != NULL) {
sptr = strtok(fptr, "\n");
while (sptr != NULL) {
printf ("%s,", sptr);
sptr = strtok (NULL, "\n");
}
printf("\n");
}
Note than after the first call to strtok(), subsequent calls on the same sequence must pass the parameter NULL.
Ok, so I'm going to explain my program.
It takes a text file that's setup as such: in pairs, first line being the title of an experiment, and the second line being 10 numbers separated by spaces. It saves the first lines of pairs in *experiments and the second lines of pairs in data. The last line is *** END *** which is what it's supposed to end with.
For some reason *** END *** doesn't end the program. Any ways I can fix this? I'm assuming it's because fgets gives str blank spaces (99 chars total) so that the string in quotes will never be equal to str?
Thanks.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int var;
int i=0,j,k;
char seps[] = " ";
char *experiments[20];
int data[10][20];
char str[100]; // make sure that this size is enough to hold the single line
char *ptr, *token;
int no_line=1;
while(fgets(str,100,stdin) != NULL && strcmp(str,"*** END ***"))
{
if(no_line % 2 == 0)
{
k=0;
token = strtok (str, seps);
while (token != NULL)
{
sscanf (token, "%d", &var);
data[i][k++] = var;
token = strtok (NULL, seps);
}
i++;
/*read integer values from the string "str" using sscanf, sscanf can be called in a loop with %d untill it fails */
}
else
{
ptr = strdup(str);
experiments[i] = ptr;
/*strore string in your variable "experiments" , before copying allocate a memory for the each entry */
}
no_line++;
}
for(j=0;j<i;j++)
{
printf("%s",experiments[j]);
for(k=0;k<10;k++)
{
printf("%d ",data[j][k]);
}
printf("\n");
}
}
You're declaring i here ...
int i,j,k;
... and using it here ...
data[i][k++] = var;
Nowhere do you initialize i. Also, why does data need to be a 2D array? Can't it just be a 1D array?
int data[10];
...
data[k++] = var;
From this code, int i seems to be declared, but not initialized?
data[i][k++] = var;
It may be helpful to use Eclipse or Code Block IDE to try small testable codes because it has all sorts of syntax and error checking features.
I'm sorry for the sloppy title, but I didn't know how to format my question correctly. I'm trying to read a .txt, of which every line has information needed to fill a struct. First I use fgets to read the line, and then i was going to use sscanf to read the individual parts. Now here is where I'm stuck: normally sscanf breaks off parts on whitespaces, but I need the whitespace to be included. I know that sscanf allows ignoring whitespaces, but the tricky part is that I then need some other arbitrary character to separate the parts. For example, I have to break the line
Carl Sagan~Contact~scifi~1997
up into parts for Author,Name,Genre,year. You can see I need the space in Carl Sagan, but I need the function to break off the strings on the tilde character. Any help is appreciated
If your input is delimited by ~ or for instance any specific character:
Use this:
sscanf(s, "%[^~]", name);
[^ is conversion type, that matches all characters except the ones listed, ending with ]
Here is the sample program for testing it:
#include <stdio.h>
int main(int argv, char **argc)
{
char *s = "Carl Sagan~Contact~scifi~1997";
char name[100], contact[100], genre[100];
int yr;
sscanf(s, "%99[^~]~%99[^~]~%99[^~]~%d", name, contact, genre, &yr);
printf("%s\n%s\n%s\n%d\n", name, contact, genre, yr);
return 0;
}
You need strtok. Use ~ as your delimiter.
See the documentation: http://linux.die.net/man/3/strtok
strtok has some drawbacks but it sounds like it will work for you.
EDIT:
After reading this, it sounds like you can use sscanf cleverly to achieve the same result, and it may actually be safer after all.
#include <stddef.h>
#include <string.h>
#include <stdio.h>
char* mystrsep(char** input, const char* delim)
{
char* result = *input;
char* p;
p = (result != NULL) ? strpbrk(result, delim) : NULL;
if (p == NULL)
*input = NULL;
else
{
*p = '\0';
*input = p + 1;
}
return result;
}
int main()
{
char str[] = "Carl Sagan~Contact~scifi~1997";
const char delimiters[] = "~";
char* ptr;
char* token;
ptr = str;
token = mystrsep(&ptr, delimiters);
while(token)
{
printf("%s\n",token);
token = mystrsep(&ptr, delimiters);
}
return 0;
}
Output :-
Carl Sagan
Contact
scifi
1997
Well, I declared a global array of chars like this char * strarr[];
in a method I am tokenising a line and try to put everything into that array like this
*line = strtok(s, " ");
while (line != NULL) {
*line = strtok(NULL, " ");
}
seems like this is not working.. How can I fix it?
Thanks
Any number of things could be going wrong with the code you haven't shown us, such as undefined behaviour by strtoking a string constatnt, or getting your parameters wrong when calling the function.
But the most likely problem from the code we can see is the use of *line instead of line, assuming that line is of type char *.
Use the following code as a baseline:
#include <stdio.h>
#include <string.h>
int main (void) {
char str[] = "My name is paxdiablo";
// Start tokenising words.
char *line = strtok (str, " ");
while (line != NULL) {
// Print current token and get next word.
printf ("[%s]\n", line);
line = strtok(NULL, " ");
}
return 0;
}
This outputs:
[My]
[name]
[is]
[paxdiablo]
and should be easily modifiable into something you can use.
Be aware that, if you're trying to save the character pointers returned from strtok (which would make sense for using *line), they are transitory and will not be what you expect after you're done. That's because modifications are made in-place within the source string. You can do it with something like:
#include <stdio.h>
#include <string.h>
int main (void) {
char *word[4]; // The array of words.
size_t i; // General counter.
size_t nextword = 0; // For preventing array overflow.
char str[] = "My name is paxdiablo";
// Start tokenising.
char *line = strtok (str, " ");
while (line != NULL) {
// If array not full, duplicate string to array and advance index.
if (nextword < sizeof(word) / sizeof(*word))
word[nextword++] = strdup (line);
// Get next word.
line = strtok(NULL, " ");
}
// Print out all stored words.
for (i = 0; i < nextword; i++)
printf ("[%s]\n", word[i]);
return 0;
}
Note the specific size of the word array in that code above. The use of char * strarr[] in your code, along with the message tentative array definition assumed to have one element is almost certainly where the problem lies.
If your implementation doesn't come with a strdup, you can get a reasonably-priced one here :-)