I am trying to read a file line by line, and tokenize each line, which have strings separated by spaces and tabs. However, when I run my program, I get the a Segmentation Fault error when I try to print the token. I don't understand why this is happening, as I am using a buffer as the string to tokenize and checking if the token is null. Below is my code:
#include <stdio.h>
#include <stdlib.h>
#define MAX_LINE_LENGTH 70
int main(void)
{
FILE * testFile;
char buf[MAX_LINE_LENGTH];
testFile = fopen("test_file.txt", "r");
if (testFile == NULL)
{
printf("Cannot open test_file.txt.\n");
exit(0);
}
while (fgets(buf, sizeof(buf), testFile) != NULL) {
char *token = strtok(buf," \t");
while (token != NULL)
{
token = strtok(NULL, " \t");
if (token != NULL) {
printf("%s\n", token);
}
}
}
exit(1);
}
Below is the contents of test_file.txt:
String1 String2 String3
String4 String5 String6
String7 String8 String9
Two helpful tips -- (1) enable compiler warnings, e.g. minimum -Wall -Wextra -pedantic for gcc/clang or /W3 for VS (any other compiler will have similar options), and do not accept code until it compiles without warning; (2) #include <string.h> where strtok is defined.
In addition to the lack of validation pointed out by #dreamer, you must be using an implicit definition for strtok. You should receive a compiler warning along those lines. Don't ignore any warning, instead go fix it, it will generally tell you the exact line the problem code is on.
Next, don't hardcode filenames. It is just as simple to pass the filename as the first argument to your program (or read from stdin by default). Your second option is to take the filename as input to your program.
Putting those together, you could do something simple like:
#include <stdio.h>
#include <string.h>
#define MAX_LINE_LENGTH 70
#define DELIM " \t\n"
int main (int argc, char **argv) {
char buf[MAX_LINE_LENGTH];
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (buf, sizeof buf, fp))
for (char *p = strtok(buf, DELIM); p; p = strtok(NULL, DELIM))
puts (p);
if (fp != stdin) /* close file if not stdin */
fclose (fp);
return 0;
}
(note: you need to include '\n' as a delimiter character to prevent the additional '\n' from being part of the last token in each line)
Example Use/Output
$ ./bin/strtokfile test_file.txt
String1
String2
String3
String4
String5
String6
String7
String8
String9
Look things over and let me know if you have questions.
Looks like you are printing without checking for NULL for token pointer.
If you need to print all tokens you also need to print inside while loop after strtok system call with addition to non-NULL check for token.
Related
I want to read the stdin with fgets.
Then, I want to read each line. My lines are strings separated with ' '.
For example, this could be a line: 1 2 ab
I thought than I should use a malloc to count the number of params in my line because this number can vary from a line to another.
1 2 3 4 has 4 but a b 2 has 3.
I cut the line with strtok and then I fill my malloc with the tokens and I print them.
The final result is only to print each lines.
For example, this is a file.txt:
1 2 33 4
a b1 c
4 b l 11
And I do:
$ cat file.txt | ./a.out
It should print:
1 2 33 4
a b1 c
4 b l 11
but it doesn't!
Can you guys help me with this please :O
Furthermore, I want to use array to count the tokens and to work with them later. For example, I need to work separately with all the second param of each line, so array[1].
This is my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv) {
while (fgets(argv, sizeof(argv), stdin) != NULL) {
char *array = (char *)malloc(sizeof(argv));
char *token = strtok(argv, " ");
strtok(token, "\n");
int i = 0;
while (token != NULL) {
array[i] = token;
token = strtok(NULL, " ");
printf("%d\n", array[i]);
++i;
}
}
return 0;
}
If all you want to do is count the tokens and output them, then you can simply use a pointer to the token and do not need to allocate storage. Using strtok() would be a good way to go. (note: if you have empty fields, you will need to parse using another method as strtok() will consider sequential delimiters as a single delimiter). The string passed to strtok() must be mutable as strtok() modifies the string (make a copy if you need to preserve the original)
Your approach is to read each line with fgets() into a sufficiently sized buffer (character array), zero your counter and then tokenize the line, incrementing the count for each token and output the token separated by a space.
You could do that as follows:
#include <stdio.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char line[MAXC]; /* storage for line */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (line, MAXC, fp)) {
size_t toks = 0,
len;
line[(len = strcspn (line, "\n"))] = 0; /* trim \n, save length */
for (char *p = strtok (line, " \t"); p; p = strtok (NULL, " \t"))
printf (toks++ ? " %s" : "%s", p);
printf ("%*s\t(%zu tokens)\n", (int)len % 8, " ", toks);
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
Above a ternary is used to control the output of the space-separator (nothing before the first token -- when the count is zero, and then a single space before each subsequent token). The count of the tokens is appended to the end of each line.
(the strlen() of each line is just used to tidy up the spacing between the end of each output line and the appended count by adding the fractional part of a tab, if needed)
Example Use/Output
With your example data in the file dat/tokfile.txt, you would receive:
$ ./bin/fgets_tok_count dat/tokfile.txt
1 2 33 4 (4 tokens)
a b1 c (3 tokens)
4 b l 11 (4 tokens)
By taking the filename as the first argument to the program, or reading from stdin by default if no argument is given, you can redirect information to your program as well, e.g.
$ ./bin/fgets_tok_count < dat/tokfile.txt
Or heaven forbid your UUOc form will also work:
$ cat dat/tokfile.txt | ./bin/fgets_tok_count
Dynamically Storing Unknown Number of Tokens Per-Line
To dynamically stored each token and preserve each for the duration of your tokenization loop, then all you need is a pointer-to-pointer-to char and a counter to track the number of pointers and strings allocated. You can do that similar to:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 1024 /* if you need a constant, #define one (or more) */
int main (int argc, char **argv) {
char line[MAXC]; /* storage for line */
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
}
while (fgets (line, MAXC, fp)) {
char **tokens = NULL;
size_t len = 0,
toks = 0;
line[(len = strcspn (line, "\n"))] = 0; /* trim \n, save length */
/* loop over each token */
for (char *p = strtok (line, " \t"); p; p = strtok (NULL, " \t")) {
size_t toklen = strlen (p);
/* allocate/validate 1 additional pointer for tokens */
void *tmp = realloc (tokens, (toks + 1) * sizeof *tokens);
if (!tmp) {
perror ("realloc-tokens");
break;
}
tokens = tmp;
/* allocate/validate storage for token of len + 1 */
if (!(tokens[toks] = malloc (toklen + 1))) {
perror ("malloc-tokens[toks]");
break;
}
/* copy token to allocated block */
memcpy (tokens[toks], p, toklen + 1);
toks++; /* increment no. of tokens in line */
}
/* output all stored line tokens and no. of tokens */
for (size_t i = 0; i < toks; i++) {
printf (i ? " %s" : "%s", tokens[i]);
free (tokens[i]); /* done with stored token, free token */
}
free (tokens); /* free pointers */
printf ("%*s\t(%zu tokens)\n", (int)len % 8, " ", toks);
}
if (fp != stdin) /* close file if not stdin */
fclose (fp);
}
(the program output is the same)
Basically, above, realloc is used to allocate storage for 1-additional pointer each time a token is found, malloc is then used to allocate for the length of the token (+1), and then the token is copied to that allocated block. When you are done tokenizing the line, the tokens pointer points to a block of memory containing toks pointers, to which a block of memory holding each token was assigned in turn. After all tokenization and storage is complete, the same output is produced by looping over the pointers, outputting the tokens (and the number of tokens). All memory is then freed.
Look things over and let me know if you have further questions.
I am trying to read a file line by line and split it into words. Those words should be saved into an array. However, the program only gets the first line of the text file and when it tries to read the new line, the program crashes.
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {
token = strtok(NULL, " ");
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}
After good answer from xing I decided to write my FULL simple program realizing your task and tell something about my solution. My program reads line-by-line a file, given as input argument and saves next lines into a buffer.
Code:
#include <assert.h>
#include <errno.h>
#define _WITH_GETLINE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define assert_msg(x) for ( ; !(x) ; assert(x) )
int
main(int argc, char **argv)
{
FILE *file;
char *buf, *token;
size_t length, read, size;
assert(argc == 2);
file = fopen(argv[1], "r");
assert_msg(file != NULL) {
fprintf(stderr, "Error ocurred: %s\n", strerror(errno));
}
token = NULL;
length = read = size = 0;
while ((read = getline(&token, &length, file)) != -1) {
token[read - 1] = ' ';
size += read;
buf = realloc(buf, size);
assert(buf != NULL);
(void)strncat(buf, token, read);
}
printf("%s\n", buf);
fclose(file);
free(buf);
free(token);
return (EXIT_SUCCESS);
}
For file file.txt:
that is a
text
which I
would like to
read
from file.
I got a result:
$ ./program file.txt
that is a text which I would like to read from file.
Few things which is worth to say about that solution:
Instead of fgets(3) I used getline(3) function because of easy way to knowledge about string length in line (read variable) and auto memory allocation for got string (token). It is important to remember to free(3) it. For Unix-like systems getline(3) is not provided by default in order to avoid compatibility problems. Therefore, #define _WITH_GETLINE macro is used before <stdio.h> header to make that function available.
buf contains only mandatory amount of space needed to save string. After reading one line from file buf is extended by the required amount of space by realloc(3). Is it a bit more "universal" solution. It is important to remember about freeing objects allocated on heap.
I also used strncat(3) which ensures that no more than read characters (length of token) would be save into buf. It is also not the best way of using strncat(3) because we also should testing a string truncation. But in general it is better than simple using of strcat(3) which is not recommended to use because enables malicious users to arbitrarily change a running program's functionality through a buffer overflow attack. strcat(3) and strncat(3) also adds terminating \0.
A getline(3) returns token with a new line character so I decided to replace it from new line to space (in context of creating sentences from words given in file). I also should eliminate last space but I do not wanted to complicate a source code.
From not mandatory things I also defined my own macro assert_msg(x) which is able to run assert(3) function and shows a text message with error. But it is only a feature but thanks to that we are able to see error message got during wrong attempts open a file.
The problem is getting the next token in the inner while loop and passing the result to strcpy without any check for a NULL result.
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {//not at the end of the line. yet!
token = strtok(NULL, " ");//get next token. but token == NULL at end of line
//passing NULL to strcpy is a problem
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}
By incorporating the check into the while condition, passing NULL as the second argument to strcpy is avoided.
while ( ( token = strtok ( NULL, " ")) != NULL) {//get next token != NULL
//if token == NULL the while block is not executed
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
Sanitize your loops, and don't repeat yourself:
#include <stdio.h>
#include <string.h>
int main(void)
{
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
for(i=0; fgets(buf,sizeof(buf),inputfile); ) {
for(token = strtok(buf, " "); token != NULL; token = strtok(NULL, " ")){
strcpy(words[i++], token);
}
}
return 0;
}
I'd love some help here, driving me nuts.
Large file using strtok to split up my lines. I'm trying to separate the first part out, save it into my symbolTblChar[lblCnt]. What I can't seem to understand is why, after pulling from the file , my output becomes so weird
INPUT
"SPACE \n LINE \n A1 \n A2"
Code
char* symbolTblChar[MAX_SYMBOLS][100];
int lblCnt = 0;
char line[LINE_SIZE];
char* chk;
while(fgets(line, sizeof line, fp) != NULL) {
chk = (char*)strtok( line, delims );
printf("Adding %s to symbol table", chk);
*symbolTblChar[lblCnt]=chk + '\0';
lblCnt++;
int t;
for(t = 0; t < lblCnt; t++)
printf("%s\n", *symbolTblChar[t]);
}
Output:
Adding SPACE to symbol table
Adding LINE to symbol table
LINE
Adding A1 to symbol table
A1
A1
Adding A2 to symbol table
A2
A2
A2
You need to allocate and store characters. With your code you are storing pointer of character array line and it will get overwritten when you read subsequent lines.
You need to do something like
*symbolTblChar[lblCnt]= strdup(chk);
Also, not sure you need double char pointer as
char* symbolTblChar[MAX_SYMBOLS][100];
You can work with below which will store MAX_SYMBOLS number of strings.
char* symbolTblChar[MAX_SYMBOLS];
I think I understand what it is you are trying to do. In addition to Rohan's answer, you are also stumbling over the use of strtok. While it is somewhat a catch 22, since you are reading a string with symbols separated by newlines, you can still make strtok work. Understand though, when using strtok, your first call to strtok uses the pointer to the string as its first argument:
chk = strtok (line, delims);
while all subsequent calls to strtok use NULL as the first argument:
chk = strtok (NULL, delims);
What is nice about strtok is it was tailor-made for parsing an entire string in a for loop format. e.g.:
for (chk = strtok (line, delims); chk; chk = strtok (NULL, delims))
Putting that together, and cleaning up symbolTblChar[MAX_SYMBOLS] to simply be an array of pointers to char, rearranging your logic a bit, provides the following example. I guessed at what you would need for LINE_SIZE and what would work for MAX_SYMBOLS (adjust as required):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LINE_SIZE 128
#define MAX_SYMBOLS 32
int main (int argc, char **argv) {
char *symbolTblChar[MAX_SYMBOLS] = {NULL};
char line[LINE_SIZE] = {0};
char *chk = NULL;
char delims[] = " \n";
int lblCnt = 0;
int t = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (fp != stdin && !fp) {
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
printf ("\nCollecting Symbols:\n");
while (fgets (line, LINE_SIZE, fp))
{
for (chk = strtok (line, delims); chk; chk = strtok (NULL, delims)) {
printf (" Adding %s to symbol table\n", chk);
symbolTblChar[lblCnt] = strdup (chk);
lblCnt++;
/* check for MAX_SYMBOLS */
if (lblCnt == MAX_SYMBOLS) {
fprintf (stderr, "warining: MAX_SYMBOLS limit reached.\n");
break;
}
}
}
/* close file if not stdin */
if (fp != stdin) fclose (fp);
/* output */
printf ("\nSymbols:\n");
for (t = 0; t < lblCnt; t++)
printf(" %s\n", symbolTblChar[t]);
/* free allocated memory */
for (t = 0; t < lblCnt; t++)
free (symbolTblChar[t]);
return 0;
}
Output
Using your sample input provides:
$ printf "SPACE \n LINE \n A1 \n A2\n" | ./bin/symbltbl
Collecting Symbols:
Adding SPACE to symbol table
Adding LINE to symbol table
Adding A1 to symbol table
Adding A2 to symbol table
Symbols:
SPACE
LINE
A1
A2
Note: you can of course remove the Adding X ... intermediate print line when you are done with it. Now, even though it may not be apparent, you need to free the memory associate with each symbol added to symbolTblChar using strdup (which both allocates and copies its argument to a new memory location). You can see that taking place at the end of main.
If this isn't what you intended, let me know. After looking at the question and what you were doing, this seemed like your logical intent. Let me know if you have any questions.
File Input: note, you can also provide an input filename as the first argument to the program and the program will read from the file instead of stdin. For example, the input file:
$ cat symbols.txt
SPACE
LINE
A1
A2
Output Reading from File
$ /bin/symbltbl symbols.txt
Collecting Symbols:
Adding SPACE to symbol table
Adding LINE to symbol table
Adding A1 to symbol table
Adding A2 to symbol table
Symbols:
SPACE
LINE
A1
A2
I have the following program :
/* a.c */
#include <stdio.h>
int
main(int argc, char* argv[]){
size_t size=0;
char* lineptr;
while(getline(&lineptr, &size, stdin)){
fprintf(stderr, "line = %s\n", lineptr);
if(lineptr){
free(lineptr);
lineptr = NULL;
}
}
return 0;
}
I redirected the output of shell command "ls" to this program using the
following line :
ls | ./a.out
Expected output :
program should print the name of all files in the current directory
and terminate.
Actual output :
The program prints the name of all the files but does not terminate,
instead it loops infinitely and prints the last entry infinitely.
Thanks
GNU's getline function returns -1 upon end-of-file (or error). Use
while(-1 != getline(&lineptr, &size, stdin))
...and set lineptr to NULL before the first call to getline.
Also, you don't have to free the pointer in every iteration of the loop; you can reuse the previous pointer and free once at the end:
size_t size = 0;
char* lineptr = NULL;
while(-1 != getline(&lineptr, &size, stdin)){
fprintf(stderr, "line = %s", lineptr);
}
free(lineptr);
getline will use realloc internally as needed. Note that you have to make sure that lineptr and size are not changed between calls to getline for this to work (although you may change the string to which lineptr points).
This is "how to parse a config file" question.
Basically i have a text file (/etc/myconfig) that has all kind of settings. I need to read that file and search for the string:
wants_return=yes
once I locate that string I need to parse it and return only whatever it is after the equal sign.
I've tried using a combinations of fgets and strtok but I'm getting confused here.
in any case anyone knows a function that can perform this?
Code is appreciated.
thanks
This works: (note: I'm unsure if fgets is supposed to include the newline character in the returned string; if it isn't, you can drop the check for it)
#include <stdio.h>
const unsigned MAXLINE=9999;
char const* FCFG="/etc/myconfig";
char const* findkey="wants_return=";
char * skip_ws(char *line)
{
return line+strspn(line," \t");
}
char * findval(char *line,char const* prefix,int prelen)
{
char *p;
p=skip_ws(line);
if (strncmp(p,prefix,prelen)==0)
return p+prelen;
else
return NULL;
}
char *findval_slow(char *line,char const* prefix)
{
return findval(line,prefix,strlen(prefix));
}
int main() {
FILE *fcfg;
char line[MAXLINE];
char *p,*pend;
int findlen;
findlen=strlen(findkey);
fcfg=fopen(FCFG,"r");
while (p=fgets(line,MAXLINE,fcfg)) {
printf("Looking at %s\n",p);
if (p=findval(line,findkey,findlen)) {
pend=p+strlen(p)-1; /* check last char for newline terminator */
if (*pend=='\n') *pend=0;
printf("Found %s\n",p); /* process/parse the value */
}
}
return 0;
}
Here's a quick example using strtok:
const int linelen = 256;
char line[linelen];
FILE* fp = fopen(argv[1], "r");
if (fp == NULL) {
perror("Error opening file");
} else {
while (! feof(fp)) {
if (fgets(line, linelen , fp)) {
const char* name = strtok(line, "= \r\n");
const char* value = strtok(NULL, "= \r\n");
printf("%s => %s\n", name, value);
}
}
fclose (fp);
}
Note, you'll need to put some additional error checking around it, but this works to parse the files I threw at it.
From your comment, it looks like you're already getting the appropriate line from the text file using fgets and loading it into a character buffer. You can use strtok to parse the tokens from the line.
If you run it with the string buffer as the first argument, it will return the first token from that string. If you run the same command with the first argument set to NULL it will return subsequent tokens from the same original string.
A quick example of how to retrieve multiple tokens:
#include <stdio.h>
#include <string.h>
int main() {
char buffer[17]="wants_return=yes";
char* tok;
tok = strtok(buffer, "=");
printf("%s\n", tok); /* tok points to "wants_return" */
tok = strtok(NULL, "=");
printf("%s\n", tok); /* tok points to "yes" */
return 0;
}
For the second strtok call, you can replace the "=" with "" to return everything to the end of the string, instead of breaking off at the next equal sign.
With a POSIX shell, I'd use something like:
answer=`egrep 'wants_config[ ]*=' /etc/myconfig | sed 's/^.*=[ ]*//'`
Of course, if you're looking for an answer that uses the C STDIO library, then you really need to review the STDIO documentation.