Strange behaviour using fgets and strtok_r

Strange behaviour using fgets and strtok_r - c

This is my code:
#define LEN 40
#define STUDLIST "./students.txt"
int main()
{
FILE * studd;
char del[] = "" " '\n'";
char name[LEN], surname[LEN], str[LEN];
char *ret;
char *tokens[2] = {NULL};
char *pToken = str;
unsigned int i = 0;
/* open file */
if ( (studd = fopen(STUDLIST,"r") ) == NULL )
{
fprintf(stderr, "fopen\n");
exit(EXIT_FAILURE);
}
while((ret = fgets(str, LEN, studd)))
{
if(ret)
{
for( tokens[i] = strtok_r( str, del, &pToken ); ++i < 2;
tokens[i] = strtok_r( NULL, del, &pToken ) );
strcpy(name, tokens[0]);
strcpy(surname, tokens[1]);
printf( "name = %s\n", name );
printf( "surname = %s\n", surname );
}
fflush(studd);
}
fclose(studd);
return 0;
}
Here there is the file students.txt: http://pastebin.com/wNpmXYis
I don't understand why the output isn't correct as I expected.
I use a loop to read each line with fgets, then I have a sting composed by [Name Surname], and I want to divide it in two different strings ([name] and [surname]) using strtok_r. I tried with a static string and it works well, but If I read many strings from FILE the output is not correct as you can see here:
http://pastebin.com/70uPMzPh
Where is my fault?

Why are you using forloop?
...
while((ret = fgets(str, LEN, studd)))
{
if(ret)
{
tokens[0] = strtok_r( str, del, &pToken );
tokens[1] = strtok_r( NULL, del, &pToken );
strcpy(name, tokens[0]);
strcpy(surname, tokens[1]);
printf( "name = %s\n", name );
printf( "surname = %s\n", surname );
}
}

You start i at zero:
unsigned int i = 0;
And later you increment it:
++i < 2;
You never set i back to zero, and in fact, continue incrementing i again for every new line in your file. With 14 names in your input file, I expect i to get to about 14.(or maybe 13 or 15, depending on the exact logic).
So this line:
tokens[i] = strtok_r(...);
ends up putting strtok results into tokens[2..15]. But only tokens[0] and tokens[1] are valid. Everything else is undefined behavior.
Answer: Be sure you reset i to zero when you read a new line of your file.

Related

Using fgets and strtok() to read a text file -C

I’m trying to read text from stdin line by line using fgets() and store the text in a variable “text”. However, when I use strtok() to split the words, it only works for a couple lines before terminating. What should I change to make it run through the entire text?
#define WORD_BUFFER_SIZE 50
#define TEXT_SIZE 200
int main(void) {
char stopWords[TEXT_SIZE][WORD_BUFFER_SIZE];
char word[WORD_BUFFER_SIZE];
int numberOfWords = 0;
while(scanf("%s", word) == 1){
if (strcmp(word, "====") == 0){
break;
}
strcpy(stopWords[numberOfWords], word);
numberOfWords++;
}
char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
while(fgets(buffer, WORD_BUFFER_SIZE*TEXT_SIZE, stdin) != NULL){
strcat(text, buffer);
}
char *k;
k = strtok(text, " ");
while (k != NULL) {
printf("%s\n", k);
k = strtok(NULL, " ");
}
}

char *buffer = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
sizeof(WORD_BUFFER_SIZE) is a constant, it's the size of integer. You probably mean WORD_BUFFER_SIZE * TEXT_SIZE. But you can find the file size and calculate exactly how much memory you need.
char *text = malloc(...)
strcat(text, buffer);
text is not initialized and doesn't have a null-terminator. strcat needs to know the end of text. You have to set text[0] = '\0' before using strcat (it's not like strcpy)
int main(void)
{
fseek(stdin, 0, SEEK_END);
size_t filesize = ftell(stdin);
rewind(stdin);
if (filesize == 0)
{ printf("not using a file!\n"); return 0; }
char word[1000] = { 0 };
//while (scanf("%s", word) != 1)
// if (strcmp(word, "====") == 0)
// break;
char* text = malloc(filesize + 1);
if (!text)
return 0;
text[0] = '\0';
while (fgets(word, sizeof(word), stdin) != NULL)
strcat(text, word);
char* k;
k = strtok(text, " ");
while (k != NULL)
{
printf("%s\n", k);
k = strtok(NULL, " ");
}
return 0;
}

According to the information you provided in the comments section, the input text is longer than 800 bytes.
However, in the line
char *text = malloc(sizeof(WORD_BUFFER_SIZE)*TEXT_SIZE);
which is equivalent to
char *text = malloc(800);
you only allocated 800 bytes as storage for text. Therefore, you did not allocate sufficient space to store the entire input into text. Attempting to store more than 800 bytes will result in a buffer overflow, which invokes undefined behavior.
If you want to store the entire input into text, then you must ensure that it is large enough.
However, this is probably not necessary. Depending on your requirements, it is probably sufficient to process one line at a time, like this:
while( fgets( buffer, sizeof buffer, stdin ) != NULL )
{
char *k = strtok( buffer, " " );
while ( k != NULL )
{
printf( "%s\n", k );
k = strtok( NULL, " " );
}
}
In that case, you do not need the array text. You only need the array buffer for storing the current contents of the line.
Since you did not provide any sample input, I cannot test the code above.
EDIT: Based on your comments to this answer, it seems that your main problem is how to read in all of the input from stdin and store it as a string, when you do not know the length of the input in advance.
One common solution is to allocate an initial buffer, and to double its size every time it gets full. You can use the function realloc for this:
#include <stdio.h>
#include <stdlib.h>
int main( void )
{
char *buffer;
size_t buffer_size = 1024;
size_t input_size = 0;
//allocate initial buffer
buffer = malloc( buffer_size );
if ( buffer == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
//continuously fill the buffer with input, and
//grow buffer as necessary
for (;;) //infinite loop, equivalent to while(1)
{
//we must leave room for the terminating null character
size_t to_read = buffer_size - input_size - 1;
size_t ret;
ret = fread( buffer + input_size, 1, to_read, stdin );
input_size += ret;
if ( ret != to_read )
{
//we have finished reading from input
break;
}
//buffer was filled entirely (except for the space
//reserved for the terminating null character), so
//we must grow the buffer
{
void *temp;
buffer_size *= 2;
temp = realloc( buffer, buffer_size );
if ( temp == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
buffer = temp;
}
}
//make sure that `fread` did not fail end due to
//error (it should only end due to end-of-file)
if ( ferror(stdin) )
{
fprintf( stderr, "input error!\n" );
exit( EXIT_FAILURE );
}
//add terminating null character
buffer[input_size++] = '\0';
//shrink buffer to required size
{
void *temp;
temp = realloc( buffer, input_size );
if ( temp == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
buffer = temp;
}
//the entire contents is now stored in "buffer" as a
//string, and can be printed
printf( "contents of buffer:\n%s\n", buffer );
free( buffer );
}
The code above assumes that the input will be terminated by an end of file condition, which is probably the case if the input is piped from a file.
On second thought, instead of having one large string for the whole file, as you are doing in your code, you may rather want an array of char* to the individual strings, each representing a line, so that for example lines[0] will be the string of the first line, lines[1] will be the string of the second line. That way, you can easily use strstr to find the " ==== " deliminator and strchr on each individual line to find the individual words, and still have all the lines in memory for further processing.
I don't recommend that you use strtok in this case, because that function is destructive in the sense that it modifies the string, by replacing the deliminators with null characters. If you require the strings for further processing, as you stated in the comments section, then this is probably not what you want. That is why I recommend that you use strchr instead.
If a reasonable maximum number of lines is known at compile-time, then the solution is rather easy:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LINE_LENGTH 1024
#define MAX_LINES 1024
int main( void )
{
char *lines[MAX_LINES];
int num_lines = 0;
char buffer[MAX_LINE_LENGTH];
//read one line per loop iteration
while ( fgets( buffer, sizeof buffer, stdin ) != NULL )
{
int line_length = strlen( buffer );
//verify that entire line was read in
if ( buffer[line_length-1] != '\n' )
{
//treat end-of file as equivalent to newline character
if ( !feof( stdin ) )
{
fprintf( stderr, "input line exceeds maximum line length!\n" );
exit( EXIT_FAILURE );
}
}
else
{
//remove newline character from string
buffer[--line_length] = '\0';
}
//allocate memory for new string and add to array
lines[num_lines] = malloc( line_length + 1 );
//verify that "malloc" succeeded
if ( lines[num_lines] == NULL )
{
fprintf( stderr, "allocation error!\n" );
exit( EXIT_FAILURE );
}
//copy line to newly allocated buffer
strcpy( lines[num_lines], buffer );
//increment counter
num_lines++;
}
//All input lines have now been successfully read in, so
//we can now do something with them.
//handle one line per loop iteration
for ( int i = 0; i < num_lines; i++ )
{
char *p, *q;
//attempt to find the " ==== " marker
p = strstr( lines[i], " ==== " );
if ( p == NULL )
{
printf( "Warning: skipping line because unable to find \" ==== \".\n" );
continue;
}
//skip the " ==== " marker
p += 6;
//split tokens on remainder of line using "strchr"
while ( ( q = strchr( p, ' ') ) != NULL )
{
printf( "found token: %.*s\n", (int)(q-p), p );
p = q + 1;
}
//output last token
printf( "found token: %s\n", p );
}
//cleanup allocated memory
for ( int i = 0; i < num_lines; i++ )
{
free( lines[i] );
}
}
When running the program above with the following input
first line before deliminator ==== first line after deliminator
second line before deliminator ==== second line after deliminator
it has the following output:
found token: first
found token: line
found token: after
found token: deliminator
found token: second
found token: line
found token: after
found token: deliminator
If, however, there is no reasonable maximum number of lines known at compile-time, then the array lines will also have to be designed to grow in a similar way as buffer in the previous program. The same applies for the maximum line length.

Split line into words and put them in char array using strtok

I have this simple line parser into tokens function...
But something im missing.
int parse_line(char *line,char **words){
int wordc=0;
/* get the first token */
char *word = strtok(line, " ");
words[wordc]=(char*)malloc(256*sizeof(char));
strcpy(words[wordc++],word );
/* walk through other tokens */
while( word != NULL ) {
word = strtok(NULL, " ");
words[wordc]=(char*)malloc(256*sizeof(char));
strcpy(words[wordc++],word );
}
return wordc;
}
When i run it i get a segmentation fault!
I give as first argument char[256] line and as second of course a char** words but i have first malloc memory for that one. like that
char **words = (char **)malloc(256 * sizeof(char *));
main:
.
.
.
char buffer[256];
char **words = (char **)malloc(256 * sizeof(char *));
.
.
.
n = read(stdin, buffer, 255);
if (n < 0){
perror("ERROR");
break;
}
parse_line(buffer,words);
When program executes parse_line it exits with segmentation fault
Found where the seg fault occures. And it's on that line here:
strcpy(words[wordc++],word );
And specifically on the first strcpy. Before it even reaches the while loop

while( word != NULL ) {
word = strtok(NULL, " ");
words[wordc]=(char*)malloc(256*sizeof(char));
strcpy(words[wordc++],word );
}
At the end of the line, word will always be set to NULL (as expected) and so strcpy(words[wordc++],word ) will be undefined behavior (likely a crash).
You need to reorganize the loop so you never try to copy a NULL string.
#jxh suggests this solution which fixes the issue of word being NULL in either of your strcpys.
/* get the first token */
char *word = strtok(line, " ");
while( word != NULL ) {
words[wordc]=(char*)malloc(256*sizeof(char));
strcpy(words[wordc++],word );
word = strtok(NULL, " ");
}
I'd do this (uses less memory)
/* get the first token */
char *word = strtok(line, " ");
while( word != NULL ) {
words[wordc++] = strdup(word);
word = strtok(NULL, " ");
}

the following proposed code:
cleanly compiles
performs the desired functionality
properly checks for errors
displays the results to the user
fails to pass all allocated memory to free() so has lots of memory leaks
and now the proposed code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
// avoid 'magic' numbers in code
#define MAX_WORDS 256
#define MAX_LINE_LEN 256
int parse_line( char *line, char **words )
{
int wordc=0;
/* get the first token */
char *token = strtok(line, " ");
while( wordc < MAX_WORDS && token )
{
words[wordc] = strdup( token );
if( ! words[wordc] )
{
perror( "strdup failed" );
exit( EXIT_FAILURE );
}
// implied else, strdup successful
wordc++;
// get next token
token = strtok(NULL, " ");
}
return wordc;
}
int main( void )
{
char buffer[ MAX_LINE LENGTH ];
// fix another problem with OPs code
char **words = calloc( MAX_WORDS, sizeof( char* ) );
if( ! words )
{
perror( "calloc failed" );
exit( EXIT_FAILURE );
}
// implied else, calloc successful
// note: would be much better to use 'fgets()' rather than 'read()'
ssize_t n = read( 0, buffer, sizeof( buffer ) );
if (n <= 0)
{
perror("read failed");
exit( EXIT_FAILURE );
}
// implied else, read successful
// note: 'read()' does not NUL terminate the data
buffer[ n ] = '\0';
int count = parse_line( buffer, words );
for( int i = 0; i < count; i++ )
{
printf( "%s\n", words[i] );
}
}
here is a typical run of the program:
hello old friend <-- user entered line
hello
old
friend

Your answers are right ! BUT i had segF again BECAUSE OF READ!!!!!
i didn't notice that when i run the program it didn't stop for reading from the input at read !
Instead it was passing it. What i did is i changed read to fgets and it worked !!!
With also your changes!
Can someone explain to me this???? Why it doesn't stop at read function??

C Programming, what am I doing wrong? Strtok

In the notepad that is saved in the same file as my main.c program is,
ape apple
ball bill bull
foot
parrot peeble
season
zebras zoo
For example, assume that the word “bull” is in the dictionary. The word “bull” contains 1 ‘b’ character, 2 ‘l’ characters, and 1 ‘u’ character. Now say the input letters were “alblldi”. In “alblldi”, we have enough ‘b’ characters for “bull”, since “alblldi” contains at least 1 ‘b’ character. Similarily, “alblldi”, has enough ‘l’ characters for “bull”, since “alblldi” contains at least 2 ‘l’ characters. However, “alblldi” does not have at least 1 ‘u’ character, and as such we know that we cannot make “bull” from “alblldi”.
How do I achieve this?
So I just started to type this code and I need help, so far I got:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE 100
int main( void )
{
int found = 0;
char string[SIZE];
char name[ SIZE ];
FILE *cfPtr;
char word[ SIZE ];
char *tokenPtr; // create char pointer
printf("\nGive me a sentence: ");
fgets( string, SIZE, stdin );
printf("The string to be tokenized is: %s\n", string);
printf("Give me a word: ");
scanf("%s",word);
tokenPtr = strtok( string, "" ); // begin tokenizing sentence
puts("");
// continue tokenizing sentence until tokenPtr becomes NULL
while ( tokenPtr != NULL ) {
if (!strcmp(word,tokenPtr)) {
printf("%s : This is the word you are looking for!\n", tokenPtr);
found = 1;
}
else {
printf( "%s\n", tokenPtr );
}
tokenPtr = strtok( NULL, " " ); // get next token
} // end while
if (!found) {
printf("The word \"%s\" was not found in the sentence\n",word);
}
if ( ( cfPtr = fopen( "dictionary.txt", "r" ) ) == NULL ) {
puts( "File could not be opened" );
}
else {
fgets( name, SIZE, cfPtr );
while ( !feof( cfPtr ) ) {
printf( "Line from file is: %s\n", name );
tokenPtr = strtok( name, "\t" ); // begin tokenizing sentence
// continue tokenizing sentence until tokenPtr becomes NULL
while ( tokenPtr != NULL ) {
printf( "\t%s\n", tokenPtr );
tokenPtr = strtok( NULL, "\t" ); // get next token
} // end while
fgets( name, SIZE, cfPtr );
}
fclose( cfPtr );
}
}

Obviously homework so I'm not doing it for you, but: You can do it on paper well enough to ask a clear question, so just figure out how to implement that in code.
A counter for each letter `int a=0; int b=0; int z=0;... (one variable per letter? there has to be a better way...)
set your counters based on the input word
check if the word you are searching can satisfy all of the counters
There are a couple of easily identified common routines here.

I would do it like so:
#include <string.h>
#include <stdio.h>
int count(char *str, int c) {
int n = 0;
for (size_t i = 0; i < strlen(str); i++) {
if (str[i] == c) {
n++;
}
}
return n;
}
int check_word(int counts[256], char *str) {
if (str == NULL) {
return 0;
}
for (int i = 0; i < 256; i++) {
if (count(str, i) < counts[i]) {
return 0;
}
}
return 1;
}
int check_words(char *word, char **words, size_t n) {
int counts[256] = {0};
for (size_t i = 0; i < strlen(word); i++) {
counts[(int)word[i]]++;
}
int k = 0;
for (size_t i = 0; i < n; i++) {
if (check_word(counts, words[i])) {
printf(">>> %s\n", words[i]);
k++;
} else {
printf(" %s\n", words[i]);
}
}
return k;
}
int main(void) {
char *words[] = {
"bull", "bill", "ball", "bubblel", "babble", "bible",
};
check_words("bull", words, 6);
return 0;
}
I initially build an array containing the count for each character in the initial word, and then for each string in the array, I check whether it contains enough of each.
Running this code gives me the following output, which I expected.
>>> bull
bill
ball
>>> bubblel
babble
bible
meaning that bull matches bull, and bubblel.

Spliting a string into an array of strings completly dynamicly allocated

This question is really close to this to this topic but I prefer the lisibility and the pointers clarification I needed offered by this solution.
So I've got a data file and I get a very long array of char from it. I want to split this string into an array with, in each case, a string wich correspond to a line of this file.
I saw solutions but they all use limited arrays, since I don't know the lenght of each line, I really need to allocate all of them dynamicly but I can't find the lenght of the lines because strtokdoesn't put a null character \0at the end of each string.
What I've got for now is this two solutions but neither work:
int get_lines(char *file, char **lines) {
int nb_lines = 0;
char *token = strtok(file, "\n");
for(int i = 0; token != NULL; i++) {
token = strtok(NULL, "\n");
nb_lines = i;
}
nb_lines++;
lines = malloc((nb_lines + 1) * sizeof(char*));
lines[nb_lines] = '\0';
token = strtok(file, "\n");
for(int i = 0; token != NULL; i++) {
token = strtok(NULL, "\n");
int nb_char = 0;
for(int j = 0; token[j] != '\n'; j++) //This will cause SIGSEGV because strtok don't keep the '\n' at the end
nb_char = j;
nb_char++;
token[nb_char] = '\0'; //This cause SIGSEGV because token's allocation finish at [nb_char-1]
lines[i] = malloc(strlen(token) * sizeof(char)); //strlen cause SIGSEGV because I cannot place the '\0' at the end of token
printf("%s", token); //SIGSEGV because printf don't find the '\0'
lines[i] = token;
}
for(int i = 0; i < nb_lines; i++) {
printf("%s", lines[i]); //SIGSEGV
}
return nb_lines;
}
So you can see above the idea of what I want to do and why it doesn't work.
Below you will see an other try I made but I'm stuck at the same point:
int count_subtrings(char* string, char* separator) {
int nb_lines = 0;
char *token = strtok(string, separator);
for(int i = 0; token != NULL; i++) {
token = strtok(NULL, separator);
nb_lines = i;
}
return nb_lines + 1;
}
char** split_string(char* string, char* separator) {
char **sub_strings = malloc((count_subtrings(string, separator) + 1) * sizeof(char*));
for(int i = 0; string[i] != EOF; i++) {
//How to get the string[i] lenght to malloc them ?
}
}
My file is quite big and the lines can be too so I don't want to malloc an other table with a size of (strlen(file) + 1) * sizeof(char) to be sure each line won't SIGSEGV and I also find this solution quite dirty, if you guys had an other idea, I would be really happy.
(Sorry for the english mistakes, I'm not really good)

Your approach with strtok has two drawbacks: First, strtok modifies the string,so you can only pass the original string once. Second, it skips empty lines, because it tretas stretches of nelines as a single token separator.. (I don't know ehether that is a concern to you.)
You can countthe newlines with a single pass through the string. Allocate memory for your line array and make a second pass, where you split the string at newlines:
char **splitlines(char *msg)
{
char **line;
char *prev = msg;
char *p = msg;
size_t count = 0;
size_t n;
while (*p) {
if (*p== '\n') count++;
p++;
}
line = malloc((count + 2) * sizeof(*line));
if (line == NULL) return NULL;
p = msg;
n = 0;
while (*p) {
if (*p == '\n') {
line[n++] = prev;
*p = '\0';
prev = p + 1;
}
p++;
}
if (*prev) line[n++] = prev;
line[n++] = NULL;
return line;
}
I've allocated two more line pointers than the newlines count: One for the case that the last line doesn't end with a newline and another one to place a NULL sentinel at the end, so that you know where yourarray ends. (You could, of course, return the actual line count via a pointer to a size_t.)

the following proposed code:
cleanly compiles
(within the limits of the heap size) doesn't care about the input file size
echo's the resulting array of file lines, double spaced, just to show it worked. for single spacing, replace the puts() with printf()
and now the code
#include <stdio.h> // getline(), perror(), fopen(), fclose()
#include <stdlib.h> // exit(), EXIT_FAILURE, realloc(), free()
int main( void )
{
FILE *fp = fopen( "untitled1.c", "r" );
if( !fp )
{
perror( "fopen for reading untitled1.c failed" );
exit( EXIT_FAILURE );
}
// implied else, fopen successful
char **lines = NULL;
size_t availableLines = 0;
size_t usedLines = 0;
char *line = NULL;
size_t lineLen = 0;
while( -1 != getline( &line, &lineLen, fp ) )
{
if( usedLines >= availableLines )
{
availableLines = (availableLines)? availableLines*2 : 1;
char **temp = realloc( lines, sizeof( char* ) * availableLines );
if( !temp )
{
perror( "realloc failed" );
free( lines );
fclose( fp );
exit( EXIT_FAILURE );
}
// implied else realloc successful
lines = temp;
}
lines[ usedLines ] = line;
usedLines++;
line = NULL;
lineLen = 0;
}
fclose( fp );
for( size_t i = 0; i<usedLines; i++ )
{
puts( lines[i] );
}
free( lines );
}
Given the above code is in a file named: untitled1.c the following is the output.
#include <stdio.h> // getline(), perror(), fopen(), fclose()
#include <stdlib.h> // exit(), EXIT_FAILURE, realloc(), free()
int main( void )
{
FILE *fp = fopen( "untitled1.c", "r" );
if( !fp )
{
perror( "fopen for reading untitled1.c failed" );
exit( EXIT_FAILURE );
}
// implied else, fopen successful
char **lines = NULL;
size_t availableLines = 0;
size_t usedLines = 0;
char *line = NULL;
size_t lineLen = 0;
while( -1 != getline( &line, &lineLen, fp ) )
{
if( usedLines >= availableLines )
{
availableLines = (availableLines)? availableLines*2 : 1;
char **temp = realloc( lines, sizeof( char* ) * availableLines );
if( !temp )
{
perror( "realloc failed" );
free( lines );
fclose( fp );
exit( EXIT_FAILURE );
}
// implied else realloc successful
lines = temp;
}
lines[ usedLines ] = line;
usedLines++;
line = NULL;
lineLen = 0;
}
fclose( fp );
for( size_t i = 0; i<usedLines; i++ )
{
puts( lines[i] );
}
free( lines );
}

Parsing a complex string in C

I have a String like this:
"00:00:00 000~00:02:00 0000|~00:01:00 0000;00:01:00 0000~",
I want to get each of the items like "00:00:00 000".
My idea is that first, split the string by ";", then split by "|", and finally split by "~".
But the problem is that I can't get it if it's null, such like "00:01:00 0000~", the part after "~", I wanna get it and set a default value to it then store it somewhere else, but the code doesn't work. What is the problem?
Here is my code:
int main(int argc, char *argv[])
{
char *str1, *str2, *str3, *str4, *token, *subtoken, *subt1, *subt2;
char *saveptr1, *saveptr2, *saveptr3;
int j;
for (j = 1, str1 = argv[1]; ; j++, str1 = NULL) {
token = strtok_r(str1, ";", &saveptr1);
if (token == NULL)
break;
printf("%d: %s\n", j, token);
int flag1 = 1;
for (str2 = token; ; str2 = NULL) {
subtoken = strtok_r(str2, "|", &saveptr2);
if (subtoken == NULL)
break;
printf(" %d: --> %s\n", flag1++, subtoken);
int flag2 = 1;
for(str3 = subtoken; ; str3 = NULL) {
subt1 = strtok_r(str3, "~", &saveptr3);
if(subt1 == NULL) {
break;
}
printf(" %d: --> %s\n",flag2++, subt1);
}
}
}
exit(EXIT_SUCCESS);
} /* main */

You can simplify your algorithm if you first make all delimiters uniform. First replace all occurrences of , and | with ~, then the parsing will be easier. You can do this externally via sed or vim or programmatically in your C code. Then you should be able to get the 'NULL' problem easily. (Personally, I prefer not to use strtok as it modifies the original string).

It is indeed easier to just write a custom parser in this case.
The version below allocates new strings, If allocating new memory is not desired, change the add_string method to instead just point to start, and set start[len] to 0.
static int add_string( char **into, const char *start, int len )
{
if( len<1 ) return 0;
if( (*into = strndup( start, len )) )
return 1;
return 0;
}
static int is_delimeter( char x )
{
static const char delimeters[] = { 0, '~', ',', '|',';' };
int i;
for( i=0; i<sizeof(delimeters); i++ )
if( x == delimeters[i] )
return 1;
return 0;
}
static char **split( const char *data )
{
char **res = malloc(sizeof(char *)*(strlen(data)/2+1));
char **cur = res;
int last_delimeter = 0, i;
do {
if( is_delimeter( data[i] ) )
{
if( add_string( cur, data+last_delimeter,i-last_delimeter) )
cur++;
last_delimeter = i+1;
}
} while( data[i++] );
*cur = NULL;
return res;
}
An example usage of the method:
int main()
{
const char test[] = "00:00:00 000~00:02:00 0000|~00:01:00 0000;00:01:00 0000~";
char **split_test = split( test );
int i = 0;
while( split_test[i] )
{
fprintf( stderr, "%2d: %s\n", i, split_test[i] );
free( split_test[i] );
i++;
}
free( split_test );
return 0;
}

Instead of splitting the string, it might be more suitable to come up with a simple finite state machine that parses the string. Fortunately, your tokens seem to have an upper limit on their length, which makes things a lot easier:
Iterate over the string and distinguish four different states:
current character is not a delimiter, but previous character was (start of token)
current character is a delimiter and previous character wasn't (end of token)
current and previous character are both not delimiters (store them in temporary buffer)
current and previous character are both delimiters (ignore them, read next character)
It should be possible to come up with a very short (10 lines?) and concise piece of code that parses the string as specified.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Strange behaviour using fgets and strtok_r - c

Why are you using forloop? ... while((ret = fgets(str, LEN, studd))) { if(ret) { tokens[0] = strtok_r( str, del, &pToken ); tokens[1] = strtok_r( NULL, del, &pToken ); strcpy(name, tokens[0]); strcpy(surname, tokens[1]); printf( "name = %s\n", name ); printf( "surname = %s\n", surname ); } }

Related

Using fgets and strtok() to read a text file -C

Split line into words and put them in char array using strtok

C Programming, what am I doing wrong? Strtok

Spliting a string into an array of strings completly dynamicly allocated

Parsing a complex string in C

Categories

Resources