I was trying to parse strings using strtok(); I am trying to parse strings delimited by a semicolon ( ; ). But when I input a string with no semicolons to strtok(), it returns the entire string. Shouldn't it be returning NULL if there are no token matches?
This is my code:
int main(int argc, char** argv)
{
char cmd[] = "INSERT A->B B->C INSERT C->D";
char delim[] = ";";
char *result = NULL;
result = strtok(cmd,delim);
if(result == NULL)
{
printf("\n NO TOKENS\n");
}
else
{
printf("\nWe got something !! %s ",result);
}
return (EXIT_SUCCESS);
}
The output is : We got something !! INSERT A->B B->C INSERT C->D
No, the delimiter means that it's the thing that separates tokens, so if there is no delimiters, then the entire string is considered the first token
consider if you have two tokens, then take one of those tokens away.
if you have
a;b
then you have tokens a and b
now if you take b away...
a
you still have token a
If you read the man page(http://man7.org/linux/man-pages/man3/strtok.3.html) carefully, you will see that it says:
The strtok() function breaks a string into a sequence of zero or
more nonempty tokens.
So, basically it is either breaking your input string into multiple tokens or not(and it happens when it finds no given delimiter into the given string).
Example:
input_string || delimiter || tokens
"abc:def" || ":" || "abc" and "def"
"abcdef" || ":" || "abcdef"
Related
I want to compare two strings which contains some other characters as well. To eliminate those characters I am using strtok()
First I am copying strings into temp buffers, which I will use in strtok().
#include<stdio.h>
#include<string.h>
int main()
{
char ch[50]="supl-dev.google.com";
char ch1[50]="*.google.com";
printf("ch =%s\n",ch);
printf("ch1 =%s\n",ch1);
char temp_ch[50], temp_ch1[50];
strcpy(temp_ch,ch);
strcpy(temp_ch1,ch1);
char *ch_token, *ch1_token;
ch_token = strtok(temp_ch,".");
ch1_token = strtok(temp_ch1,"*");
printf("ch_token=%s\n",ch_token);
printf("ch1_token = %s\n",ch1_token);
return 0;
}
Expected results :
ch =supl-dev.google.com
ch1 =*.google.com
ch_token=supl-dev
ch1_token = *
Actual results :
ch =supl-dev.google.com
ch1 =*.google.com
ch_token=supl-dev
ch1_token = .google.com
Here I am expecting ch1_token should contain '*'.
Nope. Your expectation is wrong. You set your delimiter for ch2 to *, which means that strtok will strip off the leading * in *.google.com and return .google.com as the first token. To get what you want, you have to set the delimiter to ..
#include<stdio.h>
#include<string.h>
int main()
{
char ch[50]="supl-dev.google.com";
char ch1[50]="*.google.com";
printf("ch =%s\n",ch);
printf("ch1 =%s\n",ch1);
char temp_ch[50], temp_ch1[50];
strcpy(temp_ch,ch);
strcpy(temp_ch1,ch1);
char *ch_token, *ch1_token;
ch_token = strtok(temp_ch,".");
ch1_token = strtok(temp_ch1,".");
printf("ch_token=%s\n",ch_token);
printf("ch1_token = %s\n",ch1_token);
return 0;
}
Now ch_token should be supl-dev and ch1_token should be *.
The thing to keep in mind is that strtok will go on to find the next token if the current token is empty.
So, when you strtok the string *.google.com with delimiter *, it finds the delimiter in the first position itself. As the current token is empty, the next token is returned which is .google.com
you are splitting the ch1 by * so its result is an empty string which is ignored and the rest of string which is .google.com.(it ignores * because it's your delimiter).
just change your splitting code to ch1_token = strtok(temp_ch1,"."); and it will return *,google and then com.
Your stated need is to search for a common sub-string within two strings.
Using strtok may work, but there are simpler ways to do this without parsing.
Have you considered using strstr()]?
char ch[50]="supl-dev.google.com";
char ch1[50]="*.google.com";
if((strstr(ch, "google.com")) && (strstr(ch1, "google.com"))
{
/// sub-string exists in both strings
}
I have the following string,
#98727,72000,2500,2450,2200,999999,999999#2500,2450,2200,999999,999999#2500,2450,2200,999999,999999#2500,2450,2200,999999,999999#,
& am looking to split this entire string with special character #. In order to get the four # separated string sets.
After that these separated string sets need to be split using , to get values like eg: 98727 alone.
I've finished separating first set of values using # & , with the below code. But, it doesn't separating the second set. What am I missing in this set of code to separate all sets & its values?
char buffer[] = "#18115,72000,2500,2450,2200,999999,999999#2500,2450,2200,999999,999999#2500,2450,2200,999999,999999#2500,2450,2200,999999,999999#";
char *ptr = &buffer;
char *p;
for (p = strtok(ptr,""); p != NULL; p = strtok(NULL, ","))
{
printf("Set: %s\r\n", p);
char *n;
for (n = strtok(p,","); n != NULL; n = strtok(NULL, ","))
{
printf("%s\r\n", n);
}
}
And getting an output like this:
Set: 18115,72000,2500,2450,2200,999999,999999
18115
72000
2500
2450
2200
999999
999999
Instead of first splitting by occurrences of # and then by that of ,, you could split by occurrence of # or , at once using strpbrk().
You could do something like
char str[]="#98727,72000,2500,2450,2200,999999,999999#2500,2450,2200,999999,999999#2500,2450,2200,999999,999999#2500,2450,2200,999999,999999#,";
char *p;
char num_str[20][10];
int i=0;
for(p=str; (p=strpbrk(p, "#,"))!=NULL; )
{
while(*(++p)=='#' || *p==',');
if(*p=='\0')
{
break; //input string over
}
sscanf(p, "%9[^#,]", num_str[i++]);
printf("\n%s", num_str[i-1]);
}
A pointer p is maintained which points to the next # or , to be examined in the input string. strpbrk() is used to find this pointer value.
strpbrk() returns a pointer to the first occurrence of any character of a string in another string.
After finding the value of p, it is incremented so that it points to the character after the # or , and if this character is also one of those, you keep on incrementing till another character is found. If this character is the nul character denoting the end of string, we exit the outer loop.
sscanf() is used to extract the characters till the next # or ,.
%9[^#,] reads till but not including the next # or ,.
P.S: The \r in printf("Set: %s\r\n", p); is superfluous.
I'm currently having trouble with appending an equal sign, before and after my string is split into tokens. It leads me to the conclusion that I must replace the newline character at some point with my desired equal sign after splitting my string. I've tried looking at the c string.h library reference to see whether or not there is a way to replace the newline char using strstr to see whether or not there was already an "\n" in the tokenized string, but ran into an infinite loop when I tried that. I also thought about trying to replace the newline character, which should be the string length minus 1, and I admit, I have low familiarity in C. If you could take a look at my code, and provide some feedback, I would greatly appreciate it. Thank you for your time. I will admit I have low familiarity with C, but am currently reading the reference libraries.
// main method
int main(void){
// allocate memory
char string[256];
char *tokenizedString;
const char delimit[2] = " ";
const char *terminate = "\n";
do{
// prompt user for a string we will tokenize
do{
printf("Enter no more than 65 tokens:\n");
fgets(string, sizeof(string), stdin);
// verify input length
if(strlen(string) > 65 || strlen(string) <= 0) {
printf("Invalid input. Please try again\n"); }
} while(strlen(string) > 65);
// tokenize the string
tokenizedString = strtok(string, delimit);
while(tokenizedString != NULL){
printf("=%s=\n", tokenizedString);
tokenizedString = strtok(NULL, delimit);
}
// replace newline character implicitly made by enter, it seems to be adding my newline character at the end of output
} while(strcmp(string, "\n"));
return 0;
}// end of method main
OUTPUT:
Enter no more than most 65 tokens:
i am very tired sadface
=i=
=am=
=very=
=tired=
=sadface
=
DESIRED OUTPUT
Enter no more than 65 tokens:
i am very tired sadface
=i=
=am=
=very=
=tired=
=sadface=
Since you are using strlen(), you can do this instead
size_t length = strlen(string);
// Check that `length > 0'
string[length - 1] = '\0';
Advantages:
This way you would call strlen() only once. Calling it multiple times for the same string is inefficient anyway.
You always remove the trailing '\n' from the input string to your tokenization will work as expected.
Note: strlen() would never return a value < 0, because what it does is count the number of characters in the string, which is only 0 for "" and > 0 otherwise.
Well, you have two ways to do it, the simplest is to add a \n to the token delimiter string
const char delimit[] = " \n";
(you don't need to use an array size if you are going to initialize a string array with a string literal)
so it eliminates the final \n that comes in with your input. Another way is to search for it on reading and eliminate it from the input string. You can use strtok(3) for this purpose also:
tokenizedString = strtok(string, "\n");
tokenizedString = strtok(tokenizedString, delimit);
I am using fgets() to read a line which contains integers values separated by spaces as follows:
while(fgets(str, sizeof(str), stdin) != NULL)
After reading string in str, I am using strtok() to convert string in to tokens and then using atoi() function to convert these values in to integers.
token = strtok(str, s);
while( token != NULL) //If token is NULL then don't convert it to integer
int d = atoi(token);
The output of the first input is as expected.
Input-1:
5 1 0 3 4\n
Output-1:
d=5
d=1
d=0
d=3
d=4
Now the problem occurs when I give a space after string and hit enter.
Input-2:
5 1 0 3 4 \n
Output-2:
d=5
d=1
d=0
d=3
d=4
d=0
So now my questions are:
1. Will strtok() not return NULL when there are only spaces at the end?
2. How to differentiate between the two zeros that are coming in output?
3. How can I avoid strtok() to read that final space or any number of spaces at the end?
The function you are using is not correct.Delimiter passed as a 2nd parameter should be correct.
token = strtok(str," \n\t"); //should use delimiter
while( token != NULL)
{
int d = atoi(token);
printf("%d\n",d);
token = strtok(NULL," \n\t");
}
Your problem is with the delimiter(s). One solution for all your questions is :
Please add space [ ] and newline [\n] both to your delimiter string, and optionally \t.
As per the man page of strtok()
char *strtok(char *str, const char *delim);
The delim argument specifies a set of bytes that delimit the tokens in the parsed string.
and
A sequence of two or more contiguous delimiter bytes in the parsed string is considered to be a single delimiter.
So, you can use
char *s = " \n\t"
and then
token = strtok(str, s);
Signature of the strtok is char *strtok(char *str, const char *delim);
delimiter can be space [ ], newline \n , coma [,] , tab [\t] anything that separate the two values in your string in a constant significant fashion are considered as delimiter.
Delimiter characters at the start or end of the string are ignored by strtok.
You can use n no of delimiter. as per your string you can use two delimiter
1. space [ ]
2. \n
change:
1.token = strtok(str, " \n");
2.token = strtok(NULL," \n");
I'm trying to do split some strings by {white_space} symbol.
btw, there is a problem within some splits. which means, I want to split by {white_space} symbol but also quoted sub-strings.
example,
char *pch;
char str[] = "hello \"Stack Overflow\" good luck!";
pch = strtok(str," ");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok(NULL, " ");
}
This will give me
hello
"Stack
Overflow"
good
luck!
But What I want, as you know,
hello
Stack Overflow
good
luck!
Any suggestion or idea please?
You'll need to tokenize twice. The program flow you currently have is as follows:
1) Search for space
2) Print all characters prior to space
3) Search for next space
4) Print all characters between last space, and this one.
You'll need to start thinking in a different matter, two layers of tokenization.
Search for Quotation Mark
On odd-numbered strings, perform your original program (search for spaces)
On even-numbered strings, print blindly
In this case, even numbered strings are (ideally) within quotes. ab"cd"ef would result in ab being odd, cd being even... etc.
The other side, is remembering what you need to do, and what you're actually looking for (in regex) is "[a-zA-Z0-9 \t\n]*" or, [a-zA-Z0-9]+. That means the difference between the two options, are whether it's separated by quotes. So separate by quotes, and identify from there.
Try altering your strategy.
Look at non-white space things, then when you find quoted string you can put it in one string value.
So, you need a function that examines characters, between white space. When you find '"' you can change the rules and hoover everything up to a matching '"'. If this function returns a TOKEN value and a value (the string matched) then what calls it, can decide to do the correct output. Then you have written a tokeniser, and there actually exist tools to generate them called "lexers" as they are used widely, to implement programming languages/config files.
Assuming nextc reads next char from string, begun by firstc( str) :
for (firstc( str); ((c = nextc) != NULL;) {
if (isspace(c))
continue;
else if (c == '"')
return readQuote; /* Handle Quoted string */
else
return readWord; /* Terminated by space & '"' */
}
return EOS;
You'll need to define return values for EOS, QUOTE and WORD, and a way to get the text in each Quote or Word.
Here's the code that works... in C
The idea is that you first tokenize the quote, since that's a priority (if a string is inside the quotes than we don't tokenize it, we just print it). And for each of those tokenized strings, we tokenize within that string on the space character, but we do it for alternate strings, because alternate strings will be in and out of the quotes.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
int main() {
char *pch1, *pch2, *save_ptr1, *save_ptr2;
char str[] = "hello \"Stack Overflow\" good luck!";
pch1 = strtok_r(str,"\"", &save_ptr1);
bool in = false;
while (pch1 != NULL) {
if(in) {
printf ("%s\n", pch1);
pch1 = strtok_r(NULL, "\"", &save_ptr1);
in = false;
continue;
}
pch2 = strtok_r(pch1, " ", &save_ptr2);
while (pch2 != NULL) {
printf ("%s\n",pch2);
pch2 = strtok_r(NULL, " ", &save_ptr2);
}
pch1 = strtok_r(NULL, "\"", &save_ptr1);
in = true;
}
}
References
Tokenizing multiple strings simultaneously
http://linux.die.net/man/3/strtok_r
http://www.cplusplus.com/reference/cstring/strtok/