splitting string and counting tokens in c - c

I have a text file that contains multiple strings that are different lengths that I need to split into tokens.
Is it best to use strtok to split these strings and how can I count the tokens?
Example of strings from the file
Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231
Emma Watson#1169875#COMP336#COMP2421#COMP231#COMP338#CCOMP3351
Kevin Hart#1146542#COMP142#COMP242#COMP231#COMP336#COMP331#COMP334
George Clooney#1164561#COMP336#COMP2421#COMP231#COMP338#CCOMP3351
Matt Damon#1118764#COMP439#COMP4232#COMP422#COMP311#COMP338
Johnny Depp#1019876#COMP311#COMP242#COMP233#COMP3431#COMP333#COMP432

Generally, using strtok is a good solution to the problem:
#include <stdio.h>
#include <string.h>
int main( void )
{
char line[] =
"Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231";
char *p;
int num_tokens = 0;
p = strtok( line, "#" );
while ( p != NULL )
{
num_tokens++;
printf( "Token #%d: %s\n", num_tokens, p );
p = strtok( NULL, "#" );
}
}
This program has the following output:
Token #1: Emma Stone
Token #2: 1169876
Token #3: COMP242
Token #4: COMP333
Token #5: COMP336
Token #6: COMP133
Token #7: COMP231
However, one disadvantage of using strtok is that it is destructive in the sense that it modifies the string, by replacing the # delimiters with terminating null characters. If you do not want this, then you can use strchr instead:
#include <stdio.h>
#include <string.h>
int main( void )
{
const char *const line =
"Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231";
const char *p = line, *q;
int num_tokens = 1;
while ( ( q = strchr( p, '#' ) ) != NULL )
{
printf( "Token #%d: %.*s\n", num_tokens, q-p, p );
num_tokens++;
p = q + 1;
}
printf( "Token #%d: %s\n", num_tokens, p );
}
This program has identical output to the first program:
Token #1: Emma Stone
Token #2: 1169876
Token #3: COMP242
Token #4: COMP333
Token #5: COMP336
Token #6: COMP133
Token #7: COMP231
Another disadvantage with strtok is that it is not reentrant or thread-safe, whereas strchr is. However, some platforms provide a function strtok_r, which does not have these disadvantages. But that function does still has the disadvantage of being destructive.

Yes, you should use strtok to split these strings.
On
how can I count the tokens
You can simply add a counter inside while and increment it by one in each iteration to get the total number of tokens.
#include <stdio.h>
#include <string.h>
int main(void) {
char string[] = "Hello world this is a simple string";
char *token = strtok(string, " ");
int count = 0;
while (token != NULL) {
count++;
token = strtok(NULL, " ");
}
printf("Total number of tokens = %d", count);
return 0;
}

strtok() is rarely the right tool for anything. In this case, it is unclear whether a sequence of ## is equivalent to a single # and whether a # appearing at the beginning or end of line is to be ignored...
strtok() makes strong assumptions for these cases that may not be the expected behavior.
Furthermore, strtok() modifies its string argument and uses a hidden static state that makes it unsafe in multithreaded programs and prone to programming errors in nested use cases. strtok_r(), where available, solves these issues but the semantics are still somewhat counterintuitive.
For your purpose, you must define precisely what is a token and a separator. If empty tokens are allowed, strtok() is definitely not a solution.

You can also write your own function to handle this quite trivial split:
char **split(char *str, char **argv, size_t *argc, const char delim)
{
*argc = 0;
if(*str && *str)
{
argv[0] = str;
*argc = 1;
while(*str)
{
if(*str == delim)
{
*str = 0;
str++;
if(*str)
{
argv[*argc] = str;
*argc += 1;
continue;
}
}
str++;
}
}
return argv;
}
int main(void)
{
char *argv[10];
size_t argc;
char str[] = "Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231";
split(str, argv, &argc, '#');
printf("Numner of substrings: %zu\n", argc);
for(size_t i = 0; i < argc; i++)
printf("token [%2zu] = `%s`\n", i, argv[i]);
}
https://godbolt.org/z/b1aarnfWs
Remarks: same as strtok it requires str to me modifiable. str will be modified.

Related

How can i tokenize a sentence starting with spaces in C

I am trying to tokenize a sentence that starts with spaces but I get segmentation fault. I am trying to skip the spaces and store the words. Is there an alternative built-in function to achieve this?
#include <string.h>
#include <stdio.h>
[...]
char *buff = " push me", *token;
token = strtok(buff, " \n");
while (token) {
printf("%s", token);
token = strtok(NULL, " \n");
}
[...]
change to this
char buff[100], *token;
strcpy(buff," push me");
Before you were pointing to a string constant with the buff variable. The compiler will put this in a segment that is read only. If you try to write to that segment you get a segmentation fault. So in the code above we allocate space in read/write memory to store the string and then copy it in.
buff points to a string literal that usually cannot be modified.
strspn and strcspn can be used to parse the sub-strings.
#include <stdio.h>
#include <string.h>
int main ( void) {
char *buff = " push me";
char *token = buff;
size_t span = 0;
if ( ! buff) {
return 1;
}
while ( *token) {
token += strspn ( token, " \r\n\t\f\v"); // count past whitespace
span = strcspn ( token, " \r\n\t\f\v"); // count non-whitespace
if ( span) {
printf ( "%.*s\n", (int)span, token); // precision field to print token
}
token += span; // advance token pointer
}
}
strtok(), like others, are "library functions", not "built-in".
Very intelligent people wrote these functions expecting they would be used in clean, elegant and terse ways. I believe this is what you want:
#include <stdio.h>
#include <string.h>
int main() {
char buf[] = " push me"; // don't point at a "string literal'
for( char *tok = buf; (tok = strtok( tok, " \n") ) != NULL; tok = NULL )
printf( "%s\n", tok ); // added '\n' for clarity
return 0;
}
Output
push
me

Taking only a part of a text to an array [duplicate]

How to split a string into an tokens and then save them in an array?
Specifically, I have a string "abc/qwe/jkh". I want to separate "/", and then save the tokens into an array.
Output will be such that
array[0] = "abc"
array[1] = "qwe"
array[2] = "jkh"
please help me
#include <stdio.h>
#include <string.h>
int main ()
{
char buf[] ="abc/qwe/ccd";
int i = 0;
char *p = strtok (buf, "/");
char *array[3];
while (p != NULL)
{
array[i++] = p;
p = strtok (NULL, "/");
}
for (i = 0; i < 3; ++i)
printf("%s\n", array[i]);
return 0;
}
You can use strtok()
char string[] = "abc/qwe/jkh";
char *array[10];
int i = 0;
array[i] = strtok(string, "/");
while(array[i] != NULL)
array[++i] = strtok(NULL, "/");
Why strtok() is a bad idea
Do not use strtok() in normal code, strtok() uses static variables which have some problems. There are some use cases on embedded microcontrollers where static variables make sense but avoid them in most other cases. strtok() behaves unexpected when more than 1 thread uses it, when it is used in a interrupt or when there are some other circumstances where more than one input is processed between successive calls to strtok().
Consider this example:
#include <stdio.h>
#include <string.h>
//Splits the input by the / character and prints the content in between
//the / character. The input string will be changed
void printContent(char *input)
{
char *p = strtok(input, "/");
while(p)
{
printf("%s, ",p);
p = strtok(NULL, "/");
}
}
int main(void)
{
char buffer[] = "abc/def/ghi:ABC/DEF/GHI";
char *p = strtok(buffer, ":");
while(p)
{
printContent(p);
puts(""); //print newline
p = strtok(NULL, ":");
}
return 0;
}
You may expect the output:
abc, def, ghi,
ABC, DEF, GHI,
But you will get
abc, def, ghi,
This is because you call strtok() in printContent() resting the internal state of strtok() generated in main(). After returning, the content of strtok() is empty and the next call to strtok() returns NULL.
What you should do instead
You could use strtok_r() when you use a POSIX system, this versions does not need static variables. If your library does not provide strtok_r() you can write your own version of it. This should not be hard and Stackoverflow is not a coding service, you can write it on your own.

String and String array Manipulation in c

I'm trying to write a string spliter function in C.It uses space as delimiter to split a given string in two or more. It more like the split funtion in Python.Here is the code:-
#include <stdio.h>
#include <string.h>
void slice_input (char *t,char **out)
{
char *x,temp[10];
int i,j;
x = t;
j=0;
i=0;
for (;*x!='\0';x++){
if (*x!=' '){
temp[i] = *x;
i++;
}else if(*x==' '){
out[j] = temp;
j++;i=0;
}
}
}
int main()
{
char *out[2];
char inp[] = "HEllo World ";
slice_input(inp,out);
printf("%s\n%s",out[0],out[1]);
//printf("%d",strlen(out[1]));
return 0;
}
Expeted Output:-
HEllo
World
but it is showing :-
World
World
Can you help please?
out[j] = temp;
where temp is a local variable. It will go out of scope as soon as your function terminates, thus out[j] will point to garbage, invoking Undefined Behavior when being accessed.
A simple fix would be to use a 2D array for out, and use strcpy() to copy the temp string to out[j], like this:
#include <stdio.h>
#include <string.h>
void slice_input(char *t, char out[2][10]) {
char *x, temp[10];
int i,j;
x = t;
j=0;
i=0;
for (;*x!='\0';x++) {
if (*x!=' ') {
temp[i] = *x;
i++;
} else if(*x==' ') {
strcpy(out[j], temp);
j++;
i=0;
}
}
}
int main()
{
char out[2][10];
char inp[] = "HEllo World ";
slice_input(inp,out);
printf("%s\n%s",out[0],out[1]);
return 0;
}
Output:
HEllo
World
http://www.cplusplus.com/reference/clibrary/cstring/strtok/
From the website:
char * strtok ( char * str, const char * delimiters ); On a first
call, the function expects a C string as argument for str, whose first
character is used as the starting location to scan for tokens. In
subsequent calls, the function expects a null pointer and uses the
position right after the end of last token as the new starting
location for scanning.
Once the terminating null character of str is found in a call to
strtok, all subsequent calls to this function (with a null pointer as
the first argument) return a null pointer.
Parameters
str C string to truncate. Notice that this string is modified by being
broken into smaller strings (tokens). Alternativelly [sic], a null
pointer may be specified, in which case the function continues
scanning where a previous successful call to the function ended.
delimiters C string containing the delimiter characters. These may
vary from one call to another. Return Value
A pointer to the last token found in string. A null pointer is
returned if there are no tokens left to retrieve.
Example
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
You can use this function to split string into tokens - there is no need to use some own functions. Your code looks like garbage, please format it.
Your source propably would look like this:
char *
strtok(s, delim)
char *s; /* string to search for tokens */
const char *delim; /* delimiting characters */
{
static char *lasts;
register int ch;
if (s == 0)
s = lasts;
do {
if ((ch = *s++) == '\0')
return 0;
} while (strchr(delim, ch));
--s;
lasts = s + strcspn(s, delim);
if (*lasts != 0)
*lasts++ = 0;
return s;
}

How to obtain a single word from a string in C?

In order to complete a program I am working on, I have to be able to put pieces of a string into a stack for later use. For example, say I had this string:
"22 15 - 2 +"
Ideally, I first want to extract 22 from the string, place it in a separate, temporary string, and then manipulate it as I would like. Here is the code that I'm using which I think would work, but it is very over-complicated.
void evaluatePostfix(char *exp){
stack *s = initStack();
char *temp_str;
char temp;
int temp_len, val, a, b, i=0, j;
int len = strlen(exp);
while(len > 0){
temp_str = malloc(sizeof(char)); //holds the string i am extracting
j=0; //first index in temp_str
temp = exp[i]; //current value in exp, incremented later on the function
temp_len = 1; //for reallocation purposes
while(!isspace(temp)){ //if a white space is hit, the full value is already scanned
if(ispunct(temp)) //punctuation will always be by itself
break; //break if it is encountered
temp_str = (char*)realloc(temp_str, temp_len+1); //or else reallocate the string to hold the new character
temp_str[j] = temp; //copy the character to the string
temp_len++; //increment for the length of temp_str
i++; //advance one value in exp
j++; //advance one value in temp_str
len--; //the number of characters left to scan is one less
temp = exp[i]; //prepare for the next loop
} //and so on, and so on...
} //more actions follow this, but are excluded
}
Like I said, overcomplicated. Is there a simpler way for me to extract this code? I can reliably depend upon there being white space between the values and characters I need to extract.
If you are good to use library function, then strtok is for this
#include <string.h>
#include <stdio.h>
int main()
{
char str[80] = "22 15 - 2 +";
const char s[2] = " ";
char *token;
/* get the first token */
token = strtok(str, s);
/* walk through other tokens */
while( token != NULL )
{
printf( " %s\n", token );
token = strtok(NULL, s);
}
return(0);
}
Reference
The limitation of strtok(char *str, const char *delim) is that it can't work on multiple strings simultaneously as it maintains a static pointer to store the index till it has parsed (hence sufficient if playing with only one string at a time). The better and safer method is to use strtok_r(char *str, const char *delim, char **saveptr) which explicitly takes a third pointer to save the parsed index.
#include <string.h>
#include <stdio.h>
int main()
{
char str[80] = "22 15 - 2 +";
const char s[2] = " ";
char *token, *saveptr;
/* get the first token */
token = strtok_r(str, s, &saveptr);
/* walk through other tokens */
while( token != NULL )
{
printf( " %s\n", token );
token = strtok_r(NULL, s, &saveptr);
}
return(0);
}
Take a look at the strotk function, i think it's what you'r looking for.

issue to extract token from sub token using strtok() function in c

Have written following code in c
#include "stdio.h"
#include "string.h"
int main()
{
char str[] = "gatway=10.253.1.0,netmask=255.255.0.0,subnet=10.253.0.0,dns=10.253.0.203";
char name[100],value[100];
char *token1,*token2;
char *commasp = ", ";
char *equ="=";
token1 = strtok(str,commasp);
while(token1 != NULL)
{
token2 = strtok(token1,equ);
sprintf(name,"%s",token2);
token2 = strtok(NULL,commasp);
sprintf(value,"%s",token2);
printf("Name:%s Value:%s\n",name,value);
token1 = strtok(NULL,commasp);
}
return 0;
}
My problem is i got only one printf like Name:gatway Value:10.253.1.0. i know last strtok() in while loop followed by previous strok() which turns to null so token1 get null value and break the loop. Have think solution for it to not use strtok() in while loop for sub token (getting name and value) and use other method to extract name and value but it seems to lengthy code(using for or while loop for character match).So any one have batter solution to packup code in single loop.
You could use strtok_r instead of strtok.
char *key_value;
char *key_value_s;
key_value = strtok_r(str, ",", &key_value_s);
while (key_value) {
char *key, *value, *s;
key = strtok_r(key_value, "=", &s);
value = strtok_r(NULL, "=", &s);
printf("%s equals %s\n", key, value);
key_value = strtok_r(NULL, ",", &key_value_s);
}
gatway equals 10.253.1.0
netmask equals 255.255.0.0
subnet equals 10.253.0.0
dns equals 10.253.0.203
Frankly though I think it would be easier to just look for , and when you find one look for = backwards.
You can do this in two steps, first parse the main string:
#include <stdio.h>
#include <string.h>
int main()
{
char str[] = "gatway=10.253.1.0,netmask=255.255.0.0,subnet=10.253.0.0,dns=10.253.0.203";
char name[100],value[100];
char *commasp = ", ";
char *ptr[256], **t = ptr, *s = str;
*t = strtok(str, commasp);
while (*t) {
t++;
*t = strtok(0, commasp);
}
for (t = ptr; *t; t++) {
printf("%s\n", *t);
// now do strtok for '=' ...
}
return 0;
}
Then parse individual pairs as before.
The above results in:
gatway=10.253.1.0
netmask=255.255.0.0
subnet=10.253.0.0
dns=10.253.0.203

Resources