I am trying to tokenize a sentence that starts with spaces but I get segmentation fault. I am trying to skip the spaces and store the words. Is there an alternative built-in function to achieve this?
#include <string.h>
#include <stdio.h>
[...]
char *buff = " push me", *token;
token = strtok(buff, " \n");
while (token) {
printf("%s", token);
token = strtok(NULL, " \n");
}
[...]
change to this
char buff[100], *token;
strcpy(buff," push me");
Before you were pointing to a string constant with the buff variable. The compiler will put this in a segment that is read only. If you try to write to that segment you get a segmentation fault. So in the code above we allocate space in read/write memory to store the string and then copy it in.
buff points to a string literal that usually cannot be modified.
strspn and strcspn can be used to parse the sub-strings.
#include <stdio.h>
#include <string.h>
int main ( void) {
char *buff = " push me";
char *token = buff;
size_t span = 0;
if ( ! buff) {
return 1;
}
while ( *token) {
token += strspn ( token, " \r\n\t\f\v"); // count past whitespace
span = strcspn ( token, " \r\n\t\f\v"); // count non-whitespace
if ( span) {
printf ( "%.*s\n", (int)span, token); // precision field to print token
}
token += span; // advance token pointer
}
}
strtok(), like others, are "library functions", not "built-in".
Very intelligent people wrote these functions expecting they would be used in clean, elegant and terse ways. I believe this is what you want:
#include <stdio.h>
#include <string.h>
int main() {
char buf[] = " push me"; // don't point at a "string literal'
for( char *tok = buf; (tok = strtok( tok, " \n") ) != NULL; tok = NULL )
printf( "%s\n", tok ); // added '\n' for clarity
return 0;
}
Output
push
me
Related
I have a text file that contains multiple strings that are different lengths that I need to split into tokens.
Is it best to use strtok to split these strings and how can I count the tokens?
Example of strings from the file
Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231
Emma Watson#1169875#COMP336#COMP2421#COMP231#COMP338#CCOMP3351
Kevin Hart#1146542#COMP142#COMP242#COMP231#COMP336#COMP331#COMP334
George Clooney#1164561#COMP336#COMP2421#COMP231#COMP338#CCOMP3351
Matt Damon#1118764#COMP439#COMP4232#COMP422#COMP311#COMP338
Johnny Depp#1019876#COMP311#COMP242#COMP233#COMP3431#COMP333#COMP432
Generally, using strtok is a good solution to the problem:
#include <stdio.h>
#include <string.h>
int main( void )
{
char line[] =
"Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231";
char *p;
int num_tokens = 0;
p = strtok( line, "#" );
while ( p != NULL )
{
num_tokens++;
printf( "Token #%d: %s\n", num_tokens, p );
p = strtok( NULL, "#" );
}
}
This program has the following output:
Token #1: Emma Stone
Token #2: 1169876
Token #3: COMP242
Token #4: COMP333
Token #5: COMP336
Token #6: COMP133
Token #7: COMP231
However, one disadvantage of using strtok is that it is destructive in the sense that it modifies the string, by replacing the # delimiters with terminating null characters. If you do not want this, then you can use strchr instead:
#include <stdio.h>
#include <string.h>
int main( void )
{
const char *const line =
"Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231";
const char *p = line, *q;
int num_tokens = 1;
while ( ( q = strchr( p, '#' ) ) != NULL )
{
printf( "Token #%d: %.*s\n", num_tokens, q-p, p );
num_tokens++;
p = q + 1;
}
printf( "Token #%d: %s\n", num_tokens, p );
}
This program has identical output to the first program:
Token #1: Emma Stone
Token #2: 1169876
Token #3: COMP242
Token #4: COMP333
Token #5: COMP336
Token #6: COMP133
Token #7: COMP231
Another disadvantage with strtok is that it is not reentrant or thread-safe, whereas strchr is. However, some platforms provide a function strtok_r, which does not have these disadvantages. But that function does still has the disadvantage of being destructive.
Yes, you should use strtok to split these strings.
On
how can I count the tokens
You can simply add a counter inside while and increment it by one in each iteration to get the total number of tokens.
#include <stdio.h>
#include <string.h>
int main(void) {
char string[] = "Hello world this is a simple string";
char *token = strtok(string, " ");
int count = 0;
while (token != NULL) {
count++;
token = strtok(NULL, " ");
}
printf("Total number of tokens = %d", count);
return 0;
}
strtok() is rarely the right tool for anything. In this case, it is unclear whether a sequence of ## is equivalent to a single # and whether a # appearing at the beginning or end of line is to be ignored...
strtok() makes strong assumptions for these cases that may not be the expected behavior.
Furthermore, strtok() modifies its string argument and uses a hidden static state that makes it unsafe in multithreaded programs and prone to programming errors in nested use cases. strtok_r(), where available, solves these issues but the semantics are still somewhat counterintuitive.
For your purpose, you must define precisely what is a token and a separator. If empty tokens are allowed, strtok() is definitely not a solution.
You can also write your own function to handle this quite trivial split:
char **split(char *str, char **argv, size_t *argc, const char delim)
{
*argc = 0;
if(*str && *str)
{
argv[0] = str;
*argc = 1;
while(*str)
{
if(*str == delim)
{
*str = 0;
str++;
if(*str)
{
argv[*argc] = str;
*argc += 1;
continue;
}
}
str++;
}
}
return argv;
}
int main(void)
{
char *argv[10];
size_t argc;
char str[] = "Emma Stone#1169876#COMP242#COMP333#COMP336#COMP133#COMP231";
split(str, argv, &argc, '#');
printf("Numner of substrings: %zu\n", argc);
for(size_t i = 0; i < argc; i++)
printf("token [%2zu] = `%s`\n", i, argv[i]);
}
https://godbolt.org/z/b1aarnfWs
Remarks: same as strtok it requires str to me modifiable. str will be modified.
I get segmentation fault when using char *s in main. If I use char s[100] or something like that everything is ok. Why is that? SIGSEGV appears when i call find_short(char *s) function on line with instruction char *token = strtok(s, delim);. This is my code:
#include <sys/types.h>
#include <string.h>
#include <limits.h>
#include <stdio.h>
int find_short(char *s)
{
int min = INT_MAX;
const char delim[2] = " ";
char *token = strtok(s, delim);
while(token != NULL) {
int len = (int)strlen(token);
if (min > len)
min = len;
token = strtok(NULL, delim);
}
return min;
}
int main()
{
char *s = "lel qwew dasdqew";
printf("%d",find_short(s));
return 0;
}
The line:
char *s = "lel qwew dasdqew";
creates a pointer to a constant string in memory.
Because that string is constant, you are unable to change its contents.
The strtok function will try to modify the contents by inserting \0 at the token-delimiter locations, and will fail because the string cannot be modified.
Changing the line to:
char s[] = "lel qwew dasdqew";
Now makes s an array of local data that you are free to change. strtok will now work because it can change the array.
The main your mistake is that you selected a wrong function to do the task.:)
I will say about this below.
As for the current program then string literals in C though they do not have constant character array types are immutable. Any attempt to change a string literal results in undefined behavior. And the function strtok changes passed to it string inserting the terminating zero between sub-strings.
Instead of the function strtok you should use string functions strspn and strcspn. They do not change the passed argument. So using these functions you are able to process also string literals.
Here is a demonstrative program.
#include <stdio.h>
#include <string.h>
size_t find_short( const char *s )
{
const char *delim= " \t";
size_t shortest = 0;
while ( *s )
{
s += strspn( s, delim );
const char *p = s;
s += strcspn( s, delim );
size_t n = s - p;
if ( shortest == 0 || ( n && n < shortest ) ) shortest = n;
}
return shortest;
}
int main(void)
{
const char *s = "lel qwew dasdqew";
printf( "%zu", find_short( s ) );
return 0;
}
Its output is
3
I am trying to process a character string in order to change something in a file. I read from a file a character string which contains a command and an argument, separated by a space character. I separated this array in tokens.
Now I want to pass the second token, which is the argument to a function. My problem is that when I run my program, the screen freezes and nothing happens. Here is my separating way and the call to the function.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void create_file(char *argument)
{
//some code goes here
}
int main()
{
int i = -1;
char *token[5];
char command[20];
const char delim[1] = " ";
FILE *fin;
fin = fopen("mbr.op", "r");
while(fscanf(fin, "%[^\n]", command) == 1)
{
i = -1;
token[++i] = strtok(command, delim);
while(token[i] != NULL)
token[++i] = strtok(NULL, delim);
if(strcmp(token[0], "CREATE_FILE") == 0)
create_file(token[1]);
}
fclose(fin);
return 0;
}
You have a few errors, first command[20] is an uninitialised string and that will cause undefined behaviour. Second, you failed to check the first arg as well as the second, so I added a test where commented. Also, the strings are not long enough so I removed the length. Lastly I test for a NULL pointer passed to the function.
Edit code was added to the question to show that command[20] was initialised, but it is still too short to take the command and a reasonable file name (thanks to #ameyCU).
#include <stdio.h>
#include <string.h>
void create_file(char *argument)
{
if(argument == NULL)
printf("NULL pointer\n");
else
printf("Arg: %s\n", argument);
}
int main(void)
{
int i = -1;
char *token[5];
char command[] = "CREATE_FILE myfile.txt";
const char delim[] = " ";
token[++i] = strtok(command, delim);
while(token[i] != NULL)
token[++i] = strtok(NULL, delim);
if(token[0] != NULL && strcmp(token[0], "CREATE_FILE") == 0) // added test
create_file(token[1]);
return 0;
}
Program output
Arg: myfile.txt
The first error is present in array definition:
const char delim[1] = " ";
In C "" is a string - an array of characters delimited by '\0'. This means that what stands to the right of "=" is a string of two chars:
// ' ' + '\0'
//0x20 0x00
Therefore this should be an array of two chars:
const char delim[2] = " ";
or
const char delim[] = " ";
I have to check if strings given by the user are correct expressions. The strings should look like this:
int1+int2+int3+int4+...
for example:
1+5+21
Is a correct expression, while 1+a is not.
How can I do that?
The problem I encountered is that I define strings like:
char *str;
str = (char*)malloc(1024*sizeof(char));
char **output = strtok(str, "+"); // error
So I get segmentation fault when using strtok function.
Example of using strtok, in your case:
#include <string.h>
#include <stdio.h>
int function()
{
char* str = malloc(80);
strcpy(str,"1+5+21");
const char s[2] = "+";
char *token;
token = strtok(str, s); /* get the first token (1) */
while( token != NULL ) /* walk through other tokens */
{
// characters manipulation for verification
}
free(str);
return(0);
}
Have written following code in c
#include "stdio.h"
#include "string.h"
int main()
{
char str[] = "gatway=10.253.1.0,netmask=255.255.0.0,subnet=10.253.0.0,dns=10.253.0.203";
char name[100],value[100];
char *token1,*token2;
char *commasp = ", ";
char *equ="=";
token1 = strtok(str,commasp);
while(token1 != NULL)
{
token2 = strtok(token1,equ);
sprintf(name,"%s",token2);
token2 = strtok(NULL,commasp);
sprintf(value,"%s",token2);
printf("Name:%s Value:%s\n",name,value);
token1 = strtok(NULL,commasp);
}
return 0;
}
My problem is i got only one printf like Name:gatway Value:10.253.1.0. i know last strtok() in while loop followed by previous strok() which turns to null so token1 get null value and break the loop. Have think solution for it to not use strtok() in while loop for sub token (getting name and value) and use other method to extract name and value but it seems to lengthy code(using for or while loop for character match).So any one have batter solution to packup code in single loop.
You could use strtok_r instead of strtok.
char *key_value;
char *key_value_s;
key_value = strtok_r(str, ",", &key_value_s);
while (key_value) {
char *key, *value, *s;
key = strtok_r(key_value, "=", &s);
value = strtok_r(NULL, "=", &s);
printf("%s equals %s\n", key, value);
key_value = strtok_r(NULL, ",", &key_value_s);
}
gatway equals 10.253.1.0
netmask equals 255.255.0.0
subnet equals 10.253.0.0
dns equals 10.253.0.203
Frankly though I think it would be easier to just look for , and when you find one look for = backwards.
You can do this in two steps, first parse the main string:
#include <stdio.h>
#include <string.h>
int main()
{
char str[] = "gatway=10.253.1.0,netmask=255.255.0.0,subnet=10.253.0.0,dns=10.253.0.203";
char name[100],value[100];
char *commasp = ", ";
char *ptr[256], **t = ptr, *s = str;
*t = strtok(str, commasp);
while (*t) {
t++;
*t = strtok(0, commasp);
}
for (t = ptr; *t; t++) {
printf("%s\n", *t);
// now do strtok for '=' ...
}
return 0;
}
Then parse individual pairs as before.
The above results in:
gatway=10.253.1.0
netmask=255.255.0.0
subnet=10.253.0.0
dns=10.253.0.203