This question already has answers here:
strtok program crashing
(4 answers)
Closed 8 years ago.
I'm just trying to understand how the strtok() function works below:
#define _CRT_SECURE_NO_WARNINGS
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(){
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
char* q=strtok(p, ",");
printf("%s", q);
return 0;
}
I was trying to run this simple code but I keep getting a weird error that I can't understand at all.
This is the error I keep getting:
It's probably not relevant but because my command prompt keeps exiting I changed some settings in my project and added this line Console (/SUBSYSTEM:CONSOLE) somewhere and I tried to run the program without debugging at first which didn't even run the program then I tried it with debugging and that's how I got the error in the link.
Instead of
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
use
char p[] = "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
because, as per the man page of strtok(), it may modify the input string. In your code, the string literal is read-only. That's why it produces the segmentation fault.
Related quote:
Be cautious when using these functions. If you do use them, note that:
*
These functions modify their first argument.
Firstly to solve the error: Segmentation fault.
Change char *p to char p[].
The reason is because strtok() will add '\0' the string delimiter when the token character is found.
In your case char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz"; is stored in read-only section of the program as string literal.
As you are trying to modify a location that is read-only leads to Segmentation fault
You may not use string literals with function strtok because the function tries to change the string passed to it as an argument.
Any attempt to change a string literal results in undefined behaviour.
According to the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
Change this definition
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
to
char s[] = "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
//...
char* q = strtok( s, "," );
For example function main can look like
int main()
{
char s[] = "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
char* q = strtok( s, "," );
while ( q != NULL )
{
puts( q );
strtok( NULL, "," );
}
return 0;
}
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz"; is assigning read-only memory to p. Really the type of p should be const char*.
Attempting to change that string in any way gives undefined behaviour. So using strtok on it is a very bad thing to do. To remedy, use
char p[]="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
That allocates the string on the stack and modifications on it are permitted. p decays to a pointer in certain instances; one of them being as a parameter to strtok.
Related
The code:
#include <string.h>
#include <libgen.h>
#include <stdio.h>
int main()
{
char *token = strtok(basename("this/is/a/path.geo.bin"), ".");
if (token != NULL){
printf( " %s\n", token );
}
return(0);
}
If I extract the filepath and put it into a char array it works perfectly. However, like this I'm getting a segfault.
It seems that function basename simply returns pointer within string literal "this/is/a/path.geo.bin" or even tries to change it.
However string literals may not be changed. Any attempt to change a string literal results in undefined behaviour. And function strtok changes the string passed to it as an argument.
According to the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
Thus the program could look like
int main()
{
char name[] = "this/is/a/path.geo.bin";
char *token = strtok(basename( name ), ".");
if (token != NULL){
printf( " %s\n", token );
}
return(0);
}
From the documentation of basename():
char *basename(char *path);
Both dirname() and basename() may modify the contents of path, so it may be desirable to pass a copy when calling one of these functions.
These functions may return pointers to statically allocated memory which may be overwritten by subsequent calls. Alternatively, they may return a pointer to some part of path, so that the string referred to by path should not be modified or freed until the pointer returned by the function is no longer required.
In addition strtok() is known to modify the string passed to it:
Be cautious when using these functions. If you do use them, note that:
These functions modify their first argument.
These functions cannot be used on constant strings.
This is likely the source of your problem as string literals must not be modified:
If the program attempts to modify such an array, the behavior is undefined.
You should work around that.
For reference:
http://linux.die.net/man/3/basename
http://linux.die.net/man/3/strtok
What is the type of string literals in C and C++?
This question already has answers here:
strtok segmentation fault
(8 answers)
strtok causing segfault but not when step through code
(1 answer)
Closed 9 years ago.
for some reason i get an exception at the first use of strtok()
what i am trying to accomplish is a function that simply checks if a substring repeats itself inside a string. but so far i havent gotten strtok to work
int CheckDoubleInput(char* input){
char* word = NULL;
char cutBy[] = ",_";
word = strtok(input, cutBy); <--- **error line**
/* walk through other tokens */
while (word != NULL)
{
printf(" %s\n", word);
word = strtok(NULL, cutBy);
}
return 1;
}
and the main calling the function:
CheckDoubleInput("asdlakm,_asdasd,_sdasd,asdas_sas");
CheckDoubleInput() is ok. Look at this. Hope you will understand
int main(){
char a[100] = "asdlakm,_asdasd,_sdasd,asdas_sas";
// This will lead to segmentation fault.
CheckDoubleInput("asdlakm,_asdasd,_sdasd,asdas_sas");
// This works ok.
CheckDoubleInput(a);
return 0;
}
The strtok function modify its first input (the parsed string), so you can't pass a pointer to a constant string. In your code, you are passing a pointer to a string literal of char[N] type (ie. a compilation constant string) and hence trying to modify a constant string literal which is undefined behaviour. You'll have to copy the string in a temporary buffer before.
char* copy = strdup("asdlakm,_asdasd,_sdasd,asdas_sas");
int result = CheckDoubleInput(copy);
free(copy);
Here is what the man page for strtok says:
Bugs
Be cautious when using these functions. If you do use them, note that:
These functions modify their first argument.
These functions cannot be used on constant strings.
The identity of the delimiting byte is lost.
The strtok() function uses a static buffer while parsing, so it's not thread safe. Use strtok_r() if this matters to you.
Either input is somehow bad. Try printing it before calling strtok OR you're using strtok on multiple threads with GCC compiler. Some compilers have a thread safe version called 'strtok_r'. Visual Studio have fixed the original function to be thread safe.
Your modified answer shows that you're passing a string literal which is read only.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why does this Seg Fault?
What is the difference between char a[] = “string”; and char *p = “string”;
Trying to understand why s[0]='H' fails. I'm guessing this has something to do with the data segment in the process memory but maybe someone better explain this?
void str2 (void)
{
char *s = "hello";
printf("%s\n", s);
s[0] = 'H'; //maybe this is a problem because content in s is constant?
printf("%s\n", s);
}
int main()
{
str2();
return 0;
}
It's wrong because the C standard says that attempting to modify a string literal gives undefined behavior.
Exactly what will happen can and will vary. In some cases it'll "work" -- the content of the string literal will change to what you've asked (e.g., back in the MS-DOS days, it usually did). In other cases, the compiler will merge identical string literals, so something like:
char *a = "1234";
char *b = "1234";
a[1] = 'a';
printf("%s\n", b);
...would print out 1a34, even though you never explicitly modified b at all.
In still other cases (including most modern systems) you can expect the attempted write to fail completely and some sort of exception/signal to be thrown instead.
You are trying to modify a string literal, which resides in implementation defined read only memory thereby causing an Undefined Behavior. Note that the undefined behavior does not warrant that your program crashes but it might show you any behavior.
Good Read:
What is the difference between char a[] = ?string?; and char *p = ?string?;?
I think this behavior, a good or stern compiler should not allow since this is (char *s = "hello") pointer to constant i.e. modifying the contents will lead to undefined behavior if the compiler will not throw any error on this
Given the following code:
#include "stdafx.h"
#include "string.h"
static char *myStaticArray[] = {"HelloOne", "Two", "Three"};
int _tmain(int argc, _TCHAR* argv[])
{
char * p = strstr(myStaticArray[0],"One");
char hello[10];
memset(hello,0,sizeof(hello));
strncpy(hello,"Hello",6);
strncpy(p,"Hello",3); // Access Violation
return 0;
}
I'm getting an access violation at precisely the point when it attempts to write to the address of myStaticArray[0].
Why is this a problem?
Background: I'm porting old C++ to C# as primarily a C# developer, so please excuse my ignorance! This piece of code apparently wasn't an issue in the old build, so I'm confused...
char * p = strstr(myStaticArray[0],"One");
p points to a part of the string literal "HelloOne". You mustn't try to modify string literals, that's undefined behaviour.
Often, string literals are stored in a read-only part of the memory, so trying to write to them causes a segmentation fault/access violation.
static char *myStaticArray[] = {"HelloOne", "Two", "Three"};
The strings in the array are string literals and are non-modifiable in C and in C++.
strncpy(p,"Hello",3);
This function call attempts to modify a string literal.
Another issue is your use of the strncpy function which does not always null terminate the string. This is the case here because strlen("Hello") is greater than 3 (your last strncpy argument).
If you want to be able to modify the strings then you'll need to allocate the character arrays like this
static char myStaticArray[][25] = {"HelloOne", "two", "three"};
The problem, as others stated is that your method causes the compiler create an array of 3 pointers to constant strings. The above declaration creates a two dementional character array, then copies the constant-string data to that memory.
I am trying to understand why the following snippet of code is giving a segmentation fault:
void tokenize(char* line)
{
char* cmd = strtok(line," ");
while (cmd != NULL)
{
printf ("%s\n",cmd);
cmd = strtok(NULL, " ");
}
}
int main(void)
{
tokenize("this is a test");
}
I know that strtok() does not actually tokenize on string literals, but in this case, line points directly to the string "this is a test" which is internally an array of char. Is there any of tokenizing line without copying it into an array?
The problem is that you're attempting to modify a string literal. Doing so causes your program's behavior to be undefined.
Saying that you're not allowed to modify a string literal is an oversimplification. Saying that string literals are const is incorrect; they're not.
WARNING : Digression follows.
The string literal "this is a test" is of an expression of type char[15] (14 for the length, plus 1 for the terminating '\0'). In most contexts, including this one, such an expression is implicitly converted to a pointer to the first element of the array, of type char*.
The behavior of attempting to modify the array referred to by a string literal is undefined -- not because it's const (it isn't), but because the C standard specifically says that it's undefined.
Some compilers might permit you to get away with this. Your code might actually modify the static array corresponding to the literal (which could cause great confusion later on).
Most modern compilers, though, will store the array in read-only memory -- not physical ROM, but in a region of memory that's protected from modification by the virtual memory system. The result of attempting to modify such memory is typically a segmentation fault and a program crash.
So why aren't string literals const? Since you really shouldn't try to modify them, it would certainly make sense -- and C++ does make string literals const. The reason is historical. The const keyword didn't exist before it was introduced by the 1989 ANSI C standard (though it was probably implemented by some compilers before that). So a pre-ANSI program might look like this:
#include <stdio.h>
print_string(s)
char *s;
{
printf("%s\n", s);
}
main()
{
print_string("Hello, world");
}
There was no way to enforce the fact that print_string isn't allowed to modify the string pointed to by s. Making string literals const in ANSI C would have broken existing code, which the ANSI C committee tried very hard to avoid doing. There hasn't been a good opportunity since then to make such a change to the language. (The designers of C++, mostly Bjarne Stroustrup, weren't as concerned about backward compatibility with C.)
There's a very good reason that trying to tokenize a compile-time constant string will cause a segmentation fault: the constant string is in read-only memory.
The C compiler bakes compile-time constant strings into the executable, and the operating system loads them into read-only memory (.rodata in a *nix ELF file). Since this memory is marked as read-only, and since strtok writes into the string that you pass into it, you get a segmentation fault for writing into read-only memory.
As you said, you can't modify a string literal, which is what strtok does. You have to do
char str[] = "this is a test";
tokenize(str);
This creates the array str and initialises it with this is a test\0, and passes a pointer to it to tokenize.
Strok modifies its first argument in order to tokenize it. Hence you can't pass it a literal string, as it's of type const char * and cannot be modified, hence the undefined behaviour. You have to copy the string literal into a char array that can be modified.
What point are you trying to make by your "...is internally an array of char" remark?
The fact that "this is a test" is internally an array of char does not change anything at all. It is still a string literal (all string literals are non-modifiable arrays of char). Your strtok still tries to tokenize a string literal. This is why it crashes.
I'm sure you'll get beaten up about this... but "strtok()" is inherently unsafe and prone to things like access violations.
Here, the answer is almost certainly using a string constant.
Try this instead:
void tokenize(char* line)
{
char* cmd = strtok(line," ");
while (cmd != NULL)
{
printf ("%s\n",cmd);
cmd = strtok(NULL, " ");
}
}
int main(void)
{
char buff[80];
strcpy (buff, "this is a test");
tokenize(buff);
}
I have also big trouble with this error.
I found a simple solution.
please include <string.h>
it will remove strtok segmentation fault error.
I just hit the Segmentation Fault error from trying to use printf to print the token (cmd in your case) after it became NULL.