The code:
#include <string.h>
#include <libgen.h>
#include <stdio.h>
int main()
{
char *token = strtok(basename("this/is/a/path.geo.bin"), ".");
if (token != NULL){
printf( " %s\n", token );
}
return(0);
}
If I extract the filepath and put it into a char array it works perfectly. However, like this I'm getting a segfault.
It seems that function basename simply returns pointer within string literal "this/is/a/path.geo.bin" or even tries to change it.
However string literals may not be changed. Any attempt to change a string literal results in undefined behaviour. And function strtok changes the string passed to it as an argument.
According to the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
Thus the program could look like
int main()
{
char name[] = "this/is/a/path.geo.bin";
char *token = strtok(basename( name ), ".");
if (token != NULL){
printf( " %s\n", token );
}
return(0);
}
From the documentation of basename():
char *basename(char *path);
Both dirname() and basename() may modify the contents of path, so it may be desirable to pass a copy when calling one of these functions.
These functions may return pointers to statically allocated memory which may be overwritten by subsequent calls. Alternatively, they may return a pointer to some part of path, so that the string referred to by path should not be modified or freed until the pointer returned by the function is no longer required.
In addition strtok() is known to modify the string passed to it:
Be cautious when using these functions. If you do use them, note that:
These functions modify their first argument.
These functions cannot be used on constant strings.
This is likely the source of your problem as string literals must not be modified:
If the program attempts to modify such an array, the behavior is undefined.
You should work around that.
For reference:
http://linux.die.net/man/3/basename
http://linux.die.net/man/3/strtok
What is the type of string literals in C and C++?
Related
void func(){
int i;
char str[100];
strcat(str, "aa");
printf("%s\n", str);
}
int main(){
func();
func();
func();
return 0;
}
this code prints:
?#aa
?#aaaa
?#aaaaaa
I don't understand why trash value(?#) is created, and why "aa" is continuously appended. Theoretically local values should be destroyed upon function termination. But this code doesn't.
There's nothing in the standard that says data on the stack needs to be "destroyed" when a function returns. It's undefined behavior, which means anything can happen.
Some implementations may choose to write zeros to all bytes that were on the stack, some might choose to write random data there, and some may chose to not touch it at all.
On this particular implementation, it appears that it doesn't attempt to sanitize what was on the stack.
After the first call to func returns. The data that was on the stack is still physically there because it hasn't been overwritten yet. When you then call func again immediately after the first call, the str variable happens to reside in the same physical memory that it did before. And since no other functions calls were made in the calling function, this data is unchanged from the first call.
If you were to call some other function in between the calls to func, then you'd most likely see different behavior, as str would contain data that was used by whatever other function was called last.
To use strcat() you need to have a valid string, an uninitialized array of char is not a valid string.
To make it a valid "empty" string, you need to do this
char str[100] = {0};
this creates an empty string because it only contains the null terminator.
Be careful when using strcat(). If you intend two just concatenate to valid c strings it's ok, if they are more than 2 then it's not the right function because the way it wroks.
Also, if you want your code to work, declare the array in main() instead and pass it to the function, like this
void
func(char *str)
{
strcat(str, "aa");
printf("%s\n", str);
}
int
main()
{
char str[100] = {0};
func(str);
func(str);
func(str);
return 0;
}
When you declare:
char str[100];
all you are doing is declaring a character array, but it's random garbage.
So you are effectively appending to garbage.
If you did this instead:
char str[100] = {0};
Then you will get different results since the string will be null to begin with. Anytime you allocate memory in C, unless you initialize it when you allocate it, it will simply be whatever was in memory in that spot at the time. You cannot assume it will be NULL and zero'ed for you.
Function strcat means to concatenate two strings. So the both arguments of the function must to represent strings. Moreover the first argument is changed.
That a character array would represent a string you can either to initialize it with a string literal (possibly an "empty" string literal) or implicitly to specify a zero value either when the array is initialized or after its initialization.
For example
void func(){
char str1[100] = "";
char str2[100] = {'\0'};
char str3[100];
str3[0] ='\0';
//...
}
Take into account that objects with the automatic storage duration are not initialized implicitly. They have indeterminate values.
As for your question
Why local array value isn't destroyed in function?
then the reason for this is that the memory occupied by the array was not overwritten by other function. However in any case it is undefined behavior.
This question already has answers here:
strtok program crashing
(4 answers)
Closed 8 years ago.
I'm just trying to understand how the strtok() function works below:
#define _CRT_SECURE_NO_WARNINGS
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(){
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
char* q=strtok(p, ",");
printf("%s", q);
return 0;
}
I was trying to run this simple code but I keep getting a weird error that I can't understand at all.
This is the error I keep getting:
It's probably not relevant but because my command prompt keeps exiting I changed some settings in my project and added this line Console (/SUBSYSTEM:CONSOLE) somewhere and I tried to run the program without debugging at first which didn't even run the program then I tried it with debugging and that's how I got the error in the link.
Instead of
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
use
char p[] = "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
because, as per the man page of strtok(), it may modify the input string. In your code, the string literal is read-only. That's why it produces the segmentation fault.
Related quote:
Be cautious when using these functions. If you do use them, note that:
*
These functions modify their first argument.
Firstly to solve the error: Segmentation fault.
Change char *p to char p[].
The reason is because strtok() will add '\0' the string delimiter when the token character is found.
In your case char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz"; is stored in read-only section of the program as string literal.
As you are trying to modify a location that is read-only leads to Segmentation fault
You may not use string literals with function strtok because the function tries to change the string passed to it as an argument.
Any attempt to change a string literal results in undefined behaviour.
According to the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
Change this definition
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
to
char s[] = "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
//...
char* q = strtok( s, "," );
For example function main can look like
int main()
{
char s[] = "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
char* q = strtok( s, "," );
while ( q != NULL )
{
puts( q );
strtok( NULL, "," );
}
return 0;
}
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz"; is assigning read-only memory to p. Really the type of p should be const char*.
Attempting to change that string in any way gives undefined behaviour. So using strtok on it is a very bad thing to do. To remedy, use
char p[]="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
That allocates the string on the stack and modifications on it are permitted. p decays to a pointer in certain instances; one of them being as a parameter to strtok.
I wonder why the following code does not throw segmentation fault when a string literal which is a result of dirname() is modified but throws segmentation fault when a string literal created in a usual is modified:
#include <stdio.h>
#include <stdlib.h>
#include <libgen.h>
#define FILE_PATH "/usr/bin/screen"
int main(void)
{
char db_file[] = FILE_PATH;
char *filename = dirname(db_file);
/* no segfault here */
filename[1] = 'a';
/* segfault here */
char *p = "abc";
p[1] = 'z';
exit(0);
}
I know that it's an UB to modify a string literal so output I get may be perfectly valid but I wonder if this can be explained. Are string literals that are returned by functions treated differently by compilers? The same situation happens when I compile this code with Clang 3.0 on x86 and gcc on x86 and ARM.
dirname() does not return a reference to a "string" literal, so it is fully legal to modify the data referenced by the pointer returned. Whether to do so makes sense or not is a different question, as the pointer returned may reference the char array passed to dirname().
However, if the OP's code would have passed a "string"-literal to dirname(), this would have already been illegal, as the POSIX specification explicitly state that the function may modify the array passed in.
From the POSIX specifications:
The dirname() function may modify the string pointed to by path.
From the manual
These functions may return pointers to statically allocated memory which may be overwritten by subsequent calls. Alternatively, they may return a pointer to some part of path, so that the string referred to by path should not be modified or freed until the pointer returned by the function is no longer required.
So it might return memory you can modify, and it might not. Depends on your system I guess.
During compilation string literals are stored in a read-only memory segment, and loaded as such at runtime.
This question explains things very nicely:
String literals: Where do they go?
This question already has answers here:
strtok segmentation fault
(8 answers)
strtok causing segfault but not when step through code
(1 answer)
Closed 9 years ago.
for some reason i get an exception at the first use of strtok()
what i am trying to accomplish is a function that simply checks if a substring repeats itself inside a string. but so far i havent gotten strtok to work
int CheckDoubleInput(char* input){
char* word = NULL;
char cutBy[] = ",_";
word = strtok(input, cutBy); <--- **error line**
/* walk through other tokens */
while (word != NULL)
{
printf(" %s\n", word);
word = strtok(NULL, cutBy);
}
return 1;
}
and the main calling the function:
CheckDoubleInput("asdlakm,_asdasd,_sdasd,asdas_sas");
CheckDoubleInput() is ok. Look at this. Hope you will understand
int main(){
char a[100] = "asdlakm,_asdasd,_sdasd,asdas_sas";
// This will lead to segmentation fault.
CheckDoubleInput("asdlakm,_asdasd,_sdasd,asdas_sas");
// This works ok.
CheckDoubleInput(a);
return 0;
}
The strtok function modify its first input (the parsed string), so you can't pass a pointer to a constant string. In your code, you are passing a pointer to a string literal of char[N] type (ie. a compilation constant string) and hence trying to modify a constant string literal which is undefined behaviour. You'll have to copy the string in a temporary buffer before.
char* copy = strdup("asdlakm,_asdasd,_sdasd,asdas_sas");
int result = CheckDoubleInput(copy);
free(copy);
Here is what the man page for strtok says:
Bugs
Be cautious when using these functions. If you do use them, note that:
These functions modify their first argument.
These functions cannot be used on constant strings.
The identity of the delimiting byte is lost.
The strtok() function uses a static buffer while parsing, so it's not thread safe. Use strtok_r() if this matters to you.
Either input is somehow bad. Try printing it before calling strtok OR you're using strtok on multiple threads with GCC compiler. Some compilers have a thread safe version called 'strtok_r'. Visual Studio have fixed the original function to be thread safe.
Your modified answer shows that you're passing a string literal which is read only.
I am trying to understand why the following snippet of code is giving a segmentation fault:
void tokenize(char* line)
{
char* cmd = strtok(line," ");
while (cmd != NULL)
{
printf ("%s\n",cmd);
cmd = strtok(NULL, " ");
}
}
int main(void)
{
tokenize("this is a test");
}
I know that strtok() does not actually tokenize on string literals, but in this case, line points directly to the string "this is a test" which is internally an array of char. Is there any of tokenizing line without copying it into an array?
The problem is that you're attempting to modify a string literal. Doing so causes your program's behavior to be undefined.
Saying that you're not allowed to modify a string literal is an oversimplification. Saying that string literals are const is incorrect; they're not.
WARNING : Digression follows.
The string literal "this is a test" is of an expression of type char[15] (14 for the length, plus 1 for the terminating '\0'). In most contexts, including this one, such an expression is implicitly converted to a pointer to the first element of the array, of type char*.
The behavior of attempting to modify the array referred to by a string literal is undefined -- not because it's const (it isn't), but because the C standard specifically says that it's undefined.
Some compilers might permit you to get away with this. Your code might actually modify the static array corresponding to the literal (which could cause great confusion later on).
Most modern compilers, though, will store the array in read-only memory -- not physical ROM, but in a region of memory that's protected from modification by the virtual memory system. The result of attempting to modify such memory is typically a segmentation fault and a program crash.
So why aren't string literals const? Since you really shouldn't try to modify them, it would certainly make sense -- and C++ does make string literals const. The reason is historical. The const keyword didn't exist before it was introduced by the 1989 ANSI C standard (though it was probably implemented by some compilers before that). So a pre-ANSI program might look like this:
#include <stdio.h>
print_string(s)
char *s;
{
printf("%s\n", s);
}
main()
{
print_string("Hello, world");
}
There was no way to enforce the fact that print_string isn't allowed to modify the string pointed to by s. Making string literals const in ANSI C would have broken existing code, which the ANSI C committee tried very hard to avoid doing. There hasn't been a good opportunity since then to make such a change to the language. (The designers of C++, mostly Bjarne Stroustrup, weren't as concerned about backward compatibility with C.)
There's a very good reason that trying to tokenize a compile-time constant string will cause a segmentation fault: the constant string is in read-only memory.
The C compiler bakes compile-time constant strings into the executable, and the operating system loads them into read-only memory (.rodata in a *nix ELF file). Since this memory is marked as read-only, and since strtok writes into the string that you pass into it, you get a segmentation fault for writing into read-only memory.
As you said, you can't modify a string literal, which is what strtok does. You have to do
char str[] = "this is a test";
tokenize(str);
This creates the array str and initialises it with this is a test\0, and passes a pointer to it to tokenize.
Strok modifies its first argument in order to tokenize it. Hence you can't pass it a literal string, as it's of type const char * and cannot be modified, hence the undefined behaviour. You have to copy the string literal into a char array that can be modified.
What point are you trying to make by your "...is internally an array of char" remark?
The fact that "this is a test" is internally an array of char does not change anything at all. It is still a string literal (all string literals are non-modifiable arrays of char). Your strtok still tries to tokenize a string literal. This is why it crashes.
I'm sure you'll get beaten up about this... but "strtok()" is inherently unsafe and prone to things like access violations.
Here, the answer is almost certainly using a string constant.
Try this instead:
void tokenize(char* line)
{
char* cmd = strtok(line," ");
while (cmd != NULL)
{
printf ("%s\n",cmd);
cmd = strtok(NULL, " ");
}
}
int main(void)
{
char buff[80];
strcpy (buff, "this is a test");
tokenize(buff);
}
I have also big trouble with this error.
I found a simple solution.
please include <string.h>
it will remove strtok segmentation fault error.
I just hit the Segmentation Fault error from trying to use printf to print the token (cmd in your case) after it became NULL.