Given the following code:
#include "stdafx.h"
#include "string.h"
static char *myStaticArray[] = {"HelloOne", "Two", "Three"};
int _tmain(int argc, _TCHAR* argv[])
{
char * p = strstr(myStaticArray[0],"One");
char hello[10];
memset(hello,0,sizeof(hello));
strncpy(hello,"Hello",6);
strncpy(p,"Hello",3); // Access Violation
return 0;
}
I'm getting an access violation at precisely the point when it attempts to write to the address of myStaticArray[0].
Why is this a problem?
Background: I'm porting old C++ to C# as primarily a C# developer, so please excuse my ignorance! This piece of code apparently wasn't an issue in the old build, so I'm confused...
char * p = strstr(myStaticArray[0],"One");
p points to a part of the string literal "HelloOne". You mustn't try to modify string literals, that's undefined behaviour.
Often, string literals are stored in a read-only part of the memory, so trying to write to them causes a segmentation fault/access violation.
static char *myStaticArray[] = {"HelloOne", "Two", "Three"};
The strings in the array are string literals and are non-modifiable in C and in C++.
strncpy(p,"Hello",3);
This function call attempts to modify a string literal.
Another issue is your use of the strncpy function which does not always null terminate the string. This is the case here because strlen("Hello") is greater than 3 (your last strncpy argument).
If you want to be able to modify the strings then you'll need to allocate the character arrays like this
static char myStaticArray[][25] = {"HelloOne", "two", "three"};
The problem, as others stated is that your method causes the compiler create an array of 3 pointers to constant strings. The above declaration creates a two dementional character array, then copies the constant-string data to that memory.
Related
I was reading up on common C pitfalls and came up to this article on some famous Uni website. (It is the 2nd link that comes up on google).
The last example on that page is,
// Memory allocation on the stack
void b(char **p) {
char * str="print this string";
*p = str;
}
int main(void) {
char * s;
b(&s);
s[0]='j'; //crash, since the memory for str is allocated on the stack,
//and the call to b has already returned, the memory pointed to by str
//is no longer valid.
return 0;
}
That explanation in the comment got me thinking then, that, isn't the memory for string literals not static?
Isn't the actual error there then that you are not supposed to modify string literals, because it is undefined behavior? Or are the comments there correct and my understanding of that example is wrong?
Upon searching further, I saw this question: referencing a char that went out of scope and I understood from that question that, the following is valid code.
#include <malloc.h>
char* a = NULL;
{
char* b = "stackoverflow";
a = b;
}
int main() {
puts(a);
}
Also this question agrees with the other stackoverflow question and my thinking, but opposes the comment from that website's code.
To test it, I tried the following,
#include <stdio.h>
#include <malloc.h>
void b(char **p)
{
char * str = "print this string";
*p = str;
}
int main(void)
{
char * s;
b(&s);
// s[0]='j'; //crash, since the memory for str is allocated on the stack,
//and the call to b has already returned, the memory pointed to by str is no longer valid.
printf("%s \n", s);
return 0;
}
which as expected does not give a segmentation fault.
Standard says (emphasize is mine):
6.4.5 String literals
[...] The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. [...]
[...] If the program attempts to
modify such an array, the behavior is undefined. [...]
No, you misunderstand the reason for crash. String literals have static duration, meaning that they exist for the lifetime of the program. Since your pointer points to the literal, you can use it anytime.
The reason for the crash is the fact that string literals are read-only. In fact char* x = "" is an error in C++, as it should be const char* x = "". They are read-only from language perspective, and any attempt to modify them would lead to undefined behavior.
In practical terms, they are often put in the read-only segment, so any attempt at modification triggers a GPF - general protection fault. Usual response to GPF is a program termination - and this is what you are witnessing with your application.
String literals are placed in general in rodata section (read-only) within the ELF file, and under Linux\Windows\Mac-OS they will end up in a memory region which will generate a fault when written to (configured so using MMU or MPU by the OS upon loading)
I have written the following replace function which replaces substring inside a big string as follows:
void replace(char *str1, char *str2, int start, int end)
{
int i,j=0;
for(i=start; i<end; i++, j++)
*(str1+i)=*(str2+j);
}
It works fine when I put a string as replace("Move a mountain", "card", 0,4), but when I declare a string using pointer array like char *list[1]={"Move a mountain"} and pass it to the function as replace(list[0], "card",0,4), it gives me a segmentation fault.
Not able to figure it out. Can anybody please explain this to me?
The code of function replace looks fine, yet all the ways you call it introduce undefined behaviour:
First, with replace("Move a mountain", "card", 0,4), you are passing a string literal as argument for str1, which is modified in replace then. Modifying a string literal is undefined behaviour, and if it "works", it is just luck (more bad luck than good luck, actually).
Second char *list[1]={"Move a mountain"} is similar but introduces another issue: char*list is an array of pointers, and you initialize list[0] to point to string literal ""Move a mountain". So passing list[0] would again lead to UB due to modifying a string literal. But then you pass list[1], which is out of bounds and therefore introduces undefined behaviour. Again, everything can happen, segfaulting being just one such possibility.
Write
char list[] = "Move a mountain"; // copies string literal into a new and modifiable array
replace(list, "card",0,4)
and it should work better.
This question already has answers here:
strtok program crashing
(4 answers)
Closed 8 years ago.
I'm just trying to understand how the strtok() function works below:
#define _CRT_SECURE_NO_WARNINGS
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(){
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
char* q=strtok(p, ",");
printf("%s", q);
return 0;
}
I was trying to run this simple code but I keep getting a weird error that I can't understand at all.
This is the error I keep getting:
It's probably not relevant but because my command prompt keeps exiting I changed some settings in my project and added this line Console (/SUBSYSTEM:CONSOLE) somewhere and I tried to run the program without debugging at first which didn't even run the program then I tried it with debugging and that's how I got the error in the link.
Instead of
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
use
char p[] = "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
because, as per the man page of strtok(), it may modify the input string. In your code, the string literal is read-only. That's why it produces the segmentation fault.
Related quote:
Be cautious when using these functions. If you do use them, note that:
*
These functions modify their first argument.
Firstly to solve the error: Segmentation fault.
Change char *p to char p[].
The reason is because strtok() will add '\0' the string delimiter when the token character is found.
In your case char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz"; is stored in read-only section of the program as string literal.
As you are trying to modify a location that is read-only leads to Segmentation fault
You may not use string literals with function strtok because the function tries to change the string passed to it as an argument.
Any attempt to change a string literal results in undefined behaviour.
According to the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
Change this definition
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
to
char s[] = "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
//...
char* q = strtok( s, "," );
For example function main can look like
int main()
{
char s[] = "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
char* q = strtok( s, "," );
while ( q != NULL )
{
puts( q );
strtok( NULL, "," );
}
return 0;
}
char* p="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz"; is assigning read-only memory to p. Really the type of p should be const char*.
Attempting to change that string in any way gives undefined behaviour. So using strtok on it is a very bad thing to do. To remedy, use
char p[]="abc,def,ghi,jkl,mno,pqr,stu,vwx,yz";
That allocates the string on the stack and modifications on it are permitted. p decays to a pointer in certain instances; one of them being as a parameter to strtok.
This question already has answers here:
Difference between char* and char[]
(8 answers)
String Literals
(3 answers)
Closed 9 years ago.
#include <stdio.h>
#include <string.h>
int main(void){
char s1[30]="abcdefghijklmnopqrstuvwxyz";
printf("%s\n",s1);
printf("%s",memset(s1,'b',7));
getch();
return 0;
}
Above code works but when I create s1 array like this,
char *s1="abcdefghijklmnopqrstuvwxyz";
it does not give any errors in compile time but fails to run in runtime.
I am using Visual Studio 2012.
Do you know why?
I found prototype of memset is:
void *memset( void *s, int c, size_t n );
char s1[30] allocates a writable memory segment to store the contents of the array, char *s1="Sisi is an enemy of Egypt."; doesn't - the latter only sets a pointer to the address of a string constant, which the compiler will typically place in a read-only section of the object code.
String literals gets space in "read-only-data" section which gets mapped into the process space as read-only (So you can't change it).
char s1[30]="abcdefghijklmnopqrstuvwxyz";
This declares s1 as array of type char, and initialized it.
char *s1="abcdefghijklmnopqrstuvwxyz";
Will place "abcdefghijklmnopqrstuvwxyz" in the read-only parts of the memory and making a pointer to that.
However modifying s1 through memset yields an undefined behavior.
An very good question!.
If you make gcc output the assembly, and compare the output, you could find out the answer, and the following is why:
char s1[30]="abcdef";
when defined in a function, it will define an array of char, and s1 is the name of the array. The program will allocate memory in stack.
when define globally, it will define a object in the program, and the object is not an read only data.
char* s2 = "abcdef"; only define a point of char, which point to an const char stored in the .rodata, that is the read only data in the program.
To make program run efficiently and make the progress management easily, the compiler will generate different sections for a given code. Constant chars, like the char* s2 = "abcdef"; and the printf format string will be stored in the .section rodata section. After loading into the main memory by the loader of the OS, this section will be marked as read only. That is why when you use memset to modify the memory which s2 point to, it will complain Segment fault.
Here is an explaination: Difference between char* and char[]
I am trying to understand why the following snippet of code is giving a segmentation fault:
void tokenize(char* line)
{
char* cmd = strtok(line," ");
while (cmd != NULL)
{
printf ("%s\n",cmd);
cmd = strtok(NULL, " ");
}
}
int main(void)
{
tokenize("this is a test");
}
I know that strtok() does not actually tokenize on string literals, but in this case, line points directly to the string "this is a test" which is internally an array of char. Is there any of tokenizing line without copying it into an array?
The problem is that you're attempting to modify a string literal. Doing so causes your program's behavior to be undefined.
Saying that you're not allowed to modify a string literal is an oversimplification. Saying that string literals are const is incorrect; they're not.
WARNING : Digression follows.
The string literal "this is a test" is of an expression of type char[15] (14 for the length, plus 1 for the terminating '\0'). In most contexts, including this one, such an expression is implicitly converted to a pointer to the first element of the array, of type char*.
The behavior of attempting to modify the array referred to by a string literal is undefined -- not because it's const (it isn't), but because the C standard specifically says that it's undefined.
Some compilers might permit you to get away with this. Your code might actually modify the static array corresponding to the literal (which could cause great confusion later on).
Most modern compilers, though, will store the array in read-only memory -- not physical ROM, but in a region of memory that's protected from modification by the virtual memory system. The result of attempting to modify such memory is typically a segmentation fault and a program crash.
So why aren't string literals const? Since you really shouldn't try to modify them, it would certainly make sense -- and C++ does make string literals const. The reason is historical. The const keyword didn't exist before it was introduced by the 1989 ANSI C standard (though it was probably implemented by some compilers before that). So a pre-ANSI program might look like this:
#include <stdio.h>
print_string(s)
char *s;
{
printf("%s\n", s);
}
main()
{
print_string("Hello, world");
}
There was no way to enforce the fact that print_string isn't allowed to modify the string pointed to by s. Making string literals const in ANSI C would have broken existing code, which the ANSI C committee tried very hard to avoid doing. There hasn't been a good opportunity since then to make such a change to the language. (The designers of C++, mostly Bjarne Stroustrup, weren't as concerned about backward compatibility with C.)
There's a very good reason that trying to tokenize a compile-time constant string will cause a segmentation fault: the constant string is in read-only memory.
The C compiler bakes compile-time constant strings into the executable, and the operating system loads them into read-only memory (.rodata in a *nix ELF file). Since this memory is marked as read-only, and since strtok writes into the string that you pass into it, you get a segmentation fault for writing into read-only memory.
As you said, you can't modify a string literal, which is what strtok does. You have to do
char str[] = "this is a test";
tokenize(str);
This creates the array str and initialises it with this is a test\0, and passes a pointer to it to tokenize.
Strok modifies its first argument in order to tokenize it. Hence you can't pass it a literal string, as it's of type const char * and cannot be modified, hence the undefined behaviour. You have to copy the string literal into a char array that can be modified.
What point are you trying to make by your "...is internally an array of char" remark?
The fact that "this is a test" is internally an array of char does not change anything at all. It is still a string literal (all string literals are non-modifiable arrays of char). Your strtok still tries to tokenize a string literal. This is why it crashes.
I'm sure you'll get beaten up about this... but "strtok()" is inherently unsafe and prone to things like access violations.
Here, the answer is almost certainly using a string constant.
Try this instead:
void tokenize(char* line)
{
char* cmd = strtok(line," ");
while (cmd != NULL)
{
printf ("%s\n",cmd);
cmd = strtok(NULL, " ");
}
}
int main(void)
{
char buff[80];
strcpy (buff, "this is a test");
tokenize(buff);
}
I have also big trouble with this error.
I found a simple solution.
please include <string.h>
it will remove strtok segmentation fault error.
I just hit the Segmentation Fault error from trying to use printf to print the token (cmd in your case) after it became NULL.