Difference between char *str="STRING" and char str[] = "STRING"? - c

While coding a simple function to remove a particular character from a string, I fell on this strange issue:
void str_remove_chars( char *str, char to_remove)
{
if(str && to_remove)
{
char *ptr = str;
char *cur = str;
while(*ptr != '\0')
{
if(*ptr != to_remove)
{
if(ptr != cur)
{
cur[0] = ptr[0];
}
cur++;
}
ptr++;
}
cur[0] = '\0';
}
}
int main()
{
setbuf(stdout, NULL);
{
char test[] = "string test"; // stack allocation?
printf("Test: %s\n", test);
str_remove_chars(test, ' '); // works
printf("After: %s\n",test);
}
{
char *test = "string test"; // non-writable?
printf("Test: %s\n", test);
str_remove_chars(test, ' '); // crash!!
printf("After: %s\n",test);
}
return 0;
}
What I don't get is why the second test fails?
To me it looks like the first notation char *ptr = "string"; is equivalent to this one: char ptr[] = "string";.
Isn't it the case?

The two declarations are not the same.
char ptr[] = "string"; declares a char array of size 7 and initializes it with the characters s ,t,r,i,n,g and \0. You are allowed to modify the contents of this array.
char *ptr = "string"; declares ptr as a char pointer and initializes it with address of string literal "string" which is read-only. Modifying a string literal is an undefined behavior. What you saw(seg fault) is one manifestation of the undefined behavior.

Strictly speaking a declaration of char *ptr only guarantees you a pointer to the character type. It is not unusual for the string to form part of the code segment of the compiled application which would be set read-only by some operating systems. The problem lies in the fact that you are making an assumption about the nature of the pre-defined string (that it is writeable) when, in fact, you never explicitly created memory for that string yourself. It is possible that some implementations of compiler and operating system will allow you to do what you've attempted to do.
On the other hand the declaration of char test[], by definition, actually allocates readable-and-writeable memory for the entire array of characters on the stack in this case.

As far as I remember
char ptr[] = "string";
creates a copy of "string" on the stack, so this one is mutable.
The form
char *ptr = "string";
is just backwards compatibility for
const char *ptr = "string";
and you are not allowed (in terms of undefined behavior) to modify it's content.
The compiler may place such strings in a read only section of memory.

char *test = "string test"; is wrong, it should have been const char*. This code compiles just because of backward comptability reasons. The memory pointed by const char* is a read-only memory and whenever you try to write to it, it will invoke undefined behavior. On the other hand char test[] = "string test" creates a writable character array on stack. This like any other regualr local variable to which you can write.

Good answer #codaddict.
Also, a sizeof(ptr) will give different results for the different declarations.
The first one, the array declaration, will return the length of the array including the terminating null character.
The second one, char* ptr = "a long text..."; will return the length of a pointer, usually 4 or 8.

char *str = strdup("test");
str[0] = 'r';
is proper code and creates a mutable string. str is assigned a memory in the heap, the value 'test' filled in it.

Related

Weird segmentation fault (difference between arrays and pointers ?)

i cant see why i'm getting segmentation fault from this code and somehow when i use arrays instead of pointers it works, i would be happy if anyone can make me understand this.
void main() {
char *str = "example string";
wrapChrInStr(str, 'a');
}
void wrapChrInStr(char *str, unsigned char chr) {
char *ptr = str;
char c;
while((c = *ptr)) {
if(c != chr) {
*str = c;
str++;
ptr++;
} else {
ptr++;
}
}
*str = '\0';
}
Thank you. I'm doing a lot of c programming and its really weird that i never faced that before.
Probably because you don't realize that there are different ways of storing a
C-String. You may have been lucky enough never to have encountered a segfault
because of this.
String literals
A string literal is declared with double quotation marks, e.g.
"hello world"
This string is usually stored in a read-only section. When using string
literals, it's best to declare the variables with a const like this:
const char *str = "hello world";
With this you know that str is pointing to read-only memory location and you cannot
manipulate the contents of the string. In fact, if you do this:
const char *str = "hello world";
str[0] = 'H';
// or the equivalent
*str = 'H'
the compiler will return an error like this:
a.c:5:5: error: assignment of read-only location ‘*str’
which I found very helpful, because you cannot accidentally manipulate the
contents pointed to by str.
Arrays
If you need to manipulate the contents of a string, then you need to store the
string in an array, e.g.
char str[] = "hello word";
In this case the compiler knows that the string literal has 10 characters and reserves 11 bytes (1 byte for '\0' - the terminating byte) for str and initializes the array with
the contents of the string literal.
Here you can do stuff like
str[0] = 'H'
but you cannot access beyond the 11th byte.
You can also declare an array with a fixed size. In this case the size must be
at least the same as the length+1 of the string literal.
char str[11] = "Hello world";
If you declare less space (char str[3] = "hello world"; for example),
your compiler will warn you with something like this
a.c:4:14: warning: initializer-string for array of chars is too long
but I'm not sure what happens if you execute the code anyway. I think this is a case of undefined behaviour
and that means: anything can happen.
Personally, I usually declare my string without a fixed size, unless there is
a reason for having a fixed size.

Bus error in my own strcat using pointer

I'm doing a pointer version of the strcat function, and this is my code:
void strcat(char *s, char *t);
int main(void) {
char *s = "Hello ";
char *t = "world\n";
strcat(s, t);
return 0;
}
void strcat(char *s, char *t) {
while (*s)
s++;
while ((*s++ = *t++))
;
}
This seems straightforward enough, but it gives bus error when running on my Mac OS 10.10.3. I can't see why...
In your code
char *s = "Hello ";
s points to a string literal which resides in read-only memory location. So, the problem is twofold
Trying to alter the content of the string literal invokes undefined behaviour.
(almost ignoring point 1) The destination pointer does not have enough memory to hold the concatinated final result. Memory overrun. Again undefined behaviour.
You should use an array (which resides in read-write memory) with sufficient length to hold the resulting (final) string instead (no memory overrun).
Suggestion: Don't use the same name for user-defined functions as that of the library functions. use some other name, e.g., my_strcat().
Pseudocode:
#define MAXVAL 512
char s[MAXVAL] = "Hello";
char *t = "world\n"; //are you sure you want the \n at the end?
and then
my_strcat(s, t);
you are adding the text of 't' after the last addres s is pointing to
char *s = "Hello ";
char *t = "world\n";
but writing after 's' is undefined behavior because the compiler might put that text in constant memory, in your case it crashes so it actually does.
you should reserve enough memory where s points to by either using malloc or declare it array style
char s[32] = "Hello ";

Run-time error in program

int main()
{
char *p="abcd";
while(*p!='\0') ++*p++;
printf("%s",p);
return 0;
}
I am not able to understand why the code does not run. The problem is in the statement ++*p++, but what is the problem?
P points to constant string literal *p="abcd"; by doing ++*p++ you are trying to modify string '"abcd"', for example a in string will be increment to 'b' because of ++*p that is undefined behavior (constant string can't change). it may cause a segmentation fault.
`++*p++` means `++*p` then `p++`
^
| this operation try to modify string constant
char *p="abcd";
p is pointing to a readonly segment and you can not increment p as you did here
while(*p!='\0') ++*p++;
//char *p="abcd";//"string" is const char, don't change.
char str[]="abcd";//is copied, including the '\0' to reserve space
char *p = str;
while(*p!='\0') ++*p++;
//printf("%s",p);//address pointing of 'p' has been changed
printf("%s",str);//display "bcde"
char *foo = "abcd";
char bar[] = "abcd";
Consider the differences between foo and bar. foo is a pointer that's initialised to point to memory reserved for a string literal. bar is an array with it's own memory that's initialised to a string literal; it's value is copied from the string literal during initialisation. It is appropriate to modify bar[0], etc, but not foo[0]. Hence, you need an array declaration.
However, you can't increment an array declaration; That's an operation for pointer variables or integer variables. Additionally, your loop changes where p points at, before it's printed, so you need to keep the original location of your string somewhere. Hence, you also need a pointer, or an integer declaration.
With this in mind, it might seem like a good idea to change your code to something like this:
int main()
{
char str[] = "abcd";
/* Using a pointer variable: */
for (char *ptr = str; *ptr != '\0'; ++*ptr++);
/* Using a size_t variable: */
for (size_t x = 0; str[x] != '\0'; str[x++]++);
printf("%s", str);
return 0;
}

changing one char in a c string

I am trying to understand why the following code is illegal:
int main ()
{
char *c = "hello";
c[3] = 'g'; // segmentation fault here
return 0;
}
What is the compiler doing when it encounters char *c = "hello";?
The way I understand it, its an automatic array of char, and c is a pointer to the first char. If so, c[3] is like *(c + 3) and I should be able to make the assignment.
Just trying to understand the way the compiler works.
String constants are immutable. You cannot change them, even if you assign them to a char * (so assign them to a const char * so you don't forget).
To go into some more detail, your code is roughly equivalent to:
int main() {
static const char ___internal_string[] = "hello";
char *c = (char *)___internal_string;
c[3] = 'g';
return 0;
}
This ___internal_string is often allocated to a read-only data segment - any attempt to change the data there results in a crash (strictly speaking, other results can happen as well - this is an example of 'undefined behavior'). Due to historical reasons, however, the compiler lets you assign to a char *, giving you the false impression that you can modify it.
Note that if you did this, it would work:
char c[] = "hello";
c[3] = 'g'; // ok
This is because we're initializing a non-const character array. Although the syntax looks similar, it is treated differently by the compiler.
there's a difference between these:
char c[] = "hello";
and
char *c = "hello";
In the first case the compiler allocates space on the stack for 6 bytes (i.e. 5 bytes for "hello" and one for the null-terminator.
In the second case the compiler generates a static const string called "hello" in a global area (aka a string literal, and allocates a pointer on the stack that is initialized to point to that const string.
You cannot modify a const string, and that's why you're getting a segfault.
You can't change the contents of a string literal. You need to make a copy.
#include <string.h>
int main ()
{
char *c = strdup("hello"); // Make a copy of "hello"
c[3] = 'g';
free(c);
return 0;
}

Why does *(str+i) = *(str +j) not work here?

void reverse(char *str){
int i,j;
char temp;
for(i=0,j=strlen(str)-1; i<j; i++, j--){
temp = *(str + i);
*(str + i) = *(str + j);
*(str + j) = temp;
printf("%c",*(str + j));
}
}
int main (int argc, char const *argv[])
{
char *str = "Shiv";
reverse(str);
printf("%s",str);
return 0;
}
When I use char *str = "Shiv" the lines in the swapping part of my reverse function i.e str[i]=str[j] dont seem to work, however if I declare str as char str[] = "Shiv", the swapping part works? What is the reason for this. I was a bit puzzled by the behavior, I kept getting the message "Bus error" when I tried to run the program.
When you use char *str = "Shiv";, you don't own the memory pointed to, and you're not allowed to write to it. The actual bytes for the string could be a constant inside the program's code.
When you use char str[] = "Shiv";, the 4(+1) char bytes and the array itself are on your stack, and you're allowed to write to them as much as you please.
The char *str = "Shiv" gets a pointer to a string constant, which may be loaded into a protected area of memory (e.g. part of the executable code) that is read only.
char *str = "Shiv";
This should be :
const char *str = "Shiv";
And now you'll have an error ;)
Try
int main (int argc, char const *argv[])
{
char *str = malloc(5*sizeof(char)); //4 chars + '\0'
strcpy(str,"Shiv");
reverse(str);
printf("%s",str);
free(str); //Not needed for such a small example, but to illustrate
return 0;
}
instead. That will get you read/write memory when using pointers. Using [] notation allocates space in the stack directly, but using const pointers doesn't.
String literals are non-modifiable objects in both C and C++. An attempt to modify a string literal always results in undefined behavior. This is exactly what you observe when you get your "Bus error" with
char *str = "Shiv";
variant. In this case your 'reverse' function will make an attempt to modify a string literal. Thus, the behavior is undefined.
The
char str[] = "Shiv";
variant will create a copy of the string literal in a modifiable array 'str', and then 'reverse' will operate on that copy. This will work fine.
P.S. Don't create non-const-qualified pointers to string literals. You first variant should have been
const char *str = "Shiv";
(note the extra 'const').
String literals (your "Shiv") are not modifiable.
You assign to a pointer the address of such a string literal, then you try to change the contents of the string literal by dereferencing the pointer value. That's a big NO-NO.
Declare str as an array instead:
char str[] = "Shiv";
This creates str as an array of 5 characters and copies the characters 'S', 'h', 'i', 'v' and '\0' to str[0], str[1], ..., str[4]. The values in each element of str are modifiable.
When I want to use a pointer to a string literal, I usually declare it const. That way, the compiler can help me by issuing a message when my code wants to change the contents of a string literal
const char *str = "Shiv";
Imagine you could do the same with integers.
/* Just having fun, this is not C! */
int *ptr = &5; /* address of 5 */
*ptr = 42; /* change 5 to 42 */
printf("5 + 1 is %d\n", *(&5) + 1); /* 6? or 43? :) */
Quote from the Standard:
6.4.5 String literals
...
6 ... If the program attempts to modify such an array [a string literal], the behavior is undefined.
char *str is a pointer / reference to a block of characters (the string). But its sitting somewhere in a block of memory so you cannot just assign it like that.
Interesting that I've never noticed this. I was able to replicate this condition in VS2008 C++.
Typically, it is a bad idea to do in-place modification of constants.
In any case, this post explains this situation pretty clearly.
The first (char[]) is local data you can edit
(since the array is local data).
The second (char *) is a local pointer to
global, static (constant) data. You
are not allowed to modify constant
data.
If you have GNU C, you can compile
with -fwritable-strings to keep the
global string from being made
constant, but this is not recommended.

Resources