string array initialisation - c

This is a continuation of another question I have.
Consider the following code:
char *hi = "hello";
char *array1[3] =
{
hi,
"world",
"there."
};
It doesn't compile to my surprise (apparently I don't know C syntax as well as I thought) and generates the following error:
error: initializer element is not constant
If I change the char* into char[] it compiles fine:
char hi[] = "hello";
char *array1[3] =
{
hi,
"world",
"there."
};
Can somebody explain to me why?

In the first example (char *hi = "hello";), you are creating a non-const pointer which is initialized to point to the static const string "hello". This pointer could, in theory, point at anything you like.
In the second example (char hi[] = "hello";) you are specifically defining an array, not a pointer, so the address it references is non-modifiable. Note that an array can be thought of as a non-modifiable pointer to a specific block of memory.
Your first example actually compiles without issue in C++ (my compiler, at least).

Related

Why can't you change characters using pointers?

recently I got a copy of the C Programming Language Book.
On page 95 there is an implementation of the strcpy function.
void strcopy(char *s, char *t) {
while(*s++ = *t++)
;
}
int main(int argc, char *argv[]) {
char *hello = "hello";
char *world = "world";
strcopy(hello, world);
return 0;
}
i tried to implement it myself, but after (successfully) compiling with
gcc -o c_lang c_programming_lang_exercises.c
and trying running it with
./c_lang
I get the following error message:
zsh: bus error
I tried it with different implementations, but somehow this does not work for me.
Maybe someone has an idea why?
Thank you for your help :)
Edit:
For my own understanding, please correct me if im wrong.
In the function call "strcopy(hello, world);" i pass the value of the pointers itself here, so i call the function with something like this: strcopy(0x123a4, 0x234859b). Now C doesnt sees this as variable, and therefore its value get copied and passed to the function.
In the strcopy function itself i cant dereference it, because i just get the value of the pointers.
What i should do is to pass the address, the pointers refere to, so that i can dereference the values in the strcopy function and access the right memory address.
The other mistake was probably in the main function i declared char *hello = "hello"that is as far as I understand it, called a string constant and is therefore not modifiable.
So, if my assumptions are correct, the following code should work:
void strcopy(char *s, char *t) {
//code goes here...
}
int main(int stringc, char *argv[]) {
char hello[] = "hello";
char world[] = "world";
strcopy(&hello, &world);
return 0;
}
And it works, but i get the following compiler warning:
incompatible pointer types passing 'char (*)[6]' to parameter of type 'char **' [-Wincompatible-pointer-types]
strcopy(&hello, &world);
C makes a really bad concession when it comes to types, and this has to do with historical reasons.
In main(), you declare your strings as:
char *hello = "hello";
What you should do is turn the warning levels waaaay up and recompile. The problem is that "hello" is a string literal — it cannot be modified. It should be declared as:
const char *hello = "hello";
(Turning warnings up will tell you this.)
Consequently, you are trying to copy characters into read-only memory, which, on modern processors and with modern compilers, is not permitted — it produces an access violation. Zsh complains for you.
Instead, make sure you have write access to a local array that is copied from read-only memory:
char hello[] = "hello";
Yep, that makes a local, mutable array that is six characters long (five for the word “hello” plus one for the null-terminator) and automatically initialized from the read-only memory.
This is one of those subtle sticking points to declaring things in C that confuses beginners regularly:
const char * s = "Hello world!"; — pointer to ROM
char s[] = "Hello world!"; — local, mutable array initialized from ROM
That’s it!

Why use the const char* form for a string

Take the following two forms of creating a string:
const char* pt1 = "Hello";
char* pt2 = "Goodbye";
What is the use of const in the above? In my understanding, doing:
ptr = "Adios";
Would work for both, since that is changing the address of the pointer, but trying to change a letter in the string would fail for both:
const char* pt1 = "Hello";
compiler error: assignment of read-only location
char* pt2 = "Goodbye";
runtime error: seg fault, trying to change .rodata
Since they produce the same result -- i.e., an error -- is there any advantage in using const when defining a string?
Defining pointers that point to string constants (aka string literals) as const char * allows the compiler to detect an incorrect access if somewhere else in the code you try and modify what pt1 points to as in *pt1 = 'A'; whereas you would just have undefined behavior at runtime if pt1 had type char *, causing a crash on some architectures and less obvious but potentially more damaging side effects on others.
To expand on this subject, there is sometimes a confusion as to the meaning of const for pointer definitions:
const char *pt1 = "Hello"; defines a modifiable pointer pt1 that points to an array of char that cannot be modified through it. Since "Hello" is a string constant, it is the correct type for pt1. pt1 can be modified to point to another string or char, modifiable or not, or be set to NULL.
char *pt2 = "Hello"; defines a modifiable pointer pt2 that points to an array of char that can be modified through it. The C Standard allows this in spite of the constness of "Hello" for compatibility with historical code. gcc and clang can disable this behavior with the -Wwrite-strings command line option. I strongly recommend using this and many more warnings to avoid common mistakes.
const char * const pt3 = "Hello"; defines a constant pointer pt3 that points to an array of char that cannot be modified through it. pt3 cannot be modified to point to another string or even be set to NULL.
char * const pt4 = "Hello"; defines a constant pointer pt4 that points to an array of char that can be modified through it. pt4 cannot be changed once initialized.
char and const can be placed in any order, but whether const is before or after the * makes a big difference.
What is the use of const in the above?
const char* pt1 = "Hello";
Simply mean you cannot change data that pt1 is pointing to.
Both
const char* ptr1 = "Hello";
char* pt2 = "Goodbye";
create static memory for string literal. I suggest you read this.
So advantage is that you would always get compile time error on first where on second it might depend on compiler. Some compilers do it automatically. see the page I have linked to.
Why use the const char* form for a string
Use const char *ptr1 when the referenced string should not get modified and allow the compiler to optimize based on that.
This is always the case when assigning with string literals.
Use char *ptr2 when the referenced string might get modified.
The danger of char* pt2 = "Goodbye"; is that later code may attempt to change the data referenced by pt2, which is presently points to string literal.
Why use the const char* form for a string
To notify yourself and other developers that the memory the pointer points to cannot be modified. In this context const is a keyword mostly for the programmer to notify the programmer that the data the pointer points to is const.
basically anything you can do to boil the error up to be caught by the compiler is preferable, right?
Yes. That is the reason why developers push to invent better tools for static code analysis. There are many tools for static code analysis, and recently GNU compiler gcc 11 comes with internal code static analysis. It's also the reason why languages like Rust are invented and so popular. All the tools try to push as many errors as possible to be detectable "statically" - at compile time.
Gcc has also a warning with -Wwrite-strings that warns about code like char *str = "str" that assigns const literal to a non-const pointer.
As KamilCuk pointed out, modifying const char* pt1 = "Hello"; gives you an error already at compile time, when you can just fix the code, recompile, and everything is fine. Modifying char* pt1 = "Hello"; throws the error at runtime, and you do not want all your 1 million users to redownload and reinstall your program (You would have to first buy a better internet connection for that). So, you definitely should use const char*.

Where does const string store? in stack or .data? [duplicate]

This question already has answers here:
How are string literals compiled in C?
(2 answers)
"Life-time" of a string literal in C
(9 answers)
Closed 8 years ago.
I have written a simple c code which shows below. In this code snippet I want to verify where the const string abcd stores. I first guess that it should be stored in .data section for read-only. After a test in Debian, however, things is different from what I initial guessed. By checking the assembly code which generated by gcc, I find it is placed in the stack frame of function p. But when I try it later in OSX, the string is stored in .data section again. Now I am confused by this. Is there any standard for the storing of const string?
#include<stdio.h>
char *p()
{
char p[] = "abcd";
return p;
}
int main()
{
char *pp = p();
printf("%s\n",pp);
return 0;
}
UPDATE: rici's answer awaken me. In OSX, the initial literal is stored in .data and then moved into function's stack frame later. Thus, it becomes a local variable for this function. However, gcc in Debian handle this situation is different from OSX. In Debian, gcc directly stored literal in stack instead of moving it from .data. I'm sorry for my carelessness.
in your case, it's located in stack. and returning the pointer to main will cause undefined behavior. but, if you have static char p[] = "abcd"; or char *p = "abcd"; they(the data) are located in .data.
There is a huge difference between:
const char s[] = "abcd";
and
const char* t = "abcd";
The first of these declares s to be an array object initialized from the string "abcd". s will have an address distinct from that of any other object in the program. The character string itself might be a compile-time artifact; the initialization is a copy so the character string does not need to be present at runtime if the compiler can find some other way of performing the initialization (such as a store immediate operation).
The second declaration declares t to be a pointer to a string constant. The string constant now must be present at runtime, because expressions like t+1, which are pointers inside the string, are valid. The language standard does not guarantee that every occurrence of string literals in the program is unique, nor does it guarantee that all occurrence are merged (although good compilers will try to do the second.) It does, however, guarantee that they have static lifetime.
Consequently, this is undefined behaviour, because the lifetime of the array s ends when the function returns:
const char *gimme_a_string() {
const char s[] = "abcd";
return s;
}
However, this is fine:
const char *gimme_a_string() {
const char *s = "abcd";
return s;
}
Also:
const char s[] = "abcd";
const char t[] = "abcd";
printf("%d\n", s == t);
is guaranteed to print 0, while
const char* s = "abcd";
const char* t = "abcd";
printf("%d\n", s == t);
might print either 0 or 1, depending on the implementation. (As written, it will almost certainly print 1. However, if the two declarations are in separate compilation units and lto is not enable, it is likely to print 0.)
Since the array form is initialized with a copy the non-const version is fine:
char s[] = "abcd";
s[3] = 'C';
But the char pointer version must be a const to avoid undefined behaviour.
// Will produce a warning on most compilers with compile option -Wall or equivalent
char* s = "abcd";
// *** UNDEFINED BEHAVIOUR *** Can cause random program breakage
s[3] = 'C';
Technically, the non-const declaration of s is legal (which is why the compiler only warns) because it is the attempt to modify the constant which is UB. But you should always heed compiler warnings; it is better to think of the declaration / initialization as wrong, because it is.

Why do I get "Incompatible types when assigning" error?

I'm having C code which seems to have similar pointer assignments but shows different behaviours while compiling .
My structure declaration and definition is below,
typedef struct {
int a;
char b[20];
}
TestStruct;
TestStruct t1;
Why does the below code gives "error: incompatible types when assigning to type ‘char[20]’ from type ‘char *’"
t1.b = "Hello World";
but the code below compiles successfully,
char *charPtr = t1.b;
charPtr = "Hello World";
Note: I'm using GCC compiler v4.6.3
Ths strings cannot be assigned to arrays in C, unless as part of an initialization.
The right way to do something like that is by means of the function strcpy() of ths <string.h> standard header.
strcpy(t1.b, "Hello world");
It is not true that the array t1.b is a pointer to char.
Actually, it has type array of 20 elements of type char.
In an expression, the array normally decays to a pointer to char.
However, the array has a fixed address in memory. It is not an lvalue, its address cannot be changed by an assignment.
The opposite assignment is valid:
charPtr = "Hello World";
The address of the string "Hello world" is assigned to charPtr.
However, your sentences have not the intended effect:
char *charPtr = t1.b;
charPtr = "Hello World";
The effect is that charPtr becomes equal to the address of t1.b.
Then, this value is discarded in the second sentence and replaced by the address of the array "Hello world".
More details: Be carefull in handling strings. A string literal like "Hello world" is an array stored (in general) in only-read memory. If you try to modify it, you can obtain unexpected results.
In particular, this happens with the assignment charPtr = "Hello world".
The string can be read, but not changed.
To change or manipulate a string, it has to be copied (with strcpy()) to an array or to an allocated portion of memory.
You can't directly assign a string literal to a char array. Use strcpy() or strlcpy().
In the second example, the array decays into a pointer, and then you change that pointer. Note that in this example, t1.b remains unchanged.
When you define:
typedef struct {
int a;
char b[20];
}TestStruct;
b is a constant pointer pointing to char. Hence t1.b = "Hello World"; is a compiling error since it change the value of t1.b
Solutions:
1) Use strcpy:
strcpy(t1.b, "Hello World");
2) Tricky type-casting way
char** pb = (char**)&t1.b;
*pb = "Hello World";
Although these 2 methods bring the same purpose (setting t1.b to "Hello World" string), the underlying idea is very different.
You cannot assign const char* to char[] as it violates constancy.

Why is the following code invalid?

Why is the following code invalid?
void foo()
{
char hello[6];
char *foo = "hello";
hello = foo;
}
But how the following code is valid?
void foo()
{
char hello[] = "hello";
char *foo = hello;
}
You are are trying to assign the array as a pointer. This is invalid. Arrays are like pointer constants in that they can’t be used as lvalues – they can’t be reassigned to point to somewhere else. The closest you can get is to copy the contents of foo into hello.
In second case, hello is an array of chars and foo is a pointer to a char. In general, arrays are interchangeable with pointers of the same type so this is valid.
I think you supposed string "hello" will be copied to hello. It's wrong. You're trying to assign a pointer to another. And, you can not assign to hello.
The right way is:
strcpy(hello, foo);
In first case you are assigning string to a foo pointer that is wrong.
where as in 2nd case you have an array of char and you are passing it into foo pointer

Resources