Pointers create constant string literals, why? - c

I was going through some documentation which states that
First Case
char * p_var="Sack";
will create a constant string literal.
And hence code like
p_var[1]="u";
will fail because of that property.
Second Case
Also mentioned is that this is possible only for character literals and not for other data types through pointers. So code like
float *p="3.14";
will fail, resulting in a compiler error.
But when i try it out i don't get compiler errors ,accessing it though gives me 0.000000f(using gcc on Ubuntu).
So regarding the above, i have three queries:
Why are string literals created in First Case read-only?
Why are only string literals allowed to be created and not other constants like float through pointers?
3. Why is Second Case not giving me compiler errors?
Update
Please discard the 3rd question and second case. I tested it by adding quotes.
Thanks

The premise is wrong: pointers don’t create any string literals, neither read-only nor writeable.
What does create a read-only string literal is the literal itself: "foo" is a read-only string literal. And if you assign it to a pointer, then that pointer points to a read-only memory location.
With that, let’s turn to your questions:
Why are string literals created in First Case read-only?
The real question is: why not? In most cases, you won’t want to change the value of a string literal later on so the default assumption makes sense. Furthermore, you can create writeable strings in C via other means.
Why are only string literals allowed to be created and not other constants like float?
Again, wrong assumption. You can create other constants:
float f = 1.23f;
Here, the 1.23f literal is read-only. You can also assign it to a constant variable:
const float f = 1.23f;
Why is Second Case not giving me compiler errors?
Because the compiler cannot check in general whether your pointer points to read-only memory or to writeable memory. Consider this:
char* p = "Hello";
char str[] = "world"; // `str` is a writeable string!
p = &str[0];
p[1] = 'x';
Here, p[1] = 'x' is entirely legal – if we hadn’t re-assigned p beforehand, it would have been illegal. Checking this cannot be generally done at compile-time.

Regarding your question:
Why are string literals created in First Case read-only?
char *p_var="Sack";
Well, the p_var is assigned with the starting address of the memory allocated to the string "Sack". p_var content is not read-only, since you haven't put the const keyword anywhere in C constructs. Although manipulating the p_var contents like strcpy or strcat may cause undefined behavior.
Quote C ISO 9899:
The declaration
char s[] = "abc", t[3] = "abc";
defines plain char array objects s and t whose elements are initialized with character string literals.
This declaration is identical to:
char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };
The contents of the arrays are modifiable. On the other hand, the declaration:
char *p = "abc";
defines p with type pointer to char and initializes it to point to an object with type array of char with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.
An explanation of why it could be read-only per your platform and compiler:
Commonly string literals will be put in "read-only-data" section which gets mapped into the process space as read-only (which is why you seem to not being allowed to change it).
But some platforms do allow, the data segment to be writable.
Why are only string literals allowed to be created and not other constants like float? and the third question.
To create a float constant you should use:
const float f=1.5f;
Now, when you are doing:
float *p="3.14";
you are basically assigning the string literal's address to a float pointer.
Try compiling with -Wall -Werror -Wextra. You will find out what is happening.
It works because, in practice, there's no difference between a char * and a float * under the hood.
Its as if you are writing this:
float *p=(float*) "3.14";
This is a well-defined behaviour, unless the memory alignment requirements of float and char differ, in which case it results in undefined behaviour (Reference: C99, 6.3.2.3 p7).

Efficiency
they are
It's a string mind the quotes

float *p="3.14";
This is also a string literal !
Why are string literals created in First Case read-only?
No, both "sack" and "3.14" are string literals and both are read-only.
Why are only string literals allowed to be created and not other constants like float?
If you want to create a float const then do:
const float p=3.14;
Why is Second Case not giving me compiler errors?
You are making the pointer p point to a string literal. When you dereference p, it expects to read a float value. So there's nothing wrong as far as the compiler can see.

Related

What does the following mean apropos C programming language?

A book on C programming says,
"There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals.
Takeaway - String literals are read-only.
If introduced today, the type of string literals would certainly be char const[], an array of const-qualified characters. Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward
compatibility."
Question 1. How can strings be read only, like takeaway says, if they can be modified?
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
How can strings be read only, like takeaway says, if they can be modified?
Because Unfortunately, the const keyword was introduced to the C language much later than string literals, and therefore it remained as it is for backward compatibility.
String literals existed before protection (in the form of const keyword) existed in the language.
Ergo, they are not protected, because the tools to protect it did not exist. A classic example of undefined behavior is writing to a string literal, see Undefined, unspecified and implementation-defined behavior .
What type is this referring to, which doesn't keep a string literal from being modified?
It has the type char[N] - "array of N chars" - where N is the number of characters in the string plus one for zero terminating character.
See https://en.cppreference.com/w/c/language/string_literal .
char *str = "string literal";
// ^^^ - implicit decay from `char [N]` -> `char *`
str[0] = 'a'; // compiler will be fine, but it's invalid code
// or super shorter form:
"string literal"[0] = 'a'; // invalid code
If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
The type would be const char[N] - an array of N constant chars, which means you can't modify the characters.
// assuming string literal has type const char [N]
const char *str = "string literal";
// ^^^ - implicit decay from `const char [N]` -> `const char *`
str[0] = 'a'; // compile time error
// or super shorter form:
"string literal"[0] = 'a'; // with `const char [N]` would be a compiler time error
With gcc compiler use -Wwrite-strings to protect against mistakes like writing to string literals.
Question 1. How can strings be read only, like takeaway says, if they can be modified?
The text does not say they can be modified. It says they are not protected from being modified. That is a slight error; properly, it should say they are not protected from attempts to modify them: The rules of the C standard do not prevent you from writing code that attempts to modify a string literal, and they do not define the results when a program executes such an attempt. In some circumstances, attempting to modify a string literal may result in a signal, usually ending program execution by default. In other circumstances, the attempt may succeed, and the string literal will be modified. In other circumstances, nothing will happen; there will be neither a signal nor a change to the string literal. It is also possible other behaviors may occur.
Question 2. "There is another family of read-only objects that unfortunately are not protected by their type from being modified: string literals."
What type is this referring to, which doesn't keep a string literal from being modified?
Technically, a string literal is a piece of source code that has a character sequence inside quotes, optionally with an encoding prefix. During compilation or program execution, an array is generated with the contents of the character sequence and a terminating null character. For string literals without a prefix, the type of that array is char []. (If there is a prefix, the type may also be wchar_t [], char16_t [], or char32_t, depending on the prefix.)
Colloquially, we often refer to this array as the string literal, even though the array is the thing that results from a string literal (an array in memory) not the actual string literal (in the source code).
The type char [] does not contain const, so it does not offer the protections that const char [] does. (Those protections are fairly mild.)
Question 3. If string literals were introduced today and had the type char const[], how'll they be an array i.e. I can't grasp as to how string literals will be an array of const qualified characters?
Your confusion here is unclear. When a string literal appears in source code, the compiler arranges for its contents to be in the memory of the running program. Those contents are in memory as an array of characters. If the rules of C were different, the type of that array would be const char [] instead of char [].

Why pointers can't be used to index arrays? [duplicate]

This question already has answers here:
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 3 years ago.
I am trying to change value of character array components using a pointer. But I am not able to do so. Is there a fundamental difference between declaring arrays using the two different methods i.e. char A[] and char *A?
I tried accessing arrays using A[0] and it worked. But I am not able to change values of the array components.
{
char *A = "ab";
printf("%c\n", A[0]); //Works. I am able to access A[0]
A[0] = 'c'; //Segmentation fault. I am not able to edit A[0]
printf("%c\n", A[0]);
}
Expected output:
a
c
Actual output:
a
Segmentation fault
The difference is that char A[] defines an array and char * does not.
The most important thing to remember is that arrays are not pointers.
In this declaration:
char *A = "ab";
the string literal "ab" creates an anonymous array object of type char[3] (2 plus 1 for the terminating '\0'). The declaration creates a pointer called A and initializes it to point to the initial character of that array.
The array object created by a string literal has static storage duration (meaning that it exists through the entire execution of your program) and does not allow you to modify it. (Strictly speaking an attempt to modify it has undefined behavior.) It really should be const char[3] rather than char[3], but for historical reasons it's not defined as const. You should use a pointer to const to refer to it:
const char *A = "ab";
so that the compiler will catch any attempts to modify the array.
In this declaration:
char A[] = "ab";
the string literal does the same thing, but the array object A is initialized with a copy of the contents of that array. The array A is modifiable because you didn't define it with const -- and because it's an array object you created, rather than one implicitly created by a string literal, you can modify it.
An array indexing expression, like A[0] actually requires a pointer as one if its operands (and an integer as the other). Very often that pointer will be the result of an array expression "decaying" to a pointer, but it can also be just a pointer -- as long as that pointer points to an element of an array object.
The relationship between arrays and pointers in C is complicated, and there's a lot of misinformation out there. I recommend reading section 6 of the comp.lang.c FAQ.
You can use either an array name or a pointer to refer to elements of an array object. You ran into a problem with an array object that's read-only. For example:
#include <stdio.h>
int main(void) {
char array_object[] = "ab"; /* array_object is writable */
char *ptr = array_object; /* or &array_object[0] */
printf("array_object[0] = '%c'\n", array_object[0]);
printf("ptr[0] = '%c'\n", ptr[0]);
}
Output:
array_object[0] = 'a'
ptr[0] = 'a'
String literals like "ab" are supposed to be immutable, like any other literal (you can't alter the value of a numeric literal like 1 or 3.1419, for example). Unlike numeric literals, however, string literals require some kind of storage to be materialized. Some implementations (such as the one you're using, apparently) store string literals in read-only memory, so attempting to change the contents of the literal will lead to a segfault.
The language definition leaves the behavior undefined - it may work as expected, it may crash outright, or it may do something else.
String literals are not meant to be overwritten, think of them as read-only. It is undefined behavior to overwrite the string and your computer chose to crash the program as a result. You can use an array instead to modify the string.
char A[3] = "ab";
A[0] = 'c';
Is there a fundamental difference between declaring arrays using the two different methods i.e. char A[] and char *A?
Yes, because the second one is not an array but a pointer.
The type of "ab" is char /*readonly*/ [3]. It is an array with immutable content. So when you want a pointer to that string literal, you should use a pointer to char const:
char const *foo = "ab";
That keeps you from altering the literal by accident. If you however want to use the string literal to initialize an array:
char foo[] = "ab"; // the size of the array is determined by the initializer
// here: 3 - the characters 'a', 'b' and '\0'
The elements of that array can then be modified.
Array-indexing btw is nothing more but syntactic sugar:
foo[bar]; /* is the same as */ *(foo + bar);
That's why one can do funny things like
"Hello!"[2]; /* 'l' but also */ 2["Hello!"]; // 'l'

How do strings work in C?

String is said to be a constant in C programming language.
So, when I give a statement like char *s = "Hello", I have learned that s points to a memory location of H since "Hello" is stored in some static memory of the program and also "Hello" is immutable.
Does it mean the variable s is now a variable of type pointer to constant data such as const int a = 3;const int *i = &a;. This seems so because I can't manipulate the data (when I do, it results in segmentation fault).
But, if it is so, shouldn't compiler be able to detect and say that I have assigned qualified data to unqualified variable.
Something like char *p p is a pointer to unqualified character and when I say char *p="Hello" p, the pointer to unqualified character can't point to a const character type?
What am I missing here?
If it is not the case as above, then how is an array of constant characters made immutable?
First of all, a string in C isn't immutable. C doesn't even know a type for strings -- a string is just defined as a sequence of char ending with '\0'.
What you're talking about are string literals and they can be immutable. The C standard defines that attempting to modify a string literal is undefined behavior, still their type is char *. So, if you are sure that in your implementation of C, a string literal is writable, you can do so! *)
But your code won't be well-defined C any more and won't work on other platforms with read-only string literals. It will compile, because writing through char * is perfectly fine, but fail at runtime in unpredictable ways (like, possibly, a crash).
Therefore, it's just best practice for portable code to assign string literals only to const char * pointers and, if you need a mutable string, use the string literal as an initializer for a char [].
*) beware this is very uncommon, you'll find it nowadays only with specialized compilers targeting embedded or very old platforms. A modern platform will place string literals in a read-only data segment or similar.
Syntax char *s = "Hello"; is present from days when const keyword was not part of C specs. Later it remained for reverse compatibility. Writing to such s[i] would lead to undefined behaviour. (Seg fault observed in your case for few runs)
This behaviour (Conversion from string literal or const char [] to non-constant char *) was supported in C++ briefly until C++11 and then deprecated.
Type safety in C is limited.

String literals vs array of char when initializing a pointer

Inspired by this question.
We can initialize a char pointer by a string literal:
char *p = "ab";
And it is perfectly fine.
One could think that it is equivalent to the following:
char *p = {'a', 'b', '\0'};
But apparently it is not the case. And not only because the string literals are stored in a read-only memory, but it appears that even through the string literal has a type of char array, and the initializer {...} has the type of char array, two declarations are handled differently, as the compiler is giving the warning:
warning: excess elements in scalar initializer
in the second case. What is the explanation of such a behavior?
Update:
Moreover, in the latter case the pointer p will have the value of 0x61 (the value of the first array element 'a') instead of a memory location, such that the compiler, as warned, taking just the first element of the initializer and assigning it to p.
I think you're confused because char *p = "ab"; and char p[] = "ab"; have similar semantics, but different meanings.
I believe that the latter case (char p[] = "ab";) is best regarded as a short-hand notation for char p[] = {'a', 'b', '\0'}; (initializes an array with the size determined by the initializer). Actually, in this case, you could say "ab" is not really used as a string literal.
However, the former case (char *p = "ab";) is different in that it simply initializes the pointer p to point to the first element of the read-only string literal "ab".
I hope you see the difference. While char p[] = "ab"; is representable as an initialization such as you described, char *p = "ab"; is not, as pointers are, well, not arrays, and initializing them with an array initializer does something entirely different (namely give them the value of the first element, 0x61 in your case).
Long story short, C compilers only "replace" a string literal with a char array initializer if it is suitable to do so, i.e. it is being used to initialize a char array.
String literals have a "magical" status in C. They're unlike anything else. To understand why, it's useful to think about this in terms of memory management. For example, ask yourself, "Where is a string literal stored in memory? When is it freed from memory?" and things will start making sense.
They're unlike numeric literals which translate easily to machine instructions. For a simplified example, something like this:
int x = 123;
... might translate to something like this at the machine level:
mov ecx, 123
When we do something like:
const char* str = "hello";
... we now have a dilemma:
mov ecx, ???
There's not necessarily some native understanding of the hardware of what a multi-byte, variable-length string actually is. It mainly knows about bits and bytes and numbers and has registers designed to store these things, yet a string is a memory block containing multiple of those.
So compilers have to generate instructions to store that string's memory block somewhere, and so they typically generate instructions when compiling your code to store that string somewhere in a globally-accessible place (typically a read-only memory segment or the data segment). They might also coalesce multiple literal strings that are identical to be stored in the same memory region to avoid redundancy. Now it can generate a mov/load instruction to load the address to the literal string, and you can then work with it indirectly through a pointer.
Another scenario we might run into is this:
static const char* some_global_ptr = "blah";
int main()
{
if (...)
{
const char* ptr = "hello";
...
some_global_ptr = ptr;
}
printf("%s\n", some_global_ptr);
}
Naturally ptr goes out of scope, but we need that literal string's memory to linger around for this program to have well-defined behavior. So literal strings translate not only to addresses to globally-accessible memory blocks, but they also don't get freed as long as your binary/program is loaded/running so that you don't have to worry about their memory management. [Edit: excluding potential optimizations: for the C programmer, we never have to worry about the memory management of a literal string, so the effect is like it's always there].
Now about character arrays, literal strings aren't necessarily character arrays, per se. At no point in the software can we capture them to an array r-value that can give us the number of bytes allocated using sizeof. We can only point to the memory through char*/const char*
This code actually gives us a handle to such an array without involving a pointer:
char str[] = "hello";
Something interesting happens here. A production compiler is likely going to apply all kinds of optimizations, but excluding those, at a basic level such code might create two separate memory blocks.
The first block is going to be persistent for the duration of the program, and will contain that literal string, "hello". The second block will be for that actual str array, and it's not necessarily persistent. If we wrote such code inside a function, it's going to allocate memory on the stack, copy that literal string to the stack, and the free the memory from the stack when str goes out of scope. The address of str is not going to match the literal string, to put it another way.
Finally, when we write something like this:
char str[] = {'h', 'e', 'l', 'l', 'o', '\0'};
... it's not necessarily equivalent, as here there are no literal strings involved. Of course an optimizer is allowed to do all kinds of things, but in this scenario, it is possible that we will simply create a single memory block (allocated on the stack and freed from the stack if we're inside a function) with instructions to move all these numbers (characters) you specified to the stack.
So while we're effectively achieving the same effect as the previous version as far as the logic of the software is concerned, we're actually doing something subtly different when we don't specify a literal string. Again, optimizers can recognize when doing something different can have the same logical effect, so they might get fancy here and make these two effectively the same thing in terms of machine instructions. But short of that, this is subtly different code we're writing.
Last but not least, when we use initializers like {...}, the compiler expects you to assign it to an aggregate l-value with memory that is allocated and freed at some point when things go out of scope. So that's why you're getting the error trying to assign such a thing to a scalar (a single pointer).
The second example is syntactically incorrect. In C, {'a', 'b', '\0'} can be used to initialize an array, but not a pointer.
Instead, you can use a C99 compound literal (also available in some compilers as extension, e.g, GCC) like this:
char *p = (char []){'a', 'b', '\0'};
Note that it's more powerful as the initializer isn't necessarily null-terminated.
From C99 we have
A character string literal is a sequence of zero or more multibyte characters enclosed in
double-quotes
So in the second definition there is no string literal as it is not within the double quotes. The pointer should be allocated memory before writing something to it or if you want to go by initializer list then
char p[] = {'a','b','\0'};
is what you want. Basically both are different declarations.

Char String Assignment

I am working on Microsoft Visual Studio environment. I came across a strange behavior
char *src ="123";
char *des ="abc";
printf("\nThe src string is %c", src[0]);
printf("\tThe dest string is %c",dest[0]);
des[0] = src[0];
printf("\nThe src string is %c", src[0]);
printf("\tThe dest string is %c",dest[0]);
The result is:
1 a
1 a
That means the des[0] is not being initialized. As src is pointing to the first element of the string. I guess by rules this should work.
This is undefined behavior:
des[0] = src[0];
Try this instead:
char des[] ="abc";
Since src and des are initialized with string literals, their type should actually be const char *, not char *; like this:
const char * src ="123";
const char * des ="abc";
There was never memory allocated for either of them, they just point to the predefined constants. Therefore, the statement des[0] = src[0] is undefined behavior; you're trying to change a constant there!
Any decent compiler should actually warn you about the implicit conversion from const char * to char *...
If using C++, consider using std::string instead of char *, and std::cout instead of printf.
Section 2.13.4 of ISO/IEC 14882 (Programming languages - C++) says:
A string literal is a sequence of characters (as defined in 2.13.2) surrounded by double quotes, optionally beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L is an ordinary string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n const char” and static storage duration (3.7), where n is the size of the string as defined below, and is initialized with the given characters. ...
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.
In C, string literals such as "123" are stored as arrays of char (const char in C++). These arrays are stored in memory such that they are available over the lifetime of the program. Attempting to modify the contents of a string literal results in undefined behavior; sometimes it will "work", sometimes it won't, depending on the compiler and the platform, so it's best to treat string literals as unwritable.
Remember that under most circumstances, an expression of type "N-element array of T" will be converted to an expression of type "pointer to T" whose value is the location of the first element in the array.
Thus, when you write
char *src = "123";
char *des = "abc";
the expressions "123" and "abc" are converted from "3-element array of char" to "pointer to char", and src will point to the '1' in "123", and des will point to the 'a' in "abc".
Again, attempting to modify the contents of a string literal results in undefined behavior, so when you write
des[0] = src[0];
the compiler is free to treat that statement any way it wants to, from ignoring it completely to doing exactly what you expect it to do to anything in between. That means that string literals, or a pointer to them, cannot be used as target parameters to calls like strcpy, strcat, memcpy, etc., nor should they be used as parameters to calls like strtok.
vinaygarg: That means the des[0] is not being initialized. As src is pointing to the first element of the string. I guess by rules this should work.
Firstly you must remember that *src and *dst are defined as pointers, nothing more, nothing less.
So you must then ask yourself what exactly "123" and "abc" are and why it cannot be altered? Well to cut a long story short, it is stored in application memory, which is read-only. Why? The strings must be stored with the program in order to be available to your code at run time, in theory you should get a compiler warning for assigning a non-const char* to a const char *. Why is it read-only? The memory for exe's and dll's need to be protected from being overwritten somehow, so it must be read-only to stop bugs and viruses from modifying executing code.
So how can you get this string into modifiable memory?
// Copying into an array.
const size_t BUFFER_SIZE = 256;
char buffer[BUFFER_SIZE];
strcpy(buffer, "abc");
strncpy(buffer, "abc", BUFFER_SIZE-1);

Resources