Changing constant variables in C - c

Why are we able to change constant variables using a pointer, but we can't change a constant string index value using a pointer?
For example,
Case1: Changing constant variables using pointers, this works fine.
int main()
{
const int var = 10;
int *ptr = &var;
*ptr = 12;
printf("var = %d\n", var); //12
return 0;
}
Case2: Changing constant string using pointers, this gives compiler error
int main()
{
char * a = "test";//test is in ROM, a is a pointer to its start address in ROM
a[3] = 'M';//error
return 0;
}

The both programs are ill-formed and have undefined behavior.
According to the C Standard (6.7.3 Type qualifiers)
6 If an attempt is made to modify an object defined with a
const-qualified type through use of an lvalue with non-const-qualified
type, the behavior is undefined.
It seems that the first program produces the expected result only due to the fact that the variable var has automatic storage duration. That is the compiler did not place it in a read-only memory.
All string literals (though in C they have types of non-constant arrays opposite to C++) have the static storage duration and usually are collected by the compiler in a literal pool that is stored in a read-only memory.
In any case according to the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.

There are at least four factors involved in the observations made in the question.
1. Implicitly removing const should generate a warning
Consider this:
int *ptr = &var;
In this statement, &var is a pointer to a const int, but ptr is a pointer to int. This violates the constraints for simple assignments in C 2018 6.5.16.1 (which apply because the rules for initialization in 6.7.9 11 refer to them). In this case, the left operand must have all the qualifiers of the right operand and be of otherwise compatible type.
Because a constraint is violated, a compiler conforming to the C standard is required to issue a diagnostic. You either used a non-conforming compiler to compile this program or you ignored the diagnostic and executed the program anyway.
Since a constraint is violated, the resulting behavior is not defined by the C standard.
An important principle here is that the C standard does not prevent you from breaking some rules. In this case, it merely does not guarantee what will happen.
2. The behavior of attempting to modify const object is not defined by the C standard
In this line:
*ptr = 12;
the program attempts to modify the constant var through a pointer. This violates C 6.7.3 7, which says:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
(The expression *ptr is an lvalue with non-const-qualified type.)
As above, the C standard does not prevent you from breaking this rule; it merely does not define what will happen.
What happens when you break this rule? It depends on how the compiler treated your program. Several things are common:
If the object is static and const, the compiler might assign it to a read-only location of memory. Then attempting to modify it would cause a memory access violation and crash the program.
If the object is automatic (defined inside a function with default storage), the compiler might assign it to the stack. The stack is both readable and writeable (it has to be writeable because we change the stack frequently, as routines are called and return). Thus, although the object is const, the compiler has no good way to put an automatic object into read-only memory. So it is on the stack and is writeable. Then attempting to modify it succeeds.
The compiler, during optimization, modifies your program in various ways. This can cause hard-to-predict results when you attempt to modify a const object. The optimizer might recognize that the attempt is not defined by the C standard and simply remove the attempt from your program. But other results are possible too.
3. Due to historical language development, string literals are not const-qualified
At the time string literals were introduced to the C language, there was no const qualifier. There was merely a rule (at some point, if not initially), that you were not allowed to modify the elements of string literals.
When const was introduced to the C language, string literals could not be made const because this would cause many programs not to compile, because those programs were using char * to refer to elements in string literals. They were not modifying the string literals, but they were using these old pointer types to refer to them.
So string literals remained non-const-qualified.
4. The rule that modifying string literals is not supported remains
Another feature of string literals is that they may be consolidated. If you use "abcdefghijklmnopqrstuvwxyz" in one place in the program and you use the same string in another place in the program, even in a different translation unit, the compiler and linker are allowed to create just one instance of them in the executable file and in the memory of the loaded program. This feature was important for early programs, because machines had limited space, so combining copies of the same data was valuable.
This permission is in C 2018 6.4.5 7, which says, about string literals:
It is unspecified whether these arrays are distinct provided their elements have the appropriate values.…
That paragraph also gives us the rule that the behavior of attempting to modify string literals is not defined by the C standard:
… If the program attempts to modify such an array, the behavior is undefined.
The first rule of this paragraph is also a reason we need the second rule. If two string literal can be consolidated into one memory location, then a routine that changed what it thought of as its string could inadvertently change the data used by another routine, possibly in an entirely different part of the program written by a different person at a different company.
Thus, due to how the C language developed historically, string literals are not const-qualified, but the C standard does not support modifying them.
What happens when you break this rule? Commonly, string literals are put in a read-only portion of memory. When you attempt to modify them, a likely result is that your program causes a memory access violation and crashes. This is the proximate cause of the behavior you observed: The string was in read-only memory, and modifying it caused a crash, but the ptr object was on the stack, and modifying it “worked.” So the results are not a necessary consequence of the rules of C but were consequences of how your compiler behaved.

Declaring a const int and then assigning a value to it through a pointer is undefined behavior. Modifying this variable through a pointer will not throw an exception when it is allocated in a writable memory though (seems that your compiler is allocating the variable in the stack).
However when you declare char *str = "test"; the string "test" is usually allocated in a read-only memory section (which seems to be your case). Thus probably throwing an ACCESS_VIOLATION when you try to change it through a[3] = 'M';.
To stress that out though: both cases should be considered undefined behavior.

Related

difference between static const int vs const int

const int a = 100;
int *p = &a;
*p = 99;
printf("value %d", a);
Above code compiles and I am able to update its value via pointer but code below didn't output anything.
static const int a = 100;
int *p = &a;
*p = 99;
printf("value %d", a);
Can anyone explain this thing.
Both code snippets are invalid C. During initialization/assignment, there's a requirement that "the type pointed to by the left has all the qualifiers of the type pointed to
by the right" (c17 6.5.16.1). This means we can't assign a const int* to an int*.
"Above code compiles" Well, it doesn't - it does not compile cleanly. See What must a C compiler do when it finds an error?.
I would recommend you to block invalid C from compiling without errors by using (on gcc, clang, icc) -std=c11 -pedantic-errors.
Since the code is invalid C, it's undefined behavior and why it has a certain behavior is anyone's guess. Speculating about why you get one particular output from one case of undefined behavior to another isn't very meaningful. What is undefined behavior and how does it work? Instead focus on writing valid C code without bugs.
There are several things going on here:
const does not mean "Put this variable in read-only memory or otherwise guarantee that any attempt to modify it will definitively result in an error message."
What const does mean is "I promise not to try to modify this variable." (But you broke that promise in both code fragments.)
Attempting to modify a const-qualified variable (i.e., breaking your promise) yields undefined behavior, which means that anything can happen, meaning that it might do what you want, or it might give you an error, or it might do what you don't want, or it might do something totally different.
Compilers don't always complain about const violations. (Though a good compiler should really have complained about the ones here.)
Some compilers are selective in their complaints. Sometimes you have to ask the compiler to warn about iffy things you've done.
Some programmers are careless about ignoring warnings. Did your compiler give you any warnings when you compiled this?
The compiler should complain in both cases when you store the address of a const int into p, a pointer to modifiable int.
In the first snippet, a is defined as a local variable with automatic storage: although you define it as const, the processor does not prevent storing a value into it via a pointer. The behavior is undefined, but consistent with your expectations (a is assigned the value 99 and this value is printed, but the compiler could have assumed that the value of a cannot be changed, hence could have passed 100 directly to printf without reading the value of a).
In the second snippet, a is a global variable only accessible from within the current scope, but the compiler can place it in a read-only location, causing undefined behavior when you attempt to modify its value via the pointer. The program may terminate before evaluating the printf() statement. This is consistent with your observations.
Briefly, modifying const-qualified static objects causes a trap and modifying a const-qualified automatic object does not because programs are able to place static objects in protected memory but automatic objects must be kept in writeable memory.
In common C implementations, a const-qualified static object is placed in a section of the program data that is marked read-only after it is loaded into memory. Attempting to modify this memory causes the processor to execute a trap, which results in the operating system terminating execution of the program.
In contrast, an object with automatic storage duration (one defined inside a function without static or other storage duration) cannot easily be put in a read-only program section. This is because automatic objects need to be allocated, initialized, and released during program execution, as the functions they are defined in are called and returned. So even though the object may be defined as const for the purposes of the C code, the program needs to be able to modify the memory actually used for it.
To achieve this, common C implementations put automatic objects on the hardware stack, and no attempt is made to mark the memory read-only. Then, if the program mistakenly attempts to modify a const-qualified automatic object, the hardware does not prevent it.
The C standard requires that the compiler issue a diagnostic message for the statement int *p = &a;, since it attempts to initialize a pointer to non-const with the address of a const-qualified type. When you ignore that message and execute the program anyway, the behavior is not defined by the C standard.
Also see this answer for explanation of why the program may behave as though a is not changed even after *p = 99; executes without trapping.
6.7.3 Type qualifiers
...
6 If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is
made to refer to an object defined with a volatile-qualified type through use of an lvalue
with non-volatile-qualified type, the behavior is undefined.133)
133) This applies to those objects that behave as if they were defined with qualified types, even if they are
never actually defined as objects in the program (such as an object at a memory-mapped input/output
address).
C 2011 Online Draft
If you declare a as const, you're making a promise to the compiler that the value of a should not change during its lifetime; if you try to assign a new value to a directly the compiler should at least issue a diagnostic. However, by trying to change a through a non-const pointer p you're breaking that promise, but you're doing it in such a way that the compiler can't necessarily detect it.
The resulting behavior is undefined - neither the compiler nor the runtime environment are required to handle the situation in any particular way. The code may work as expected, it may crash outright, it may appear to do nothing, it may corrupt other data. const-ness may be handled in different ways depending on the compiler, the platform, and the code.
The use of static changes how a is stored, and the interaction of static and const is likely what's leading to the different behavior. The static version of a is likely being stored in a different memory segment which may be read-only.

Can a string literal and a non-string non-compound literal be modified? [duplicate]

This question already has answers here:
Why are compound literals in C modifiable
(2 answers)
Why do I get a segmentation fault when writing to a "char *s" initialized with a string literal, but not "char s[]"?
(19 answers)
Closed 4 years ago.
String literals are lvalues, which leaves the door open to modify string literals.
From C in a Nutshell:
In C source code, a literal is a token that denotes a fixed value, which may be an integer, a floating-point number, a character, or a string. A literal’s type is determined by its value and its notation.
The literals discussed here are different from compound literals, which were introduced in the C99 standard. Compound literals are ordinary modifiable objects, similar to variables.
Although C does not strictly prohibit modifying string literals, you should not attempt to do so. For one thing, the compiler, treating the string literal as a constant, may place it in read-only memory, in which case the attempted write operation causes a fault. For another, if two or more identical string literals are used in the program, the compiler may store them at the same location, so that modifying one causes unexpected results when you access another.
The first paragraph says that "a literal in C denotes a fixed value".
Does it mean that a literal (except compound literals) shouldn't be modified?
Since a string literal isn't a compound literal, should a string literal be modified?
The second paragraph says that "C does not strictly prohibit
modifying string literals" while compilers do. So should a string
literal be modified?
Do the two paragraphs contradict each other? How shall I understand them?
Can a literal which is neither compound literal nor string literal be modified?
From the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
As for your statement.
The second paragraph says that "C does not strictly prohibit modifying
string literals" while compilers do. So should a string literal be
modified?
Then compilers do not modify string literals. They may store identical string literals as one array.
As #o11c pointed out in a comment in the Annex J (informative) Portability issues there is written
J.5 Common extensions
1 The following extensions are widely used in
many systems, but are not portable to all implementations. The
inclusion of any extension that may cause a strictly conforming
program to become invalid renders an implementation nonconforming.
Examples of such extensions are new keywords, extra library functions
declared in standard headers, or predefined macros with names that do
not begin with an underscore.
J.5.5 Writable string literals
1 String literals are modifiable (in which case, identical string
literals should denote distinct objects) (6.4.5).
Don't modify string literals. Treat them as char const[].
String literals are effectively char const[] (modifying them results in undefined behavior), but for legacy reason they're really char [], which means the compiler won't stop you from writing into them, but your program will still go undefined if you do.
And saying more practically - not every hardware platfotm provides mechanisms to protect memory location where Read Only objects are stored. And it had to be defined as UB. There are 3 possible options:
Literals (and constant objects more generally) are kept in the RAM but the hardware does not provide memory protection mechanisms. Nothing can stop the programmer from writing to this location
Literals (and constant objects) are kept in the RAM but the hardware does provide memory protection mechanisms - you will get segfault
Read Only data is stored in the read only memory (for example uC FLASH). You can try to write it but there is no effect of it (example ARM). No hardware exception raised
The first paragraph says that "a literal in C denotes a fixed value".
Does it mean that a literal (except compound literals) shouldn't be modified?
I don't know what the authors intention was, but modification of the array resulting from a string literal during runtime is blatantly undefined, according to C11/6.4.5p7: "If the program attempts to modify such an array, the behavior is undefined."
It should also be noted that attempts to modify a const-qualified compound literal during runtime will also result in undefined behavior, which is explained along-side some volatile-related undefined behaviour in C11/6.7.3p6. It is otherwise well defined to modify compound literals.
For example:
char *fubar = "hello world";
(*fubar)++; // SQUARELY UNDEFINED BEHAVIOUR!
char *fubar = (char[]){"hello world"};
(*fubar)++; // This is well defined.
Literally replacing "hello world" with "goodbye galaxy", in either piece of source code, is fine. Redefining standard functions, however (i.e. #define memcpy strncpy or #define size_t signed char, which are both great ways to ruin someone elses day), is undefined behaviour.
Since a string literal isn't a compound literal, should a string literal be modified?
The array resulting from a string literal should certainly not be modified during runtime, for any attempt to do so would trigger undefined behaviour.
The string literal itself, which exists as a quoted sequence of characters within your source code, on the other hand... of course, that can be modified as you choose. You're not obliged to modify it, though.
The second paragraph says that "C does not strictly prohibit modifying string literals" while compilers do. So should a string literal be modified?
The C standard doesn't strictly prohibit a lot of undefined behavior; it leaves the behavior undefined, meaning your program is likely to behave erratically or be non-portable. In the realms of well defined C, your programs should not invoke any undefined behaviour, including overflowing arrays, modifying const-qualified objects or the arrays resulting from string literals, race conditions caused by multithreading, etc.
If you want to invoke undefined behaviour, C will let you shoot yourself in the foot. You might have a good reason for doing so; perhaps your program will be more optimal, or perhaps your compiler actually lets you modify string literals ("it's a feature, not a bug", they say, "so give us your money", they say, as you become reliant upon their non-standard quirks). Be aware that some compilers will instead behave as though the attempted modification didn't occur, or crash, or there could be some vulnerability caused.
... and above all else, be aware that your code will no longer be compliant C code!
Do the two paragraphs contradict each other?
By omission, perhaps. The first paragraph does state that the values are fixed, and the second paragraph that the values might be modifiable during runtime through invocation of undefined behaviour.
I think the author meant to make the distinction between elements of source code and the runtime environment. He/she could simply clarify this by ensuring it's explicit that literals should not be modified during runtime, for example.
How shall I understand them?
In the realms of C such values can't change during runtime because invoking undefined behaviour means the code in question is no longer compliant C code.
Perhaps they were trying to avoid explaining undefined behaviour, because it may seem too complex to explain. If you look deeper into the subject, you'll find that the meaning is, as predicted, roughly a conjunction of the two words.
undefined: /ʌndɪˈfʌɪnd/ adj. not clear or defined.
behaviour: /bɪˈheɪvjə/ noun. the way in which a machine or natural phenomenon works or functions
That is to say, an attempt to modify the array resulting from a string literal during runtime results in "unclear functionality". It's not required to be documented anywhere in the realms of computer science, and even if it is documented, that documentation might be a lie.
Can a literal which is neither compound literal nor string literal be modified?
As a lexical element in source code, providing it doesn't override a standard symbol, yes. Literals which aren't l-values (i.e. don't have any storage) such as integer constants, obviously can't be modified during runtime. I suppose it might be possible on some systems to attempt to modify the memory which a function pointer points at, which could be seen as a literal; that's also undefined behaviour and would result in code that isn't C.
It might also be possible to modify many other types of elements which aren't seen as objects by the C standard, such as the return address on the stack. That's what makes buffer overflows so subtly dangerous!

How does the system knew that an address being accessed is constant or not

Writing to a constant variable using a pointer is giving run time error.
const int i;
int *p;
void main()
{
p = (int*)&i;
*p = 10; // Causes runtime error
}
But in a windows system everything is running from RAM itself.
When I printed the address of const variables and normal variables, I can see that they are in different offsets.
How does the system know that the address being accessed by the pointer is a const one?
Strictly speaking, your code yields undefined behavior according to the C-language standard.
In practice, the linker has probably placed the variable i in a RO section of the executable image.
So the write-operation *p = 10 resulted in a memory-access violation (aka segmentation fault).
How does the system knew ....
Ideally, the system does not need to know. For objects with const-qualified type, the allocation (in general) will be in read-only section, so any attempt to modify (write) will cause access violation. It's the programmer who should know.
When I printed the address of const variables and normal variables, I can see that they are in different offsets.
Yes, that's likely, because the normal variables reside in read-write memory, whereas const variables will reside in read-only memory.
Please notice, there's no syntax (or compilation) error for your code snippet. It's only the behavior of the code (runtime) is undefined.
FYI, quoting C11, chapter §6.7.3/p6
If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined. [...]

Does the preprocessor prepare a list of unique constant strings before the compiler goes into action?

In the code below, I have two different local char* variables declared in two different functions.
Each variable is initialized to point to a constant string, and the contents of the two strings are identical.
Checking in runtime, the variables are initialized to point to the same address in memory.
So the compiler must have assigned the same (constant) value to each one of them.
How is that possible?
#include <stdio.h>
void PrintPointer()
{
char* p = "abc";
printf("%p\n",p);
}
int main()
{
char* p = "abc";
printf("%p\n",p);
PrintPointer();
return 0;
}
It has nothing to do with the preprocessor. But the compiler is explicitly allowed (not required) by the standard to share the memory for identical string literals. For details on when this happens, you must consult your compiler's documentation.
For example, here's the relevant documentation for VC2013:
In some cases, identical string literals may be pooled to save space in the executable file. In string-literal pooling, the compiler causes all references to a particular string literal to point to the same location in memory, instead of having each reference point to a separate instance of the string literal. To enable string pooling, use the /GF compiler option.
The C++ standard says in N3797 2.14.15/12:
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.
The C standard now contains the same wording. Historically it was possible to modify string literals at run-time in C, but this is now Undefined Behaviour. Some compilers may allow it, some not.
Technically, the compiler does it by storing string literals in the symbol table. If an identical string is seen more than once, the same symbolic reference is used each time. The same technique might well be used for other literals, but would not be so easily detected.
The preprocessor, by the way, has nothing to do with it.
How is that possible?
It's possible because the compiler keeps track of values like that. But no, the preprocessor generally doesn't get involved in things like this; the preprocessor does things like macro substitutions that modify the code before the compiler starts working. In this case, though, we're talking about actual code:
char* p = "abc";
and that's the domain of the compiler, not the preprocessor.
So the compiler must have assigned the same (constant) value to each one of them. How is that possible?
If you have two identical string literals, as you do here, then the compiler is allowed to combine them into a single one; apparently, your compiler does that. It's also allowed to store them separately.

Why is it allowed to overwrite a const variable using a pointer to it using memcpy?

Why is it allowed to change a const variable using a pointer to it with memcpy?
This code:
const int i=5;
int j = 0;
memcpy(&j, &i, sizeof(int));
printf("Source: i = %d, dest: j = %d\n", i,j);
j = 100;
memcpy(&i, &j, sizeof(int));
printf("Source: j = %d, dest: i = %d\n", j,i);
return 0;
compiled with just a warning:
warning: passing argument 1 of ‘memcpy’ discards ‘const’ qualifier
from pointer target type [enabled by default]
But did run just fine, and changed the value of a const variable.
Attempt to modify the value of a const-qualified variable leads to an undefined behavior in C. You should not rely on your results, since anything can happen.
C11 (n1570), § 6.7.3 Type qualifiers
If an attempt is made to modify an object defined with a const-qualified type through use
of an lvalue with non-const-qualified type, the behavior is undefined.
Nothing force the compiler to produce a diagnostic message.
In fact, this qualifier has not enormous effects on the machine code. A const-qualified variable does not usually reside in a read-only data segment (obviously, not in your implementation, although it could be different on an other one).
The compiler can't tell easily what a pointer is pointing to in a given function. It is possible with some static analysis tools, which perform pointer-analysis. However, it is difficult to implement, and it would be stupid to put it in the standard.
The question asks why. Here's why:
This is allowed because once you have a pointer to a memory address, the language does not know what it points to. It could be a variable, part of a struct, the heap or the stack, or anything. So it cannot prevent you from writing to it. Direct memory access is always unsafe and to be avoided if there's another way of doing it.
The const stops you modifying the value of a const with an assignment (or increment etc). This kind of mutation is the only operations it can guarantee you won't be able to perform on a const.
Another way to look at this is the division of the static context (i.e. at compile time) and the runtime context. When you compile a piece of code which may, for example, make an assignment to a variable, the language can say "that's not allowed, it's const" and that is a compilation error. After this, the code is compiled into an executable and the fact that it is a const is lost. Variable declarations (and the rest of the language) is written as input to the compiler. Once it is compiled, the code isn't relevant. You can do a logical proof in your compiler to say that consts aren't changed. The compiled program runs, and we know at compile time that we have created a program that doesn't break the rules.
When you introduce pointers, you have behaviour that can be defined at run-time. The code that you wrote is now irrelevant, and you can [attempt to] do what you want. The fact that pointers are typed (allowing pointer arithmetic, interpreting the memory at the end of a pointer as a particular type) means that the language gives you some help, but it can't prevent you from doing anything. It can make no guarantees, as you can point a pointer anywhere. The compiler can't stop you breaking the rules at run-time with code that uses pointers.
That said, pointers are the way we get dynamic behaviour and data structures, and are necessary for all but the most trivial code.
(The above is subject to lots of caveats, i.e. code heuristics, more sophisticated static analysis bus is broadly true of a vanilla compiler.)
The reason why is because the C language allows any pointer type to be implicitly casted to/from the type void*. It is designed that way because void pointers are used for generic programming.
So a C compiler is not allowed to stop your code from compiling, even though the program invokes undefined behavior in this case. A good compiler will however give a warning as soon as you implicitly try to cast away a const qualifier.
C++ has "stronger typing" than C, meaning that it would require an explicit cast of the pointer type for this code to compile. This is one flaw of the C language that C++ actually fixed.
While 'officially' it's undefined in reality it's very much defined - you will change the value of the const variable. Which raises the question why it's const to begin with.

Resources