How would one modify a constant string? - c

To create a string that I can modify, I can do something like this:
// Creates a variable string via array
char string2[] = "Hello";
string2[0] = 'a'; // this is ok
And to create a constant string that cannot be modified:
// Creates a constant string via a pointer
char *string1 = "Hello";
string1[0] = 'a'; // This will give a bus error
My question then is how would one modify a constant string (for example, by casting)? And, is that considered bad practice, or is it something that is commonly done in C programming?

By definition, you cannot modify a constant. If you want to get the same effect, make a non-constant copy of the constant and modify that.

how would one modify a constant string (for example, by casting)?
If by this you mean, how would one attempt to modify it, you don't even need a cast. Your sample code was:
char *string1 = "Hello";
string1[0] = 'a'; // This will give a bus error
If I compile and run it, I get a bus error, as expected, and just like you did. But if I compile with -fwritable-strings, which causes the compiler to put string constants in read/write memory, it works just fine.
I suspect you were thinking of a slightly different case. If you write
const char *string1 = "Hello";
string1[0] = 'a'; // This will give a compilation error
the situation changes: you can't even compile the code. You don't get a Bus Error at run-time, you get a fatal error along the lines of "read-only variable is not assignable" at compile time.
Having written the code this way, one can attempt to get around the const-ness with an explicit cast:
((char *)string1)[0] = 'a';
Now the code compiles, and we're back to getting a Bus Error. (Or, with -fwritable-strings, it works again.)
is that considered bad practice, or is it something that is commonly done in C programming
I would say it is considered bad practice, and it is not something that is commonly done.
I'm still not sure quite what you're asking, though, or if I've answered your question. There's often confusion in this area, because there are typically two different kinds of "constness" that we're worried about:
whether an object is stored in read-only memory
whether a variable is not supposed to be modified, due to the constraints of a program's architecture
The first of these is enforced by the OS and by the MMU hardware. It doesn't matter what programming-language constructs you did or didn't use -- if you attempt to write to a readonly location, it's going to fail.
The second of these has everything to do with software engineering and programming style. If a piece of code promises not to modify something, that promise may let you make useful guarantees about the rest of the program. For example, the strlen function promises not to modify the string you hand it; all it does is inspect the string in order to compute its length.
Confusingly, in C at least, the const keyword has mostly to do with the second category. When you declare something as const, it doesn't necessarily (and in fact generally does not) cause the compiler to put the something into read-only memory. All it does is let the compiler give you warnings and errors if you break your promise -- if you accidentally attempt to modify something that elsewhere you declared as const. (And because it's a compile-time thing, you can also readily "cheat" and turn off this kind of constness with a cast.)
But there is read-only memory, and these days, compilers typically do put string constants there, even though (equally confusingly, but for historical reasons) string constants do not have the type const char [] in C. But since read-only memory is a hardware thing, you can't "turn it off" with a cast.

You cannot modify the contents of a string literal in a safe or reliable manner in C; it results in undefined behavior. From the C11 standard draft section 6.4.5 p7 concerning string literals:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.

Attempting to modify constant string literal is undefined behavior. You may get a bus error, as in your case, or the program may not even indicate that the write failed at all. This is undefined behavior for you - the language makes no promises at this point.
You could reassign the pointer (losing your reference to the string "Hello"):
char *s1 = "Hello";
printf("%s ", s1);
s1 = "World";
printf("%s\n", s1);

Related

Troubles with Malloc function [duplicate]

This question already has answers here:
Array index out of bound behavior
(10 answers)
Closed 5 years ago.
I think I have troubles understanding the malloc function in C, despite reading many tutorials and even though it seems to make sense theoretical, I get destroyed when I am trying to do something practical.
Currently I understand malloc this way: It's a function to reserve memory.
So I tried this:
char *str;
/* Initial memory allocation */
str = malloc(3);
strcpy(str, "somestring");
printf("String = %s\n", str);
/* Reallocating memory */
str = realloc(str, 15);
strcat(str, ".com");
printf("String = %s", str);
free(str);
return(0);
The output is:
String = somestring
String = som(2-3 strange characters).com
I know I should not malloc 3, rather the string size +1(null terminator). However the strange thing is, I never get any errors in the first line, in this case "somestring" is always displayed correctly. Just the second part, bugs out. When I delete the realloc part, everything works fine, even though in theory, it shouldn't. Do I understand something wrong or can somebody please explain this behaviour?
Exceeding an array's bounds, e.g. those bounds defined by a previous malloc, is undefined behaviour. After such an operation, all bets are off. The program might even function as intended, but it may also yield some non obvious things. Confer, for example, the definition of UB in this online C standard draft:
3.4.3
(1) undefined behavior behavior, upon use of a nonportable or erroneous
program construct or of erroneous data, for which this International
Standard imposes no requirements
(2) NOTE Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during translation
or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of a
diagnostic message).
...
So reserve enough memory, and this behaviour should turn into the intended one.
Your brief description of malloc is not incorrect. It does reserve memory. More accurately stated, from the link:
"Allocates a block of size bytes of memory, returning a pointer to the
beginning of the block."
(emphasis mine)
But that is not the only consideration for using it in preparation for creating a C string.
In addition to understanding malloc(), Undefined Behavior (1) is good to be aware of, and how at best will cause buggy behavior, but possibly much worse.
By creating memory for 3 bytes, and writing a string with more than two characters (and a NULL terminator) you have invoked undefined behavior (2). Sometimes it will seem to work, but others it will not. In this case, you have written to memory that you do not own. If that memory is not also concurrently owned by another variable. It is likely to appear to you that all is normal, as is demonstrated by the results you show in your post. But if it is owned, and being used by another variable, the operation of writing to that location will fail. The second of these scenarios is the better of the two. The first is especially bad because it will cause your program to appear fine, possibly for hours, but the first time a conflict occurs, it will fail.
Keep in mind also, the definition of a string in C is a null terminated character array:
char string[20];
char string2[3];
strcpy(string, "string content");//will always work
strcpy(string2, "string content");//may or may not appear to work
string would appear in memory as:
|s|t|r|i|n|g| |c|o|n|t|e|n|t|\0|?|?|?|?|?|
Where ? can be any value.
there is no guarantee what string2 will contain.
Regarding your statement: However the strange thing is, I never get any errors in the first line..., because undefined behavior(3) is by definition unpredictable, the following may or may not help you to see the effects, but because of the over exagerated values and assignments, it will likely cause an access violation at some point...
char shortStr[2][3] = {{"in"},{"at"}};//elements [0] & [1] will be provided memory locations close in proximity.
char longStr[100]={"this is a very long array of characters, this is a continuation of the same thing."};
strcpy(shortStr[0], longStr);
This will likely result in an error because shortStr[1], although not guaranteed, is in a memory location that will prevent the copy from happening without a memory access violation.
The function malloc may reserve more bytes than you asked for, for example 16 when you asked for 3, for reasons pertinent to memory management. You don't own those other 13 bytes, but nobody else does either, so you might "get away with" using that memory without any conflict. Until you demo the product: then it will fail, for sure.

changing a const char *str1 = "abc"

Is this good practice? The code compiles and runs but I wonder if this is a good practice to emulate
In C code,
we write const char *str1 = "abc";
then later, lets say there is a pointer variable char *str2 that points to dynamically allocated memory
and then we do str1 = str2 so now both str1 and str2 point to dynamically allocated memory
So now we have lost track of any pointer to "abc". Though in this code, we may not need it but I wonder what is the best recommended way to handle these.
The overall problem is that we need a string that initially is declared to abc and later dependent on user input, we may want to use the string supplied by the user.
It's absolutely fine. const char *str1 means "a pointer that can be modified, to character data that cannot be modified (through this pointer)".
So, you can point str1 at any string you like, and it makes sense to "reseat" it to point at different strings at different times.
Obviously if your code is complicated enough, you can make it difficult for a reader to work out what the variable currently contains, but that's true of all variables. For example you want to be careful with pointers that sometimes point at string literals and sometimes point at dynamically-allocated memory, because it might not always be clear whether the pointer should be freeed.
If you wanted str1 to always point at the same string, you would define it const char * const str1 (or char const *const str1 in order to make the position of the const always consistent). That's not what you want in this case, and the fact that you haven't declared str1 const indicates as much to the reader.
Losing the pointer to the string literal will not lead to memory leak, so what you do is safe in that aspect.
The string literal "abc" is not dynamically allocated, so there is nothing that can leak in this situation.
String literals are a part of the "program image" that gets loaded into memory at startup by the executable loader of the operating system. The space that this image occupies is reclaimed by the operating system once the process is over. Of course this is not quite accurate since there are techniques like demand paging and copy-on-write, but they are irrelevant for that case.
It would be a problem if you didn't put const in that definition. The latter would allow you to attempt to modify a piece of memory which is usually stored in a read-only area of the process, so undefined behaviour would manifest.

Why do we need strdup()?

While I was working on an assignment, I came to know that we should not use assignments such as :
char *s="HELLO WORLD";
Programs using such syntaxes are prone towards crashing.
I tried and used:
int fun(char *temp)
{
// do sum operation on temp
// print temp.
}
fun("HELLO WORLD");
Even the above works(though the output is compiler and standard specific).
Instead we should try strdup() or use const char *
I have tried reading other similar questions on the blog, but could not get the concept that WHY THE ABOVE CODE SHOULDNT WORK.
Is memory allocated?? And what difference does const make??
Lets clarify things a bit. You don't ever specifically need strdup. It is just a function that allocates a copy of a char* on the heap. It can be done many different ways including with stack based buffers. What you need is the result, a mutable copy of a char*.
The reason the code you've listed is dangerous is that it's passing what is really a constant string in the from of a string literal into a slot which expects a mutable string. This is unfortunately allowed in the C standard but is ihnherently dangerous. Writing to a constant string will produce unexpected results and often crashes. The strdup function fixes the problem because it creates a mutable copy which is placed into a slot expecting a mutable string.
String literals are stored in the program's data segment. Manipulating their pointers will modify the string literal, which can lead to... strange results at best. Use strdup() to copy them to heap- or stack-allocated space instead.
String literals may be stored in portions of memory that do not have write privileges. Attempting to write to them will cause undefined behaviour. const means that the compiler ensures that the pointer is not written to, guaranteeing that you do not invoke undefined behaviour in this way.
This is a problem in C. Although string literals are char * you can't modify them, so they are effectively const char*.
If you are using gcc, you can use -Wwrite-strings to check if you are using string literals correctly.
Read my answer on (array & string) Difference between Java and C. It contains the answer to your question in the section about strings.
You need to understand that there's a difference between static and memory allocation and that you don't resort to the same memory spaces.

c char pointer problem

if we declare char * p="hello"; then since it is written in data section we cannot modify the contents to which p points but we can modify the pointer itself. but i found this example in C Traps and Pitfalls
Andrew Koenig
AT&T Bell Laboratories
Murray Hill, New Jersey 07974
the example is
char *p, *q;
p = "xyz";
q = p;
q[1] = ’Y’;
q would point to memory containing the string xYz. So would p, because p and q point to the same memory.
how is it true if the first statement i mentioned is also true..
similarly i ran the following code
main()
{
char *p="hai friends",*p1;
p1=p;
while(*p!='\0') ++*p++;
printf("%s %s",p,p1);
}
and got the output as
ibj!gsjfoet
please explain how in both these cases we are able to modify contents?
thanks in advance
Your same example causes a segmentation fault on my system.
You're running into undefined behavior here. .data (note that the string literal might be in .text too) is not necessarily immutable - there is no guarantee that the machine will write protect that memory (via page tables), depending on the operating system and compiler.
Only your OS can guarantee that stuff in the data section is read-only, and even that involves setting segment limits and access flags and using far pointers and such, so it's not always done.
C itself has no such limitation; in a flat memory model (which almost all 32-bit OSes use these days), any bytes in your address space are potentially writable, even stuff in your code section. If you had a pointer to main(), and some knowledge of machine language, and an OS that had stuff set up just right (or rather, failed to prevent it), you could potentially rewrite it to just return 0. Note that this is all black magic of a sort, and is rarely done intentionally, but it's part of what makes C such a powerful language for systems programming.
Even if you can do this and it seems that there are no errors, it's a bad idea. Depending on the program in question, you could end up making it very easy for buffer overflow attacks. A good article explaining this is:
https://www.securecoding.cert.org/confluence/display/seccode/STR30-C.+Do+not+attempt+to+modify+string+literals
It'll depend on the compiler as to whether that works or not.
x86 is a von Neumann architecture (as opposed to Harvard), so there's no clear difference between the 'data' and 'program' memory at the basic level (i.e. the compiler isn't forced into having different types for program vs data memory, and so won't necessarily restrict any variable to one or the other).
So one compiler may allow modification of the string while another does not.
My guess is that a more lenient compiler (e.g. cl, the MS Visual Studio C++ compiler) would allow this, while a more strict compiler (e.g. gcc) would not. If your compiler allows it, chances are it's effectively changing your code to something like:
...
char p[] = "hai friends";
char *p1 = p;
...
// (some disassembly required to really see what it's done though)
perhaps with the 'good intention' of allowing new C/C++ coders to code with less restriction / fewer confusing errors. (whether this is a 'Good Thing' is up to much debate and I will keep my opinions mostly out of this post :P)
Out of interest, what compiler did you use?
In olden days, when C as described by K & R in their book "The C Programming Language" was the ahem "standard", what you describe was perfectly OK. In fact, some compilers jumped through hoops to make string literals writable. They'd laboriously copy the strings from the text segment to the data segment on initialisation.
Even now, gcc has a flag to restore this behaviour: -fwritable-strings.
main()
{
int i = 0;
char *p= "hai friends", *p1;
p1 = p;
while(*(p + i) != '\0')
{
*(p + i);
i++;
}
printf("%s %s", p, p1);
return 0;
}
This code will give output: hai friends hai friends
Modifying string literals is a bad idea, but that doesn't mean it might not work.
One really good reason not to: your compiler is allowed to take multiple instances of the same string literal and make them point to the same block of memory. So if "xyz" was defined somewhere else in your code, you could inadvertently break other code that was expecting it to be constant.
Your program also works on my system(windows+cygwin). However the standard says you shouldn't do that though the consequence is not defined.
Following excerpt from the book C: A Reference Manual 5/E, page 33,
You should never attempt to modify the memory that holds the characters of a string constant since may be read-only
char p1[] = "Always writable";
char *p2 = "Possibly not writable";
const char p3[] = "Never writable";
p1 line will always work; p2 line may work or may cause a run-time error; p3 will always cause a compile-time error.
While modifying a string literal may be possible on your system, that's a quirk of your platform, rather than a guarantee of the language. The actual C language doesn't know anything about .data sections, or .text sections. That's all implementation detail.
On some embedded systems, you won't even have a filesystem to contain a file with a .text section. On some such systems, your string literals will be stored in ROM, and trying to write to the ROM will just crash the device.
If you write code that depends on undefined behavior, and only works on your platform, you can be guaranteed that sooner or later, somebody will think it is a good idea to port it to some new device that doesn't work the way you expected. When that happens, an angry pack of embedded developers will hunt you down and stab you.
p is effectively pointing to read only memory. The result of assigning to the array p points to is probably undefined behavior. Just because the compiler lets you get away with it doesn't mean it's OK.
Take a look at this question from the C-FAQ: comp.lang.c FAQ list · Question 1.32
Q: What is the difference between
these initializations?
char a[] = "string literal";
char *p = "string literal";
My program crashes if I try to assign
a new value to p[i].
A: A string literal (the formal term
for a double-quoted string in C
source) can be used in two slightly
different ways:
As the initializer for an array of char, as in the declaration of char
a[] , it specifies the initial values
of the characters in that array (and,
if necessary, its size).
Anywhere else, it turns into an unnamed, static array of characters,
and this unnamed array may be stored
in read-only memory, and which
therefore cannot necessarily be
modified. In an expression context,
the array is converted at once to a
pointer, as usual (see section 6), so
the second declaration initializes p
to point to the unnamed array's first
element.
Some compilers have a switch
controlling whether string literals
are writable or not (for compiling old
code), and some may have options to
cause string literals to be formally
treated as arrays of const char (for
better error catching).
I think you are making a big confusion on a very important general concept to understand when using C, C++ or other low-level languages. In a low-level language there is an implicit assumption than the programmer knows what s/he is doing and makes no programming error.
This assumption allows the implementers of the language to just ignore what should happen if the programmer is violating the rules. The end effect is that in C or C++ there is no "runtime error" guarantee... if you do something bad simply it's NOT DEFINED ("undefined behaviour" is the legalese term) what is going to happen. May be a crash (if you're very lucky), or may be just apparently nothing (unfortunately most of the times... with may be a crash in a perfectly valid place one million executed instructions later).
For example if you access outside of an array MAY BE you will get a crash, may be not, may even be a daemon will come out of your nose (this is the "nasal daemon" you may find on the internet). It's just not something that who wrote the compiler took care thinking to.
Just never do that (if you care about writing decent programs).
An additional burden on who uses low level languages is that you must learn all the rules very well and you must never violate them. If you violate a rule you cannot expect a "runtime error angel" to help you... only "undefined behaviour daemons" are present down there.

Why can I change a local const variable through pointer casts but not a global one in C?

I wanted to change value of a constant by using pointers.
Consider the following code
int main()
{
const int const_val = 10;
int *ptr_to_const = &const_val;
printf("Value of constant is %d",const_val);
*ptr_to_const = 20;
printf("Value of constant is %d",const_val);
return 0;
}
As expected the value of constant is modified.
but when I tried the same code with a global constant, I am getting following run time error.
The Windows crash reporter is opening. The executable is halting after printing the first printf statement in this statement "*ptr_to_const = 20;"
Consider the following code
const int const_val = 10;
int main()
{
int *ptr_to_const = &const_val;
printf("Value of constant is %d",const_val);
*ptr_to_const = 20;
printf("Value of constant is %d",const_val);
return 0;
}
This program is compiled in mingw environment with codeblocks IDE.
Can anyone explain what is going on?
It's a constant and you are using some tricks to change it anyway, so undefined behavior results. The global constant is probably in read-only memory and therefore cannot be modified. When you try to do that you get a runtime error.
The constant local variable is created on the stack, which can be modified. So you get away with changing the constant in this case, but it might still lead to strange things. For example the compiler could have used the value of the constant in various places instead of the constant itself, so that "changing the constant" doesn't show any effect in these places.
It's in read only memory!
Basically, your computer resolves virtual to physical addresses using a two level page table system. Along with that grand data structure comes a special bit representing whether or not a page is readable. This is helpful, because user processes probably shouldn't be over writing their own assembly (although self-modifying code is kind of cool). Of course, they probably also shouldn't be over writing their own constant variables.
You can't put a "const" function-level variable into read only memory, because it lives in the stack, where it MUST be on a read-write page. However, the compiler/linker sees your const, and does you a favor by putting it in read only memory (it's constant). Obviously, overwriting that will cause all kinds of unhappiness for the kernel who will take out that anger on the process by terminating it.
Casting away pointer const-ness in C and C++ is only safe if you are certain that the pointed-to variable was originally non-const (and you just happen to have a const pointer to it). Otherwise, it is undefined, and depending on your compiler, the phase of the moon, etc, the first example could very well fail as well.
You should not even expect the value to be modified at the first place. According to the standard, it is undefined behavior. It is wrong both with a global variable and in the first place. Just don't do it :) It could have crashed the other way, or with both local and global.
There are two errors here. The first one is:
int *ptr_to_const = &const_val;
which is a constraint violation according to C11 6.5.4/3 (earlier standards had similar text):
Constraints
Conversions that involve pointers, other than where permitted by the constraints of 6.5.16.1, shall be specified by means of an explicit cast
The conversion from const int * to int * is not permitted by the constraints of 6.5.16.1 (which can be viewed here).
Confusingly, when some compilers encounter a constraint violation, they write "warning" (or even nothing at all, depending on switches) and pretend that you wrote something else in your code, and carry on. This often leads to programs that do not behave as the programmer expected, or in fact don't behave in any predictable way. Why do compilers do this? Beats me, but it certainly makes for an endless stream of questions like this.
gcc, appears to proceed as if you had written int *ptr_to_const = (int *)&const_val;.
This piece of code is not a constraint violation because an explicit cast is used. However this brings us to the second problem. The line *ptr_to_const = 20; then tries to write to a const object. This causes undefined behaviour, the relevant text from the Standard is in 6.7.3/6:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
This rule is a Semantic, not a Constraint, which means that the Standard does not require the compiler to emit any sort of warning or error message. The program is just wrong and may behave in nonsensical ways, with any sort of strange symptoms, including but not limited to what you observed.
Since this behavior is not defined in the specification, it is implementation-specific, so not portable, so not a good idea.
Why would you want to change the value of a constant?
Note: this is intended as an answer to Can we change the value of an object defined with const through pointers? which links to this question as a duplicate.
The Standard imposes no requirements on what a compiler must do with code that constructs a pointer to a const object and attempts to write to it. Some implementations--especially embedded ones--might possibly have useful behaviors (e.g. an implementation which uses non-volatile RAM could legitimately place const variables in an area of memory which is writable, but whose contents will remain even if the unit is powered down and back up), and the fact that the Standard imposes no requirements about how compilers handle code that creates non-const pointers to const memory does not affect the legitimacy of such code on implementations which expressly allow it. Even on such implementations, however, it's probably a good idea to replace something like:
volatile const uint32_t action_count;
BYPASS_WRITE_PROTECT = 0x55; // Hardware latch which enables writing to
BYPASS_WRITE_PROTECT = 0xAA; // const memory if written with 0x55/0xAA
BYPASS_WRITE_PROTECT = 0x04; // consecutively followed by the bank number
*((uint32_t*)&action_count)++;
BYPASS_WRITE_PROTECT = 0x00; // Re-enable write-protection of const storage
with
void protected_ram_store_u32(uint32_t volatile const *dest, uint32_t dat)
{
BYPASS_WRITE_PROTECT = 0x55; // Hardware latch which enables writing to
BYPASS_WRITE_PROTECT = 0xAA; // const memory if written with 0x55/0xAA
BYPASS_WRITE_PROTECT = 0x04; // consecutively followed by the bank number
*((volatile uint32_t*)dest)=dat;
BYPASS_WRITE_PROTECT = 0x00; // Re-enable write-protection of const storage
}
void protected_ram_finish(void) {}
...
protected_ram_store(&action_count, action_count+1);
protected_ram_finish();
If a compiler would be prone to apply unwanted "optimizations" to code which writes to const storage, moving "protected_ram_store" into a separately-compiled module could serve to prevent such optimizations. It could also be helpful e.g. the code needs to move to hardware which uses some other protocol to write to memory. Some hardware, for example, might use a more complicated write protocols to minimize the probability of erroneous writes. Having a routine whose express purpose is to write to "normally-const" memory will make such intentions clear.

Resources