Typecasting string and strdup - c

If an input const string is being modified in some way (which is resulting in C compiler warning), what is the best way to handle it - typecasting it to a new variable and then using it OR duplicating it and using it and then freeing it. Or is there any other way to handle this type of scenario. please suggest. Any help would be appreciated.
//Typecasting
const char * s1;
char * s2 = (char *)s1;
//Duplicate and free
const char * s1;
char * s2 = strdup( s1 );
free(s2)
EDIT: It is a C compiler; not C++. I am not sure whether in typecasting, s2 will be a new copy of string s1 or will it be pointing to the original string s1?
Thanks for the answers. I have one more doubt-
const char * c1;
const char * c2 = c1;
Is the above assignment valid?

Casting away const here might shut the compiler up but will lead to runtime failures. Make a copy of the string and work on that.
Casting away const does not copy the contents of the memory. It just creates a pointer to the same memory and tells the compiler that it can go right ahead and write to that memory. If the memory is read only you have a protection fault. More seriously you can have correctness problems which are hard to debug. Don't cast away const.
Of course, if you need to modify a variable and have those modifications visible to the caller, then you should not make it const in the first place. On the other hand, if the modifications are meant to be private to the function, then duplication of the const parameter is best.
As a broad rule you should attempt to avoid casts if at all possible. Casts are one of the most frequent sources of errors.

If the array that s1 points to is const then you must not modify it; doing so gives undefined behaviour. In that case, if you need to modify it, then you will have to take a copy. UPDATE: Typecasting will give you a pointer to the same array, which is still const, so you still must not modify (even though trying to do so no longer gives a compile error).
If you can guarantee that the array is not const, then you could modify it by casting away the const from the pointer. In that case, it would be less error-prone if you can enforce that guarantee by not adding a const qualifier to s1 in the first place.

If your input is a const char* you definitely should listen to the compiler warning and leave it alone.

Related

How Should I Define/Declare String Constants

I've always used string constants in C as one of the following
char *filename = "foo.txt";
const char *s = "bar"; /* preferably this or the next one */
const char * const s3 = "baz":
But, after reading this, now I'm wondering, should I be declaring my string constants as
const char s4[] = "bux";
?
Please note that linked question suggested as a duplicate is different because this one is specifically asking about constant strings. I know how the types are different and how they are stored. The array version in that question is not const-qualified. This was a simple question as to whether I should use constant array for constant strings vs. the pointer version I had been using. The answers here have answered my question, when two days of searching on SO and Google did not yield an exact answer. Thanks to these answers, I've learned that the compiler can do special things when the array is marked const, and there are indeed (at least one) case where I will now be using the array version.
Pointer and arrays are different. Defining string constants as pointers or arrays fits different purposes.
When you define a global string constant that is not subject to change, I would recommend you make it a const array:
const char product_name[] = "The program version 3";
Defining it as const char *product_name = "The program version 3"; actually defines 2 objects: the string constant itself, which will reside in a constant segment, and the pointer which can be changed to point to another string or set to NULL.
Conversely, defining a string constant as a local variable would be better done as a local pointer variable of type const char *, initialized with the address of a string constant:
int main() {
const char *s1 = "world";
printf("Hello %s\n", s1);
return 0;
}
If you define this one as an array, depending on the compiler and usage inside the function, the code will make space for the array on the stack and initialize it by copying the string constant into it, a more costly operation for long strings.
Note also that const char const *s3 = "baz"; is a redundant form of const char *s3 = "baz";. It is different from const char * const s3 = "baz"; which defines a constant pointer to a constant array of characters.
Finally, string constants are immutable and as such should have type const char []. The C Standard purposely allows programmers to store their addresses into non const pointers as in char *s2 = "hello"; to avoid producing warnings for legacy code. In new code, it is highly advisable to always use const char * pointers to manipulate string constants. This may force you to declare function arguments as const char * when the function does not change the string contents. This process is known as constification and avoid subtile bugs.
Note that some functions violate this const propagation: strchr() does not modify the string received, declared as const char *, but returns a char *. It is therefore possible to store a pointer to a string constant into a plain char * pointer this way:
char *p = strchr("Hello World\n", 'H');
This problem is solved in C++ via overloading. C programmers must deal with this as a shortcoming. An even more annoying situation is that of strtol() where the address of a char * is passed and a cast is required to preserve proper constness.
The linked article explores a small artificial situation, and the difference demonstrated vanishes if you insert const after * in const char *ptr = "Lorum ipsum"; (tested in Apple LLVM 10.0.0 with clang-1000.11.45.5).
The fact the compiler had to load ptr arose entirely from the fact it could be changed in some other module not visible to the compiler. Making the pointer const eliminates that, and the compiler can prepare the address of the string directly, without loading the pointer.
If you are going to declare a pointer to a string and never change the pointer, then declare it as static const char * const ptr = "string";, and the compiler can happily provide the address of the string whenever the value of ptr is used. It does not need to actually load the contents of ptr from memory, since it can never change and will be known to point to wherever the compiler chooses to store the string. This is then the same as static const char array[] = "string";—whenever the address of the array is needed, the compiler can provide it from its knowledge of where it chose to store the array.
Furthermore, with the static specifier, ptr cannot be known outside the translation unit (the file being compiled), so the compiler can remove it during optimization (as long as you have not taken its address, perhaps when passing it to another routine outside the translation unit). The result should be no differences between the pointer method and the array method.
Rule of thumb: Tell the compiler as much as you know about stuff: If it will never change, mark it const. If it is local to the current module, mark it static. The more information the compiler has, the more it can optimize.
From the performance perspective, this is a fairly small optimization which makes sense for low-level code that needs to run with the lowest possible latency.
However, I would argue that const char s3[] = "bux"; is better from the semantic perspective, because the type of the right hand side is closer to type of the left hand side. For that reason, I think it makes sense to declare string constants with the array syntax.

Differences between moments of char pointer assignment

Today I was told, that this code:
int main(){
char *a;
a = "foobar";
/* a used later to strcpy */
return 0;
}
Is bad, and may lead to problems and errors.
However, my code worked without any problems, and I don't understand, what is the difference between this, and
int main(){
char *a = "foobar";
/* a used later to strcpy */
return 0;
}
Which was described to me as the "correct" way.
Could someone describe, why these two codes are different?
And, if the first one may be problematic, show an example of this?
Functionally, they are the same.
In the former snippet, a is assigned to a string literal; in the latter, a is initialized with a string literal.
In both cases, a points to string literal (which can't be modified).
There's no reason to consider one as more correct than the other. I'd prefer the latter - but that's just my personal preference.
Both snippets are equally bad, because both end in a non const pointer pointing to const data. If the non const pointer is used to (try to) change the data, you will get Undefined Behaviour: everything can happen from it works to program crashes including modifying instruction is ignored.
The correct way is to either use a const pointer or to initialize a non const array.
const char *a = "foobar";
or
char a[] = "foobar";
But beware, in latter case you have a true array not a pointer, so you could also do if you really need pointer semantics:
char _a[] = "foobar";
char *a = _a;
There are some places that have coding standards, for instance to help with static code analysis by tools like Coverity.
A coding practice rule that I have seen several places is that variables should always be declared initialized to simplify things to make analysis easier.
You second snippet hews more closely to that rule than the first, as its impossible to insert new code where a could be used uninitialized.
That's a positive benefit when it comes to code maintenance.

When to allocate memory to char *

I am bit confused when to allocate memory to a char * and when to point it to a const string.
Yes, I understand that if I wish to modify the string, I need to allocate it memory.
But in cases when I don't wish to modify the string to which I point and just need to pass the value should I just do the below? What are the disadvantages in the below steps as compared to allocating memory with malloc?
char *str = NULL;
str = "This is a test";
str = "Now I am pointing here";
Let's try again your example with the -Wwrite-strings compiler warning flag, you will see a warning:
warning: initialization discards 'const' qualifier from pointer target type
This is because the type of "This is a test" is const char *, not char *. So you are losing the constness information when you assign the literal address to the pointer.
For historical reasons, compilers will allow you to store string literals which are constants in non-const variables.
This is, however, a bad behavior and I suggest you to use -Wwrite-strings all the time.
If you want to prove it for yourself, try to modify the string:
char *str = "foo";
str[0] = 'a';
This program behavior is undefined but you may see a segmentation fault on many systems.
Running this example with Valgrind, you will see the following:
Process terminating with default action of signal 11 (SIGSEGV)
Bad permissions for mapped region at address 0x4005E4
The problem is that the binary generated by your compiler will store the string literals in a memory location which is read-only. By trying to write in it you cause a segmentation fault.
What is important to understand is that you are dealing here with two different systems:
The C typing system which is something to help you to write correct code and can be easily "muted" (by casting, etc.)
The Kernel memory page permissions which are here to protect your system and which shall always be honored.
Again, for historical reasons, this is a point where 1. and 2. do not agree. Or to be more clear, 1. is much more permissive than 2. (resulting in your program being killed by the kernel).
So don't be fooled by the compiler, the string literals you are declaring are really constant and you cannot do anything about it!
Considering your pointer str read and write is OK.
However, to write correct code, it should be a const char * and not a char *. With the following change, your example is a valid piece of C:
const char *str = "some string";
str = "some other string";
(const char * pointer to a const string)
In this case, the compiler does not emit any warning. What you write and what will be in memory once the code is executed will match.
Note: A const pointer to a const string being const char *const:
const char *const str = "foo";
The rule of thumb is: always be as constant as possible.
If you need to modify the string, use dynamic allocation (malloc() or better, some higher level string manipulation function such as strdup, etc. from the libc), if you don't need to, use a string literal.
If you know that str will always be read-only, why not declare it as such?
char const * str = NULL;
/* OR */
const char * str = NULL;
Well, actually there is one reason why this may be difficult - when you are passing the string to a read-only function that does not declare itself as such. Suppose you are using an external library that declares this function:
int countLettersInString(char c, char * str);
/* returns the number of times `c` occurs in `str`, or -1 if `str` is NULL. */
This function is well-documented and you know that it will not attempt to change the string str - but if you call it with a constant string, your compiler might give you a warning! You know there is nothing dangerous about it, but your compiler does not.
Why? Because as far as the compiler is concerned, maybe this function does try to modify the contents of the string, which would cause your program to crash. Maybe you rely very heavily on this library and there are lots of functions that all behave like this. Then maybe it's easier not to declare the string as const in the first place - but then it's all up to you to make sure you don't try to modify it.
On the other hand, if you are the one writing the countLettersInString function, then simply make sure the compiler knows you won't modify the string by declaring it with const:
int countLettersInString(char c, char const * str);
That way it will accept both constant and non-constant strings without issue.
One disadvantage of using string-literals is that they have length restrictions.
So you should keep in mind from the document ISO/IEC:9899
(emphasis mine)
5.2.4.1 Translation limits
1 The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:
[...]
— 4095 characters in a character string literal or wide string literal (after concatenation)
So If your constant text exceeds this count (What some times throughout may be possible, especially if you write a dynamic webserver in C) you are forbidden to use the string literal approach if you want to stay system independent.
There is no problem in your code as long as you are not planing to modify the contents of that string. Also, the memory for such string literals will remain for the full life time of the program. The memory allocated by malloc is read-write, so you can manipulate the contents of that memory.
If you have a string literal that you do not want to modify, what you are doing is ok:
char *str = NULL;
str = "This is a test";
str = "Now I am pointing here";
Here str a pointer has a memory which it points to. In second line you write to that memory "This is a test" and then again in 3 line you write in that memory "Now I am pointing here". This is legal in C.
You may find it a bit contradicting but you can't modify string that is something like this -
str[0]='X' // will give a problem.
However, if you want to be able to modify it, use it as a buffer to hold a line of input and so on, use malloc:
char *str=malloc(BUFSIZE); // BUFSIZE size what you want to allocate
free(str); // freeing memory
Use malloc() when you don't know the amount of memory needed during compile time.
It is legal in C unfortunately, but any attempt to modify the string literal via the pointer will result in undefined behavior.
Say
str[0] = 'Y'; //No compiler error, undefined behavior
It will run fine, but you may get a warning by the compiler, because you are pointing to a constant string.
P.S.: It will run OK only when you are not modifying it. So the only disadvantage of not using malloc is that you won't be able to modify it.

Pointer to a character constant

Sorry for being such a dumb here. Can't sort this out for myself.
In a header file there is a Macro like this.
#define kOID "1.3.6.1.4.1.1.1.2.4.0"
How to declare and initialize a char pointer to this data without creating a copy of this string?
Preprocessor macros are nothing but a textual substitution. Thus if you write
const char *pointer = kOID;
the preprocessor will substitute the text with
const char *pointer = "1.3.6.1.4.1.1.1.2.4.0";
One thing to bear in mind is that the const specifier is necessary since once the textual substitution is made, the memory will be allocated on read-only segments.
Also be careful to have the macro visible at the point where you'd like to declare that pointer.
Assuming that you're not planning to change the contents of this string, you can simply use:
char* p = kOID;
The string will reside in a read-only section of the program, so any attempt to change its contents will result with a memory access violation during runtime. So for your own safety, you should generally use:
const char* p = kOID;
Thus, any attempt to change the contents of the string pointed by p will lead to a compile-time error instead of a runtime error. The former is typically much easier to track-down and fix than the latter.
To summarize the const issue, here are the options that you can use:
char* p = kOID;
char* const p = kOID; // compilation error if you change the pointer
const char* p = kOID; // compilation error if you change the pointed data
const char* const p = kOID; // compilation error if you change either one of them
UPDATE - Memory Usage Considerations:
Please note that every such declaration may result with an additional memory usage, adding up to the length of the string plus one character, plus 4 or 8 bytes for the pointer (depending on your system). Now, the pointer is perhaps less of an issue, but the string itself might yield an extensive memory usage if you instantiate it in several places in the code. So if you're planning to use the string in various places within your program, then you should probably declare it globally in one place.
In addition, please note that the string may reside either in the code-section of the program or in the data-section of the program. Depending on your memory partitions, you may prefer having it in one place over the other.
include the header file first.
#include <header.h>
Add the defined constant
char * s = kOID;
This will compile the program fine. However as kOID is a string literal it'll be saved on read only memory of your program. So if you modify the s it'll cause Segmentation fault. The get around is to make s constant.
const char * s = kOID;
Now if you compile the program compiler will check any assignment on s and notice accordingly.
a.c: In function ‘main’:
a.c:10:5: error: assignment of read-only location ‘*s’
So you'll be safe.
To add to what has been said by others, also you can initialize your array this way:
const char some_string[] = kOID;
This is similar to const char *const some_string = kOID;. Possibly, it may lead to additional memory allocation but this depends on compiler.

Which of these options is good practice for assigning a string value to a variable in C?

I have this snippet of C code:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
typedef struct Date {
int date;
char* month;
int year;
} Date_t;
typedef Date_t* pDate_t;
void assignMonth(pDate_t birth)
{
//1)
birth->month = "Nov";
//2)
//birth->month = malloc(sizeof(char) * 4);
//birth->month = strcpy(birth->month, "Nov");
}
int main()
{
Date_t birth;
birth.date = 13;
assignMonth(&birth);
birth.year = 1969;
printf("%d %s %d\n",birth.date, birth.month, birth.year);
return 0;
}
In the function assignMonth I have two possibilities for assigning month. Both give me the same result in the output, so what is the difference between them? I think that the second variant is the good one, am I wrong? If yes, why? If not, why?
Thanks in advance for any help.
P.S. I'm interested in what is going on in memory in both cases.
It depends on what you want to do with birth.month later. If you have no intention of changing it, then the first is better (quicker, no memory cleanup requirement required, and each Date_t object shares the same data). But if that is the case, I would change the definition of month to const char *. In fact, any attempt to write to *birth.month will cause undefined behaviour.
The second approach will cause a memory leak unless you remember to free(birth.month) before birth goes out of scope.
You're correct, the second variant is the "good" one.
Here's the difference:
With 1, birth->month ends up pointing to the string literal "Nov". It is an error to try to modify the contents of birth->month in this case, and so birth->month should really be a const char* (many modern compilers will warn about the assignment for this reason).
With 2, birth->month ends up pointing to an allocated block of memory whose contents are "Nov". You are then free to modify the contents of birth->month, and the type char* is accurate. The caveat is that you are also now required to free(birth->month) in order to release this memory when you are done with it.
The reason that 2 is the correct way to do it in general, even though 1 seems simpler in this case, is that 1 in general is misleading. In C, there is no string type (just sequences of characters), and so there is no assignment operation defined on strings. For two char*s, s1 and s2, s1 = s2 does not change the string value pointed to by s1 to be the same as s2, it makes s1 point at exactly the same string as s2. This means that any change to s1 will affect the contents of s2, and vice-versa. Also, you now need to be careful when deallocating that string, since free(s1); free(s2); will double-free and cause an error.
That said, if in your program birth->month will only ever be one of several constant strings ("Jan", "Feb", etc.) variant 1 is acceptable, however you should change the type of birth->month to const char* for clarity and correctness.
Neither is correct. Everyone's missing the fact that this structure is inherently broken. Month should be an integer ranging from 1 to 12, used as an index into a static const string array when you need to print the month as a string.
I suggest either:
const char* month;
...
birth->month = "Nov";
or:
char month[4];
...
strcpy(birth->month, "Nov");
avoiding the memory allocation altogether.
With option 1 you never allocate memory to store "Nov", which is okay because it's a static string. A fixed amount of memory was allocated for it automatically. This will be fine so long as it's a string that appears literally in the source and you never try to modify it. If you wanted to read a value in from the user, or from a file, then you'd need to allocate first.
Seperate to your question; why is struct Date typedefed? it has a type already - "struct Date".
You can use an incomplete type if you want to hide the structure decleration.
In my experience, people seem to typedef because they think they should - without actually thinking about the effect of doing so.
In the first case your cannot do something like birth->month[i]= 'c'. In other words you cannot modify the string literal "Mov" pointed to by birth->month because it is stored in the read only section of memory.
In the second case you can modify the contents of p->month because "Mov" resides on the heap. Also you need to deallocate the allocated memory using free in this case.
For this example, it doesn't matter too much. If you have a lot of Date_t variables (in, say, an in-memory database), the first methow will lead to less memory usage over-all, with the gotcha that you should not, under any circumstances, change any of the characters in the strings, as all "Nov" strings would be "the same" string (a pointer to the same 4 chars).
So, to an extent, both variants are good, but the best one would depend on expected usage pattern(s).

Resources