C: Pointer-to-array and array of characters - c

there are many similar questions regarding this Topic, but they do not answer the following question:
Taking a swing
I am going to take a swing, if you want go straight to the question in the next heading. Please correct me if I make any wrong assumptions here.
Lets assume, I have this string declaration
char* cpHelloWorld = "Hello World!";
I understand the Compiler will make a char* to an anonymous Array stored somewhere in the Memory (by the way: where is it stored?).
If I have this declaration
char cHelloWorld[] = "Hello World!";
There will be no anonymous Array, as the Compiler will create the Array cHelloWorld right away.
The first difference between these two variables is that I can change cpHelloWorld, whereas the Array cHelloWorld is read-only, and I would have to redeclare it if I want to Change it.
My question is following
cpHelloWorld = "This is a pretty long Phrase, compared to the Hello World phrase above.";
How does my application allocate at runtime a new, bigger (anonymous) Array at runtime? Should I use this approach with the pointer, as it seems easier to use or are there any cons? On paper, I would have used malloc, if I had to work with dynamic Arrays.
My guess is that the Compiler (or runtime Environment?) creates a new anonymous Array every time I change the Content of my Array.

char* cpHelloWorld = "Hello World!";
is a String Literal stored in read-only memory. You cannot modify the contents of this string.
char cHelloWorld[] = "Hello World!";
is an array of char initialized to "Hello World!\0".
(note: where the brackets are placed)
The amount of memory allocated at run-time by the compiler is set by the initialization "This is a pretty long ... phrase above."; The compiler will initialize the literal allowing 1 char for each char in the initialization string +1 for the required nul-terminating character.
Whether you use a statically declared array (e.g. char my_str[ ] = "stuff";) or you seek to dynamically allocate storage for the characters, largely depends on whether you know what, and how much, of whatever you wish to store. Obviously, if you know beforehand what the string is, using a string literal or an initialized array of type char is a simple way to go.
However, when you do NOT know what will be stored, or how much, then declaring a pointer to char (e.g. char *my_string; and then once you have the data to store, you can allocate storage for my_string (e.g. my_string = malloc (len * sizeof *my_string); (of course sizeof *my_string will be 1 for character arrays, so that can be omitted) (note: parenthesis are required with sizeof (explicit type), e.g. sizeof (int), but are optional when used with a variable)
Then simply free whatever you have allocated when the values are no longer needed.

As a matter of fact all strings known to the compiler at compile-time are allocated in the data segment of the program. The pointer itself is located on the stack.
There is no memory allocation at run-time, so it is nothing like malloc. There are no performance drawbacks here.

Each of the constant "anonymous" strings used in these contexts exists at its fixed address. The only dynamic part is the actual pointer assignment. You should get the same string address each time you execute a specific pointer assignment from a specific anonymous string (each string has its own address).

Related

When to allocate memory to char *

I am bit confused when to allocate memory to a char * and when to point it to a const string.
Yes, I understand that if I wish to modify the string, I need to allocate it memory.
But in cases when I don't wish to modify the string to which I point and just need to pass the value should I just do the below? What are the disadvantages in the below steps as compared to allocating memory with malloc?
char *str = NULL;
str = "This is a test";
str = "Now I am pointing here";
Let's try again your example with the -Wwrite-strings compiler warning flag, you will see a warning:
warning: initialization discards 'const' qualifier from pointer target type
This is because the type of "This is a test" is const char *, not char *. So you are losing the constness information when you assign the literal address to the pointer.
For historical reasons, compilers will allow you to store string literals which are constants in non-const variables.
This is, however, a bad behavior and I suggest you to use -Wwrite-strings all the time.
If you want to prove it for yourself, try to modify the string:
char *str = "foo";
str[0] = 'a';
This program behavior is undefined but you may see a segmentation fault on many systems.
Running this example with Valgrind, you will see the following:
Process terminating with default action of signal 11 (SIGSEGV)
Bad permissions for mapped region at address 0x4005E4
The problem is that the binary generated by your compiler will store the string literals in a memory location which is read-only. By trying to write in it you cause a segmentation fault.
What is important to understand is that you are dealing here with two different systems:
The C typing system which is something to help you to write correct code and can be easily "muted" (by casting, etc.)
The Kernel memory page permissions which are here to protect your system and which shall always be honored.
Again, for historical reasons, this is a point where 1. and 2. do not agree. Or to be more clear, 1. is much more permissive than 2. (resulting in your program being killed by the kernel).
So don't be fooled by the compiler, the string literals you are declaring are really constant and you cannot do anything about it!
Considering your pointer str read and write is OK.
However, to write correct code, it should be a const char * and not a char *. With the following change, your example is a valid piece of C:
const char *str = "some string";
str = "some other string";
(const char * pointer to a const string)
In this case, the compiler does not emit any warning. What you write and what will be in memory once the code is executed will match.
Note: A const pointer to a const string being const char *const:
const char *const str = "foo";
The rule of thumb is: always be as constant as possible.
If you need to modify the string, use dynamic allocation (malloc() or better, some higher level string manipulation function such as strdup, etc. from the libc), if you don't need to, use a string literal.
If you know that str will always be read-only, why not declare it as such?
char const * str = NULL;
/* OR */
const char * str = NULL;
Well, actually there is one reason why this may be difficult - when you are passing the string to a read-only function that does not declare itself as such. Suppose you are using an external library that declares this function:
int countLettersInString(char c, char * str);
/* returns the number of times `c` occurs in `str`, or -1 if `str` is NULL. */
This function is well-documented and you know that it will not attempt to change the string str - but if you call it with a constant string, your compiler might give you a warning! You know there is nothing dangerous about it, but your compiler does not.
Why? Because as far as the compiler is concerned, maybe this function does try to modify the contents of the string, which would cause your program to crash. Maybe you rely very heavily on this library and there are lots of functions that all behave like this. Then maybe it's easier not to declare the string as const in the first place - but then it's all up to you to make sure you don't try to modify it.
On the other hand, if you are the one writing the countLettersInString function, then simply make sure the compiler knows you won't modify the string by declaring it with const:
int countLettersInString(char c, char const * str);
That way it will accept both constant and non-constant strings without issue.
One disadvantage of using string-literals is that they have length restrictions.
So you should keep in mind from the document ISO/IEC:9899
(emphasis mine)
5.2.4.1 Translation limits
1 The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:
[...]
— 4095 characters in a character string literal or wide string literal (after concatenation)
So If your constant text exceeds this count (What some times throughout may be possible, especially if you write a dynamic webserver in C) you are forbidden to use the string literal approach if you want to stay system independent.
There is no problem in your code as long as you are not planing to modify the contents of that string. Also, the memory for such string literals will remain for the full life time of the program. The memory allocated by malloc is read-write, so you can manipulate the contents of that memory.
If you have a string literal that you do not want to modify, what you are doing is ok:
char *str = NULL;
str = "This is a test";
str = "Now I am pointing here";
Here str a pointer has a memory which it points to. In second line you write to that memory "This is a test" and then again in 3 line you write in that memory "Now I am pointing here". This is legal in C.
You may find it a bit contradicting but you can't modify string that is something like this -
str[0]='X' // will give a problem.
However, if you want to be able to modify it, use it as a buffer to hold a line of input and so on, use malloc:
char *str=malloc(BUFSIZE); // BUFSIZE size what you want to allocate
free(str); // freeing memory
Use malloc() when you don't know the amount of memory needed during compile time.
It is legal in C unfortunately, but any attempt to modify the string literal via the pointer will result in undefined behavior.
Say
str[0] = 'Y'; //No compiler error, undefined behavior
It will run fine, but you may get a warning by the compiler, because you are pointing to a constant string.
P.S.: It will run OK only when you are not modifying it. So the only disadvantage of not using malloc is that you won't be able to modify it.

Which method is correct for Initializing a wchar_t string?

I am writing a program and I need to initialize a message buffer which will hold text. I am able to make it work, however I am writing below various ways used to initialize the strings in C and I want to understand the difference. Also, which is the most appropriate method for initializing a wchar_t/char string?
Method I:
wchar_t message[100];
based on my understanding, this will allocate a memory space of 200 bytes (I think size of wchar_t is 2 bytes on Windows OS). This memory allocation is static and it will be allocated inside the .data section of the executable at the time of compiling.
message is also a memory address itself that points to the first character of the string.
This method of initializing a string works good for me.
Method II:
wchar_t *message;
message=(wchar_t *) malloc(sizeof(wchar_t) * 100);
This method will first initialize the variable message as a pointer to wchar_t. It is an array of wide characters.
next, it will dynamically allocate memory for this string. I think I have written the syntax for it correctly.
When I use this method in my program, it does not read the text after the space in a string.
Example text: "This is a message"
It will read only "This" into the variable message and no text after that.
Method III:
wchar_t *message[100];
This will define message as an array of 100 wide characters and a pointer to wchar_t. This method of initializing message works good. However, I am not sure if it is the right way. Because message in itself is pointing to the first character in the string. So, initializing it with the size, is it correct?
I wanted to understand it in more depth, the correct way of initializing a string. This same concept can be extended to a string of characters as well.
The magic is the encoding-prefix L:
#include <wchar.h>
...
wchar_t m1[] = L"Hello World";
wchar_t m2[42] = L"Hello World";
wchar_t * pm = L"Hello World";
...
wcscat(m2, L" again");
pm = calloc(123, sizeof *pm);
wcspy(pm, L"bye");
See also the related part of the C11 Standard.
It really depends on what you want to do and how you use the data. If you need it globally, by all means, define a static array. If you only need it in a method, do the same in the method. If you want to pass the data around between functions, over a longer lifetime, malloc the memory and use that.
However, your method III is wrong - it is an array of 100 wchar_t pointers. If you want to create a 100 large wchar_t array and a pointer, you need to use:
wchar_t message[100], *message_pointer;
Also, concerning terminology: you are only declaring a variable in the method I, you never assign anything to it.

Why does a char array need strcpy and char star doesn't - using structs in C

I have a misunderstanding regarding this code -
typedef struct _EXP{
int x;
char* name;
char lastName[40];
}XMP
...main...
XMP a;
a.name = "eaaa";
a.lastName = strcpy(a.lastName, "bbb");
Why can't I use: a.lastName = "bbbb"; and that's all?
Well consider the types here. The array has the contents of the string, while the char* merely points to the data. Consequently the array requires strcpy and friends.
Besides, if you allocated memory for the char* on the heap or stack and then wanted to assign some content to that, you'd also have to use strcpy because a mere assignment would create a dangling pointer (i.e. a memory leak).
Because the location of an array is fixed, while the value of a pointer (which is itself a location) is not. You can assign new values to a pointer, but not an array.
Under the hood, they're both the same thing; an array name in C is a pointer, but from a semantics point of view you cannot reassign an array but you can repoint a pointer.
When you write
a.name = "eaaa" ;
the compiler will allocate memory for a NULL terminated string eaaa\0 and, because of that instruction, it will make the pointer name point to that location (e.g. the name variable will contain the address of the memory location where the first byte of the string resides).
If you have the array instead, you already have an allocated area of memory (which cannot be assigned to another memory location!), and you can only fill it with data (in this case bytes representing your string).
This is my understanding about what might be the reason for this.
I think it's about the way that language works. C (and also C++) produces an unmanaged code - which means they don't need an environment (like JVM) to run on to manage memory, threading etc. So, the code is produced to an executable that is run by the OS directly. For that reason, the executable includes information, for example, how much space that to be allocated for each type (not sure for the dynamic types though) including the arrays. (This is also why C++ introduced header files since this was the only way to know size of an object during compilation)
So, when the compiler sees an array of characters, it calculates how much space is needed for it during the compilation phase and put that information into the executable. When running the program, the flow can figure out how much space is required and allocates that much of memory. If you change this multiple times, let's say in a C function, each assignment would make the previous one(s) invalid. So, IMO, that's why the compiler doesn't allow that.

C Duration of strings, constants, compound literals, and why not, the code itself

I didn't remember where I read, that If I pass a string to a function like.
char *string;
string = func ("heyapple!");
char *func (char *string) {
char *p
p = string;
return p;
}
printf ("%s\n", string);
The string pointer continue to be valid because the "heyapple!" is in memory, it IS in the code the I wrote, so it never will be take off, right?
And about constants like 1, 2.10, 'a'?
And compound literals?
like If I do it:
func (1, 'a', "string");
Only the string will be all of my program execution, or the constans will be too?
For example I learned that I can take the address of string doing it
&"string";
Can I take the address of the constants literals? like 1, 2.10, 'a'?
I'm passing theses to functions arguments and it need to have static duration like strings without the word static.
Thanks a lot.
This doesn't make a whole lot of sense.
Values that are not pointers cannot be "freed", they are values, they can't go away.
If I do:
int c = 1;
The variable 'c' is not a pointer, it cannot do anything else than contain an integer value, to be more specific it can't NOT contain an integer value. That's all it does, there are no alternatives.
In practice, the literals will be compiled into the generated machine-code, so that somewhere in the code resulting from the above will be something like
load r0, 1
Or whatever the assembler for the underlying instruction set looks like. The '1' is a part of the instruction encoding, it can't go away.
Make sure you distinguish between values and pointers to memory. Pointers are themselves values, but a special kind of value that contains an address to memory.
With char* hello = "hello";, there are two things happening:
the string "hello" and a null-terminator are written somewhere in memory
a variable named hello contains a value which is the address to that memory
With int i = 0; only one thing happens:
a variable named i contains the value 0
When you pass around variables to functions their values are always copied. This is called pass by value and works fine for primitive types like int, double, etc. With pointers this is tricky because only the address is copied; you have to make sure that the contents of that address remain valid.
Short answer: yes. 1 and 'a' stick around due to pass by value semantics and "hello" sticks around due to string literal allocation.
Stuff like 1, 'a', and "heyapple!" are called literals, and they get stored in the compiled code, and in memory for when they have to be used. If they remain or not in memory for the duration of the program depends on where they are declared in the program, their size, and the compiler's characteristics, but you can generally assume that yes, they are stored somewhere in memory, and that they don't go away.
Note that, depending on the compiler and OS, it may be possible to change the value of literals, inadvertently or purposely. Many systems store literals in read-only areas (CONST sections) of memory to avoid nasty and hard-to-debug accidents.
For literals that fit into a memory word, like ints and chars it doesn't matter how they are stored: one repeats the literal throughout the code and lets the compiler decide how to make it available. For larger literals, like strings and structures, it would be bad practice to repeat, so a reference should be kept.
Note that if you use macros (#define HELLO "Hello!") it is up to the compiler to decide how many copies of the literal to store, because macro expansion is exactly that, a substitution of macros for their expansion that happens before the compiler takes a shot at the source code. If you want to make sure that only one copy exists, then you must write something like:
#define HELLO "Hello!"
char* hello = HELLO;
Which is equivalent to:
char* hello = "Hello!";
Also note that a declaration like:
const char* hello = "Hello!";
Keeps hello immutable, but not necessarily the memory it points to, because of:
char h = (char) hello;
h[3] = 'n';
I don't know if this case is defined in the C reference, but I would not rely on it:
char* hello = "Hello!";
char* hello2 = "Hello!"; // is it the same memory?
It is better to think of literals as unique and constant, and treat them accordingly in the code.
If you do want to modify a copy of a literal, use arrays instead of pointers, so it's guaranteed a different copy of the literal (and not an alias) is used each time:
char hello[] = "Hello!";
Back to your original question, the memory for the literal "heyapple!" will be available (will be referenceable) as long as a reference is kept to it in the running code. Keeping a whole module (a loadable library) in memory because of a literal may have consequences on overall memory use, but that's another concern (you could also force the unloading of the module that defines the literal and get all kind of strange results).
First,it IS in the code the I wrote, so it never will be take off, right? my answer is yes. I recommend you to have a look at the structure of ELF or runtime structure of executable. The position that the string literal stored is implementation dependent, in gcc, string literal is store in the .rdata segment. As the name implies, the .rdata is read-only. In your code
char *p
p = string;
the pointer p now point to an address in a readonly segment, so even after the end of function call, that address is still valid. But if you try to return a pointer point to a local variable then it is dangerous and may cause hard-to-find bugs:
int *func () {
int localVal = 100;
int *ptr = localVal;
return p;
}
int val = func ();
printf ("%d\n", val);
after the execution of func, as the stack space of func is retrieve by the c runtime, the memory address where localVal was stored will no longer guarantee to hold the original localVal value. It can be overidden by operation following the func.
Back to your question title
-
string literal have static duration.
As for "And about constants like 1, 2.10, 'a'?"
my answer is NO, your can't get address of a integer literal using &1. You may be confused by the name 'integer constant', but 1,2.10,'a' is not right value ! They do not identify a memory place,thus, they don't have duration, a variable contain their value can have duration
compound literals, well, I am not sure about this.

Advantages and disadvantages of using strdup on a string literal

I want to be clear about all the advantages/disadvantages of the following code:
{
char *str1 = strdup("some string");
char *str2 = "some string";
free(str1);
}
str1:
You can modify the contents of the string
str2:
You don't have to use free()
Faster
Any other differences?
Use neither if you can and avoid it by one of the following
static char const str3[] = { "some string" };
char str4[] = { "some string" };
str3 if you never plan to modify it and str4 if you do.
str3 ensures that no other function in your program can modify your string (string literals may be shared and mutable). str4 allocates a constant sized array on the stack, so allocation and deallocation comes with no overhead. The system has just to copy your data.
Using the original string - whether it's a literal in the source, part of a memory-mapped file, or even an allocated string "owned" by another part of your program - has the advantage of saving memory, and possibly eliminating ugly error conditions you'd otherwise have to handle if you performed an allocation (which could fail). The disadvantage, of course, is that you have to keep track of the fact that this string is not "owned" by the code currently using it, and thus that it cannot be modified/freed. Sometimes this means you need a flag in a structure to indicate whether a string it uses was allocated for the structure or not. With smaller programs, it might just mean you have to manually follow the logic of string ownership through several functions and make sure it's correct.
By the way, if the string is going to be used by a structure, one nice way to get around having to keep a flag marking whether it was allocated for the structure or not is to allocate space for the structure and the string (if needed) with a single call to malloc. Then, freeing the structure always just works, regardless of whether the string was allocated for the structure or assigned from a string literal or other source.
strdup is not C89 and not C99 -> not
ANSI C -> not portable
is portable and str2 is implicit const

Resources