What happens with memory when we reassign value to char pointer? - c

I wonder what happens inside the memory when we do something like this
char *s;
s = "Text";
s = "Another Text";
If I'm getting it right, by assigning string to char pointer memory is dynamically allocated. So according to my understanding assignment expression
s = "Text";
equals to
s = (char *) malloc(5); // "Text" + '\0'
strcpy(s, "Text");
Well, this way we can easily free memory by using
free(s);
But... After reassigning same pointer to another value, it allocates new memory segment to store that value.
s = "Text";
printf("\n(%p) s = \"%s\"", s, s);
s = "Another Text";
printf("\n(%p) s = \"%s\"", s, s);
Output:
(0x400614) s = "Text"
(0x400628) s = "Another Text"
That means that address of old value is not accessible to us any longer and we can't free that any more. Another call to free(s); will probably deallocate only last memory segment used by that pointer.
My question is: If we reassign same char pointer over and over again, does it consume more and more program memory during run-time or that garbage somehow gets automatically freed?
I hope that was enough to demonstrate my problem, couldn't think better example. If something's not clear enough please ask for additional clarification.

Your understanding is wrong. It is just the assignment and it does not allocate any memory. In your example you assign the pointer with the addresses of the string literals. String literals are created compile time and placed in the read only memory
You do now allocate any memory by assigning the pointer

It's not equal to doing a malloc. What's happening is that the string literal is stored in a read only part of memory. And it's not the assignment of a pointer that does the allocation. All string literals in a program are already allocated from start.
It might be worth mentioning that they are not strictly speaking stored in read only memory, but they might be and writing to a string literal is undefined behavior.
You cannot and should not call free on a string literal. Well, you can, but the program will likely crash.

With no optimization, compiler will reserve two distinct memory space for string literals "text1" and "text2".
If assignment lines are very consecutive as in your question and if nothing is done after the first assignment line —assuming compiling with optimization— compiler, most probably, will not allocate any space for the first string literal nor will produce any opcode for the first assignment line.

Related

Assignments to malloced string behaving wierd

Here are the codes, it's C code. And please explain return value of ="string"
char * p = (char*) malloc(sizeof(char) * 100);
p = "hello";
*(p+1) = '1';
printf("%s", p);
free(p);
Notice that p = "hello"; does not copy any string, but just sets the pointer p to become the address of the 6 bytes literal string "hello". To copy a string use strncpy or strcpy (read strncpy(3) ...) but be scared of buffer overflows.
So you do
char * p = malloc(100);
which allocates a memory zone capable of holding 100 bytes. Let's pretend that malloc succeeded (read malloc(3)...) and returned the address 0x123456 for example (often, that concrete address is not reproducible from one run to the next, e.g. because of ASLR).
Then you assign p = "hello"; so you forgot the address 0x123456 (you've got now a memory leak), and you put in p the address of the 6 bytes literal string "hello" (let's imagine it is 0x2468a).
Later the machine executes the code for *(p+1) = '1'; so you are trying to replace the e character (at address 0x2468b) inside literal "hello" by 1. You get a segmentation violation (or some other undefined behavior), since that literal string sits in constant read only memory (e.g. the text segment of your executable, at least on Linux).
A better code might be:
#define MY_BUFFER_LEN 100
char *p = malloc(MY_BUFFER_LEN);
if (!p) { perror("malloc for p"); exit(EXIT_FAILURE); };
strncpy(p, "hello", MY_BUFFER_LEN); /// you could do strcpy(p, "hello")
*(p+1) = '1';
printf("%s", p);
free(p);
and if you are lucky (no memory failure) that would much later output h1llo (the output will happen only when stdout becomes flushed since it is buffered, e.g. by some later call to fflush). So don't forget to call
fflush(NULL);
after the previous code chunk. Read perror(3), fflush(3).
A generic advice is to read documentation of every function that you are using (even printf(3) ...).
Regarding printf and related functions, since stdout is often line buffered, in practice I strongly recommend to end every format control string with \n -e.g. printf("%s\n", p); in your code; when you don't do that (there are some cases where you don't want to...) think twice and perhaps comment your code.
Don't forget to compile with all warnings and debug info (e.g. gcc -Wall -Wextra -g) then learn how to use the debugger (e.g. gdb)
Your code doesn't do what you think it does.
"hello" is behind the scenes a pointer to a static array of six chars, most likely write protected.
When you assign it to p, the pointer that malloc returned is lost, instead p now contains a pointer to that static array of six chars.
The assigment to p+1 may crash, or may not crash, but whatever it does, it is undefined behaviour and will cause trouble.
free (p) tries to free a static array of six chars. That isn't going to work. Again, undefined behaviour, and an immediate crash if you are lucky.
There are problems in nearly each line:
1) In the first line you are assigning to p the address of newly allocated memory. which is OK
2) In the next line you overwrite it with an address of some static string. Which is bad, since the allocated memory is "lost", thus causing memory leak.
3) In the third line you are trying to overwrite something in the static string location, which might be read only, which is bad.
4) In the last line you are trying to free the memory at the string's location, which is memory violation.
You need to understand what pointers are.
In C, there is no variable that can store a string. Instead the strings are stored in arrays of chars. In order to be able to handle such an array, you use a variable (called pointer) that stores the memory address of the first element of the array. Also, strings have a special character (the nul character) the indicate where they end, but that's not relevant here.
In the code you posted, you allocate memory to store 100 chars (this is usually 100 bytes) and get a pointer to the first element of this memory region, called p (for further reading: Do I cast the result of malloc?).
Whenever you use a string literal, some memory gets allocated and it gets stored in that memory region. When you assign it to p, p now points to the first element of the new memory region, that stores the string (so you've lost the memory allocated you malloc -> memory leak). Now, you try to modify the string pointed at by p. This won't work, as the string literals are stored as constant strings. And after that, you try to free the memory of this string, which is not possible. All of these mistakes generate runtime errors and are not very easy to detect and fix. In order to be able to achieve what you want, use a function like strcpy to copy the string from the read-only memory region to your pointer:
strcpy(p, "string");
Strings in C are arrays, and you can't use = to assign values to arrays, only to indexed elements of arrays. For copying string data, use strcpy() or strncpy(). Your code could look like:
char *p = (char*) malloc(sizeof(char) * 100);
strcpy(p, "hello");
*(p+1) = '1';
printf("%s", p);
free(p);
Try that. The difference is that p = "hello" will set the pointer p to point to the constant string "hello", overwriting the pointer to the 100-byte memory block you just allocated. The free(p); call below will fail because you're passing a pointer not returned by an allocation function, even if the system you're using doesn't give a write fault on attempting to modify constant data.
If you ever do need to point a pointer at a constant string (it happens), be sure you declare the pointer as const char *. This is required in C++. I don't know if it's required in newer versions of C, but it's a Real Good Idea in any case.
1) In C it is incorrect cast the return of malloc() (C++, okay to cast)
2) Once allocated memory, assigning a value to p is not done by =.
strcpy(p, "hello"); is better.
3) Once assigned correctly, using strcpy(), *(p+1) = '1' is the same as p[1] = '1', and will result in "h1llo".
By the way, using the line *(p+1) = '1' AFTER using p = "hello" to assign the value hello (instead of strcpy()) will cause problems, as described by #gnasher, #Basile and others.

What is the difference between assigning a string pointer first to allocated memory and directly to a string literal?

So my understanding is that these two block of codes are valid and do the same thing.
1.)
char *ptr = malloc(5);
ptr = "hi";
2.)
char *ptr = "hi";
I would want to know the difference between the two like if there any advantages of one over the other.
The former is a bug, and that code should never have been written.
It overwrites the pointer returned by malloc() with the address of a string literal, dropping the original pointer and leaking memory.
You must use strcpy() or some other memory-copying method to initialize newly allocated heap memory with a string.
The second just assigns the (run-time constant) address of the string literal to the pointer ptr, no characters are copied anywhere.
The first bit is a possible memory leak, the second relies on the implicit const storage class being used, and assigns the memory address of an immutable string to a pointer.
Basically:
char *ptr = malloc(5);//allocates 5 * sizeof *ptr
//then assigns the address where this block starts to ptr
//this:
ptr = "hi";//assigns position of 'h','i', '\0' in read-only mem to ptr
Now, the address you've allocated, that ptr pointed to, still is allocated. The difference is that you have no "handle" on it anymore, because ptr's value changed. There's no pointer pointing to the dynamic memory you allocated using malloc, so it's getting rather tricky to manage the memory... You probably won't be able to free it, and calling free on ptr now will result in undefined behaviour.
If you write:
char *ptr = "hi";
Then you're actually writing:
const char *ptr = "hi";
Which means you can't change the string to which ptr points:
ptr[0] = 'H';//IMBOSSIBRU
Alternatives are:
char string[] = "Hi";//copies Hi\0 to string
//or
char *ptr = malloc(5);
strcpy(ptr, "hi");//requires string.h
The difference between the two snippets above is that the first creates a stack array, the second allocates a block of memory on the heap. Stack memory is easier to manage, faster and just better in almost every way, apart from it being less abundant, and not really usable as a return value...
There is a pool of string literals for every process. Whenever you create a string literal inside your code, the literal is saved in the pool and the string's address (i.e. an address pointing somewhere to the pool) is returned. Therefore, you are creating a memory leak in your first option, because you are overwriting the address you received with malloc.
In the first case
char *ptr = malloc(5);
ptr = "hi";
There is a memory leak and later you are pointing ptr to a string literal "hi" which does require any memory from the heap (that's why there is memory leak).
But if you are allocating the memory and if you are using
strcpy (ptr, "hi");
then if you wish you can modify it
strcpy (ptr, "hello")
with one condition that you allocate sufficient memory before.
But in your case you are assigning a pointer ptr with a string literal, here you will not be able to modify it
ptr = "hello" // not valid. This will give a segmentation fault error
In your second case there is no memory leak and you are making a pointer to point to a string literal and thus it's value cannot be modified as it will be stored in read only data segment.

string manipulations in C

Following are some basic questions that I have with respect to strings in C.
If string literals are stored in read-only data segment and cannot be changed after initialisation, then what is the difference between the following two initialisations.
char *string = "Hello world";
const char *string = "Hello world";
When we dynamically allocate memory for strings, I see the following allocation is capable enough to hold a string of arbitary length.Though this allocation work, I undersand/beleive that it is always good practice to allocate the actual size of actual string rather than the size of data type.Please guide on proper usage of dynamic allocation for strings.
char *str = (char *)malloc(sizeof(char));
scanf("%s",str);
printf("%s\n",str);
1.what is the difference between the following two initialisations.
The difference is the compilation and runtime checking of the error as others already told about this.
char *string = "Hello world";--->stored in read only data segment and
can't be changed,but if you change the value then compiler won't give
any error it only comes at runtime.
const char *string = "Hello world";--->This is also stored in the read
only data segment with a compile time checking as it is declared as
const so if you are changing the value of string then you will get an
error at compile time ,which is far better than a failure at run time.
2.Please guide on proper usage of dynamic allocation for strings.
char *str = (char *)malloc(sizeof(char));
scanf("%s",str);
printf("%s\n",str);
This code may work some time but not always.The problem comes at run-time when you will get a segmentation fault,as you are accessing the area of memory which is not own by your program.You should always very careful in this dynamic memory allocation as it will leads to very dangerous error at run time.
You should always allocate the amount of memory you need correctly.
The most error comes during the use of string.You should always keep in mind that there is a '\0' character present at last of the string and during the allocation its your responsibility to allocate memory for this.
Hope this helps.
what is the difference between the following two initialisations.
String literals have type char* for legacy reasons. It's good practice to only point to them via const char*, because it's not allowed to modify them.
I see the following allocation is capable enough to hold a string of arbitary length.
Wrong. This allocation only allocates memory for one character. If you tried to write more than one byte into string, you'd have a buffer overflow.
The proper way to dynamically allocate memory is simply:
char *string = malloc(your_desired_max_length);
Explicit cast is redundant here and sizeof(char) is 1 by definition.
Also: Remember that the string terminator (0) has to fit into the string too.
In the first case, you're explicitly casting the char* to a const one, meaning you're disallowing changes, at the compiler level, to the characters behind it. In C, it's actually undefined behaviour (at runtime) to try and modify those characters regardless of their const-ness, but a string literal is a char *, not a const char *.
In the second case, I see two problems.
The first is that you should never cast the return value from malloc since it can mask certain errors (especially on systems where pointers and integers are different sizes). Specifically, unless there is an active malloc prototype in place, a compiler may assume that it returns an int rather than the correct void *.
So, if you forget to include stdlib.h, you may experience some funny behaviour that the compiler couldn't warn you about, because you told it with an explicit cast that you knew what you were doing.
C is perfectly capable of implicit casting between the void * returned from malloc and any other pointer type.
The second problem is that it only allocates space for one character, which will be the terminating null for a string.
It would be better written as:
char *string = malloc (max_str_size + 1);
(and don't ever multiply by sizeof(char), that's a waste of time - it's always 1).
The difference between the two declarations is that the compiler will produce an error (which is much preferable to a runtime failure) if an attempt to modify the string literal is made via the const char* declared pointer. The following code:
const char* s = "hello"; /* 's' is a pointer to 'const char'. */
*s = 'a';
results in the VC2010 emitted the following error:
error C2166: l-value specifies const object
An attempt to modify the string literal made via the char* declared pointer won't be detected until runtime (VC2010 emits no error), the behaviour of which is undefined.
When malloc()ing memory for storing of strings you must remember to allocate one extra char for storing the null terminator as all (or nearly all) C string handling functions require the null terminator. For example, to allocate a buffer for storing "hello":
char* p = malloc(6); /* 5 for "hello" and 1 for null terminator. */
sizeof(char) is guaranteed to be 1 so is unrequired and it is not necessary to cast the return value of malloc(). When p is no longer required remember to free() the allocated memory:
free(p);
Difference between the following two initialisations.
first, char *string = "Hello world";
- "Hello world" stored in stack segment as constant string and its address is assigned to pointer'string' variable.
"Hello world" is constant. And you can't do string[5]='g', and doing this will cause a segmentation fault.
Where as 'string' variable itself is not constant. And you can change its binding:
string= "Some other string"; //this is correct, no segmentation fault
const char *string = "Hello world";
Again "Hello world" stored in stack segment as constant string and its address is assigned to 'string' variable.
And string[5]='g', and this cause segmentation fault.
No use of const keyword here!
Now,
char *string = (char *)malloc(sizeof(char));
Above declaration same as first one but this time you are assignment is dynamic from Heap segment (not from stack)
The code:
char *string = (char *)malloc(sizeof(char));
Will not hold a string of arbitrary length. It will allocate a single character and return a pointer to char character. Note that a pointer to a character and a pointer to what you call a string are the same thing.
To allocate space for a string you must do something like this:
char *data="Hello, world";
char *copy=(char*)malloc(strlen(data)+1);
strcpy(copy,data);
You need to tell malloc exactly how many bytes to allocate. The +1 is for the null terminator that needs to go onto the end.
As for literal string being stored in a read-only segment, this is an implementation issue, although is pretty much always the case. Most C compilers are pretty relaxed about const'ing access to these strings, but attempting to modify them is asking for trouble, so you should always declare them const char * to avoid any issues.
That particular allocation may appear to work as there's probably plenty of space in the program heap, but it doesn't. You can verify it by allocating two "arbitrary" strings with the proposed method and memcpy:ing some long enough string to the corresponding addresses. In the best case you see garbage, in the worst case you'll have segmentation fault or assert from malloc or free.

Pointer to C style string isn't allocated

#include <stdio.h>
int main() {
char *t = "hello world";
puts(t);
//printf("%s", t);
t = "goodbye world";
puts(t);
}
The memory for t isn't allocated, so why I don't get segfault when I run it?
t is a pointer, so you are just making t point to another string.
Because string literals are allocated statically in your program memory - you do not need to allocate memory for them explicitly.
Memory is allocated for t; enough memory is allocated for it to hold a pointer (typically, 4 bytes in a 32-bit program, 8 bytes in a 64-bit program).
Further, the initialization for t ensures that the pointer points somewhere:
char *t = "hello world";
String literals are also allocated space, somewhere. Often, that is in the read-only portion of memory, so you should really be using const char *t = "hello world"; and even if you don't use the explicit const, you should not try to modify the string that t points at. But it is the compiler's problem to ensure that t is pointing somewhere valid.
Similarly, after the assignment:
t = "goodbye, Cruel World!";
the variable is pointing at space allocated by the compiler. As long as you don't abuse it (and your code doesn't), this is fine.
What would get you into trouble is something like this:
char *t;
puts(t); // t is uninitialized; undefined behaviour
t = 0; // equivalently, t = NULL;
puts(t); // t contains the null pointer; undefined behaviour
The uninitialized local variable could contain any value; you cannot predict reliably what will happen. On some machines, it may contain a null pointer and cause a crash, but that is not something you can rely on.
A null pointer doesn't point at anything valid, so dereferencing a null pointer leads to undefined behaviour, and very often that undefined behaviour is a crash. (Classically, on DEC VAX machines, you got a zero byte at address zero instead of a crash. That led (in part) to one of Henry Spencer's Ten Commandments "All the world is not a VAX" — and also "Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end.")
So, in your program, memory is allocated for t and t is initialized and assigned to point to (read-only) string constants, so there is no excuse for the program to crash.
t is here a pointer to the first character of an anonymous string, which can be in read-only memory. A good idea is to declare the pointer as pointer to const char :
const char *t = "hello world";
See also here.
All the memory the compiler needs to allocate for t is 4 bytes on a 32-bit system. Remember that it's just a pointer. In the first couple of lines it's pointing to "hello world", but after that you change it so it points to "goodbye world". C will have allocated enough memory for the strings you have defined and passes you the pointer so you can point to them. You don't need to worry about that. Also remember that these string are static and read-only, which means you can't safely say t[4] = 'b';.

Memory allocated in char * var; declaration

In C, declaring a char pointer like this
char* p="Hello";
allocates some memory for a string literal Hello\0. When I do this afterwards
p="FTW";
what happens to the memory allocated to Hello\0? Is the address p points to changed?
There is no dynamic memory allocation in either statement.
Those strings are stored in your executable, loaded in a (likely read-only) section of memory that will live as long as your process does.
The second assignment only changes what p points to. Nothing else happens.
The memory remains occupied by "Hello". It is lost (unless you have other references to it).
The address p is pointing to (the value of p) is changed of course.
In this case, "Hello" is created at compile time and is part of the binary. In most situation "Hello" is stored in read only memory. "FTW" is also part of the binary. Second assignment will only change the pointer.
in addition - "Hello" and "FTW" have static storge duration as Met have pointed out
It creates a string constant that cannot be modified and should be used as it is.
If you try doing
p[0]='m';
It would give segmentation fault since this is not string literal with allocated memory in which you can reassign and read back values.
what if
p = getbuffer();
getbuffer()
{
return buf = malloc(buf, size);
}
how can free this memory before allocating new memory to p! imagine that p should use getbuffer() many times.

Resources