How much memory is reserved when i declare a string? - c

What exactly happens, in terms of memory, when i declare something like:
char arr[4];
How many bytes are reserved for arr?
How is null string accommodated when I 'strcpy' a string of length 4 in arr?
I was writing a socket program, and when I tried to suffix NULL at arr[4] (i.e. the 5th memory location), I ended up replacing the values of some other variables of the program (overflow) and got into a big time mess.
Any descriptions of how compilers (gcc is what I used) manage memory?

sizeof(arr) bytes are saved* (plus any padding the compiler wants to put around it, though that isn't for the array per se). On an implementation with a stack, this just means moving the stack pointer sizeof(arr) bytes down. (That's where the storage comes from. This is also why automatic allocation is fast.)
'\0' isn't accommodated. If you copy "abcd" into it, you get a buffer overrun, because that takes up 5 bytes total, but you only have 4. You enter undefined behavior land, and anything could happen.
In practice you'll corrupt the stack and crash sooner or later, or experience what you did and overwrite nearby variables (because they too are allocated just like the array was.) But nobody can say for certain what happens, because it's undefined.
* Which is sizeof(char) * 4. sizeof(char) is always 1, so 4 bytes.

What exactly happens, in terms of
memory, when i declare something like:
char arr[4];
4 * sizeof(char) bytes of stack memory is reserved for the string.
How is null string accommodated when I
'strcpy' a string of length 4 in arr?
You can not. You can only have 3 characters, 4th one (i.e. arr[3]) should be '\0' character for a proper string.
when I tried to suffix NULL at arr[4]
The behavior will be undefined as you are accessing a invalid memory location. In the best case, your program will crash immediately, but it might corrupt the stack and crash at a later point of time also.

In C, what you ask for is--usually--exactly what you get. char arr[4] is exactly 4 bytes.
But anything in quotes has a 'hidden' null added at the end, so char arr[] = "oops"; reserves 5 bytes.
Thus, if you do this:
char arr[4];
strcpy(arr, "oops");
...you will copy 5 bytes (o o p s \0) when you've only reserved space for 4. Whatever happens next is unpredictable and often catastrophic.

When you define a variable like char arr[4], it reserves exactly 4 bytes for that variable. As you've found, writing beyond that point causes what the standard calls "undefined behavior" -- a euphemism for "you screwed up -- don't do that."
The memory management of something like this is pretty simple: if it's a global, it gets allocated in a global memory space. If it's a local, it gets allocated on the stack by subtracting an appropriate amount from the stack pointer. When you return, the stack pointer is restored, so they cease to exist (and when you call another function, will normally get overwritten by parameters and locals for that function).

When you make a declaration like char arr[4];, the compiler allocates as many bytes as you asked for, namely four. The compiler might allocate extra in order to accommodate efficient memory accesses, but as a rule you get exactly what you asked for.
If you then declare another variable in the same function, that variable will generally follow arr in memory, unless the compiler makes certain optimizations again. For that reason, if you try to write to arr but write more characters than were actually allocated for arr, then you can overwrite other variables on the stack.
This is not really a function of gcc. All C compilers work essentially the same way.

Related

strcpy working no matter the malloc size?

I'm currently learning C programming and since I'm a python programmer, I'm not entirely sure about the inner workings of C. I just stumbled upon a really weird thing.
void test_realloc(){
// So this is the original place allocated for my string
char * curr_token = malloc(2*sizeof(char));
// This is really weird because I only allocated 2x char size in bytes
strcpy(curr_token, "Davi");
curr_token[4] = 'd';
// I guess is somehow overwrote data outside the allocated memory?
// I was hoping this would result in an exception ( I guess not? )
printf("Current token > %s\n", curr_token);
// Looks like it's still printable, wtf???
char *new_token = realloc(curr_token, 6);
curr_token = new_token;
printf("Current token > %s\n", curr_token);
}
int main(){
test_realloc();
return 0;
}
So the question is: how come I'm able to write more chars into a string than is its allocated size? I know I'm supposed to handle mallocated memory myself but does it mean there is no indication that something is wrong when I write outside the designated memory?
What I was trying to accomplish
Allocate a 4 char ( + null char ) string where I would write 4 chars of my name
Reallocate memory to acomodate the last character of my name
know I'm supposed to handle mallocated memory myself but does it mean there is no indication that something is wrong when I write outside the designated memory?
Welcome to C programming :). In general, this is correct: you can do something wrong and receive no immediate feedback that was the case. In some cases, indeed, you can do something wrong and never see a problem at runtime. In other cases, however, you'll see crashes or other behaviour that doesn't make sense to you.
The key term is undefined behavior. This is a concept that you should become familiar with if you continue programming in C. It means just like it sounds: if your program violates certain rules, the behaviour is undefined - it might do what you want, it might crash, it might do something different. Even worse, it might do what you want most of the time, but just occasionally do something different.
It is this mechanism which allows C programs to be fast - since they don't at runtime do a lot of the checks that you may be used to from Python - but it also makes C dangerous. It's easy to write incorrect code and be unaware of it; then later make a subtle change elsewhere, or use a different compiler or operating system, and the code will no longer function as you wanted. In some cases this can lead to security vulnerabilities, since unwanted behavior may be exploitable.
Suppose that you have an array as shown below.
int arr[5] = {6,7,8,9,10};
From the basics of arrays, name of the array is a pointer pointing to the base element of the array. Here, arr is the name of the array, which is a pointer, pointing to the base element, which is 6. Hence,*arr, literally, *(arr+0) gives you 6 as the output and *(arr+1) gives you 7 and so on.
Here, size of the array is 5 integer elements. Now, try accessing the 10th element, though the size of the array is 5 integers. arr[10]. This is not going to give you an error, rather gives you some garbage value. As arr is just a pointer, the dereference is done as arr+0,arr+1,arr+2and so on. In the same manner, you can access arr+10 also using the base array pointer.
Now, try understanding your context with this example. Though you have allocated memory only for 2 bytes for character, you can access memory beyond the two bytes allocated using the pointer. Hence, it is not throwing you an error. On the other hand, you are able to predict the output on your machine. But it is not guaranteed that you can predict the output on another machine (May be the memory you are allocating on your machine is filled with zeros and may be those particular memory locations are being used for the first time ever!). In the statement,
char *new_token = realloc(curr_token, 6); note that you are reallocating the memory for 6 bytes of data pointed by curr_token pointer to the new_tokenpointer. Now, the initial size of new_token will be 6 bytes.
Usually malloc is implemented such a way that it allocates chunks of memory aligned to paragraph (fundamental alignment) that is equal to 16 bytes.
So when you request to allocate for example 2 bytes malloc actually allocates 16 bytes. This allows to use the same chunk of memory when realloc is called.
According to the C Standard (7.22.3 Memory management functions)
...The pointer returned if the allocation succeeds is suitably aligned so
that it may be assigned to a pointer to any type of object
with a fundamental alignment requirement and then used to access such an
object or an array of such objects in the space allocated
(until the space is explicitly deallocated).
Nevertheless you should not rely on such behavior because it is not normative and as result is considered as undefined behavior.
No automatic bounds checking is performed in C.
The program behaviour is unpredictable.
If you go writing in the memory reserved for another process, you will end with a Segmentation fault, otherwise you will only corrupt data, ecc...

Char and strcpy in C

I came across a part of question in which, I am getting an output, but I need a explanation why it is true and does work?
char arr[4];
strcpy(arr,"This is a link");
printf("%s",arr);
When I compile and execute, I get the following output.
Output:
This is a link
The short answer why it worked (that time) is -- you got lucky. Writing beyond the end of an array is undefined behavior. Where undefined behavior is just that, undefined, it could just a easily cause a segmentation fault as it did produce output. (though generally, stack corruption is the result)
When handling character arrays in C, you are responsible to insure you have allocated sufficient storage. When you intend to use the array as a character string, you also must allocate sufficient storage for each character +1 for the nul-terminating character at the end (which is the very definition of a nul-terminated string in C).
Why did it work? Generally, when you request say char arr[4]; the compiler is only guaranteeing that you have 4-bytes allocated for arr. However, depending on the compiler, the alignment, etc. the compiler may actually allocate whatever it uses as a minimum allocation unit to arr. Meaning that while you have only requested 4-bytes and are only guaranteed to have 4-usable-bytes, the compiler may have actually set aside 8, 16, 32, 64, or 128, etc-bytes.
Or, again, you were just lucky that arr was the last allocation requested and nothing yet has requested or written to the memory address starting at byte-5 following arr in memory.
The point being, you requested 4-bytes and are only guaranteed to have 4-bytes available. Yes it may work in that one printf before anything else takes place in your code, but your code is wholly unreliable and you are playing Russian-Roulette with stack corruption (if it has not already taken place).
In C, the responsibility falls to you to insure your code, storage and memory use is all well-defined and that you do not wander off into the realm of undefined, because if you do, all bets are off, and your code isn't worth the bytes it is stored in.
How could you make your code well-defined? Appropriately limit and validate each required step in your code. For your snippet, you could use strncpy instead of strcpy and then affirmatively nul-terminate arr before calling printf, e.g.
char arr[4] = ""; /* initialize all values */
strncpy(arr,"This is a link", sizeof arr); /* limit copy to bytes available */
arr[sizeof arr - 1] = 0; /* affirmatively nul-terminate */
printf ("%s\n",arr);
Now, you can rely on the contents of arr throughout the remainder of your code.
Your code has some memory issues (buffer overrun) . The function strcpy copies bytes until the null character. The function printf prints until the null character.
There is no guarantee on the behavior of this piece of code.
It's just like: you told me "I'll pick you up at 5:00 p.m." and when you came I would be there(guarantee). But I can't guarantee whether I had grabbed you a cup of coffee or not, because you didn't told me you want one. Maybe I'm very nice and bought two cups of coffee, or maybe I'm a cheapskate and just bought one for myself.
It may work. It may not. It may fail immediately and obviously. It may fail at some arbitrary future time and in subtle ways that will drive you insane.
That is the often-insidious nature of undefined behaviour. Don't do it.
If it works at all, it's totally by accident and in no way guaranteed. It's possible that you're overwriting stuff on the stack or in other memory (depending on the implementation and how/where the actual variable str is defined(a)) but that the memory being overwritten is not used after that point (given the simple nature of the code).
That possibility of it working accidentally in no way makes it a good idea.
For the language lawyers among us, section J.2 (instances of undefined behaviour) of C11 clearly states:
An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]).
That informative section references 6.5.6, which is normative, and which states when discussing pointer/integer addition (of which a[b] is an example):
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
(a) For example, on my system, declaring the variable inside main causes the program to crash because the buffer overflow trashes the return address on the stack.
However, if I put the declaration at file level (outside of main), it seems to run just fine, printing the message then exiting the program.
But I assure you that's only because the memory you've trashed is not important for the continuation of the program in this case. It will almost certainly be important in anything more substantial than this example.
your code will always work as long as the printf is placed just after strcpy. But it is wrong coding
Try following and it won't work
int j;
char arr[4];
int i;
strcpy(arr,"This is a link");
i=0;
j=0;
printf("%s",arr);
To understand why it is so you must understand the idea of stack. All local variables are allocated on stack. Hence in your code, program control has allocated 4 bytes for "arr" and when you copy a string which is larger than 4 bytes then you are overwriting/corrupting some other memory. But as you accessed "arr" just after strcpy hence the area you have overwritten which may belong to some other variables still not updated by program that's why your printf works fine. But as I suggested in example code where other variables are updated which fall into the memory region you have overwritten, you won't get correct (? or more appropriate is desired) output
Your code is working also because stack grows downwards if it would have been other way then also you had not get desired output

Having issues with understanding Array Size

#include<stdio.h>
int main ()
{char c[5];
scanf ("%s",&c);
printf("%s",c);}
So I have declared the array size to be 5.
But lets say when I type Elephant which is an 8 lettered word it still gets printed.Can someone explain why and also suggest what I should do so that the computer takes/considers my input only upto 5 characters.
This is non-deterministic behavior. The array is 5 characters in memory, but scanf is not safe. It is putting all 8 characters into sequential memory starting at c[0] although 3 of the characters are technically not part of the array c. The last three characters may end up getting changed by some other application or function because they are not owned by the array c.
If you use the proper scan_s function it will throw an error when you try to do this. You should always use the "safe" functions for this.
Any array like this is actually a pointer to the first element of the array and the [5] indicates a 5 memory location offset from the first memory location defined by "c". You can freely set the number inside [] to any offset although it may go outside the bounds of what you allocated to c.
In C, arrays have no bound-check. So basically, if you declare a 5-element array, you could store 100 elements in it and, if you are lucky (here, by lucky I mean to not overwrite some other, important things, that affect the program execution), it will work. But it is dangerous and wrong. It's your responsibility to make sure this will not happen.
the variable c is allocated into the stack, when allocating space into the stack, the compiler may allocate more than what you're function really need so, that why when you write 8 character into the stack of the function main, you don't receive a segmentation fault.
you should change you're code and use :
fgets(c, sizeof c, stdin);

Why do I need to allocate memory?

#include<stdio.h>
#include<stdlib.h>
void main()
{
char *arr;
arr=(char *)malloc(sizeof (char)*4);
scanf("%s",arr);
printf("%s",arr);
}
In the above program, do I really need to allocate the arr?
It is giving me the result even without using the malloc.
My second doubt is ' I am expecting an error in 9th line because I think it must be
printf("%s",*arr);
or something.
do I really need to allocate the arr?
Yes, otherwise you're dereferencing an uninitialised pointer (i.e. writing to a random chunk of memory), which is undefined behaviour.
do I really need to allocate the arr?
You need to set arr to point to a block of memory you own, either by calling malloc or by setting it to point to another array. Otherwise it points to a random memory address that may or may not be accessible to you.
In C, casting the result of malloc is discouraged1; it's unnecessary, and in some cases can mask an error if you forget to include stdlib.h or otherwise don't have a prototype for malloc in scope.
I usually recommend malloc calls be written as
T *ptr = malloc(N * sizeof *ptr);
where T is whatever type you're using, and N is the number of elements of that type you want to allocate. sizeof *ptr is equivalent to sizeof (T), so if you ever change T, you won't need to duplicate that change in the malloc call itself. Just one less maintenance headache.
It is giving me the result even without using the malloc
Because you don't explicitly initialize it in the declaration, the initial value of arr is indeterminate2; it contains a random bit string that may or may not correspond to a valid, writable address. The behavior on attempting to read or write through an invalid pointer is undefined, meaning the compiler isn't obligated to warn you that you're doing something dangerous. On of the possible outcomes of undefined behavior is that your code appears to work as intended. In this case, it looks like you're accessing a random segment of memory that just happens to be writable and doesn't contain anything important.
My second doubt is ' I am expecting an error in 9th line because I think it must be printf("%s",*arr); or something.
The %s conversion specifier tells printf that the corresponding argument is of type char *, so printf("%s", arr); is correct. If you had used the %c conversion specifier, then yes, you would need to dereference arr with either the * operator or a subscript, such as printf("%c", *arr); or printf("%c", arr[i]);.
Also, unless your compiler documentation explicitly lists it as a valid signature, you should not define main as void main(); either use int main(void) or int main(int argc, char **argv) instead.
1. The cast is required in C++, since C++ doesn't allow you to assign void * values to other pointer types without an explicit cast
2. This is true for pointers declared at block scope. Pointers declared at file scope (outside of any function) or with the static keyword are implicitly initialized to NULL.
Personally, I think this a very bad example of allocating memory.
A char * will take up, in a modern OS/compiler, at least 4 bytes, and on a 64-bit machine, 8 bytes. So you use four bytes to store the location of the four bytes for your three-character string. Not only that, but malloc will have overheads, that add probably between 16 and 32 bytes to the actual allocated memory. So, we're using something like 20 to 40 bytes to store 4 bytes. That's a 5-10 times more than it actually needs.
The code also casts malloc, which is wrong in C.
And with only four bytes in the buffer, the chances of scanf overflowing is substantial.
Finally, there is no call to free to return the memory to the system.
It would be MUCH better to use:
int len;
char arr[5];
fgets(arr, sizeof(arr), stdin);
len = strlen(arr);
if (arr[len] == '\n') arr[len] = '\0';
This will not overflow the string, and only use 9 bytes of stackspace (not counting any padding...), rather than 4-8 bytes of stackspace and a good deal more on the heap. I added an extra character to the array, so that it allows for the newline. Also added code to remove the newline that fgets adds, as otherwise someone would complain about that, I'm sure.
In the above program, do I really need to allocate the arr?
You bet you do.
It is giving me the result even without using the malloc.
Sure, that's entirely possible... arr is a pointer. It points to a memory location. Before you do anything with it, it's uninitialized... so it's pointing to some random memory location. The key here is wherever it's pointing is a place your program is not guaranteed to own. That means you can just do the scanf() and at that random location that arr is pointing to the value will go, but another program can overwrite that data.
When you say malloc(X) you're telling the computer that you need X bytes of memory for your own usage that no one else can touch. Then when arr captures the data it will be there safely for your usage until you call free() (which you forgot to do in your program BTW)
This is a good example of why you should always initialize your pointers to NULL when you create them... it reminds you that you don't own what they're pointing at and you better point them to something valid before using them.
I am expecting an error in 9th line because I think it must be printf("%s",*arr)
Incorrect. scanf() wants an address, which is what arr is pointing to, that's why you don't need to do: scanf("%s", &arr). And printf's "%s" specificier wants a character array (a pointer to a string of characters) which again is what arr is, so no need to deference.

what is the correct way to define a string in C?

what is the correct way to define a string in C?
using:
char string[10];
or
char *string="???";
If I use an array, I can use any pointer to point to it and then manipulate it.
It seems like using the second one will cause trouble because we didn't allocate memory for that. I am taught that array is just a pointer value, I thought these two are the same before.
Until I did something like string* = *XXXX, and realize it didn't work like a pointer.
As #affenlehrer points out, how you "define" a string depends on how you want to use it. In reality, 'defining' a string in C really just amounts to putting it in quotes somewhere in your program. You should probably read more about how memory works and is allocated in C, but if you write:
char *ptr = "???"
What happens is that the compiler will take the string "???" (which is really four bytes of data, three '?'s followed by one zero byte for the NUL terminator). It will insert that at some static place in your program (in something called the .bss segment), and when your program starts running, the value of ptr will be initialized to point to that location in memory. This means you have a pointer to four bytes of memory, and if you try to write outside of those bytes, your program is doing something bad (and probably violating memory safety).
On the other hand, if you write
char string[10];
Then this basically tells the compiler to go allocate some space in your program of 10 bytes, and make the variable 'string' point to it. It depends where you put this: if you put it inside a function, then you will have a stack allocated buffer of 10 bytes. If you manipulate this buffer inside a function, and then don't do anything with the pointer afterwards, you're all fine. However, if you pass back the address of string -- or use the pointer in any way -- after the function returns, you're in the wrong. This is because, after the function returns, you lose all of the stack allocated variables.
There are even more ways to create strings in C (e.g. using malloc). What is your usecase? Basically you need a place in memory where the data is stored (on the stack, on the heap, static as in your second example) and then a character pointer to the first character of your string. Most string related functions will "see" the end of the string by the trailing '\0', in some other cases (mostly general purpose data related functions) you also have to provide the length of the string.

Resources