I came across a part of question in which, I am getting an output, but I need a explanation why it is true and does work?
char arr[4];
strcpy(arr,"This is a link");
printf("%s",arr);
When I compile and execute, I get the following output.
Output:
This is a link
The short answer why it worked (that time) is -- you got lucky. Writing beyond the end of an array is undefined behavior. Where undefined behavior is just that, undefined, it could just a easily cause a segmentation fault as it did produce output. (though generally, stack corruption is the result)
When handling character arrays in C, you are responsible to insure you have allocated sufficient storage. When you intend to use the array as a character string, you also must allocate sufficient storage for each character +1 for the nul-terminating character at the end (which is the very definition of a nul-terminated string in C).
Why did it work? Generally, when you request say char arr[4]; the compiler is only guaranteeing that you have 4-bytes allocated for arr. However, depending on the compiler, the alignment, etc. the compiler may actually allocate whatever it uses as a minimum allocation unit to arr. Meaning that while you have only requested 4-bytes and are only guaranteed to have 4-usable-bytes, the compiler may have actually set aside 8, 16, 32, 64, or 128, etc-bytes.
Or, again, you were just lucky that arr was the last allocation requested and nothing yet has requested or written to the memory address starting at byte-5 following arr in memory.
The point being, you requested 4-bytes and are only guaranteed to have 4-bytes available. Yes it may work in that one printf before anything else takes place in your code, but your code is wholly unreliable and you are playing Russian-Roulette with stack corruption (if it has not already taken place).
In C, the responsibility falls to you to insure your code, storage and memory use is all well-defined and that you do not wander off into the realm of undefined, because if you do, all bets are off, and your code isn't worth the bytes it is stored in.
How could you make your code well-defined? Appropriately limit and validate each required step in your code. For your snippet, you could use strncpy instead of strcpy and then affirmatively nul-terminate arr before calling printf, e.g.
char arr[4] = ""; /* initialize all values */
strncpy(arr,"This is a link", sizeof arr); /* limit copy to bytes available */
arr[sizeof arr - 1] = 0; /* affirmatively nul-terminate */
printf ("%s\n",arr);
Now, you can rely on the contents of arr throughout the remainder of your code.
Your code has some memory issues (buffer overrun) . The function strcpy copies bytes until the null character. The function printf prints until the null character.
There is no guarantee on the behavior of this piece of code.
It's just like: you told me "I'll pick you up at 5:00 p.m." and when you came I would be there(guarantee). But I can't guarantee whether I had grabbed you a cup of coffee or not, because you didn't told me you want one. Maybe I'm very nice and bought two cups of coffee, or maybe I'm a cheapskate and just bought one for myself.
It may work. It may not. It may fail immediately and obviously. It may fail at some arbitrary future time and in subtle ways that will drive you insane.
That is the often-insidious nature of undefined behaviour. Don't do it.
If it works at all, it's totally by accident and in no way guaranteed. It's possible that you're overwriting stuff on the stack or in other memory (depending on the implementation and how/where the actual variable str is defined(a)) but that the memory being overwritten is not used after that point (given the simple nature of the code).
That possibility of it working accidentally in no way makes it a good idea.
For the language lawyers among us, section J.2 (instances of undefined behaviour) of C11 clearly states:
An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]).
That informative section references 6.5.6, which is normative, and which states when discussing pointer/integer addition (of which a[b] is an example):
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
(a) For example, on my system, declaring the variable inside main causes the program to crash because the buffer overflow trashes the return address on the stack.
However, if I put the declaration at file level (outside of main), it seems to run just fine, printing the message then exiting the program.
But I assure you that's only because the memory you've trashed is not important for the continuation of the program in this case. It will almost certainly be important in anything more substantial than this example.
your code will always work as long as the printf is placed just after strcpy. But it is wrong coding
Try following and it won't work
int j;
char arr[4];
int i;
strcpy(arr,"This is a link");
i=0;
j=0;
printf("%s",arr);
To understand why it is so you must understand the idea of stack. All local variables are allocated on stack. Hence in your code, program control has allocated 4 bytes for "arr" and when you copy a string which is larger than 4 bytes then you are overwriting/corrupting some other memory. But as you accessed "arr" just after strcpy hence the area you have overwritten which may belong to some other variables still not updated by program that's why your printf works fine. But as I suggested in example code where other variables are updated which fall into the memory region you have overwritten, you won't get correct (? or more appropriate is desired) output
Your code is working also because stack grows downwards if it would have been other way then also you had not get desired output
Related
I know that declaring a char[] variable in a while loop is scoped, having seen this post: Redeclaring variables in C.
Going through a tutorial on creating a simple web server in C, I'm finding that I have to manually clear memory assigned to responseData in the example below, otherwise the contents of index.html are just continuously appended to the response and the response contains duplicated contents from index.html:
while (1)
{
int clientSocket = accept(serverSocket, NULL, NULL);
char httpResponse[8000] = "HTTP/1.1 200 OK\r\n\n";
FILE *htmlData = fopen("index.html", "r");
char line[100];
char responseData[8000];
while(fgets(line, 100, htmlData) != 0)
{
strcat(responseData, line);
}
strcat(httpResponse, responseData);
send(clientSocket, httpResponse, sizeof(httpResponse), 0);
close(clientSocket);
}
Correct by:
while (1)
{
...
char responseData[8000];
memset(responseData, 0, strlen(responseData));
...
}
Coming from JavaScript, this was surprising. Why would I want to declare a variable and have access to the memory contents of a variable declared in a different scope with the same name? Why wouldn't C just reset that memory behind the scenes?
Also... Why is it that variables of the same name declared in different scopes get assigned the same memory addresses?
According to this question: Variable declared interchangebly has the same pattern of memory address that ISN'T the case. However, I'm finding that this is occurring pretty reliably.
Not completely correct. You don't need to clear the whole responseData array - clearing its first byte is just enough:
responseData[0] = 0;
As Gabriel Pellegrino notes in the comment, a more idiomatic expression is
responseData[0] = '\0';
It explicitly defines a character via its code point of zero value, while the former uses an int constant zero. In both cases the right-side argument has type int which is implicitly converted (truncated) to char type for assignment. (Paragraph fixed thx to the pmg's comment.)
You could know that from the strcat documentation: the function appends its second argument string to the first one. If you need the very first chunk to get stored into the buffer, you want to append it to an empty string, so you need to ensure the string in the buffer is empty. That is, it consists of the terminating NUL character only. memset-ting the whole array is an overkill, hence a waste of time.
Additionally, using a strlen on the array is asking for troubles. You can't know what the actual contents of the memory block allocated for the array is. If it was not used yet or was overwritten with some other data since your last use, it may contain no NUL character. Then strlen will run out of the array causing Undefined Behavior. And even if it returns successfuly, it will give you the string's length bigger than the size of the array. As a result memset will run out of the array, possibly overwriting some vital data!
Use sizeof whenever you memset an array!
memset(responseData, 0, sizeof(responseData));
EDIT
In the above I tried to explain how to fix the issue with your code, but I didn't answer your questions. Here they are:
Why do variables (...) in different scopes get assigned the same memory addresses?
In regard of execution each iteration of the while(1) { ... } loop indeed creates a new scope. However, each scope terminates before the new one is created, so the compiler reserves appropriate block of memory on the stack and the loop re-uses it in every iteration. That also simplifies a compiled code: every iteration is executed by exactly the same code, which simply jumps at the end to the beginning. All instructions within the loop that access local variables use exactly the same addressing (relative to the stack) in each iteration. So, each variable in the next iteration has precisely the same location in memory as in all previous iterations.
I'm finding that I have to manually clear memory
Yes, automatic variables, allocated on the stack, are not initialized in C by default. We always need to explicitly assign an initial value before we use it – otherwise the value is undefined and may be incorrect (for example, a floating-point variable can appear not-a-number, a character array may appear not terminated, an enum variable may have a value out of the enum's definition, a pointer variable may not point at a valid, accessible location, etc.).
otherwise the contents (...) are just continuously appended
This one was answered above.
Coming from JavaScript, this was surprising
Yes, JavaScript apparently creates new variables at the new scope, hence each time you get a brand new array – and it is empty. In C you just get the same area of a previously allocated memory for an automatic variable, and it's your responsibility to initialize it.
Additionally, consider two consecutive loops:
void test()
{
int i;
for (i=0; i<5; i++) {
char buf1[10];
sprintf(buf1, "%d", i);
}
for (i=0; i<1; i++) {
char buf2[10];
printf("%s\n", buf2);
}
}
The first one prints a single-digit, character representation of five numbers into the character array, overwriting it each time - hence the last value of buf1[] (as a string) is "4".
What output do you expect from the second loop? Generally speaking, we can't know what buf2[] will contain, and printf-ing it causes UB. However we may suppose the same set of variables (namely a single 10-items character array) from both disjoint scopes will get allocated the same way in the same part of a stack. If this is the case, we'll get a digit 4 as an output from a (formally uninitialized) array.
This result depends on the compiler construction and should be considered a coincidence. Do not rely on it as this is UB!
Why wouldn't C just reset that memory behind the scenes?
Because it's not told to. The language was created to compile to effective, compact code. It does as little 'behind the scenes' as possible. Among others things it does not do is not initializing automatic variables unless it's told to. Which means you need to add an explicit initializer to a local variable declaration or add an initializing instruction (e.g. an assignment) before the first use. (This does not apply to global, module-scope variables; those are initialized to zeros by default.)
In higher-level languages some or all variables are initialized on creation, but not in C. That's its feature and we must live with it – or just not use this language.
With this line:
char responseData[8000];
You are saying to your compiler: Hey big C, give me a 8000 bytes chunk and name it responseData.
In runtime, if you don't specify, no one will ever clean or give you a "brand-new" chunk of memory. That means that the 8000 bytes chunk you get in every single execution can hold all the possible permutations of bits in this 8000 bytes. Something extraordinary that can happens, is that you're getting in every execution the same memory region and thus, the same bits in this 8000 bytes your big C gave to you in the first time. So, if you don't clean, you have the impression that you're using the same variable, but you're not! You're just using the same (never cleaned) memory region.
I'd add that it's part of the programmer's responsibilities to clean, if you need to, the memory you're allocating, in dynamic or static way.
Why would I want to declare a variable and have access to the memory contents of a variable declared in a different scope with the same name? Why wouldn't C just reset that memory behind the scenes?
Objects with auto storage duration (i.e., block-scope variables) are not automatically initialized - their initial contents are indeterminate. Remember that C is a product of the early 1970s, and errs on the side of runtime speed over convenience. The C philosophy is that the programmer is in the best position to know whether something should be initialized to a known value or not, and is smart enough to do it themselves if needed.
While you're logically creating and destroying a new instance of responseData on each loop iteration, it turns out the same memory location is being reused each time through. We like to think that space is allocated for each block-scope object as we enter the block and released as we leave it, but in practice that's (usually) not the case - space for all block-scope objects within a function is allocated on function entry, and released on function exit1.
Different objects in different scopes may map to the same memory behind the scenes. Consider something like
void bletch( void )
{
if ( some_condition )
{
int foo = some_function();
printf( "%d\n", foo );
}
else
{
int bar = some_other_function();
printf( "%d\n", bar );
}
It's impossible for both foo and bar to exist at the same time, so there's no reason to allocate separate space for both - the compiler will (usually) allocate space for one int object at function entry, and that space gets used for either foo or bar depending on which branch is taken.
So, what happens with responseData is that space for one 8000-character array is allocated on function entry, and that same space gets used for each iteration of the loop. That's why you need to clear it out on each iteration, either with a memset call or with an initializer like
char responseData[8000] = {0};
As M.M points out in a comment, this isn't true for variable-length arrays (and potentially other variably modified types) - space for those is set aside as needed, although where that space is taken from isn't specified by the language definition. For all other types, though, the usual practice is to allocate all necessary space on function entry.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
My understanding of fscanf:
grabs a line from a file and based on format, stores it to a string.
That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).
Some assumptions:
1. fp is a valid FILE pointer.
2. The file has 1 line in it that reads "Something"
A pointer with allocated memory
char* temp = malloc(sizeof(char) * 1); // points to some small part in mem.
int resp = fscanf(fp,"%s", temp);
printf("Trying to print: %s\n",temp); // prints "Something" (that's what's in the file)
An array with predefined length (it's different from the pointer!)
char temp[100]; // this buffer MUST be big enough, or we get segmentation fault
int resp = fscanf(fp,"%s", temp);
printf("Trying to print: %s\n",temp); // prints "Something" (that's what's in the file)
A null pointer
char* temp; // null pointer
int resp = fscanf(fp,"%s", temp);
printf("Trying to print: %s\n",temp); // Crashes, segmentation fault
So a few questions have arisen!
How can a pointer with malloc of 1 contain longer texts?
Since the pointer's content doesn't seem to matter, why does a null pointer crash? I would expect the allocated pointer to crash as well, since it points to a small piece of memory.
Why does the pointer work, but an array (char temp[1];) crashes?
Edit:
I'm well aware that you need to pass a big enough buffer to contain the data from the line, I was wondering why it was still working and not crashing in other situations.
My understanding of fscanf:
grabs a line from a file and based on
format, stores it to a string.
No, that contains some serious and important misconceptions. fscanf() reads from a file as directed by the specified format, so as to assign values to some or all of the objects pointed-to by its third and subsequent arguments. It does not necessarily read a whole line, but on the other hand, it may read more than one.
In your particular usage,
int resp = fscanf(fp,"%s", temp);
, it attempts to skip any leading whitespace, including but not limited to empty and blank lines, then read characters into the pointed-to character array, up to the first whitespace character or the end of the file. Under no circumstance will it consume the line terminator of the line from which it populates the array contents, but it will not even get that far if there is other whitespace on the line following at least one non-whitespace character (though that is not the case in the particular sample input you describe).
That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).
Strings are not an actual data type in C. Arrays of chars are, but such arrays are not "strings" in the C sense unless they contain at least one null character. Furthermore, in that case, C string functions for the most part operate only on the portions of such arrays up to and including the first null, so it is those portions that are best characterized as "strings".
There is more than one way to obtain storage for character sequences that can be considered strings, but there is only one way to pass them around: by means of a pointer to their first character. Whether you obtain storage by declaring a character array, by a string literal, or by allocating memory for it, the contents are accessed only via pointers. Even when you declare a char array and access elements by applying the index operator, [], to the name of the array variable, you are actually still using a pointer to access the contents.
Why does a pointer with malloc of 1 can contain longer texts?
A pointer does not contain anything but itself. It is the space it points to that contains anything else, such as text. If you allocate only one byte, then the allocated space can contain only one byte. If you overrun that one byte by attempting to write a longer character sequence where the pointer points, then you invoke undefined behavior. In particular, C does not guarantee that an error will be generated, or that the program will fail to behave as you expect, but all manner of havoc can ensue, without limit.
Since the pointer content doesn't seem to matter, why does a null pointer crash, I would expect the allocated pointer to crash as
well, since it points to a small piece of memory.
Attempting to dereference an invalid pointer, including, but not limited to a null pointer, also produces undefined behavior. A crash is well within the realm of possible behaviors. C does not guarantee a crash in that case, but that's reliably provided by some implementations.
Why does the pointer work, but an array(char temp[1];) crashes?
You do not demonstrate your 1-character array alternative, but again, overrunning the bounds of the object -- in this case an array -- produces undefined behavior. It is undefined so it is not justified to suppose that the behavior would be the same as for overrunning the bounds of an allocated object, or even that either one of those behaviors would be consistent.
That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).
For passing a C-"string" to scanf() & friends there is just one way: Pass it the address of enough valid memory.
If you don't the code would invoke the infamouse Undefined Behaviour, which means anything can happen, from crash to seemingly running fine.
Why does a pointer with malloc of 1 can contain longer texts?
In theory, it can't without causing undefined behavior. In practice, however, when you allocate a single byte, the allocator gives you a small chunk of memory of the smallest size it supports, which is usually sufficient for 8..10 characters without causing a crash. The additional memory serves as a "padding" that prevents a crash (but it is still undefined behavior).
Since the pointer content doesn't seem to matter, why does a null pointer crash, I would expect the allocated pointer to crash as well, since it points to a small piece of memory.
Null pointer, on the other hand, is not sufficient even for an empty string, because you need space for null terminator. Hence, it's a guaranteed UB, which manifests itself as a crash on most platforms.
Why does the pointer work, but an array(char temp[1]) crashes?
Because arrays are allocated without any extra "padding" memory after them. Note that a crash is not guaranteed, because the array may be followed by unused bytes of memory, which your string could corrupt without any consequences.
Because null pointers aren't allocated with memory.
When you request for a small piece of memory, it is allocated from a block of memory called "heap". The heap is always allocated and freed in units of blocks or pages, which will always be a little larger than a few bytes, usually several KBs.
So when you allocate memory with new or by defining an array (small), you get a piece of memory in the heap. The actually available space is larger and can (often) go over the amount you requested, so it's practically safe to write (and read) more than requested. But theoretically, it's an UB and should make the program crash.
When you create a null pointer, it points to 0, an invalid address that can't be read from or written to. So it's guaranteed that the program will crash, often by a segmentation fault.
Small arrays may crash more often than new and malloc because they aren't always allocated from heap, and may come without any extra space after them, so it's more dangerous to write over the limit. However they're often preceding unused (unallocated) memory areas, so sometimes your program may not crash, but gets corrupted data instead.
I'm currently learning C programming and since I'm a python programmer, I'm not entirely sure about the inner workings of C. I just stumbled upon a really weird thing.
void test_realloc(){
// So this is the original place allocated for my string
char * curr_token = malloc(2*sizeof(char));
// This is really weird because I only allocated 2x char size in bytes
strcpy(curr_token, "Davi");
curr_token[4] = 'd';
// I guess is somehow overwrote data outside the allocated memory?
// I was hoping this would result in an exception ( I guess not? )
printf("Current token > %s\n", curr_token);
// Looks like it's still printable, wtf???
char *new_token = realloc(curr_token, 6);
curr_token = new_token;
printf("Current token > %s\n", curr_token);
}
int main(){
test_realloc();
return 0;
}
So the question is: how come I'm able to write more chars into a string than is its allocated size? I know I'm supposed to handle mallocated memory myself but does it mean there is no indication that something is wrong when I write outside the designated memory?
What I was trying to accomplish
Allocate a 4 char ( + null char ) string where I would write 4 chars of my name
Reallocate memory to acomodate the last character of my name
know I'm supposed to handle mallocated memory myself but does it mean there is no indication that something is wrong when I write outside the designated memory?
Welcome to C programming :). In general, this is correct: you can do something wrong and receive no immediate feedback that was the case. In some cases, indeed, you can do something wrong and never see a problem at runtime. In other cases, however, you'll see crashes or other behaviour that doesn't make sense to you.
The key term is undefined behavior. This is a concept that you should become familiar with if you continue programming in C. It means just like it sounds: if your program violates certain rules, the behaviour is undefined - it might do what you want, it might crash, it might do something different. Even worse, it might do what you want most of the time, but just occasionally do something different.
It is this mechanism which allows C programs to be fast - since they don't at runtime do a lot of the checks that you may be used to from Python - but it also makes C dangerous. It's easy to write incorrect code and be unaware of it; then later make a subtle change elsewhere, or use a different compiler or operating system, and the code will no longer function as you wanted. In some cases this can lead to security vulnerabilities, since unwanted behavior may be exploitable.
Suppose that you have an array as shown below.
int arr[5] = {6,7,8,9,10};
From the basics of arrays, name of the array is a pointer pointing to the base element of the array. Here, arr is the name of the array, which is a pointer, pointing to the base element, which is 6. Hence,*arr, literally, *(arr+0) gives you 6 as the output and *(arr+1) gives you 7 and so on.
Here, size of the array is 5 integer elements. Now, try accessing the 10th element, though the size of the array is 5 integers. arr[10]. This is not going to give you an error, rather gives you some garbage value. As arr is just a pointer, the dereference is done as arr+0,arr+1,arr+2and so on. In the same manner, you can access arr+10 also using the base array pointer.
Now, try understanding your context with this example. Though you have allocated memory only for 2 bytes for character, you can access memory beyond the two bytes allocated using the pointer. Hence, it is not throwing you an error. On the other hand, you are able to predict the output on your machine. But it is not guaranteed that you can predict the output on another machine (May be the memory you are allocating on your machine is filled with zeros and may be those particular memory locations are being used for the first time ever!). In the statement,
char *new_token = realloc(curr_token, 6); note that you are reallocating the memory for 6 bytes of data pointed by curr_token pointer to the new_tokenpointer. Now, the initial size of new_token will be 6 bytes.
Usually malloc is implemented such a way that it allocates chunks of memory aligned to paragraph (fundamental alignment) that is equal to 16 bytes.
So when you request to allocate for example 2 bytes malloc actually allocates 16 bytes. This allows to use the same chunk of memory when realloc is called.
According to the C Standard (7.22.3 Memory management functions)
...The pointer returned if the allocation succeeds is suitably aligned so
that it may be assigned to a pointer to any type of object
with a fundamental alignment requirement and then used to access such an
object or an array of such objects in the space allocated
(until the space is explicitly deallocated).
Nevertheless you should not rely on such behavior because it is not normative and as result is considered as undefined behavior.
No automatic bounds checking is performed in C.
The program behaviour is unpredictable.
If you go writing in the memory reserved for another process, you will end with a Segmentation fault, otherwise you will only corrupt data, ecc...
char * a = (char *) malloc(10);
strcpy(a,"string1");
char * x = "string2";
strcat(a,x);
printf("\n%s",a);
Here, I allocated only 10B to a, but still after concatenating a and x (combined size is 16B), C prints the answer without any problem.
But if I do this:
char * a = "string1";
char * x = "string2";
strcat(a,x);
printf("\n%s",a);
Then I get a segfault. Why is this? Why does the first one work despite lower memory allocation? Does strcat reallocate memory for me? If yes, why does the second one not work? Is it because a & x declared that way are unmodifiable string literals?
In your first example, a is allocated in the heap. So when you're concatenating the other string, something in the heap will be overwritten, but there is no write-protection.
In your second example, a points to a region of the memory that contains constants, and is readonly. Hence the seg fault.
The first one doesn't always work, it already caused an overflow. The second one, a is a pointer to the constant string which is stored in the data section, in a read-only page.
In the 2nd case what you have is a pointer to unmodifiable string literals,
In 1st case, you are printing out a heap memory location and in that case its undefined, you cannot guarantee that it will work every time.
(may be write it in a very large loop, yo may see this undefined behavior)
Your code is writing beyond the buffer that it's permitted, which causes undefined behavior. This can work and it can fail, and worse: it can look like it worked but cause seemingly unrelated failures later. The language allows you to do things like this because you're supposed to know what you're doing, but it's not recommended practice.
In your first case, of having used malloc to allocate buffers, you're actually being helped but not in a manner you should ever rely on. The malloc function allocates at least as much space as you've requested, but in practice it typically rounds up to a multiple of 16... so your malloc(10); probably got a 16 byte buffer. This is implementation specific and it's never a good idea to rely on something like that.
In your second case, it's likely that the memory pointed to by your a (and x) variable(s) is non-writable, which is why you've encountered a segfault.
What exactly happens, in terms of memory, when i declare something like:
char arr[4];
How many bytes are reserved for arr?
How is null string accommodated when I 'strcpy' a string of length 4 in arr?
I was writing a socket program, and when I tried to suffix NULL at arr[4] (i.e. the 5th memory location), I ended up replacing the values of some other variables of the program (overflow) and got into a big time mess.
Any descriptions of how compilers (gcc is what I used) manage memory?
sizeof(arr) bytes are saved* (plus any padding the compiler wants to put around it, though that isn't for the array per se). On an implementation with a stack, this just means moving the stack pointer sizeof(arr) bytes down. (That's where the storage comes from. This is also why automatic allocation is fast.)
'\0' isn't accommodated. If you copy "abcd" into it, you get a buffer overrun, because that takes up 5 bytes total, but you only have 4. You enter undefined behavior land, and anything could happen.
In practice you'll corrupt the stack and crash sooner or later, or experience what you did and overwrite nearby variables (because they too are allocated just like the array was.) But nobody can say for certain what happens, because it's undefined.
* Which is sizeof(char) * 4. sizeof(char) is always 1, so 4 bytes.
What exactly happens, in terms of
memory, when i declare something like:
char arr[4];
4 * sizeof(char) bytes of stack memory is reserved for the string.
How is null string accommodated when I
'strcpy' a string of length 4 in arr?
You can not. You can only have 3 characters, 4th one (i.e. arr[3]) should be '\0' character for a proper string.
when I tried to suffix NULL at arr[4]
The behavior will be undefined as you are accessing a invalid memory location. In the best case, your program will crash immediately, but it might corrupt the stack and crash at a later point of time also.
In C, what you ask for is--usually--exactly what you get. char arr[4] is exactly 4 bytes.
But anything in quotes has a 'hidden' null added at the end, so char arr[] = "oops"; reserves 5 bytes.
Thus, if you do this:
char arr[4];
strcpy(arr, "oops");
...you will copy 5 bytes (o o p s \0) when you've only reserved space for 4. Whatever happens next is unpredictable and often catastrophic.
When you define a variable like char arr[4], it reserves exactly 4 bytes for that variable. As you've found, writing beyond that point causes what the standard calls "undefined behavior" -- a euphemism for "you screwed up -- don't do that."
The memory management of something like this is pretty simple: if it's a global, it gets allocated in a global memory space. If it's a local, it gets allocated on the stack by subtracting an appropriate amount from the stack pointer. When you return, the stack pointer is restored, so they cease to exist (and when you call another function, will normally get overwritten by parameters and locals for that function).
When you make a declaration like char arr[4];, the compiler allocates as many bytes as you asked for, namely four. The compiler might allocate extra in order to accommodate efficient memory accesses, but as a rule you get exactly what you asked for.
If you then declare another variable in the same function, that variable will generally follow arr in memory, unless the compiler makes certain optimizations again. For that reason, if you try to write to arr but write more characters than were actually allocated for arr, then you can overwrite other variables on the stack.
This is not really a function of gcc. All C compilers work essentially the same way.