Why does overflowing a char array influence the other array?

Why does overflowing a char array influence the other array? - c

Hi. Here's the conundrum. I have this code:
#include<stdio.h>
#include<conio.h>
#include<string.h>
int main(){
char a[5];
char b[5];
memset(a, 0, 5);
memset(b, 0,5);
strcpy(a, "BANG");
printf("b = ");
scanf("%s", &b);
printf("a = %s\n", a);
getch();
}
When you run it, you'll notice that if you read a long enough string into b, the value of a will change too. You would expect it to remain "BANG", but that is not what happens. I would like to have an explanation for this. Thank you!

You're creating a "buffer overflow". The array is dimensioned to only hold 5 bytes (4 characters plus standard C string terminator), and if you put there more than that, the rest will overspill.
Usually, into something important, making your program crash.
There are automated tools (e.g. valgrind) to detect this kind of bugs.

If the string is long enough, you are getting a buffer overrun and the behavior is undefined, which include overwriting the other array or even crashing the application. Because the behavior is undefined you should avoid it, but just for the sake of understanding, the compiler has laid out the a array after the b array in memory (in this particular run of the compiler). When you write b+sizeof(b) you are writing to a[0].

Congratulations, you've run into your first buffer overflow (first that you're aware of :) ).
The arrays will be allocated in the stack of the program and these arrays are adjacent. Since C does not check violation of array bounds, you may access any permitted part of memory as a cell of any array.
Let's review a very common runtime example, this program running on x86. The stack on x86 is growing to the least addresses, so usually compiler places a[] above the b[] on the stack. When you try to access b[5], it will be the same address as a[0], b[6] is a[1], and so on.
This is how buffer overflow exploits work: some careless programmer does not check the string size in the buffer and then an evil hacker writes his malicious code to the stack and runs it.

Think about it in terms of your program's memory.
a is an array of 5 characters, b is an array of 5 characters. Something like this on your stack:
[0][0][0][0][0][0][0][0][0][0]
^ ^
| +--"a" something like 0xbfe69e52
+-----------------"b" something like 0xbfe69e4d
So when you do your strcpy of "bang":
[0][0][0][0][0][B][A][N][G][0]
^ ^
| +--"a" something like 0xbfe69e52
+-----------------"b" something like 0xbfe69e4d
Now if you put a "long" string into b:
[T][h][i][s][I][s][l][o][n][g]
^ ^
| +--"a" something like 0xbfe69e52
+-----------------"b" something like 0xbfe69e4d
Opps, just lost a. This is a "buffer overflow" because you overflowed b (in to a in this case). C isn't going to stop you from doing that.

The one thing everyone above seems to forget to mention is the fact that the stack is usually handled in the opposite direction to what you'd expect.
Effectively the allocation of 'a' SUBTRACTS 5 bytes from the current stack pointer (esp/rsp on x86/x64). The allocation of 'b' then subtracts a further 5 bytes.
So lets say your esp is 0x1000 when you make your first stack allocation. This gives 'a' the memory address 0xFB. 'b' then will get 0xF6 and hence the 6th byte (ie index 5) of 0xF6 is 0xF6 + 5 or 0xFB and thus you are now writing into the array for a.
This can easily be confirmed by the following code (Assuming 32-bit):
printf( "0x%08x\n", a );
printf( "0x%08x\n", b );
You will see that b has a lower memory address than a.

b has only 5 letters. So if you write a longer string, you are writing the memory adjacent to b.

C does no bounds checking on memory access, so you are free to read and write past the declared end of an array. a and b may end up adjacent in memory, even in reverse order from their declaration, so unless your code takes care not to read more characters than e.g. belong to b, you can corrupt a. What will actually happen is undefined, and may change from run to run.
In this particular case note that you can limit the number of characters read by scanf using a width in the format string: scanf("%4s", &b);

Related

If strncat adding NUL may cause the array go out of bound

I have some trouble with strncat().The book called Pointers On C says the function ,strncat(),always add a NUL in the end of the character string.To better understand it ,I do an experiment.
#include<stdio.h>
#include<string.h>
int main(void)
{
char a[14]="mynameiszhm";
strncat(a,"hello",3);
printf("%s",a);
return 0;
}
The result is mynameiszhmhel
In this case the array has 14 char memory.And there were originally 11 characters in the array except for NUL.Thus when I add three more characters,all 14 characters fill up the memory of array.So when the function want to add a NUL,the NUL takes up memory outside the array.This cause the array to go out of bounds but the program above can run without any warning.Why?Will this causes something unexpected?
So when we use the strncat ,should we consider the NUL,in case causes the array go out of bound?
And I also notice the function strncpy don't add NUL.Why this two string function do different things about the same thing?And why the designer of C do this design?

This cause the array to go out of bounds but the program above can run without any warning. Why?
Maybe. With strncat(a,"hello",3);, code attempted to write beyond the 14 of a[]. It might go out of bounds, it might not. It is undefined behavior (UB). Anything is allowed.
Will this causes something unexpected?
Maybe, the behavior is not defined. It might work just as you expect - whatever that is.
So when we use thestrncat ,should we consider the NUL, in case causes the array go out of bound?
Yes, the size parameter needs to account for appending a null character, else UB.
I also notice the function strncpy don't add NUL. Why this two string function do different things about the same thing? And why the designer of C do this design?
The 2 functions strncpy()/strncat() simple share similar names, not highly similar paired functionality of strcpy()/strcat().
Consider that the early 1970s, memory was far more expensive and many considerations can be traced back to a byte of memory more that an hour's wage. Uniformity of functionality/names was of lesser importance.
And there were originally 11 characters in the array except for NUL.
More like "And there were originally 11 characters in the array except for 3 NUL.". This is no partial initialization in C.

This is not really an answer, but a counterexample.
Observe the following modification to your program:
#include<stdio.h>
#include<string.h>
int main(void)
{
char p[]="***";
char a[14]="mynameiszhm";
char q[]="***";
strncat(a,"hello",3);
printf("%s%s%s", p, a, q);
return 0;
}
The results of this program are dependent on where p and q are located in memory, compared to a. If they are not adjacent, the results are not so clear but if either p or q immediately comes after a, then your strncat will overwrite the first * causing one of them not to be printed anymore because that will now be a string of length 0.
So the results are dependent on memory layout, and it should be clear that the compiler can put the variables in memory in any order it likes. And they can be adjacent or not.
So the problem is that you are not keeping to your promise not to put more than 14 bytes into a. The compiler did what you asked, and the C standards guarantee behaviour as long as you keep to the promises.
And now you have a program that may or may not do what you wanted it to do.

Does specifying array size for a user input string in C matter?

I am writing a code to take a user's input from the terminal as a string. I've read online that the correct way to instantiate a string in C is to use an array of characters. My question is if I instantiate an array of size [10], is that 10 indexes? 10 bits? 10 bytes? See the code below:
#include <stdio.h>
int main(int argc, char **argv){
char str[10] = "Jessica";
scanf("%s", &str);
printf("%c\n", str[15]);
}
In this example "str" is initialized to size 10 and I am able to to print out str[15] assuming that when the user inputs a a string it goes up to that index.
My questions are:
Does the size of the "str" array increase after taking a value from scanf?
At what amount of string characters will my original array have overflow?
.

When you declare an array of char as you have done:
char str[10] = "Jessica";
then you are telling the compiler that the array will hold up to 10 values of the type char (generally - maybe even always - this is an 8-bit character). When you then try to access a 'member' of that array with an index that goes beyond the allocated size, you will get what is known as Undefined Behaviour, which means that absolutely anything may happen: your program may crash; you may get what looks like a 'sensible' value; you may find that your hard disk is entirely erased! The behaviour is undefined. So, make sure you stick within the limits you set in the declaration: for str[n] in your case, the behaviour is undefined if n < 0 or n > 9 (array indexes start at ZERO). Your code:
printf("%c\n", str[15]);
does just what I have described - it goes beyond the 'bounds' of your str array and, thus, will cause the described undefined behaviour (UB).
Also, your scanf("%s", &str); may also cause such UB, if the user enters a string of characters longer than 9 (one must be reserved for a terminating nul character)! You can prevent this by telling the scanf function to accept a maximum number of characters:
scanf("%9s", str);
where the integer given after the % is the maximum input length allowed (anything after this will be ignored). Also, as str is defined as an array, then you don't need the explicit "address of" operator (&) in scanf - it is already there, as an array reference decays to a pointer!
Hope this helps! Feel free to ask for further clarification and/or explanation.

One of C's funny little foibles is that in almost all cases it does not check to make sure you are not overflowing your arrays.
It's your job to make sure you don't access outside the bounds of your arrays, and if you accidentally do, almost anything can happen. (Formally, it's undefined behavior.)
About the only thing that can't happen is that you get a nice error message
Error: array out-of-bounds access at line 23
(Well, theoretically that could happen, but in practice, virtually no C implementation checks for array bounds violations or issues messages like that.)
See also this answer to a similar question.

An array declares the given number of whatever you are declaring. So in the case of:
char str[10]
You are declaring an array of ten chars.
Does the size of the "str" array increase after taking a value from scanf?
No, the size does not change.
At what amount of string characters will my original array have overflow?
An array of 10 chars will hold nine characters and the null terminator. So, technically, it limits the string to nine characters.
printf("%c\n", str[15]);
This code references the 16th character in your array. Because your array only holds ten characters, you are accessing memory outside of the array. It's anyone's guess as to if your program even owns that memory and, if it does, you are referencing memory that is part of another variable. This is a recipe for disaster.

Difference in storage of memory in string and an integer array in C after using sprintf function

Could you please explain me memory allocation in C for strings and integer array after using sprintf function?
#include <stdio.h>
#include <stdlib.h>
int main() {
char str[10];
long long int i = 0, n = 7564368987643389, l;
sprintf(str, "%lld", n); // made it to a string
printf("%c%c", str[11], str[12]);
}
In the above code, string size is 10 inclusive of null character. How come we access 11 and 12 elements in it? The program prints 43

Here
sprintf(str,"%lld",n);
n i.e 7564368987643389 converted into character buffer & stored into str. It looks like
str[0] str[2] str[4] str[6] str[8] str[10] str[12] ..
--------------------------------------------------------------------------------------
| 7 | 5 | 6 | 4 | 3 | 6 | 8 | 9 | 8 | 7 | 6 | 4 | 3 | 3 | 8 | 9 | \0 |
--------------------------------------------------------------------------------------
str str[1] str[3] str[5] str[7] str[9] str[11] ..
As you can see str[11] is 4 and str[12] is 3. Hence the printf() statement below prints:
printf("%c %c",str[11],str[12]);
4 3
But since you have declared str of 10 characters and in sprintf() you are trying to store more than 10 characters, it causes undefined behavior.

You’ve written past the end of the array into memory you don’t own - the behavior on doing so is undefined, and any outcome is possible.
C does not require bounds checking on array writes or accesses. You won’t get an ArrayIndexOutOfBounds-type exception if you index past the end of the array.
In this case you didn’t overwrite anything important, so the program ran as you would expect, but it doesn’t have to. You could have corrupted data in an adjacent object, or you could have gotten a runtime error from the OS.
You are responsible for knowing how big the target buffer is and not reading or writing past the end of it. The language will not protect you here.

Could you please explain me memory allocation in C for strings and integer array?
Most of the time in C, memory allocation is your responsibility. In particular, when you call functions such as strcpy, sprintf and fread, that write potentially arbitrarily many characters to a buffer you supply, it is your responsibility to ensure, somehow, beforehand, that the buffer is big enough.
Some functions, such as fread, let you say how big your buffer is, so that they can be sure not to overflow it. Others, such as strcpy, sprintf, and scanf with directives like %s, do not. You must be especially careful with these functions.
When you write something like
char str[10];
sprintf(str, "%lld", 7564368987643389);
where you supply a buffer that is not big enough for the result, two questions tend to arise:
Why didn't it work?
Why did it work?
If it didn't work, the reason why should be obvious: the destination buffer simply wasn't big enough. And if despite that problem, it did seem to work, the reason is because C doesn't typically enforce (doesn't explicitly guard against) buffer overflow.
Suppose I buy some land -- a 1600 square foot plot -- in an undeveloped neighborhood. Suppose my title deed says:
The property line runs south for 40 feet from an iron stake, then 40 feet west, then 40 feet north, then 40 feet east.
So I've got 40 x 40 foot plot of land, but the only feature on the ground that positively identifies where my property is, is an iron stake at one corner. There isn't a crisp black line painted on the soil, or anything, precisely delineating the property lines.
Suppose I hire an architect and a builder, and we build a house on my new plot of land, and we screw up our measurements, and we build the house 10 feet into a neighboring parcel of land (that I don't own). What happens?
What does not happen is that the instant we dig that first footing or pour that first concrete or erect that first wall that crosses over the property line, a giant error message appears in the sky saying "PROPERTY LINE EXCEEDED".
No, this kind of error is not guaranteed to get detected right away. The problem might not be noticed (by a building inspector, or by the owner of the adjacent property) until tomorrow, or next week, or next year; and under some circumstances it might never get noticed at all.
And the situation is just about exactly the same with C memory allocation. If you write more to an array than it's allocated to hold, the problem might not reveal itself for a while, or it might not reveal itself at all: the program might seem to work perfectly, despite this reasonably dire error it contains.
For this reason, not only do you have to be careful with memory allocation in C, there are some good habits to get into. Not only is it important to declare you arrays (or malloc your buffers) big enough, but you also want to make sure that, whenever possible, the size is checked. For example:
When you call functions like fread, that accept a destination buffer and the size of that buffer, make sure the size you pass is accurate.
Instead of calling functions like sprintf that accept a destination buffer but with no way to specify its size, prefer alternative functions like snprintf that do allow the size to be specified and that therefore can guard against overflow.
If there's a function that doesn't allow the buffer size to be specified and for which there's no better alternative, maybe just don't use that function at all. Examples are strcpy and scanf with the %s and %[ directives.
When you write your own functions that accept pointers to buffers and that write characters or other data into those buffers, make sure you provide an argument by which the caller can explicitly supply the buffer size, and make sure your function honors this limit.

The program has undefined behavior because:
sprintf stores the characters 7564368987643389 plus a null terminator (a total of 17 bytes) in the destination array str defined with a length of only 10 bytes. sprintf does not receive the length of the array, hence writes the output to memory beyond the end of the array if it is too short. You should always use snprintf(str, sizeof str, "%lld", n); to avoid such undefined behavior.
printf("%c%c", str[11], str[12]); reads 2 bytes beyond the end of the str array, namely the 12th and the 13th bytes. This has undefined behavior, but if printf did successfully store the 17 bytes into the memory starting at the address of str, reading these bytes may yield the values '4' and '3', and produce an output of 43, which may or may not be visible as you did not end the program's output with a newline.
Writing and reading beyond the end of an array has undefined behavior, it may cause the program to crash or may seem to function as expected, but undesirable side effects can occur and go unnoticed for a while. On your system it seems nothing bad happened, but on some other system, it may cause tremendous damage... imagine if the program were running as part of a nuclear powerplant regulation system, you would not want to test the system's resilience this way.

I suggest creating some function you need.
Like this.
#include <stdio.h>
#include <stdlib.h>
char* LL_Int_To_Str(long long int A){
int Lenz=0,i;
int NegFlag=0;
if(A<0){
A=~A+1;
NegFlag=1;
}
long long int B=A;
do{
B/=10;
Lenz++;
}while(B);
char *Result=(char*)malloc(sizeof(char)*(Lenz+1+NegFlag));
Result[Lenz+NegFlag]='\0';
for(i=Lenz-1;i>-1-NegFlag;i--){
Result[i+NegFlag]=(A%10)+48;
A/=10;
}
if(NegFlag){
Result[0]='-';
}
return Result;
}
int main(){
int i;
long long int n=7564368987643389;
//long long int n=-7564368987643389;
char* StrX=LL_Int_To_Str(n);
/*char* Loop;//Debug
for(Loop=StrX;*Loop!='\0';Loop++){
printf("%c\n",*Loop);
}*/
printf("%s\n",StrX);
free(StrX);
return 0;
}
Then you can make sure nothing mistake.
Sometimes you think you can use some function save your time.
But most of time ,it's waste.
Another suggest
#include <limits.h>
printf("%lld\n",LLONG_MAX);
You can find max of long long int is 9223372036854775807
So max size of string requires is 19+1+1 (1 for '\0' 1 for '-')
Just use char str[21]; solve all problem

Why does C hold a large string in a tiny char array? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Disclaimer: Been doing Java for a while, but new to C.
I have a program that I wrote, and I'm purposely trying to see what happens with different inputs and outputs.
#include <stdio.h>
int main() {
printf("whattup\n");
char str1[1], str2[1];
printf("Enter something: ");
scanf("%s", &str1);
printf("Enter something else: ");
scanf("%s", &str2);
printf("first thing: %s\n", str1);
printf("second thing: %s", str2);
}
This is the program flow:
whattup
Enter something: ahugestatement
Enter something else: smallertext
first thing: mallertext
Things I don't understand:
Why does "first thing" print out the str2?
Why does str2 have it's first letter cut off?
Why does "second thing:" not print out?
I made the char array with a size of 1, shouldn't it only hold 1 letter?

To answer your questions specifically, you'll have to keep in mind that what happens is very much implementation-specific. The specific behavior you're seeing doesn't have to hold true on all C implementations. This is what the C standard calls "undefined behavior". With that in mind:
Why does "first thing" print out the str2?
Why does str2 have it's first letter cut off?
You have allocated storage for two chars on the stack. The compiler allocates them next to each other, with str2 preceding str1 in memory. Therefore, after your first scanf, part of the stack will look like this:
str1 is allocated here
v
? a h u g e s t a t e m e n t \0
^
str2 is allocated here
Then, after the second scanf, the same part of memory will look like this:
str1 is allocated here
v
s m a l l e r t e x t \0 e n t \0
^
str2 is allocated here
In other words, the second input simply overwrites the first, since it goes beyond the bounds of the storage you allocated for it. Then, when you print out str1, it simply prints whatever is at the address of str1, which, as you can see in the figure above, is mallertext.
Why does "second thing:" not print out?
This is because of two effects interacting. For one thing, where you print str2, you do not end the output with a newline. stdout is normally line-buffered, which means that data written to it is not actually written to the underlying terminal until either A) a newline is written, B) you explicitly call fflush(stdout), or C) the program exits.
It would, therefore, print it when the program exited, but your program never exits. Since you overwrite parts of the stack that you don't manage, in this case you overwrite the return address from main, and therefore, when you return from main, your program promptly crashes, and thus never arrives to the point where it would flush stdout.
In the case of your program, the stack-frame layout of main looks like this (assuming AMD64 Linux):
RBP+8: Return address
RPB+0: Previous frame address
RBP-1: str1
RBP-2: str2
Since ahugestatement including its NUL terminator is 15 bytes, the 14 of those bytes that don't fit in str1 overwrite the entire previous frame address and 6 bytes of the return address. Since the new return address is entirely invalid, your program segfaults when the return from main jumps to an address that isn't even mapped in memory.
I made the char array with a size of 1, shouldn't it only hold 1 letter?
Yes, and it does. It's just that you clobber the memory that follows it.
As a general statement, scanf is not really a terribly useful function if you want to do even any most basic form of checking for illegal input. If you're hoping to do interactive input at all, it is almost always better to use something like fgets() instead and then parse the read input. fgets(), unlike scanf, takes an additional input for the size of the receiving buffer, and will then make sure to not write outside it.

You have to do the bounds checking in C, to make sure your buffers don't overflow. So your output is undefined. If you run that code many times, its bound to crash at some point because the overflown buffers will end up overwriting something important.

That's called buffer overflow. You allocated one character to hold your input, but you are writing beyond that (messing up the rest of your program's memory).
Unlike Java, the C compiler and runtime do not enforce array bounds. That is one of the main differences between "(memory-) managed languages" and low-level languages.

Your array only holds one character and the rest is out of bounds.
Access out of the range of an array is undefined and usually disastrous.

Difference between array and malloc

Here is my code :
#include<stdio.h>
#include <stdlib.h>
#define LEN 2
int main(void)
{
char num1[LEN],num2[LEN]; //works fine with
//char *num1= malloc(LEN), *num2= malloc(LEN);
int number1,number2;
int sum;
printf("first integer to add = ");
scanf("%s",num1);
printf("second integer to add = ");
scanf("%s",num2);
//adds integers
number1= atoi(num1);
number2= atoi(num2);
sum = number1 + number2;
//prints sum
printf("Sum of %d and %d = %d \n",number1, number2, sum);
return 0;
}
Here is the output :
first integer to add = 15
second integer to add = 12
Sum of 0 and 12 = 12
Why it is taking 0 instead of first variable 15 ?
Could not understand why this is happening.
It is working fine if I am using
char *num1= malloc(LEN), *num2= malloc(LEN);
instead of
char num1[LEN],num2[LEN];
But it should work fine with this.
Edited :
Yes, it worked for LEN 3 but why it showed this undefined behaviour. I mean not working with the normal arrays and working with malloc. Now I got that it should not work with malloc also. But why it worked for me, please be specific so that I can debug more accurately ?
Is there any issue with my system or compiler or IDE ?
Please explain a bit more as it will be helpful or provide any links to resources. Because I don't want to be unlucky anymore.

LEN is 2, which is enough to store both digits but not the required null terminating character. You are therefore overrunning the arrays (and the heap allocations, in that version of the code!) and this causes undefined behavior. The fact that one works and the other does not is simply a byproduct of how the undefined behavior plays out on your particular system; the malloc version could indeed crash on a different system or a different compiler.
Correct results, incorrect results, crashing, or something completely different are all possibilities when you invoke undefined behavior.
Change LEN to 3 and your example input would work fine.
I would suggest indicating the size of your buffers in your scanf() line to avoid the undefined behavior. You may get incorrect results, but your program at least would not crash or have a security vulnerability:
scanf("%2s", num1);
Note that the number you use there must be one less than the size of the array -- in this example it assumes an array of size 3 (so you read a maximum of 2 characters, because you need the last character for the null terminating character).

LEN is defined as 2. You left no room for a null terminator. In the array case you would overrun the array end and damage your stack. In the malloc case you would overrun your heap and potentially damage the malloc structures.
Both are undefined behaviour. You are unlucky that your code works at all: if you were "lucky", your program would decide to crash in every case just to show you that you were triggering undefined behaviour. Unfortunately that's not how undefined behaviour works, so as a C programmer, you just have to be defensive and avoid entering into undefined behaviour situations.
Why are you using strings, anyway? Just use scanf("%d", &number1) and you can avoid all of this.

Your program does not "work fine" (and should not "work fine") with either explicitly declared arrays or malloc-ed arrays. Strings like 15 and 12 require char buffers of size 3 at least. You provided buffers of size 2. Your program overruns the buffer boundary in both cases, thus causing undefined behavior. It is just that the consequences of that undefined behavior manifest themselves differently in different versions of the code.
The malloc version has a greater chance to produce illusion of "working" since sizes of dynamically allocated memory blocks are typically rounded to the nearest implementation-depended "round" boundary (like 8 or 16 bytes). That means that your malloc calls actually allocate more memory than you ask them to. This might temporarily hide the buffer overrun problems present in your code. This produces the illusion of your program "working fine".
Meanwhile, the version with explicit arrays uses local arrays. Local arrays often have precise size (as declared) and also have a greater chance to end up located next to each other in memory. This means that buffer overrun in one array can easily destroy the contents of the other array. This is exactly what happened in your case.
However, even in the malloc-based version I'd still expect a good debugging version of standard library implementation to catch the overrun problems. It is quite possible that if you attempt to actually free these malloc-ed memory blocks (something you apparently didn't bother to do), free will notice the problem and tell you that heap integrity has been violated at some point after malloc.
P.S. Don't use atoi to convert strings to integers. Function that converts strings to integers is called strtol.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight