Input too big for array - c

I have a small question that I was just wondering about.
#include <stdio.h>
int main()
{
char n_string[5];
printf("Please enter your first name: ");
scanf("%s", n_string);
printf("\nYour name is: %s", n_string);
return 0;
}
On the 5th line I declare a string of 4 letters. Now this means I will only be able to hold 4 characters in that string, correct?
If I execute my program and write the name: Alexander, I get the output:
Your name is Alexander.
My question is, how come I could put a string of 9 characters into an array that holds 4?

You are overwriting a part of your program's stack by doing that, which is generally a very bad thing. In this case, you got lucky, but if you write further you will almost certainly get a segfault, when main tries to return.
Malicious actors will use this as a buffer overflow attack, to overwrite a function's return address.
If your question is "Why does C allow me to do this?", the answer is that C does not do bounds checking on arrays. It treats arrays (more or less) as a pointer to an address in memory, and scanf is more than happy to write to the memory location without worrying about what it actually represents.

You allocated 5 bytes, but since your CPU probably requires 16-byte alignment, the compiler probably allocated 16 bytes. Try this :
char n_string[5];
volatile int some_int;
some_int= 0;
sscanf(..);
printf("%s %d\n", n_string, some_int);
Is some_int still 0? Writing into n_string may have caused a buffer overflow and written bad data to some_int. Of course your compiler probably knows that some_int will stay a zero, so we declare it like volatile int some_int; to stop it from optimizing.

You reserve memory for 4 letters and the terminating zero. You write nine letters and a zero to it. You overstepped your bounds by 5 bytes. Those 5 bytes belonged to someone else, you just trashed his memory.
The most likely candidate for this is variables that are close. Test this, although not guaranteed, chances are you will see what happens with your remaining bytes: they will damage your i variable:
#include <stdio.h>
int main()
{
char n_string[5];
int i = 17;
printf("Please enter your first name: ");
scanf("%s", n_string);
printf("\nYour name is: %s", n_string);
printf("\nThe variable i is %d", i);
return 0;
}

I think there just happens to be valid memory in your process at the address contiguous to your array that means it just happens to work. However, it will be corrupting other memory elsewhere in the process by overwriting it.
Essentially you have a buffer overflow.

Related

Writing 5 character to char[5] affects int

Easy code down below.
Mac OS X 10.10.5, Xcode 7.2, C-file.
If I input 1, and afterwards qwert, I get 0 and qwert back.
1 and qwer gives 1 and qwer.
1 and e.g. qwerty gives 121 and qwerty.
What have I missed - why can I write more than 4 chars (+null) to a 5 char variable?
Why is the integer affected?
#include <stdio.h>
int main() {
int userInput;
char q[5];
printf("Hello\n");
scanf("%d", &userInput);
printf("%d\nAnd\n", userInput);
scanf("%s", q);
printf("\n");
printf("%d\n%s", userInput, q);
return 0;
}
What have I missed - why can I write more than 4 chars (+null) to a 5 char variable?
There is nothing stopping you from accessing out of bounds portions of an array in c. This will compile:
char a[2];
a[10000] = 10;
Why is the integer affected?
What you are causing is undefined behavior and is likely the reason that your int is affected. You can learn more about this by reading about c arrays. This is happening because you are putting a 5 character string plus a null terminating character ( ie 6 chars) into a space only meant for 5. You are going outside the bounds of your array.
As a further note, scanf("%s" offers no method of protecting against this behavior. If a user puts in a string that is too long then too bad. That is why you should protect your input by using something like a format string of "%4s" or use fgets:
fgets(q, sizeof q, stdin);
Which are both ways you can protect your input from entering more than 4 characters.
[Edit] User/code can try to "write more than 4 chars (+null) to a 5 char variable". C does not specify what should happen when code does not prevent such an event. C is coding without the safety net/training wheels.
scanf("%s", q); reads and saves the 5 characters of "qwert" and it also appends a null character '\0'. #Weather Vane
Since q[] has only room to 5 characters, undefined behavior occurs (UB). In OP's case, it appear to have over-written userInput.
To avoid, use a width limit on "%s" such as below. It will not consume more than 4 non-white-space from the user. Unfortunately, extra text will remain in stdin.
char q[5];
scanf("%4s", q);
Or better, review fgets() for reading user input.
The reason that the int userInput is affected is that you are writing past the end of the char array (q). Since both of these are stack variables, the compiler you're using seems to be allocating memory on the stack for the local variables in "reverse order", it, they are being "pushed" in the order defined, so the first local variable listed is lower on the stack. So, in your case, when you write past the end of q, you are writing in the memory space allocated for userInput, which is why it is affected.

Buffer overflow in C with gets

I am very new to C and as a class assignment my instructor wanted us to play with buffer overflows. I found the following online as an example and I can't figure out how to use it!
#include <stdio.h>
char temp[32];
unsigned int setThis=1;
printf("Enter your temp: \n");
fgets(temp, 34, stdin); //Takes a 34 buffer size when temp can only be 32
printf("Value of you setThis: %d", setThis);
So my question is, how do i set "setThis" to a certain variable?
Any help is appreciated, BeastlyJman.
There's no guaranteed way to do it, but typically variables are put on the stack such that the first variable is last in memory. So if you declare setThis before temp[32], then setThis will be at the end of the temp array, and you can overwrite it.
But as I said, there's no guarantee that's what the compiler will do. You should really check the assembly code that the compiler generates to see where temp and setThis are located.
Also, you can save yourself some typing if you reduce the size of temp to temp[8] and then pass 10 to fgets. To cause an overflow, you need to type more characters than the buffer can hold.

scanf reads more chars than the destination var can hold

The following code reads up to 10 chars from stdin and output the chars.
When I input more than 10 chars, I expect it to crash because msg has not enough room, but it does NOT! How could that be?
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[])
{
char* msg = malloc(sizeof(char)*10);
if (NULL==msg)
exit(EXIT_FAILURE);
scanf("%s", msg);
printf("You said: %s\n", msg);
if (strlen(msg)<10)
free(msg);
return EXIT_SUCCESS;
}
Use fgets instead, scanf is not buffer safe. What you are seeing is Undefined Behavior.
You may allocate "safe" big size when using scanf(). On user input it should be 2 lines (cca. 2x80 chars), in case of files some bigger.
Conclusion: scanf() is kinda quick-and-dirty stuff, don't use it in serious projects.
You can specify max size in scanf() format string
scanf("%9s", msg);
I would imagine that malloc() allocates blocks of memory aligned to word boundaries. On a 32-bit machine, that means whatever you ask for will be rounded up to the nearest multiple of 4. That means you might get away with a string of at least 11 characters (plus a '\0' terminator) without suffering any problems.
But don't ever assume this to be the case. Like everyone else is saying, you should always specify a safe maximum length in your format string if you want to avoid problems.
It does not crash because c is very lenient, contrary to popular belief. It is not required for the program to crash or even complain if a buffer is overflown. Say you define
union{
uint8_t a[3]
uint32_t b
}
then a[4] is perfectly fine memory and there is no reason to crash (but don't ever do this). Even a[5] or a[100] may be perfectly fine.
On the other hand I may try to access a[-1] which happens to be memory the OS does not allow you to access, causing a segfault.
As to what you should do to fix this:as others have pointed out, scanf is not safe to use with buffers. Use on of their suggetsions.

Why does overflowing a char array influence the other array?

Hi. Here's the conundrum. I have this code:
#include<stdio.h>
#include<conio.h>
#include<string.h>
int main(){
char a[5];
char b[5];
memset(a, 0, 5);
memset(b, 0,5);
strcpy(a, "BANG");
printf("b = ");
scanf("%s", &b);
printf("a = %s\n", a);
getch();
}
When you run it, you'll notice that if you read a long enough string into b, the value of a will change too. You would expect it to remain "BANG", but that is not what happens. I would like to have an explanation for this. Thank you!
You're creating a "buffer overflow". The array is dimensioned to only hold 5 bytes (4 characters plus standard C string terminator), and if you put there more than that, the rest will overspill.
Usually, into something important, making your program crash.
There are automated tools (e.g. valgrind) to detect this kind of bugs.
If the string is long enough, you are getting a buffer overrun and the behavior is undefined, which include overwriting the other array or even crashing the application. Because the behavior is undefined you should avoid it, but just for the sake of understanding, the compiler has laid out the a array after the b array in memory (in this particular run of the compiler). When you write b+sizeof(b) you are writing to a[0].
Congratulations, you've run into your first buffer overflow (first that you're aware of :) ).
The arrays will be allocated in the stack of the program and these arrays are adjacent. Since C does not check violation of array bounds, you may access any permitted part of memory as a cell of any array.
Let's review a very common runtime example, this program running on x86. The stack on x86 is growing to the least addresses, so usually compiler places a[] above the b[] on the stack. When you try to access b[5], it will be the same address as a[0], b[6] is a[1], and so on.
This is how buffer overflow exploits work: some careless programmer does not check the string size in the buffer and then an evil hacker writes his malicious code to the stack and runs it.
Think about it in terms of your program's memory.
a is an array of 5 characters, b is an array of 5 characters. Something like this on your stack:
[0][0][0][0][0][0][0][0][0][0]
^ ^
| +--"a" something like 0xbfe69e52
+-----------------"b" something like 0xbfe69e4d
So when you do your strcpy of "bang":
[0][0][0][0][0][B][A][N][G][0]
^ ^
| +--"a" something like 0xbfe69e52
+-----------------"b" something like 0xbfe69e4d
Now if you put a "long" string into b:
[T][h][i][s][I][s][l][o][n][g]
^ ^
| +--"a" something like 0xbfe69e52
+-----------------"b" something like 0xbfe69e4d
Opps, just lost a. This is a "buffer overflow" because you overflowed b (in to a in this case). C isn't going to stop you from doing that.
The one thing everyone above seems to forget to mention is the fact that the stack is usually handled in the opposite direction to what you'd expect.
Effectively the allocation of 'a' SUBTRACTS 5 bytes from the current stack pointer (esp/rsp on x86/x64). The allocation of 'b' then subtracts a further 5 bytes.
So lets say your esp is 0x1000 when you make your first stack allocation. This gives 'a' the memory address 0xFB. 'b' then will get 0xF6 and hence the 6th byte (ie index 5) of 0xF6 is 0xF6 + 5 or 0xFB and thus you are now writing into the array for a.
This can easily be confirmed by the following code (Assuming 32-bit):
printf( "0x%08x\n", a );
printf( "0x%08x\n", b );
You will see that b has a lower memory address than a.
b has only 5 letters. So if you write a longer string, you are writing the memory adjacent to b.
C does no bounds checking on memory access, so you are free to read and write past the declared end of an array. a and b may end up adjacent in memory, even in reverse order from their declaration, so unless your code takes care not to read more characters than e.g. belong to b, you can corrupt a. What will actually happen is undefined, and may change from run to run.
In this particular case note that you can limit the number of characters read by scanf using a width in the format string: scanf("%4s", &b);

C: scanf behavior in a for-loop

I came across the following code :
int i;
for(; scanf("%s", &i);)
printf("hello");
As per my understanding, if we provide integer input scanf would be unsuccessful in reading and therefore return 0, thus the loop should not run even once. However, it runs infinitely by accepting all types of inputs as successful reads.
Would someone kindly explain this behaviour?
That is the incorrect format specifier for an int: should be "%d".
It is attempting to read a string into an int variable, probably overwriting memory. As "%s" is specified, all inputs will be read thus scanf() returns a value greater than zero.
(Edit: I don't think this answer should have been accepted. Upvoted maybe, but not accepted. It doesn't explain the infinite loop at all, #hmjd does that.)
(This doesn't actually answer the question, the other answers do that, but it's interesting and good to know.)
As hmjd says, using scanf like this will overwrite memory ("smash the stack"), as it starts writing to i in memory, and then keeps going, even outside the 4 bytes of memory that i takes up (or 8 bytes, on a 64-bit platform).
To illustrate, consider the following bit of code:
#include<stdio.h>
int main() {
char str_above[8] = "ABCDEFG";
int i;
char str_below[8] = "ABCDEFG";
scanf("%s", &i);
printf("i = %d\n", i);
printf("str_above = %s\nstr_below = %s\n", str_above, str_below);
return 0;
}
Compiling and running it, and entering 1234567890 produces the following output:
i = 875770417
str_above = 567890
str_below = ABCDEFG
Some points:
i has little correspondence to the integer 1234567890 (it is related to the values of the characters '1',...,'4' and the endianness of the system).
str_above has been modified by scanf: the characters '5',...,'0','\0' have overrun the end of the block of memory reserved for i and have been written to the memory reserved for str_above.
The stack has been smashed "upwards", i.e. str_above is stored later in memory than i and str_below is stored earlier in memory. (To put it another way &str_above > &i and &str_below < &i.)
This is the basis for "buffer overrun attacks", where values on the stack are modified by writing too much data to an array. And it is why gets is dangerous (and should never be used) and using scanf with a generic %s format specifier should also never be done.

Resources