Consider next test program:
char a[10];
strcpy(a, "test");
for(int i=0; i<3; i++) {
char b[2];
strcpy(b, "tt");
strcat(a, b);
}
printf("%d %d %s\n", strlen(a), sizeof(a), a);
Output: 10 10 testtttttt.
Everything seems ok.
If i<7 the buffer is overflow, however there is no error. Output: 18 10 testtttttttttttttt. Program seems to be working.
If i<11 then we see an error "stack smashing detected"...
Why is that program doesn't prompt an error when i<7 ?
What you are doing is undefined behaviour. Anything could happen. What you saw is just one possible outcome, that some automatic tool detected it quite late instead of instantly. Your code is wrong, any assumption what should happen is wrong, and asking for a "why" is pointless. With a different compiler, or with different compiler settings, or on a different day, the outcome could be completely different.
By the way, there is a buffer overflow when i = 0 since you are trying to copy two chars and a trailing zero byte into a buffer that only has space for two chars.
If i<7 the buffer is overflow, however there is no error. Output: 18
10 testtttttttttttttt. Program seems to be working.
The reason is because its an undefined behavior. You can expect any value to appear since you are accessing an array out of its limits
You may check Valgrind for these scenarios
Your buffer variable is only allowing 10 characters, your argument is 11, increase your buffer to support your argument.
char a[10];
The error which you are getting i.e, Stack Smashing, that is a protection mechanism used by gcc to check buffer overflow errors.
You are asking why there is no Error:
The buffer overflow detection is a feature to help you, but there's absolutely no guaranty that it'll detect buffer overflows.
Related
I understand that assigning memory allocation for string requires n+1 due to the NULL character. However, the question is what if you allocate 10 chars but enter an 11 char string?
#include <stdlib.h>
int main(){
int n;
char *str;
printf("How long is your string? ");
scanf("%d", &n);
str = malloc(n+1);
if (str == NULL) printf("Uh oh.\n");
scanf("%s", str);
printf("Your string is: %s\n", str);
}
I tried running the program but the result is still the same as n+1.
If you allocated a char* of 10 characters but wrote 11 characters to it, you're writing to memory you haven't allocated. This has undefined behavior - it may happen to work, it may crash with a segmentation fault, and it may do something completely different. In short - don't rely on it.
If you overrun an area of memory given you by malloc, you corrupt the RAM heap. If you're lucky your program will crash right away, or when you free the memory, or when your program uses the chunk of memory right after the area you overran. When your program crashes you'll notice the bug and have a chance to fix it.
If you're unlucky your code goes into production, and some cybercriminal figures out how to exploit your overrun memory to trick your program into running some malicious code or using some malicious data they fed you. If you're really unlucky, you get featured in Krebs On Security or some other information security news outlet.
Don't do this. If you're not confident of your ability to avoid doing it, don't use C. Instead use a language with a native string data type. Seriously.
what if you allocate 10 chars but enter an 11 char string?
scanf("%s", str); experiences undefined behavior (UB). Anything may happen including "I tried running the program but the result is still the same as n+1." will appear OK.
Instead always use a width with scanf() and "%s" to stop reading once str[] is full. Example:
char str[10+1];
scanf("%10s", str);
Since n is variable here, consider instead using fgets() to read a line of input.
Note that fgets() also reads and saves a trailing '\n'.
Better to use fgets() for user input and drop scanf() call altogether until you understand why scanf() is bad.
str = malloc(n+1);
if (str == NULL) printf("Uh oh.\n");
if (fgets(str, n+1, stdin)) {
str[strcspn(str, "\n")] = 0; // Lop off potential trailing \n
When you write 11 bytes to a 10-byte buffer, the last byte will be out-of-bounds. Depending on several factors, the program may crash, have unexpected and weird behavior, or may run just fine (i.e., what you are seeing). In other words, the behavior is undefined. You pretty much always want to avoid this, because it is unsafe and unpredictable.
Try writing a bigger string to your 10-byte buffer, such as 20 bytes or 30 bytes. You will see problems start to appear.
I have written simple string program using array allocation method. I have allocated character array 10 bytes, but when i give input, program is accepting input string of greater than 10 bytes. I am getting segmentation fault only when I give input string of some 21 chars. Why there is no segmentation fault when my input exceed allocated my array limit?
Program:
#include <stdio.h>
#include <string.h>
void main() {
char str[10];
printf ("\n Enter the string: ");
gets (str);
printf ("\n The value of string=%s",str);
int str_len;
str_len = strlen (str);
printf ("\n Length of String=%d\n",str_len);
}
Output:
Enter the string: n durga prasad
The value of string=n durga prasad
Length of String=14
As you can see, string length is shown as 14, but I have allocated only 10 bytes. How can the length be more that my allocated size?
Please, don't use gets() it suffers from buffer overflow issues which in turn invokes undefined behaviour.
Why there is no segmentation fault when my input exceed allocated my array limit?
Once your input is exceeding the allocated array size (i.e., 9 valid characters + 1 null-terminator), the immediate next access t the array location becomes illegal and invokes UB. The segmentation fault is one of the side effect of UB, it is not a must.
Solution: Use fgets() instead.
When you declare an array, like char str[10];, your compiler won't always allocate precisely the number of bytes that you required. It often allocate more, usually a multiple of 8 if you are in 64-bits system, for instance it might be 16 in your case.
So even if you asked for 10 bytes, you can manipulate some more. But of course, it's strongly unrecommended because, as you said, it can produce segmentation faults.
And, as said by other answers from Sourav and Gopi, to use fgets instead of gets may also help to produce less undefined behavior.
When you enter more than the number of characters the array can hold then you have undefined behavior. Your array can hold 9 characters followed by a null terminator, so any devaition from this is a UB.
Don't use gets() use fgets() instead
char a[10];
fgets(a,sizeof(a),stdin);
By using fgets() you are avoiding buffer overflow issue and avoiding undefined behavior.
PS: fgets() comes with a newline character
As you already know, your input causes buffer overflow, I'm not going to repeat the reason. Instead I would like to answer the particular question ,
"Why there is no segmentation fault when my input exceed allocated my array limit?"
The reason that there may or may not be segmentation fault depends on something called undefined behaviour. Once you overrun the allocated memory boundary, you're not supposed to get a segmentation fault for sure. Rather, what you'll be facing is UB (as told earlier). Now, quoting the results of UB,
[...] programs invoking undefined behavior may compile and run, and produce correct results, or undetectably incorrect results, or any other behavior.
So, it is not a must that you'll be getting a segmentation fault immediately on accessing the very next memory. It may run perfectly well unless it reaches some memory which is actually inaccessible for the particular process and then, the SIGSEV signal (11) will be raised.
However, after running into UB, any output from any subsequent statement cannot be validated. So, the output of strlen() is invalid here.
Here is my code :
#include<stdio.h>
#include <stdlib.h>
#define LEN 2
int main(void)
{
char num1[LEN],num2[LEN]; //works fine with
//char *num1= malloc(LEN), *num2= malloc(LEN);
int number1,number2;
int sum;
printf("first integer to add = ");
scanf("%s",num1);
printf("second integer to add = ");
scanf("%s",num2);
//adds integers
number1= atoi(num1);
number2= atoi(num2);
sum = number1 + number2;
//prints sum
printf("Sum of %d and %d = %d \n",number1, number2, sum);
return 0;
}
Here is the output :
first integer to add = 15
second integer to add = 12
Sum of 0 and 12 = 12
Why it is taking 0 instead of first variable 15 ?
Could not understand why this is happening.
It is working fine if I am using
char *num1= malloc(LEN), *num2= malloc(LEN);
instead of
char num1[LEN],num2[LEN];
But it should work fine with this.
Edited :
Yes, it worked for LEN 3 but why it showed this undefined behaviour. I mean not working with the normal arrays and working with malloc. Now I got that it should not work with malloc also. But why it worked for me, please be specific so that I can debug more accurately ?
Is there any issue with my system or compiler or IDE ?
Please explain a bit more as it will be helpful or provide any links to resources. Because I don't want to be unlucky anymore.
LEN is 2, which is enough to store both digits but not the required null terminating character. You are therefore overrunning the arrays (and the heap allocations, in that version of the code!) and this causes undefined behavior. The fact that one works and the other does not is simply a byproduct of how the undefined behavior plays out on your particular system; the malloc version could indeed crash on a different system or a different compiler.
Correct results, incorrect results, crashing, or something completely different are all possibilities when you invoke undefined behavior.
Change LEN to 3 and your example input would work fine.
I would suggest indicating the size of your buffers in your scanf() line to avoid the undefined behavior. You may get incorrect results, but your program at least would not crash or have a security vulnerability:
scanf("%2s", num1);
Note that the number you use there must be one less than the size of the array -- in this example it assumes an array of size 3 (so you read a maximum of 2 characters, because you need the last character for the null terminating character).
LEN is defined as 2. You left no room for a null terminator. In the array case you would overrun the array end and damage your stack. In the malloc case you would overrun your heap and potentially damage the malloc structures.
Both are undefined behaviour. You are unlucky that your code works at all: if you were "lucky", your program would decide to crash in every case just to show you that you were triggering undefined behaviour. Unfortunately that's not how undefined behaviour works, so as a C programmer, you just have to be defensive and avoid entering into undefined behaviour situations.
Why are you using strings, anyway? Just use scanf("%d", &number1) and you can avoid all of this.
Your program does not "work fine" (and should not "work fine") with either explicitly declared arrays or malloc-ed arrays. Strings like 15 and 12 require char buffers of size 3 at least. You provided buffers of size 2. Your program overruns the buffer boundary in both cases, thus causing undefined behavior. It is just that the consequences of that undefined behavior manifest themselves differently in different versions of the code.
The malloc version has a greater chance to produce illusion of "working" since sizes of dynamically allocated memory blocks are typically rounded to the nearest implementation-depended "round" boundary (like 8 or 16 bytes). That means that your malloc calls actually allocate more memory than you ask them to. This might temporarily hide the buffer overrun problems present in your code. This produces the illusion of your program "working fine".
Meanwhile, the version with explicit arrays uses local arrays. Local arrays often have precise size (as declared) and also have a greater chance to end up located next to each other in memory. This means that buffer overrun in one array can easily destroy the contents of the other array. This is exactly what happened in your case.
However, even in the malloc-based version I'd still expect a good debugging version of standard library implementation to catch the overrun problems. It is quite possible that if you attempt to actually free these malloc-ed memory blocks (something you apparently didn't bother to do), free will notice the problem and tell you that heap integrity has been violated at some point after malloc.
P.S. Don't use atoi to convert strings to integers. Function that converts strings to integers is called strtol.
I have some doubts regarding character array in C, I have a character array of size 1, logic says that when I input more than 2 characters, I should be getting a segmentation fault, However puts prints out the array properly whereas printf prints some parts of the array along with garbage value, Why is this happening
#include<stdio.h>
int main()
{
int i;
char A[1];
printf("%d\n",(int)sizeof(A));
gets(A);
puts(A);
for(i=0;i<8;i++)
{
printf("%c\n",A[i]);
}
}
O/P:
1
abcdefg
abcdefg
a
f
g
To add to this I have to type in multiple characters of the array size in the program to throw a segmentation fault. Is it because of the SFP in the stack? The size of SFP is 4 bytes Please correct me if I'm wrong
1
abcdefghijklmnop
abcdefghijklmnop
a
f
g
h
Segmentation fault
OK, others explained it in high-level language and elder's expierence.
I would like to explain your situations in the assembly layer.
You know why your first situation ran without accident?
Because your buffers overflow does NOT destory other processes's memory, So the OS does't signal a Segmentation fault to your process.
And why your stack's length is more than your array's size?
Because of the aligning. Many OS reqiures a stack frame aligning x bytes to implement efficient addressing.
x is machine-dependent.
e.g, If x is 16 bytes.
char s[1] will lead the stack to 16 byte;
char s[17] will lead the stack to 32byte.
Actually even when you write only one character, it's still buffer overflow, because gets() will write a null character to the array.
Buffer overflow doesn't necessarily mean segmentation fault. You can't rely on undefined behavior in any ways. Possibly it just took the program several times to break the memory that it shouldn't write.
It seems that you have known that gets() is dangerous and should be avoided, I added this just in case.
I came across the following code :
int i;
for(; scanf("%s", &i);)
printf("hello");
As per my understanding, if we provide integer input scanf would be unsuccessful in reading and therefore return 0, thus the loop should not run even once. However, it runs infinitely by accepting all types of inputs as successful reads.
Would someone kindly explain this behaviour?
That is the incorrect format specifier for an int: should be "%d".
It is attempting to read a string into an int variable, probably overwriting memory. As "%s" is specified, all inputs will be read thus scanf() returns a value greater than zero.
(Edit: I don't think this answer should have been accepted. Upvoted maybe, but not accepted. It doesn't explain the infinite loop at all, #hmjd does that.)
(This doesn't actually answer the question, the other answers do that, but it's interesting and good to know.)
As hmjd says, using scanf like this will overwrite memory ("smash the stack"), as it starts writing to i in memory, and then keeps going, even outside the 4 bytes of memory that i takes up (or 8 bytes, on a 64-bit platform).
To illustrate, consider the following bit of code:
#include<stdio.h>
int main() {
char str_above[8] = "ABCDEFG";
int i;
char str_below[8] = "ABCDEFG";
scanf("%s", &i);
printf("i = %d\n", i);
printf("str_above = %s\nstr_below = %s\n", str_above, str_below);
return 0;
}
Compiling and running it, and entering 1234567890 produces the following output:
i = 875770417
str_above = 567890
str_below = ABCDEFG
Some points:
i has little correspondence to the integer 1234567890 (it is related to the values of the characters '1',...,'4' and the endianness of the system).
str_above has been modified by scanf: the characters '5',...,'0','\0' have overrun the end of the block of memory reserved for i and have been written to the memory reserved for str_above.
The stack has been smashed "upwards", i.e. str_above is stored later in memory than i and str_below is stored earlier in memory. (To put it another way &str_above > &i and &str_below < &i.)
This is the basis for "buffer overrun attacks", where values on the stack are modified by writing too much data to an array. And it is why gets is dangerous (and should never be used) and using scanf with a generic %s format specifier should also never be done.