Character array in C(Puts vs printf)

Character array in C(Puts vs printf) - c

I have some doubts regarding character array in C, I have a character array of size 1, logic says that when I input more than 2 characters, I should be getting a segmentation fault, However puts prints out the array properly whereas printf prints some parts of the array along with garbage value, Why is this happening
#include<stdio.h>
int main()
{
int i;
char A[1];
printf("%d\n",(int)sizeof(A));
gets(A);
puts(A);
for(i=0;i<8;i++)
{
printf("%c\n",A[i]);
}
}
O/P:
1
abcdefg
abcdefg
a
f
g
To add to this I have to type in multiple characters of the array size in the program to throw a segmentation fault. Is it because of the SFP in the stack? The size of SFP is 4 bytes Please correct me if I'm wrong
1
abcdefghijklmnop
abcdefghijklmnop
a
f
g
h
Segmentation fault

OK, others explained it in high-level language and elder's expierence.
I would like to explain your situations in the assembly layer.
You know why your first situation ran without accident?
Because your buffers overflow does NOT destory other processes's memory, So the OS does't signal a Segmentation fault to your process.
And why your stack's length is more than your array's size?
Because of the aligning. Many OS reqiures a stack frame aligning x bytes to implement efficient addressing.
x is machine-dependent.
e.g, If x is 16 bytes.
char s[1] will lead the stack to 16 byte;
char s[17] will lead the stack to 32byte.

Actually even when you write only one character, it's still buffer overflow, because gets() will write a null character to the array.
Buffer overflow doesn't necessarily mean segmentation fault. You can't rely on undefined behavior in any ways. Possibly it just took the program several times to break the memory that it shouldn't write.
It seems that you have known that gets() is dangerous and should be avoided, I added this just in case.

Related

Why doesn't scanf() generate memory clobbering errors when filling array.

Given an array with 5 elements, it is well known that if you use scanf() to read in exactly 5 elements, then scanf() will fill the array and then clobber memory by putting a null character '\0' into the 6th element without generating an error(Im calling it a 6th element but I know its memory thats not part of the array) As is described here: Null termination of char array
However when you try to read in 6 elements or more an error is generated because the OS detects that memory is being clobbered and the kernel sends a signal. Can someone clear up why an error is not generated in the first case of memory clobbering above?
Example code:
// ex1.c
#include <stdio.h>
int main(void){
char arr[5];
scanf("%s", arr);
printf("%s\n", arr);
return 0;
}
Compile, run and enter four characters: 1234. This stores them in the array correctly and doesn't clobber memory. No error here.
$ ./ex1
1234
1234
Run again and enter five characters. This will clobber memory because scanf() stored an extra '\0' null character in memory after the 5th element. No error is generated.
$ ./ex1
12345
12345
Now enter six characters which we expect to clobber memory. The error that is generated looks like(ie. Im guessing) its the result of a signal sent by the kernel saying that we just clobbered the stack(local memory) somehow....Why is an error being generated for this memory clobbering but not for the previous one above?
$ ./ex1
123456
123456
*** stack smashing detected ***: ./ex1 terminated
Aborted (core dumped)
This seems to happen no matter what size I make the array.

The behaviour is undefined if in both the cases where you input more than characters than the buffer can hold.
The stack smashing detection mechanism works by using canaries. When the canary value gets overwritten SIGABRT is generated. The reason why it doesn't get generated is probably because there's at least one extra byte of memory after the array (typically one-past-the-end of an object is required to be a valid pointer. But it can't be used to store to values -- legally).
In essence, the canary wasn't overwritten when you input 1 extra char but it does get overwritten when you input 2 bytes for one reason or another, triggering SIGABRT.
If you have some other variables after arr such as:
#include <stdio.h>
int main(void){
char arr[5];
char var[128];
scanf("%s", arr);
printf("%s\n", arr);
return 0;
}
Then the canary may not be overwritten when you input few more bytes as it might be simply overwriting var. Thus prolonging the buffer overflow detection by the compiler. This is a plausible explanation. But in any case, your program is invalid if it overruns buffer and you should not rely the stack smashing detection by the compiler to save you.

.Why is an error being generated for this memory clobbering but not for the previous one above?
Because for the 1st test it seemed to work just because of (bad) luck.
In both cases arr was accessed out-of-bounds and by doing so the code invoked undefined behaviour. This means the code might do what you expect or not or what ever, like booting the machine, formatting the disk ...
C does not test for memory access, but leaves this to the programmer. Who could have made the call to scanf() save by doing:
char arr[5];
scanf("%4s", arr); /* Stop scanning after 4th character. */

Stack Smashing here is actually caused due to a protection mechanism used by compiler to detect buffer overflow errors.The compiler adds protection variables (known as canaries) which have known values.
In your case when an input string of size greater than 5 causes corruption of this variable resulting in SIGABRT to terminate the program.
You can read more about buffer overflow protection. But as #alk answered you are invoking Undefined Behavior

Actually
If we declare a array of size 5, then also rather we can put and access data from this array as memory beyond this array is empty and we can do the same till this memory is free but once it assigned to another program now even we are unable to acces a data present there

Array memory allocation of strings

I have written simple string program using array allocation method. I have allocated character array 10 bytes, but when i give input, program is accepting input string of greater than 10 bytes. I am getting segmentation fault only when I give input string of some 21 chars. Why there is no segmentation fault when my input exceed allocated my array limit?
Program:
#include <stdio.h>
#include <string.h>
void main() {
char str[10];
printf ("\n Enter the string: ");
gets (str);
printf ("\n The value of string=%s",str);
int str_len;
str_len = strlen (str);
printf ("\n Length of String=%d\n",str_len);
}
Output:
Enter the string: n durga prasad
The value of string=n durga prasad
Length of String=14
As you can see, string length is shown as 14, but I have allocated only 10 bytes. How can the length be more that my allocated size?

Please, don't use gets() it suffers from buffer overflow issues which in turn invokes undefined behaviour.
Why there is no segmentation fault when my input exceed allocated my array limit?
Once your input is exceeding the allocated array size (i.e., 9 valid characters + 1 null-terminator), the immediate next access t the array location becomes illegal and invokes UB. The segmentation fault is one of the side effect of UB, it is not a must.
Solution: Use fgets() instead.

When you declare an array, like char str[10];, your compiler won't always allocate precisely the number of bytes that you required. It often allocate more, usually a multiple of 8 if you are in 64-bits system, for instance it might be 16 in your case.
So even if you asked for 10 bytes, you can manipulate some more. But of course, it's strongly unrecommended because, as you said, it can produce segmentation faults.
And, as said by other answers from Sourav and Gopi, to use fgets instead of gets may also help to produce less undefined behavior.

When you enter more than the number of characters the array can hold then you have undefined behavior. Your array can hold 9 characters followed by a null terminator, so any devaition from this is a UB.
Don't use gets() use fgets() instead
char a[10];
fgets(a,sizeof(a),stdin);
By using fgets() you are avoiding buffer overflow issue and avoiding undefined behavior.
PS: fgets() comes with a newline character

As you already know, your input causes buffer overflow, I'm not going to repeat the reason. Instead I would like to answer the particular question ,
"Why there is no segmentation fault when my input exceed allocated my array limit?"
The reason that there may or may not be segmentation fault depends on something called undefined behaviour. Once you overrun the allocated memory boundary, you're not supposed to get a segmentation fault for sure. Rather, what you'll be facing is UB (as told earlier). Now, quoting the results of UB,
[...] programs invoking undefined behavior may compile and run, and produce correct results, or undetectably incorrect results, or any other behavior.
So, it is not a must that you'll be getting a segmentation fault immediately on accessing the very next memory. It may run perfectly well unless it reaches some memory which is actually inaccessible for the particular process and then, the SIGSEV signal (11) will be raised.
However, after running into UB, any output from any subsequent statement cannot be validated. So, the output of strlen() is invalid here.

Difference between array and malloc

Here is my code :
#include<stdio.h>
#include <stdlib.h>
#define LEN 2
int main(void)
{
char num1[LEN],num2[LEN]; //works fine with
//char *num1= malloc(LEN), *num2= malloc(LEN);
int number1,number2;
int sum;
printf("first integer to add = ");
scanf("%s",num1);
printf("second integer to add = ");
scanf("%s",num2);
//adds integers
number1= atoi(num1);
number2= atoi(num2);
sum = number1 + number2;
//prints sum
printf("Sum of %d and %d = %d \n",number1, number2, sum);
return 0;
}
Here is the output :
first integer to add = 15
second integer to add = 12
Sum of 0 and 12 = 12
Why it is taking 0 instead of first variable 15 ?
Could not understand why this is happening.
It is working fine if I am using
char *num1= malloc(LEN), *num2= malloc(LEN);
instead of
char num1[LEN],num2[LEN];
But it should work fine with this.
Edited :
Yes, it worked for LEN 3 but why it showed this undefined behaviour. I mean not working with the normal arrays and working with malloc. Now I got that it should not work with malloc also. But why it worked for me, please be specific so that I can debug more accurately ?
Is there any issue with my system or compiler or IDE ?
Please explain a bit more as it will be helpful or provide any links to resources. Because I don't want to be unlucky anymore.

LEN is 2, which is enough to store both digits but not the required null terminating character. You are therefore overrunning the arrays (and the heap allocations, in that version of the code!) and this causes undefined behavior. The fact that one works and the other does not is simply a byproduct of how the undefined behavior plays out on your particular system; the malloc version could indeed crash on a different system or a different compiler.
Correct results, incorrect results, crashing, or something completely different are all possibilities when you invoke undefined behavior.
Change LEN to 3 and your example input would work fine.
I would suggest indicating the size of your buffers in your scanf() line to avoid the undefined behavior. You may get incorrect results, but your program at least would not crash or have a security vulnerability:
scanf("%2s", num1);
Note that the number you use there must be one less than the size of the array -- in this example it assumes an array of size 3 (so you read a maximum of 2 characters, because you need the last character for the null terminating character).

LEN is defined as 2. You left no room for a null terminator. In the array case you would overrun the array end and damage your stack. In the malloc case you would overrun your heap and potentially damage the malloc structures.
Both are undefined behaviour. You are unlucky that your code works at all: if you were "lucky", your program would decide to crash in every case just to show you that you were triggering undefined behaviour. Unfortunately that's not how undefined behaviour works, so as a C programmer, you just have to be defensive and avoid entering into undefined behaviour situations.
Why are you using strings, anyway? Just use scanf("%d", &number1) and you can avoid all of this.

Your program does not "work fine" (and should not "work fine") with either explicitly declared arrays or malloc-ed arrays. Strings like 15 and 12 require char buffers of size 3 at least. You provided buffers of size 2. Your program overruns the buffer boundary in both cases, thus causing undefined behavior. It is just that the consequences of that undefined behavior manifest themselves differently in different versions of the code.
The malloc version has a greater chance to produce illusion of "working" since sizes of dynamically allocated memory blocks are typically rounded to the nearest implementation-depended "round" boundary (like 8 or 16 bytes). That means that your malloc calls actually allocate more memory than you ask them to. This might temporarily hide the buffer overrun problems present in your code. This produces the illusion of your program "working fine".
Meanwhile, the version with explicit arrays uses local arrays. Local arrays often have precise size (as declared) and also have a greater chance to end up located next to each other in memory. This means that buffer overrun in one array can easily destroy the contents of the other array. This is exactly what happened in your case.
However, even in the malloc-based version I'd still expect a good debugging version of standard library implementation to catch the overrun problems. It is quite possible that if you attempt to actually free these malloc-ed memory blocks (something you apparently didn't bother to do), free will notice the problem and tell you that heap integrity has been violated at some point after malloc.
P.S. Don't use atoi to convert strings to integers. Function that converts strings to integers is called strtol.

C buffer overflow only when way out of range

Consider next test program:
char a[10];
strcpy(a, "test");
for(int i=0; i<3; i++) {
char b[2];
strcpy(b, "tt");
strcat(a, b);
}
printf("%d %d %s\n", strlen(a), sizeof(a), a);
Output: 10 10 testtttttt.
Everything seems ok.
If i<7 the buffer is overflow, however there is no error. Output: 18 10 testtttttttttttttt. Program seems to be working.
If i<11 then we see an error "stack smashing detected"...
Why is that program doesn't prompt an error when i<7 ?

What you are doing is undefined behaviour. Anything could happen. What you saw is just one possible outcome, that some automatic tool detected it quite late instead of instantly. Your code is wrong, any assumption what should happen is wrong, and asking for a "why" is pointless. With a different compiler, or with different compiler settings, or on a different day, the outcome could be completely different.
By the way, there is a buffer overflow when i = 0 since you are trying to copy two chars and a trailing zero byte into a buffer that only has space for two chars.

If i<7 the buffer is overflow, however there is no error. Output: 18
10 testtttttttttttttt. Program seems to be working.
The reason is because its an undefined behavior. You can expect any value to appear since you are accessing an array out of its limits
You may check Valgrind for these scenarios
Your buffer variable is only allowing 10 characters, your argument is 11, increase your buffer to support your argument.
char a[10];
The error which you are getting i.e, Stack Smashing, that is a protection mechanism used by gcc to check buffer overflow errors.

You are asking why there is no Error:
The buffer overflow detection is a feature to help you, but there's absolutely no guaranty that it'll detect buffer overflows.

Why does overflowing a char array influence the other array?

Hi. Here's the conundrum. I have this code:
#include<stdio.h>
#include<conio.h>
#include<string.h>
int main(){
char a[5];
char b[5];
memset(a, 0, 5);
memset(b, 0,5);
strcpy(a, "BANG");
printf("b = ");
scanf("%s", &b);
printf("a = %s\n", a);
getch();
}
When you run it, you'll notice that if you read a long enough string into b, the value of a will change too. You would expect it to remain "BANG", but that is not what happens. I would like to have an explanation for this. Thank you!

You're creating a "buffer overflow". The array is dimensioned to only hold 5 bytes (4 characters plus standard C string terminator), and if you put there more than that, the rest will overspill.
Usually, into something important, making your program crash.
There are automated tools (e.g. valgrind) to detect this kind of bugs.

If the string is long enough, you are getting a buffer overrun and the behavior is undefined, which include overwriting the other array or even crashing the application. Because the behavior is undefined you should avoid it, but just for the sake of understanding, the compiler has laid out the a array after the b array in memory (in this particular run of the compiler). When you write b+sizeof(b) you are writing to a[0].

Congratulations, you've run into your first buffer overflow (first that you're aware of :) ).
The arrays will be allocated in the stack of the program and these arrays are adjacent. Since C does not check violation of array bounds, you may access any permitted part of memory as a cell of any array.
Let's review a very common runtime example, this program running on x86. The stack on x86 is growing to the least addresses, so usually compiler places a[] above the b[] on the stack. When you try to access b[5], it will be the same address as a[0], b[6] is a[1], and so on.
This is how buffer overflow exploits work: some careless programmer does not check the string size in the buffer and then an evil hacker writes his malicious code to the stack and runs it.

Think about it in terms of your program's memory.
a is an array of 5 characters, b is an array of 5 characters. Something like this on your stack:
[0][0][0][0][0][0][0][0][0][0]
^ ^
| +--"a" something like 0xbfe69e52
+-----------------"b" something like 0xbfe69e4d
So when you do your strcpy of "bang":
[0][0][0][0][0][B][A][N][G][0]
^ ^
| +--"a" something like 0xbfe69e52
+-----------------"b" something like 0xbfe69e4d
Now if you put a "long" string into b:
[T][h][i][s][I][s][l][o][n][g]
^ ^
| +--"a" something like 0xbfe69e52
+-----------------"b" something like 0xbfe69e4d
Opps, just lost a. This is a "buffer overflow" because you overflowed b (in to a in this case). C isn't going to stop you from doing that.

The one thing everyone above seems to forget to mention is the fact that the stack is usually handled in the opposite direction to what you'd expect.
Effectively the allocation of 'a' SUBTRACTS 5 bytes from the current stack pointer (esp/rsp on x86/x64). The allocation of 'b' then subtracts a further 5 bytes.
So lets say your esp is 0x1000 when you make your first stack allocation. This gives 'a' the memory address 0xFB. 'b' then will get 0xF6 and hence the 6th byte (ie index 5) of 0xF6 is 0xF6 + 5 or 0xFB and thus you are now writing into the array for a.
This can easily be confirmed by the following code (Assuming 32-bit):
printf( "0x%08x\n", a );
printf( "0x%08x\n", b );
You will see that b has a lower memory address than a.

b has only 5 letters. So if you write a longer string, you are writing the memory adjacent to b.

C does no bounds checking on memory access, so you are free to read and write past the declared end of an array. a and b may end up adjacent in memory, even in reverse order from their declaration, so unless your code takes care not to read more characters than e.g. belong to b, you can corrupt a. What will actually happen is undefined, and may change from run to run.
In this particular case note that you can limit the number of characters read by scanf using a width in the format string: scanf("%4s", &b);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight