Array memory allocation of strings - c

I have written simple string program using array allocation method. I have allocated character array 10 bytes, but when i give input, program is accepting input string of greater than 10 bytes. I am getting segmentation fault only when I give input string of some 21 chars. Why there is no segmentation fault when my input exceed allocated my array limit?
Program:
#include <stdio.h>
#include <string.h>
void main() {
char str[10];
printf ("\n Enter the string: ");
gets (str);
printf ("\n The value of string=%s",str);
int str_len;
str_len = strlen (str);
printf ("\n Length of String=%d\n",str_len);
}
Output:
Enter the string: n durga prasad
The value of string=n durga prasad
Length of String=14
As you can see, string length is shown as 14, but I have allocated only 10 bytes. How can the length be more that my allocated size?

Please, don't use gets() it suffers from buffer overflow issues which in turn invokes undefined behaviour.
Why there is no segmentation fault when my input exceed allocated my array limit?
Once your input is exceeding the allocated array size (i.e., 9 valid characters + 1 null-terminator), the immediate next access t the array location becomes illegal and invokes UB. The segmentation fault is one of the side effect of UB, it is not a must.
Solution: Use fgets() instead.

When you declare an array, like char str[10];, your compiler won't always allocate precisely the number of bytes that you required. It often allocate more, usually a multiple of 8 if you are in 64-bits system, for instance it might be 16 in your case.
So even if you asked for 10 bytes, you can manipulate some more. But of course, it's strongly unrecommended because, as you said, it can produce segmentation faults.
And, as said by other answers from Sourav and Gopi, to use fgets instead of gets may also help to produce less undefined behavior.

When you enter more than the number of characters the array can hold then you have undefined behavior. Your array can hold 9 characters followed by a null terminator, so any devaition from this is a UB.
Don't use gets() use fgets() instead
char a[10];
fgets(a,sizeof(a),stdin);
By using fgets() you are avoiding buffer overflow issue and avoiding undefined behavior.
PS: fgets() comes with a newline character

As you already know, your input causes buffer overflow, I'm not going to repeat the reason. Instead I would like to answer the particular question ,
"Why there is no segmentation fault when my input exceed allocated my array limit?"
The reason that there may or may not be segmentation fault depends on something called undefined behaviour. Once you overrun the allocated memory boundary, you're not supposed to get a segmentation fault for sure. Rather, what you'll be facing is UB (as told earlier). Now, quoting the results of UB,
[...] programs invoking undefined behavior may compile and run, and produce correct results, or undetectably incorrect results, or any other behavior.
So, it is not a must that you'll be getting a segmentation fault immediately on accessing the very next memory. It may run perfectly well unless it reaches some memory which is actually inaccessible for the particular process and then, the SIGSEV signal (11) will be raised.
However, after running into UB, any output from any subsequent statement cannot be validated. So, the output of strlen() is invalid here.

Related

inputting a character string using scanf()

I started learning about inputting character strings in C. In the following source code I get a character array of length 5.
#include<stdio.h>
int main(void)
{
char s1[5];
printf("enter text:\n");
scanf("%s",s1);
printf("\n%s\n",s1);
return 0;
}
when the input is:
1234567891234567, and I've checked it's working fine up to 16 elements(which I don't understand because it is more than 5 elements).
12345678912345678, it's giving me an error segmentation fault: 11 (I gave 17 elements in this case)
123456789123456789, the error is Illegal instruction: 4 (I gave 18 elements in this case)
I don't understand why there are different errors. Is this the behavior of scanf() or character arrays in C?. The book that I am reading didn't have a clear explanation about these things. FYI I don't know anything about pointers. Any further explanation about this would be really helpful.
Is this the behavior of scanf() or character arrays in C?
TL;DR - No, you're facing the side-effects of undefined behavior.
To elaborate, in your case, against a code like
scanf("%s",s1);
where you have defined
char s1[5];
inputting anything more than 4 char will cause your program to venture into invalid memory area (past the allocated memory) which in turn invokes undefined behavior.
Once you hit UB, the behavior of the program cannot be predicted or justified in any way. It can do absolutely anything possible (or even impossible).
There is nothing inherent in the scanf() which stops you from reading overly long input and overrun the buffer, you should keep control on the input string scanning by using the field width, like
scanf("%4s",s1); //1 saved for terminating null
The scanf function when reading strings read up to the next white-space (e.g. newline, space, tab etc.), or the "end of file". It has no idea about the size of the buffer you provide it.
If the string you read is longer than the buffer provided, then it will write out of bounds, and you will have undefined behavior.
The simplest way to stop this is to provide a field length to the scanf format, as in
char s1[5];
scanf("%4s",s1);
Note that I use 4 as field length, as there needs to be space for the string terminator as well.
You can also use the "secure" scanf_s for which you need to provide the buffer size as an argument:
char s1[5];
scanf_s("%s", s1, sizeof(s1));

Why doesn't scanf() generate memory clobbering errors when filling array.

Given an array with 5 elements, it is well known that if you use scanf() to read in exactly 5 elements, then scanf() will fill the array and then clobber memory by putting a null character '\0' into the 6th element without generating an error(Im calling it a 6th element but I know its memory thats not part of the array) As is described here: Null termination of char array
However when you try to read in 6 elements or more an error is generated because the OS detects that memory is being clobbered and the kernel sends a signal. Can someone clear up why an error is not generated in the first case of memory clobbering above?
Example code:
// ex1.c
#include <stdio.h>
int main(void){
char arr[5];
scanf("%s", arr);
printf("%s\n", arr);
return 0;
}
Compile, run and enter four characters: 1234. This stores them in the array correctly and doesn't clobber memory. No error here.
$ ./ex1
1234
1234
Run again and enter five characters. This will clobber memory because scanf() stored an extra '\0' null character in memory after the 5th element. No error is generated.
$ ./ex1
12345
12345
Now enter six characters which we expect to clobber memory. The error that is generated looks like(ie. Im guessing) its the result of a signal sent by the kernel saying that we just clobbered the stack(local memory) somehow....Why is an error being generated for this memory clobbering but not for the previous one above?
$ ./ex1
123456
123456
*** stack smashing detected ***: ./ex1 terminated
Aborted (core dumped)
This seems to happen no matter what size I make the array.
The behaviour is undefined if in both the cases where you input more than characters than the buffer can hold.
The stack smashing detection mechanism works by using canaries. When the canary value gets overwritten SIGABRT is generated. The reason why it doesn't get generated is probably because there's at least one extra byte of memory after the array (typically one-past-the-end of an object is required to be a valid pointer. But it can't be used to store to values -- legally).
In essence, the canary wasn't overwritten when you input 1 extra char but it does get overwritten when you input 2 bytes for one reason or another, triggering SIGABRT.
If you have some other variables after arr such as:
#include <stdio.h>
int main(void){
char arr[5];
char var[128];
scanf("%s", arr);
printf("%s\n", arr);
return 0;
}
Then the canary may not be overwritten when you input few more bytes as it might be simply overwriting var. Thus prolonging the buffer overflow detection by the compiler. This is a plausible explanation. But in any case, your program is invalid if it overruns buffer and you should not rely the stack smashing detection by the compiler to save you.
.Why is an error being generated for this memory clobbering but not for the previous one above?
Because for the 1st test it seemed to work just because of (bad) luck.
In both cases arr was accessed out-of-bounds and by doing so the code invoked undefined behaviour. This means the code might do what you expect or not or what ever, like booting the machine, formatting the disk ...
C does not test for memory access, but leaves this to the programmer. Who could have made the call to scanf() save by doing:
char arr[5];
scanf("%4s", arr); /* Stop scanning after 4th character. */
Stack Smashing here is actually caused due to a protection mechanism used by compiler to detect buffer overflow errors.The compiler adds protection variables (known as canaries) which have known values.
In your case when an input string of size greater than 5 causes corruption of this variable resulting in SIGABRT to terminate the program.
You can read more about buffer overflow protection. But as #alk answered you are invoking Undefined Behavior
Actually
If we declare a array of size 5, then also rather we can put and access data from this array as memory beyond this array is empty and we can do the same till this memory is free but once it assigned to another program now even we are unable to acces a data present there

Printing char array in C causes segmentation fault

I did a lot of searching around for this, couldn't find any question with the same exact issue.
Here is my code:
void fun(char* name){
printf("%s",name);
}
char name[6];
sscanf(input,"RECTANGLE_SEARCH(%6[A-Za-z0-9])",name)
printf("%s",name);
fun(name);
The name is grabbed from scanf, and it printed out fine at first. Then when fun is called, there is a segmentation fault when it tries to print out name. Why is this?
After looking in my scrying-glass, I have it:
Your scanf did overflow the buffer (more than 6 byte including terminator read), with ill-effect slightly delayed due to circumstance:
Nobody else relied on or re-used the memory corrupted at first, thus the first printf seems to work.
Somewhere after the first and before the second call to printf the space you overwrote got re-used, so the string you read was no longer terminated before encountering not allocated pages.
Thus, a segmentation-fault at last.
Of course, your program was toast the moment it overflowed the buffer, not later when it finally crashed.
Morale: Never write to memory you have not dedicated for that.
Looking at your edit, the format %6[A-Za-z0-9] tries to read up to 6 characters exclusive the terminator, not inclusive!
Since you're reading 6 characters, you have to declare name to be 7 characters, so there's room for the terminating null character:
char name[7];
Otherwise, you'll get a buffer overflow, and the consequences are undefined. Once you have undefined consequences, anything can happen, including 2 successful calls to printf() followed by a segfault when you call another function.
You're probably walking off the end of the array with your printf statement. Printf uses the terminating null character '\0' to know where the end of the string is. Try allocating your array like this:
char name[6] = {'\0'};
This will allocate your array with every element initially set to the '\0' character, which means that as long as you don't overwrite the entire array with your scanf, printf will terminate before walking off the end.
Are you sure that name is zero byte terminated? scanf can overflow your buffer depending on how you are calling it.
If that happens then printf will read beyond the end of the array resulting in undefined behavior and probably a segmentation fault.

Difference between array and malloc

Here is my code :
#include<stdio.h>
#include <stdlib.h>
#define LEN 2
int main(void)
{
char num1[LEN],num2[LEN]; //works fine with
//char *num1= malloc(LEN), *num2= malloc(LEN);
int number1,number2;
int sum;
printf("first integer to add = ");
scanf("%s",num1);
printf("second integer to add = ");
scanf("%s",num2);
//adds integers
number1= atoi(num1);
number2= atoi(num2);
sum = number1 + number2;
//prints sum
printf("Sum of %d and %d = %d \n",number1, number2, sum);
return 0;
}
Here is the output :
first integer to add = 15
second integer to add = 12
Sum of 0 and 12 = 12
Why it is taking 0 instead of first variable 15 ?
Could not understand why this is happening.
It is working fine if I am using
char *num1= malloc(LEN), *num2= malloc(LEN);
instead of
char num1[LEN],num2[LEN];
But it should work fine with this.
Edited :
Yes, it worked for LEN 3 but why it showed this undefined behaviour. I mean not working with the normal arrays and working with malloc. Now I got that it should not work with malloc also. But why it worked for me, please be specific so that I can debug more accurately ?
Is there any issue with my system or compiler or IDE ?
Please explain a bit more as it will be helpful or provide any links to resources. Because I don't want to be unlucky anymore.
LEN is 2, which is enough to store both digits but not the required null terminating character. You are therefore overrunning the arrays (and the heap allocations, in that version of the code!) and this causes undefined behavior. The fact that one works and the other does not is simply a byproduct of how the undefined behavior plays out on your particular system; the malloc version could indeed crash on a different system or a different compiler.
Correct results, incorrect results, crashing, or something completely different are all possibilities when you invoke undefined behavior.
Change LEN to 3 and your example input would work fine.
I would suggest indicating the size of your buffers in your scanf() line to avoid the undefined behavior. You may get incorrect results, but your program at least would not crash or have a security vulnerability:
scanf("%2s", num1);
Note that the number you use there must be one less than the size of the array -- in this example it assumes an array of size 3 (so you read a maximum of 2 characters, because you need the last character for the null terminating character).
LEN is defined as 2. You left no room for a null terminator. In the array case you would overrun the array end and damage your stack. In the malloc case you would overrun your heap and potentially damage the malloc structures.
Both are undefined behaviour. You are unlucky that your code works at all: if you were "lucky", your program would decide to crash in every case just to show you that you were triggering undefined behaviour. Unfortunately that's not how undefined behaviour works, so as a C programmer, you just have to be defensive and avoid entering into undefined behaviour situations.
Why are you using strings, anyway? Just use scanf("%d", &number1) and you can avoid all of this.
Your program does not "work fine" (and should not "work fine") with either explicitly declared arrays or malloc-ed arrays. Strings like 15 and 12 require char buffers of size 3 at least. You provided buffers of size 2. Your program overruns the buffer boundary in both cases, thus causing undefined behavior. It is just that the consequences of that undefined behavior manifest themselves differently in different versions of the code.
The malloc version has a greater chance to produce illusion of "working" since sizes of dynamically allocated memory blocks are typically rounded to the nearest implementation-depended "round" boundary (like 8 or 16 bytes). That means that your malloc calls actually allocate more memory than you ask them to. This might temporarily hide the buffer overrun problems present in your code. This produces the illusion of your program "working fine".
Meanwhile, the version with explicit arrays uses local arrays. Local arrays often have precise size (as declared) and also have a greater chance to end up located next to each other in memory. This means that buffer overrun in one array can easily destroy the contents of the other array. This is exactly what happened in your case.
However, even in the malloc-based version I'd still expect a good debugging version of standard library implementation to catch the overrun problems. It is quite possible that if you attempt to actually free these malloc-ed memory blocks (something you apparently didn't bother to do), free will notice the problem and tell you that heap integrity has been violated at some point after malloc.
P.S. Don't use atoi to convert strings to integers. Function that converts strings to integers is called strtol.

Character array in C(Puts vs printf)

I have some doubts regarding character array in C, I have a character array of size 1, logic says that when I input more than 2 characters, I should be getting a segmentation fault, However puts prints out the array properly whereas printf prints some parts of the array along with garbage value, Why is this happening
#include<stdio.h>
int main()
{
int i;
char A[1];
printf("%d\n",(int)sizeof(A));
gets(A);
puts(A);
for(i=0;i<8;i++)
{
printf("%c\n",A[i]);
}
}
O/P:
1
abcdefg
abcdefg
a
f
g
To add to this I have to type in multiple characters of the array size in the program to throw a segmentation fault. Is it because of the SFP in the stack? The size of SFP is 4 bytes Please correct me if I'm wrong
1
abcdefghijklmnop
abcdefghijklmnop
a
f
g
h
Segmentation fault
OK, others explained it in high-level language and elder's expierence.
I would like to explain your situations in the assembly layer.
You know why your first situation ran without accident?
Because your buffers overflow does NOT destory other processes's memory, So the OS does't signal a Segmentation fault to your process.
And why your stack's length is more than your array's size?
Because of the aligning. Many OS reqiures a stack frame aligning x bytes to implement efficient addressing.
x is machine-dependent.
e.g, If x is 16 bytes.
char s[1] will lead the stack to 16 byte;
char s[17] will lead the stack to 32byte.
Actually even when you write only one character, it's still buffer overflow, because gets() will write a null character to the array.
Buffer overflow doesn't necessarily mean segmentation fault. You can't rely on undefined behavior in any ways. Possibly it just took the program several times to break the memory that it shouldn't write.
It seems that you have known that gets() is dangerous and should be avoided, I added this just in case.

Resources