I am trying to understand the array concept in string.
char a[5]="hello";
Here, array a is an character array of size 5. "hello" occupies the array index from 0 to 4. Since, we have declared the array size as 5, there is no space to store the null character at the end of the string.
So my understanding is when we try to print a, it should print until a null character is encountered. Otherwise it may also run into segmentation fault.
But, when I ran it in my system it always prints "hello" and terminates.
So can anyone clarify whether my understanding is correct. Or does it depends upon the system that we execute.
As ever so often, the answer is:
Undefined behavior is undefined.
What this means is, trying to feed this character array to a function handling strings is wrong. It's wrong because it isn't a string. A string in C is a sequence of characters that ends with a \0 character.
The C standard will tell you that this is undefined behavior. So, anything can happen. In C, you don't have runtime checks, the code just executes. If the code has undefined behavior, you have to be prepared for any effect. This includes working like you expected, just by accident.
It's very well possible that the byte following in memory after your array happens to be a \0 byte. In this case, it will look to any function processing this "string" as if you passed it a valid string. A crash is just waiting to happen on some seemingly unrelated change to the code.
You could try to add some char foo = 42; before or after the array definition, it's quite likely that you will see that in the output. But of course, there's no guarantee, because, again, undefined behavior is undefined :)
What you have done is undefined behavior. Apparently whatever compiler you used happened to initialize memory after your array to 0.
Here, array a is an character array of size 5. "hello" occupies the array index from 0 to 4. Since, we have declared the array size as 5, there is no space to store the null character at the end of the string.
So my understanding is when we try to print a, it should print until a null character is encountered.
Yes, when you use printf("%s", a), it prints characters until it hits a '\0' character (or segfaults or something else bad happens - undefined behavior). I can demonstrate that with a simple program:
#include <stdio.h>
int main()
{
char a[5] = "hello";
char b[5] = "world";
int c = 5;
printf("%s%s%d\n", a, b, c);
return 0;
}
Output:
$ ./a.out
helloworldworld5
You can see the printf function continuing to read characters after it has already read all the characters in array a. I don't know when it will stop reading characters, however.
I've slightly modified my program to demonstrate how this undefined behavior can create bad problems.
#include <stdio.h>
#include <string.h>
int main()
{
char a[5] = "hello";
char b[5] = "world";
int c = 5;
printf("%s%s%d\n", a, b, c);
char d[5];
strcpy(d, a);
printf("%s", d);
return 0;
}
Here's the result:
$ ./a.out
helloworld��world��5
*** stack smashing detected ***: <unknown> terminated
helloworldhell�p��UAborted (core dumped)
This is a classic case of stack overflow (pun intended) due to undefined behavior.
Edit:
I need to emphasize: this is UNDEFINED BEHAVIOR. What happened in this example may or may not happen to you, depending on your compiler, architecture, libraries, etc. You can make guesses to what will happen based on your understanding of different implementations of various libraries and compilers on different platforms, but you can NEVER say for certain what will happen. My example was on Ubuntu 17.10 with gcc version 7. My guess is that something very different could happen if I tried this on an embedded platform with a different compiler, but I cannot say for certain. In fact, something different could happen if I had this example inside of a larger program on the same machine.
Related
I have a question about this code below:
#include <stdio.h>
char abcd(char array[]);
int main(void)
{
char array[4] = { 'a', 'b', 'c', 'd' };
printf("%c\n", abcd(array));
return 0;
}
char abcd(char array[])
{
char *p = array;
while (*p) {
putchar(*p);
p++;
}
putchar(*p);
putchar(p[4]);
return *p;
}
Why isn't segmentation fault generated when this program comes across putchar(*p) right after exiting while loop? I think that after *p went beyond the array[3] there is supposed to be no value assigned to other memory locations. For example, trying to access p[4] would be illegal because it would be out of the bound, I thought. On the contrary, this program runs with no errors. Is this because any other memories which no value are assigned (in this case any other memories than array[4]) should be null, whose value is '\0'?
OP seems to think accessing an array out-of-bounds, something special should happen.
Accessing outside array bounds is undefined behavior (UB). Anything may happen.
Let's clarify what a undefined behavior is.
The C standard is a contract between the developer and the compiler as to what the code means. However, it just so happens that you can write things that are just outside what is defined by the standard.
One of the most common cases is trying to do out-of-bounds access. Other languages say that this should result in an exception or another error. C does not. An argument is that it would imply adding costly checks at every array access.
The compiler does not know that what you are writing is undefined behavior¹. Instead, the compiler assumes that what you write contains no undefined behavior, and translate your code to assembly accordingly.
If you want an example, compile the code below with or without optimizations:
#include <stdio.h>
int table[4] = {0, 0, 0, 0};
int exists_in_table(int v)
{
for (int i = 0; i <= 4; i++) {
if (table[i] == v) {
return 1;
}
}
return 0;
}
int main(void) {
printf("%d\n", exists_in_table(3));
}
Without optimizations, the assembly I get from gcc does what you might expect: it just goes too far in the memory, which might cause a segfault if the array is allocated right before a page boundary.
With optimizations, however, the compiler looks at your code and notices that it cannot exit the loop (otherwise, it would try to access table[4], which cannot be), so the function exists_in_table necessarily returns 1. And we get the following, valid, implementation:
exists_in_table(int):
mov eax, 1
ret
Undefined behavior means undefined. They are very tricky to detect since they can be virtually invisible after compiling. You need advanced static analyzer to interpret the C source code and understand whether what it does can be undefined behavior.
¹ in the general case, that is; modern compilers use some basic static analyzer to detect the most common errors
C does no bounds checking on array accesses; because of how arrays and array subscripting are implemented, it can't do any bounds checking. It simply doesn't know that you've run past the end of the array. The operating environment will throw a runtime error if you cross a page boundary, but up until that point you can read or clobber any memory following the end of the array.
The behavior on subscripting past the end of the array is undefined - the language definition does not require the compiler or the operating environment to handle it any particular way. You may get a segfault, you may get corrupted data, you may clobber a frame pointer or return instruction address and put your code in a bad state, or it may work exactly as expected.
There are few remark points inside your program:
array inside the main and abcd function are different. In main, it is array of 4 elements, in abcd, it is an input variable with array type. If inside main, you call something like array[4] there will be compiler warnings for this. But there won't be compiler warning if you call in side abcd.
*p is a pointer point to array or in other word, it point to first element of array. In C, there isn't any boundary or limit for p. Your program is lucky because the memory after array contains 0 value to stop the while(*p) loop. If you did check the address of pointer p (&p). It might not equal to array[4].
#include <stdio.h>
void main()
{
char a[8];
a[0] = 'h';
a[1]='e';
a[2]='l';
/*a[3]='l';
a[4]='o';*/
printf("%s", a);
}
When I run this program it prints out: hel
But why is it that when I have it like this
#include <stdio.h>
void main()
{
char a[8];
a[0] = 'h';
a[1]='e';
a[2]='l';
a[3]='l';
a[4]='o';
printf("%s", a);
}
It prints out: hello��
If the string is 3 characters or less, then it prints out the string correctly but if I have more than that and no NULL character at the end(to signify the end of the string) it prints out some garbage?
Also, this is in C.
You are facing some unexpected behavior since you are not closing the string with a NUL '\0' character. In the first version when you skip the [3] address in the array it probably contains a 0 that fatally close the string, but this is absolutely random and unpredictable.
In C, dynamic (malloc...) and automatic (stack) variables are not zero-initialised. Only static and thread-local variables are zeroed.
Thus, if you do not provide the terminating 0 yourself, there might not be one, so your string is not terminated and using it results in undefined behavior.
Anyway, using indeterminate values can result in implementation defined or undefined behavior in certain cases all by itself, which might make your program misbehave erratically.
I am coming back from after reading this c-faq question I am totaly confused what happening here.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main ()
{
char ar[3]="NIS", *c;
printf ("%s\n", ar);
strcpy (c, ar);
printf ("%s\n", c);
if (ar[4] == '\0')
{
printf ("Null");
}
else
{
printf ("%c\n", ar[4]);
}
}
Here I have assigned "NIS" Equal size of declare array.and when i try to access ar[3],ar[4] it giving null why ? it's ok in case of ar[3] but why in case of ar[4] Another thought: In c-faq it was mentioned that if you assign any string equal to the size of declared array, you can't use printf ("%s"), and strcpy() on that array as it is mentioned in c-faq. But in my above code i have used printf as well as strcpy here both working fine.It might be i have interpreted wrong please correct me. and another question is that When I try to compare ar[5] with null it is not printing anything that's ok but why it is printing Null for ar[4].My thought on this "NIS" String will store in memory like this..
Thanks in advance.
--------------------------------------------------------
| N | I | S | /0 | Garbage value here
|_______|________|_______|________|_____________________
ar[0] ar[1] ar[2] ar[3]
Well Here ar[3] is giving null when I compare it with '\0' that's ok but when I comapre it with ar[4] still it giving me null instead of some garbage value..
Your code exhibits undefined behaviour. It works for you by chance, but on another machine it could fail. As you understood from the FAQ, the code is not valid. But that does not mean it will always fail. That is simply the nature of undefined behaviour. Literally anything can happen.
Accessing ar[3] is illegal because that is beyond the end of the array. Valid indices for this array are 0, 1 and 2.
You did not allocate memory for c so any de-referencing of the pointer is undefined behaviour.
Your main declaration is wrong. You should write:
int main(void)
Don't do this. The declaration char NIS[3]; gives you a three-character array with which you can use the indexes 0 through 2 inclusive.
Any use of other indexes (for dereferencing) is undefined behaviour and should not be done.
The reason why it may be working is because there's nothing stating that the "garbage" values have to be non-zero. That's what garbage means in this context, they could be anything.
Your strcpy is also undefined behaviour since your c pointer has not been initialised to anything useful.
ar[3] does not exist because ar is only 3 characters long.
That faq is saying that it is legal, but that it's not a C string.
If the array is too short, the null character will be cut off.
Basically, "abc" is silently 'a', 'b', 'c', 0. However, since ar is of length 3 and not four, the null byte gets truncated.
What the compiler chooses to do in this situation (and the OS) is not known. If it happens to work, that's just by luck.
Example code:
int main ()
{
char b[] = {"abcd"};
char *c = NULL;
printf("\nsize: %d\n",sizeof(b));
c = (char *)malloc(sizeof(char) * 3);
memcpy(c,b,10); // here invalid read and invalid write
printf("\nb: %s\n",b);
printf("\nc: %s\n",c);
return 0;
}
See in code I have done some invalid reads and invalid writes, but this small program works fine and does not create a core dump.
But once in my big library, whenever I make 1 byte of invalid read or invalid write, it was always creating core dump.
Question:
Why do I sometimes get a core dump from an invalid read/write and sometimes do not get a core dump?
It entirely depends on what you're overwriting or dereferencing when you do an invalid read/write. Specifically, if you're overwriting some pointer that gets dereferenced for example, let's say, the most significant byte of one, you could end up having something get dereferenced to a completely different (and completely invalid) area of memory.
So, for example, if the stack were arranged such that memcpy past the end of c would overwrite part of b, when you attempt to call printf() with b as an argument, it tries to take that pointer and dereference it to print a string. Since it's no longer a valid pointer, that'll cause a segfault. But since things like stack arrangement are platform (and perhaps compiler?) dependent, you may not see the same behavior with similar examples in different programs.
What you are trying to do is basically buffer overflow & in your code sample more specifically heap overflow. The reason you see the crash only at times depends on which memory area you are accessing & if or not you have permission to access/write it (which has been well explained by Dan Fego). I think the example provided by Dan Fego is more about stack overflow (correction welcome!). gcc has protection related to buffer overflow on the stack (stack smashing). You can see this (stack based overflow) in the following example:
#include <stdio.h>
#include <string.h>
int main (void)
{
char b[] = { "abcdefghijk"};
char c [8];
memcpy (c, b, sizeof c + 1); // here invalid read and invalid write
printf ("\nsize: %d\n", sizeof b);
printf ("\nc: %s\n", c);
return 0;
}
Sample output:
$ ./a.out
size: 12
c: abcdefghi���
*** stack smashing detected ***: ./a.out terminated
This protection can be disabled using -fno-stack-protector option in gcc.
Buffer overflow are one of major cause of security vulnerability. Unfortunately function like memcpy do not check for these kinds of problems, but there are ways to protect against these kinds of problems.
Hope this helps!
you create a 3 char string c, but you copy on it 10 chars. it is an error.
it is called a bufferoverflow : you write in a memory that doesnot belong to you. so the behavior is undefined. it could be a crash, it could works fine or it could modify another variable you created.
so the goo thing to do is to allocate enough memory for c to contain the content of b :
c = (char *)malloc(sizeof(char) * (sizeof(b)+1)); // +1 is for the '\0' char that ends every string in c.
2 - when you copy b in c dont forget to put the end of string char : '\0'. it is mandatory in the c standard.
so printf("%s",c); knows where to string finish.
3 - you copied 10 chars from b to c but b containd only 5 chars (a,b,c,d and '\0'), so the behavior of memcpy is undefined (e.g. : memcpy can try to read memory that cant be read,...).
you can copy only the memory you own : the 5 chars of b.
4 - i think the good instruction for defining b is : char b="abcd"; or char b={'a','b','c','d',0};
What should be the output of this following code snippet and why?
#include <stdio.h>
#include <string.h>
int main()
{
char ch = 'A';
char str[3];
strcpy(str, "ABCDE");
printf("%c", ch);
}
The output of this program could be anything because you overrun the buffer str and get undefined behavior. In fact, the program might not output anything, it might crash, or it might do something far worse.
The snippet invokes undefined behaviour. The result can be anything, from crashing to unexpected output.
As other have mentioned, this is undefined behavior since it would depend on the contents of the memory located aftr wherever str is allocated. It will start with ABCDE but will run off into a random collection of bytes converted to chars or a crash.
The output is undefined. In linux, I am getting the output D because I think the data stored in stack from bottom to top. So, ch is stored at the bottom, and str is stored just above it. now you are overwriting str with two bytes extra, which is resulting in corrupting ch variable, which may result in displaying the D as output. Again, this depends upon compiler and operating system you are running.