Question about values out of bounds of an array in C - arrays

I have a question about this code below:
#include <stdio.h>
char abcd(char array[]);
int main(void)
{
char array[4] = { 'a', 'b', 'c', 'd' };
printf("%c\n", abcd(array));
return 0;
}
char abcd(char array[])
{
char *p = array;
while (*p) {
putchar(*p);
p++;
}
putchar(*p);
putchar(p[4]);
return *p;
}
Why isn't segmentation fault generated when this program comes across putchar(*p) right after exiting while loop? I think that after *p went beyond the array[3] there is supposed to be no value assigned to other memory locations. For example, trying to access p[4] would be illegal because it would be out of the bound, I thought. On the contrary, this program runs with no errors. Is this because any other memories which no value are assigned (in this case any other memories than array[4]) should be null, whose value is '\0'?

OP seems to think accessing an array out-of-bounds, something special should happen.
Accessing outside array bounds is undefined behavior (UB). Anything may happen.

Let's clarify what a undefined behavior is.
The C standard is a contract between the developer and the compiler as to what the code means. However, it just so happens that you can write things that are just outside what is defined by the standard.
One of the most common cases is trying to do out-of-bounds access. Other languages say that this should result in an exception or another error. C does not. An argument is that it would imply adding costly checks at every array access.
The compiler does not know that what you are writing is undefined behavior¹. Instead, the compiler assumes that what you write contains no undefined behavior, and translate your code to assembly accordingly.
If you want an example, compile the code below with or without optimizations:
#include <stdio.h>
int table[4] = {0, 0, 0, 0};
int exists_in_table(int v)
{
for (int i = 0; i <= 4; i++) {
if (table[i] == v) {
return 1;
}
}
return 0;
}
int main(void) {
printf("%d\n", exists_in_table(3));
}
Without optimizations, the assembly I get from gcc does what you might expect: it just goes too far in the memory, which might cause a segfault if the array is allocated right before a page boundary.
With optimizations, however, the compiler looks at your code and notices that it cannot exit the loop (otherwise, it would try to access table[4], which cannot be), so the function exists_in_table necessarily returns 1. And we get the following, valid, implementation:
exists_in_table(int):
mov eax, 1
ret
Undefined behavior means undefined. They are very tricky to detect since they can be virtually invisible after compiling. You need advanced static analyzer to interpret the C source code and understand whether what it does can be undefined behavior.
¹ in the general case, that is; modern compilers use some basic static analyzer to detect the most common errors

C does no bounds checking on array accesses; because of how arrays and array subscripting are implemented, it can't do any bounds checking. It simply doesn't know that you've run past the end of the array. The operating environment will throw a runtime error if you cross a page boundary, but up until that point you can read or clobber any memory following the end of the array.
The behavior on subscripting past the end of the array is undefined - the language definition does not require the compiler or the operating environment to handle it any particular way. You may get a segfault, you may get corrupted data, you may clobber a frame pointer or return instruction address and put your code in a bad state, or it may work exactly as expected.

There are few remark points inside your program:
array inside the main and abcd function are different. In main, it is array of 4 elements, in abcd, it is an input variable with array type. If inside main, you call something like array[4] there will be compiler warnings for this. But there won't be compiler warning if you call in side abcd.
*p is a pointer point to array or in other word, it point to first element of array. In C, there isn't any boundary or limit for p. Your program is lucky because the memory after array contains 0 value to stop the while(*p) loop. If you did check the address of pointer p (&p). It might not equal to array[4].

Related

Following C code compiles and runs, but is it undefined bahaviour?

I posted a question about some pointer issues I've been having earlier in this question:
C int pointer segmentation fault several scenarios, can't explain behaviour
From some of the comments, I've been led to believe that the following:
#include <stdlib.h>
#include <stdio.h>
int main(){
int *p;
*p = 1;
printf("%d\n", *p);
return 0;
}
is undefined behaviour. Is this true? I do this all the time, and I've even seen it in my C course.
However, when I do
#include <stdlib.h>
#include <stdio.h>
int main(){
int *p=NULL;
*p = 1;
printf("%d\n", *p);
return 0;
}
I get a seg fault right before printing the contents of p (after the line *p=1;). Does this mean I should have always been mallocing any time I actually assign a value for a pointer to point to?
If that's the case, then why does char *string = "this is a string" always work?
I'm quite confused, please help!
This:
int *p;
*p = 1;
Is undefined behavior because p isn't pointing anywhere. It is uninitialized. So when you attempt to dereference p you're essentially writing to a random address.
What undefined behavior means is that there is no guarantee what the program will do. It might crash, it might output strange results, or it may appear to work properly.
This is also undefined behaivor:
int *p=NULL;
*p = 1;
Because you're attempting to dereference a NULL pointer.
This works:
char *string = "this is a string" ;
Because you're initializing string with the address of a string constant. It's not the same as the other two cases. It's actually the same as this:
char *string;
string = "this is a string";
Note that here string isn't being dereferenced. The pointer variable itself is being assigned a value.
Yes, doing int *p; *p = 1; is undefined behavior. You are dereferencing an uninitialized pointer (accessing the memory to which it points). If it works, it is only because the garbage in p happened to be the address of some region of memory which is writable, and whose contents weren't critical enough to cause an immediate crash when you overwrote them. (But you still might have corrupted some important program data causing problems you won't notice until later...)
An example as blatant as this should trigger a compiler warning. If it doesn't, figure out how to adjust your compiler options so it does. (On gcc, try -Wall -O).
Pointers have to point to valid memory before they can be dereferenced. That could be memory allocated by malloc, or the address of an existing valid object (p = &x;).
char *string = "this is a string"; is perfectly fine because this pointer is not uninitialized; you initialized it! (The * in char *string is part of its declaration; you aren't dereferencing it.) Specifically, you initialized it with the address of some memory which you asked the compiler to reserve and fill in with the characters this is a string\0. Having done that, you can safely dereference that pointer (though only to read, since it is undefined behavior to write to a string literal).
is undefined behaviour. Is this true?
Sure is. It just looks like it's working on your system with what you've tried, but you're performing an invalid write. The version where you set p to NULL first is segfaulting because of the invalid write, but it's still technically undefined behavior.
You can only write to memory that's been allocated. If you don't need the pointer, the easiest solution is to just use a regular int.
int p = 1;
In general, avoid pointers when you can, since automatic variables are much easier to work with.
Your char* example works because of the way strings work in C--there's a block of memory with the sequence "this is a string\0" somewhere in memory, and your pointer is pointing at that. This would be read-only memory though, and trying to change it (i.e., string[0] = 'T';) is undefined behavior.
With the line
char *string = "this is a string";
you are making the pointer string point to a place in read-only memory that contains the string "this is a string". The compiler/linker will ensure that this string will be placed in the proper location for you and that the pointer string will be pointing to the correct location. Therefore, it is guaranteed that the pointer string is pointing to a valid memory location without any further action on your part.
However, in the code
int *p;
*p = 1;
p is uninitialized, which means it is not pointing to a valid memory location. Dereferencing p will therefore result in undefined behavior.
It is not necessary to always use malloc to make p point to a valid memory location. It is one possible way, but there are many other possible ways, for example the following:
int i;
int *p;
p = &i;
Now p is also pointing to a valid memory location and can be safely dereferenced.
Consider the code:
#include <stdio.h>
int main(void)
{
int i=1, j=2;
int *p;
... some code goes here
*p = 3;
printf("%d %d\n", i, j);
}
Would the statement *p = 2; write to i, j, or neither? It would write to i or j if p points to that object, but not if p points somewhere else. If the ... portion of the code doesn't do anything with p, then p might happen point to i, or j, or something within the stdout object, or anything at all. If it happens to point to i or j, then the write *p = 3; might affect that object without any side effects, but if it points to information within stdout that controls where output goes, it might cause the following printf to behave in unpredictable fashion. In a typical implementation, p might point anywhere, and there will be so many things to which p might point that it would be impossible to predict all of the possible effects of writing to them.
Note that the Standard classifies many actions as "Undefined Behavior" with the intention that many or even most implementations will extend the semantics of the language by documenting their behavior. Most implementations, for example, extend the meaning of the << operator to allow it to be used to multiply negative numbers by power of two. Even on implementations that extend the language to specify that an assignment like *p = 3; will always perform a word-sized write of the value 3 to the indicated address, with whatever consequence results, there would be relatively few platforms(*) where it would be possible to fully characterize all possible effects of that action in cases where nothing is known about the value of p. In cases where pointers are read rather than written, some systems may be able to offer useful behavioral guarantees about the effect of arbitrary stray reads, but not all(**).
(*) Some freestanding platforms which keep code in read-only storage may be able to uphold some behavioral guarantees even if code writes to arbitrary pointer addresses. Such behavioral guarantees may be useful in systems whose state might be corrupted by electrical interference, but even when targeting such systems writing to a stray pointer would never be useful.
(**) On many platforms, stray reads will either yield a meaningless value without side effects or force an abnormal program termination, but on an Apple II which a Disk II card in the customary slot-6 location, if code reads from address 0xC0EF within a second of performing a disk access, the drive head to start overwriting whatever happens to be on the last track accessed. This is by design (software that needs to write to the disk does so by accessing address 0xC0EF, and having hardware respond to both reads and writes required one less logic gate--and thus one less chip--than would be required for hardware that only responded to writes) but does mean that code must be careful not to perform any stray reads.

Dynamic array without malloc?

I was reading through some source code and found a functionality that basically allows you to use an array as a linked list? The code works as follows:
#include <stdio.h>
int
main (void)
{
int *s;
for (int i = 0; i < 10; i++)
{
s[i] = i;
}
for (int i = 0; i < 10; i++)
{
printf ("%d\n", s[i]);
}
return 0;
}
I understand that s points to the beginning of an array in this case, but the size of the array was never defined. Why does this work and what are the limitations of it? Memory corruption, etc.
Why does this work
It does not, it appears to work (which is actually bad luck).
and what are the limitations of it? Memory corruption, etc.
Undefined behavior.
Keep in mind: In your program whatever memory location you try to use, it must be defined. Either you have to make use of compile-time allocation (scalar variable definitions, for example), or, for pointer types, you need to either make them point to some valid memory (address of a previously defined variable) or, allocate memory at run-time (using allocator functions). Using any arbitrary memory location, which is indeterminate, is invalid and will cause UB.
I understand that s points to the beginning of an array in this case
No the pointer has automatic storage duration and was not initialized
int *s;
So it has an indeterminate value and points nowhere.
but the size of the array was never defined
There is neither array declared or defined in the program.
Why does this work and what are the limitations of it?
It works by chance. That is it produced the expected result when you run it. But actually the program has undefined behavior.
As I have pointed out first on the comments, what you are doing does not work, it seems to work, but it is in fact undefined behaviour.
In computer programming, undefined behavior (UB) is the result of
executing a program whose behavior is prescribed to be unpredictable,
in the language specification to which the computer code adheres.
Hence, it might "work" sometimes, and sometimes not. Consequently, one should never rely on such behaviour.
If it would be that easy to allocate a dynamic array in C what would one use malloc?! Try it out with a bigger value than 10 to increase the likelihood of leading to a segmentation fault.
Look into the SO Thread to see the how to properly allocation and array in C.

Do C pointers (always) start with a valid address memory?

Do C pointer (always) start with a valid address memory? For example If I have the following piece of code:
int *p;
*p = 5;
printf("%i",*p); //shows 5
Why does this piece of code work? According to books (that I read), they say a pointer always needs a valid address memory and give the following and similar example:
int *p;
int v = 5;
p = &v;
printf("%i",*p); //shows 5
Do C pointer (always) start with a valid address memory?
No.
Why does this code work?
The code invokes undefined behavior. If it appears to work on your particular system with your particular compiler options, that's merely a coincidence.
No. Uninitialized local variables have indeterminate values and using them in expressions where they get evaluated cause undefined behavior.
The behaviour is undefined. A C compiler can optimize the pointer access away, noting that in fact the p is not used, only the object *p, and replace the *p with q and effectively produce the program that corresponds to this source code:
#include <stdio.h>
int main(void) {
int q = 5;
printf("%i", q); //shows 5
}
Such is the case when I compile the program with GCC 7.3.0 and -O3 switch - no crash. I get a crash if I compile it without optimization. Both programs are standard-conforming interpretations of the code, namely that dereferencing a pointer that does not point to a valid object has undefined behaviour.
No.
On older time, it was common to initialize pointer to selected memory addresses (e.g. linked to hardware).
char *start_memory buffer = (char *)0xffffb000;
Compiler has no way to find if this is a valid address. This involve a cast, so it is cheating.
Consider
static int *p;
p will have the value of NULL, which doesn't point to a valid address (Linux, but on Kernel, it invalidate such address, other OS could use memory on &NULL to store some data.
But you may also create initialized variables, so with undefined initial values (which probably it is wrong).

Pointers address location

As part of our training in the Academy of Programming Languages, we also learned C. During the test, we encountered the question of what the program output would be:
#include <stdio.h>
#include <string.h>
int main(){
char str[] = "hmmmm..";
const char * const ptr1[] = {"to be","or not to be","that is the question"};
char *ptr2 = "that is the qusetion";
(&ptr2)[3] = str;
strcpy(str,"(Hamlet)");
for (int i = 0; i < sizeof(ptr1)/sizeof(*ptr1); ++i){
printf("%s ", ptr1[i]);
}
printf("\n");
return 0;
}
Later, after examining the answers, it became clear that the cell (& ptr2)[3] was identical to the memory cell in &ptr1[2], so the output of the program is: to be or not to be (Hamlet)
My question is, is it possible to know, only by written code in the notebook, without checking any compiler, that a certain pointer (or all variables in general) follow or precede other variables in memory?
Note, I do not mean array variables, so all the elements in the array must be in sequence.
In this statement:
(&ptr2)[3] = str;
ptr2 was defined with char *ptr2 inside main. With this definition, the compiler is responsible for providing storage for ptr2. The compiler is allowed to use whatever storage it wants for this—it could be before ptr1, it could be after ptr1, it could be close, it could be far away.
Then &ptr2 takes the address of ptr2. This is allowed, but we do not know where that address will be in relation to ptr1 or anything else, because the compiler is allowed to use whatever storage it wants.
Since ptr2 is a char *, &ptr2 is a pointer to char *, also known as char **.
Then (&ptr2)[3] attempts to refer to element 3 of an array of char * that is at &ptr2. But there is no array there in C’s model of computation. There is just one char * there. When you try to refer to element of 3 of an array when there is no element 3 of an array, the behavior is not defined by the C standard.
Thus, this code is a bad example. It appears the test author misunderstood C, and this code does not illustrate what was intended.
char *ptr2 = some initializer;
(&ptr2)[3] = str;
When you evaluate &ptr2, you obtain the address of memory where is stored the pointer that points to that initializer.
When you do (&ptr2)[3]=something you try to write 3*sizeof(void*) locations further from the location of ptr2, the address of a string. This is invalid and almost sure it finishes with segmentation fault.
No, it's not possible and no such assumptions can be made.
By writing outside a variable's space, this code invokes undefined behavior, it's basically "illegal" and anything can happen when you run it. The C language specification says nothing about variables being allocated on a stack in some particular order that you can exploit, it does however say that accessing random memory is undefined behavior.
Basically this code is pretty horrible and should never be used, even less so in a teaching environment. It makes me sad, how people mis-understand C and still teach it to others. :/
A program usually is loaded in memory with this structure:
Stack, Mmap'ed files, Heap, BSS (uninitialized static variables), Data segment (Initialized static variables) and Text (Compiled code)
You can learn more here:
https://manybutfinite.com/post/anatomy-of-a-program-in-memory/
Depending on how you declare the variable it will go to one of the places said before.
The compiler will arrange the BSS and Data segment variables as he wishes on compilation time so usually no chance. Neither heap vars (the OS will get the memory block that fits better the space allocated)
In the stack (which is a LIFO structure) the variables are put one over eachother so if you have:
int a = 5;
int b = 10;
You can say that a and b will be placed one following the other. So, in this case you can tell.
There is another exception and that is if the variable is an structure or an array, they are always placed like i said before, each one following the last.
In your code ptr1 is an array of arrays of chars so it will follow the exception i said.
In fact, do the following exercise:
#include <stdio.h>
#include <string.h>
int main(){
const char * const ptr1[] = {"to be","or not to be","that is the question"};
for (int i = 0; i < 3; i++) {
for (int j = 0; j < strlen(ptr1[i]); j++)
printf("%p -> %c\n", &ptr1[i][j], ptr1[i][j]);
printf("\n");
}
}
and you will see the memory address and its content!
Have a nice day.

Character array initialization in C

I am trying to understand the array concept in string.
char a[5]="hello";
Here, array a is an character array of size 5. "hello" occupies the array index from 0 to 4. Since, we have declared the array size as 5, there is no space to store the null character at the end of the string.
So my understanding is when we try to print a, it should print until a null character is encountered. Otherwise it may also run into segmentation fault.
But, when I ran it in my system it always prints "hello" and terminates.
So can anyone clarify whether my understanding is correct. Or does it depends upon the system that we execute.
As ever so often, the answer is:
Undefined behavior is undefined.
What this means is, trying to feed this character array to a function handling strings is wrong. It's wrong because it isn't a string. A string in C is a sequence of characters that ends with a \0 character.
The C standard will tell you that this is undefined behavior. So, anything can happen. In C, you don't have runtime checks, the code just executes. If the code has undefined behavior, you have to be prepared for any effect. This includes working like you expected, just by accident.
It's very well possible that the byte following in memory after your array happens to be a \0 byte. In this case, it will look to any function processing this "string" as if you passed it a valid string. A crash is just waiting to happen on some seemingly unrelated change to the code.
You could try to add some char foo = 42; before or after the array definition, it's quite likely that you will see that in the output. But of course, there's no guarantee, because, again, undefined behavior is undefined :)
What you have done is undefined behavior. Apparently whatever compiler you used happened to initialize memory after your array to 0.
Here, array a is an character array of size 5. "hello" occupies the array index from 0 to 4. Since, we have declared the array size as 5, there is no space to store the null character at the end of the string.
So my understanding is when we try to print a, it should print until a null character is encountered.
Yes, when you use printf("%s", a), it prints characters until it hits a '\0' character (or segfaults or something else bad happens - undefined behavior). I can demonstrate that with a simple program:
#include <stdio.h>
int main()
{
char a[5] = "hello";
char b[5] = "world";
int c = 5;
printf("%s%s%d\n", a, b, c);
return 0;
}
Output:
$ ./a.out
helloworldworld5
You can see the printf function continuing to read characters after it has already read all the characters in array a. I don't know when it will stop reading characters, however.
I've slightly modified my program to demonstrate how this undefined behavior can create bad problems.
#include <stdio.h>
#include <string.h>
int main()
{
char a[5] = "hello";
char b[5] = "world";
int c = 5;
printf("%s%s%d\n", a, b, c);
char d[5];
strcpy(d, a);
printf("%s", d);
return 0;
}
Here's the result:
$ ./a.out
helloworld��world��5
*** stack smashing detected ***: <unknown> terminated
helloworldhell�p��UAborted (core dumped)
This is a classic case of stack overflow (pun intended) due to undefined behavior.
Edit:
I need to emphasize: this is UNDEFINED BEHAVIOR. What happened in this example may or may not happen to you, depending on your compiler, architecture, libraries, etc. You can make guesses to what will happen based on your understanding of different implementations of various libraries and compilers on different platforms, but you can NEVER say for certain what will happen. My example was on Ubuntu 17.10 with gcc version 7. My guess is that something very different could happen if I tried this on an embedded platform with a different compiler, but I cannot say for certain. In fact, something different could happen if I had this example inside of a larger program on the same machine.

Resources