How to explain the following C code snippet? - c

Gcc version:
gcc 4.4.3
The code snippet:
#include <stdio.h>
struct str {
int len;
char s[0];
};
struct test {
struct str *p_str;
};
int main()
{
struct test t = { 0 };
if (t.p_str->s) // FLAG_0
printf("here!");
printf(t.p_str->s); // FLAG_1
return 0;
}
I got a error when I run the code:Segmentation fault
I use the gdb to debug. I found it crashed at the FLAG_1
I am confused about it.
It ran OK at FLAG_0, but crashed at FLAG_1. Why ?
Meanwhile, I found the value of t.p_str is 0x00. I don't understand the if case is OK.
Note: The code is just for the study!

Since s is an array, not a pointer, it will never be null. So the compiler is free to omit the check, and assume it isn't null. If it does that, then FLAG_0 will not attempt to dereference the null pointer, so you won't get a segmentation fault at that point.
Of course, it's free to do anything else it feels like, since the program has undefined behaviour.

If you're talking about C++ it's illegal to declare an array of length 0. C++11 §8.3.4 Arrays:
If the constant-expression is present, it shall be an integral
constant expression and its value shall be greater than zero.
The constant-expression here is the array length. However, the standard allows creating zero length arrays dynamically using new.
If you're talking about C, then it's already entered undefined behaviour territory at FLAG_0 and thus nothing that happens afterwards is warranted by the language.

You declared an Array with 0 elements. char s[0];.
An empty string has at least one element, a binary zero.
This struct test t = { 0 }; does not initialise your structure to 0. It initialises your first element to 0. In this special case you just create a null pointer.
You can initialise using memset.
#include <string.h>
typedef struct{
int len;
char s[10];
}MyStr;
int main()
{
MyStr str;
memset(&str, 0, sizeof(MyStr));
}
One statement dereferences a nullpointer: printf(t.p_str->s); tries to resolve t.p_str->s with t.p_str beeing 0.

So first of all, the code is total rubbish and invokes undefined behaviour in multiple places, so anything can happen, including crashes or non crashes, unexpected or not.
What is t.p_str->s ?
s is an array. If t.p_str were a valid pointer, then t.p_str->s would be the address of the first character in the s array. That's a pointer, it's not a null pointer, so for the "if" statement that counts as a true result. The compiler doesn't actually need to evaluate the whole expression, because it doesn't care what pointer it is, only that it isn't a null pointer. That's why t.p_str->s doesn't crash here, because the program never evaluates it.
In the printf statement, the actual value of t.p_str->s is needed, so that one crashes.

Related

Question about values out of bounds of an array in C

I have a question about this code below:
#include <stdio.h>
char abcd(char array[]);
int main(void)
{
char array[4] = { 'a', 'b', 'c', 'd' };
printf("%c\n", abcd(array));
return 0;
}
char abcd(char array[])
{
char *p = array;
while (*p) {
putchar(*p);
p++;
}
putchar(*p);
putchar(p[4]);
return *p;
}
Why isn't segmentation fault generated when this program comes across putchar(*p) right after exiting while loop? I think that after *p went beyond the array[3] there is supposed to be no value assigned to other memory locations. For example, trying to access p[4] would be illegal because it would be out of the bound, I thought. On the contrary, this program runs with no errors. Is this because any other memories which no value are assigned (in this case any other memories than array[4]) should be null, whose value is '\0'?
OP seems to think accessing an array out-of-bounds, something special should happen.
Accessing outside array bounds is undefined behavior (UB). Anything may happen.
Let's clarify what a undefined behavior is.
The C standard is a contract between the developer and the compiler as to what the code means. However, it just so happens that you can write things that are just outside what is defined by the standard.
One of the most common cases is trying to do out-of-bounds access. Other languages say that this should result in an exception or another error. C does not. An argument is that it would imply adding costly checks at every array access.
The compiler does not know that what you are writing is undefined behavior¹. Instead, the compiler assumes that what you write contains no undefined behavior, and translate your code to assembly accordingly.
If you want an example, compile the code below with or without optimizations:
#include <stdio.h>
int table[4] = {0, 0, 0, 0};
int exists_in_table(int v)
{
for (int i = 0; i <= 4; i++) {
if (table[i] == v) {
return 1;
}
}
return 0;
}
int main(void) {
printf("%d\n", exists_in_table(3));
}
Without optimizations, the assembly I get from gcc does what you might expect: it just goes too far in the memory, which might cause a segfault if the array is allocated right before a page boundary.
With optimizations, however, the compiler looks at your code and notices that it cannot exit the loop (otherwise, it would try to access table[4], which cannot be), so the function exists_in_table necessarily returns 1. And we get the following, valid, implementation:
exists_in_table(int):
mov eax, 1
ret
Undefined behavior means undefined. They are very tricky to detect since they can be virtually invisible after compiling. You need advanced static analyzer to interpret the C source code and understand whether what it does can be undefined behavior.
¹ in the general case, that is; modern compilers use some basic static analyzer to detect the most common errors
C does no bounds checking on array accesses; because of how arrays and array subscripting are implemented, it can't do any bounds checking. It simply doesn't know that you've run past the end of the array. The operating environment will throw a runtime error if you cross a page boundary, but up until that point you can read or clobber any memory following the end of the array.
The behavior on subscripting past the end of the array is undefined - the language definition does not require the compiler or the operating environment to handle it any particular way. You may get a segfault, you may get corrupted data, you may clobber a frame pointer or return instruction address and put your code in a bad state, or it may work exactly as expected.
There are few remark points inside your program:
array inside the main and abcd function are different. In main, it is array of 4 elements, in abcd, it is an input variable with array type. If inside main, you call something like array[4] there will be compiler warnings for this. But there won't be compiler warning if you call in side abcd.
*p is a pointer point to array or in other word, it point to first element of array. In C, there isn't any boundary or limit for p. Your program is lucky because the memory after array contains 0 value to stop the while(*p) loop. If you did check the address of pointer p (&p). It might not equal to array[4].

Can't assign value to a structure's variable via pointer [C]

I'm pretty new to the C. I'm trying to create a simple program to represent a point using a structure. It looks like this:
// including standard libraries
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <assert.h>
// including user defined libraries
;
typedef struct point {
char p_name;
double *p_coords;
} point_t;
int main() {
point_t *pt;
pt->p_name = "A";
printf("%c", pt->p_name);
// returning 0 if there are no errors
return 0;
}
The problem is that, when I try to print the name of the point after I assigned the name "A" to it, the program does output nothing except for the exit code, which is (probably) a random number:
Process finished with exit code -1073741819 (0xC0000005)
The fact is that pointers is a concept that is very hard for me to understand (I used to program in python before) and therefore I'm probably missing something. I've also tried out with other variable types such as int, but the result is the same (even the exit status number is the same). Is there a way to fix this behaviour?
P.S.: Excuse my rudimental English, I'm still practising it, and thanks a lot for your time!
In your code
pt->p_name = "A";
is wrong for two primary reasons:
You never made pt point to any valid memory location. Attempt to dereference an invalid memory invokes undefined behavior.
p_name is of type char. "A" is a string literal, of type char [x], which boils down to char * for assignment, and they are not compatible types.
You need to
Make sure pt points to valid memory location. Actually, you don;t need a pointer here, at all. Define pt as a variable (not a pointer variable) of the structure type, and access the members via the . operator.
Use 'A' for assignment, as in character constant, not a string.
Pointers must be made to point somewhere. You never assign a value to the pointer pt, and attempting to dereference an uninitialized pointer value invokes undefined behavior.
You're also assigning a string to a character value. String use double quotes while single characters use single quotes
Use single quotes for a character, and a pointer must first be made to point somewhere:
point_t p;
point_t *pt = &p;
pt->p_name = 'A';
printf("%c", pt->p_name);
you have to use malloc to allocate the memory of the poin_t structure.
Something like
point_t *pt = malloc(sizeof(point_t));
pt->p_name = 'A';
printf("%c", pt->p_name);
And very importantly as others mentioned is that pt->p_name = "a" is also wrong you are allocating int a char a const char* I fixed in my example

strlen() of an empty array within a struct is not 0

I'm very new to C, and I'm not understanding this behavior. Upon printing the length of this empty array I get 3 instead of 0.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct entry entry;
struct entry{
char arr[16];
};
int main(){
entry a;
printf("%d\n",strlen(a.arr));
return 0;
}
What am I not understanding here?
The statement entry a; does not initialize the struct, so its value is likely garbage. Therefore, there's no guarantee that strlen on any of its members will return anything sensible. In fact, it might even crash the program, or worse.
There is no such thing as an "empty array" in C. Your array of char[16]; always contains 16 bytes - uninitialized as a local variable each char has an unspecified value. In addition, if none of these unspecified values happen to be 0, strlen will read outside the array and your code will have undefined behaviour.
Additionally strlen returns size_t and using %d to print this has undefined behaviour too; you must use %zu where z says that the corresponding argument is size_t.
(If by happenstance you're using the MSVC++ "C" compiler, do note that it might not support %zu. Get a real C compiler and C standard library instead.)
Here's the source code to strlen():
size_t strlen(const char *str)
{
const char *s;
for (s = str; *s; ++s);
return(s - str);
}
Wait, you mean there's source code to strlen()? Why yes. All the standard functions in C are themselves written in C.
This function starts at the memory address specified by str. It then uses the for function to start at that address, and then it goes forward, byte by byte, until it reaches zero. How does that for function do that? Well first it assigns s to str. Then, it checks the value s points to. If it's zero (i.e. if *s returns zero) then the for loop is done. If that value is not zero, the s pointer is incremented, and the zero check is done, over and over, until it finds a zero.
Finally, the distance that the s pointer has moved, minus the original pointer you passed in, is the result of strlen().
In other words, strlen() just walks through memory until it finds the next zero character, and it returns the number of characters from that point to the original pointer.
But, what if it doesn't find a zero? Does it stop? Nope. It will just trudge on and on until it finds a zero or the program crashes.
That is why strlen() is so confusing, and why it's source of many critical bugs in modern software. This doesn't mean you can't use it, but it does mean you must be very very careful to make sure that whatever you pass in is a null-terminated string (i.e. a set of zero or more non-zero characters, followed by a zero character.)
Remember also that in C, you basically have no idea what memory contains when you allocate it or set it aside. If you want it to be all zeros, then you need to make sure to fill it with zeros yourself!
Anyway, the answer to your question involves the use of the memset() function. You'll have to pass memset() the pointer to the beginning of your array, the length of that array, and the value to fill it with (in your case, zero of course!)
No initialization of a, this leads to undefined behavior.
C "strings" are '\0' terminated arrays of char. So strlen() will browse whole memory from given address until it either finds a '\0' or results in a segmentation fault.
What am I not understanding here?
Perhaps the mis-understanding is that auto variables, such as:
entry a;
are assigned memory from the process' stack. The pre-existing content of that stack memory is not zeroed-out for your benefit. Hence the value(s) of the elements of a, which will also be located on the process stack, will not be initially zeroed-out for your benefit. Rather, the entire content of a and its elements (including .arr) will contain bizarre and perhaps unexpected values.
C programmers learn to initialize auto variables by zeroing them out, or initializing them with a desirable value.
For example, the question code might do this as follows:
int main(){
entry a =
{
.arr[0] = 0
};
...
}
Or:
int main(){
entry a;
memset(&a, 0, sizeof(a));
...
}

pointer in c program

program in c language
void main()
{
char *a,*b;
a[0]='s';
a[1]='a';
a[2]='n';
a[3]='j';
a[4]='i';
a[5]='t';
printf("length of a %d/n", strlen(a));
b[0]='s';
b[1]='a';
b[2]='n';
b[3]='j';
b[4]='i';
b[5]='t';
b[6]='g';
printf("length of b %d\n", strlen(b));
}
here the output is :
length of a 6
length of b 12
Why and please explain it.
thanks in advance.
You are assigning to pointer (which contains garbage) without allocating memory. What you are noticing is Undefined Behavior. Also main should return an int. Also it does not make sense to try and find the length of an array of chars which are not nul terminated.
This is how you can go about:
Sample code
When you declare any variable it comes with whatever it had in memory previously where your application is running, and since pointers are essentially numbers, whatever number it had referenced to some random memory address.
Then, when setting a[i], the compiler interprets that as you want to step sizeof(a) bytes forward, thus, a[i] is equal to the address (a + i*1) (1 because chars use one byte).
Finally, C-strings need to be NULL terminated (\0, also known as sentinel), and methods like strlen go over the length of the string until hitting the sentinel, most likely, your memory had a stray 0 somewhere that caused strlen to stop.
Allocate some memory and terminate the strings then it will work better
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void main(){
char *a=malloc(10);
char *b=malloc(10);
if(a){
a[0]='s';
a[1]='a';
a[2]='n';
a[3]='j';
a[4]='i';
a[5]='t';
a[6]=(char)0;
printf("length of a %d\n", (int)strlen(a));
}else{
printf("Failed to allocate 10 bytes\n" );
}
if(b){
b[0]='s';
b[1]='a';
b[2]='n';
b[3]='j';
b[4]='i';
b[5]='t';
b[6]='g';
b[7]=(char)0;
printf("length of b %d\n", (int)strlen(b));
}else{
printf("Failed to allocate 10 bytes\n" );
}
free(a);
free(b);
}
Undefined behavior. That's all.
You're using an uninitialized pointer. After that, all bets are off as to what will happen.
Of course, we can attempt to explain why your particular implementation acts in a certain way but it'd be quite pointless outside of novelty.
The indexing operator is de-referencing the pointers a and b, but you never initialized those pointers to point at valid memory. Writing to un-initialized memory triggers undefined behavior.
You are simply "lucky" (or unlucky, it depends on your viewpoint) that the program doesn't crash, that the pointer values are such that you succeed in writing at those locations.
Note that you never write the termination character ('\0') to either string, but still get the "right" value from strlen(); this implies that a and b both point at memory that happens to be full of zeros. More luck.
This is a very broken program; that it manages to run "successfully" is because it's behavior is undefined, and undefined clearly includes "working as the programmer intended".
a and b are both char pointers. First of all, you didn't initialise them and secondly didn't terminate them with NULL.

Is it necessary to initialize the char array for accurate length?

In the below example when I define char array uninitialized and want to find the length, it's undefined behavior.
#include<stdio.h>
int main()
{
char a[250];
printf("length=%d\n",strlen(a));
}
I got "0". I don't know how? Explain it.
Luck. Whether it's good or bad luck is a matter of opinion. The contents of your array are whatever happened to already occupy that memory, and is not initialized. In your case, it happened the first byte was a '\0'.
This is, of course, undefined behavior and you can't depend on it happening this way.
You said in your example you were using an uninitialized char array to show undefined behavior, then when you got "0" you want an explanation? It's... undefined behavior.
If you got 0 for the length if just means that there happens to be a 0 as the first element of a[] in your uninitialized array. When it's an uninitialized local that means, as far as the C standards are concerened, anything can be in there, including a 0.
To address the question in your title: "Is it necessary to initialize the char array for accurate length?"
Yes, to be able to deterministically know the length of a string in a char array via the strlen() function, it is required for a null terminator to be present. That means it needs to be initialized or set in some manner or another.
As other answers say the strlen() result is more a matter of luck than defined behaviour
To find the "size" of the memory block use sizeof() instead
Note: I've also included the string.h and used a long conversion for the integers in the printf
#include<stdio.h>
#include<string.h>
int main()
{
char a[250];
printf("length=%ld\n",strlen(a));
printf("sizeof=%ld\n",sizeof(a));
}
when you define
char a[250];
The array will contains garabage contents and random.
strlen(a) count the number of not null charachter ('\0') till it find a null charachter then it stop.
so if your char a[250]; array contains garabage element and the first element is randomly set to null '\0' the strlen(a) will return 0

Resources