What is wrong with this C cast - c

I came across this in an IRC channel yesterday and didn't understand why it was bad behavior:
#include <stdio.h>
int main(void)
{
char x[sizeof(int)] = { '\0' }; int *y = (int *) x;
printf("%d\n", *y);
}
Is there any loss of data or anything? Can anyone give me any docs to explain further about what it does wrong?

The array x may not be properly aligned in memory for an int. On x86 you won't notice, but on other architectures, such as SPARC, dereferencing y will trigger a bus error (SIGBUS) and crash your program.
This problem may occur for any address:
int main(void)
{
short a = 1;
char b = 2;
/* y not aligned */
int* y = (int *)(&b);
printf("%d\n", *y); /* SIGBUS */
}

For one thing, the array x is not guaranteed to be aligned properly for an int.
There's been a conversation topic about how this might affect techniques like placement new. It should be noted that placement new needs to occur on properly aligned memory as well, but placement new is often used with memory that allocated dynamically, and allocation functions (in C and C++) are required to return memory that's suitably aligned for any type specifically so the address can be assigned to a pointer of any type.
The same isn't true for the memory allocated by the compiler for automatic variables.

Why not use a union instead?
union xy {
int y;
char x[sizeof(int)];
};
union xy xyvar = { .x = { 0 } };
...
printf("%d\n", xyvar.y);
I haven't verified it, but I would think the alignment problems mentioned by others would not be a problem here. If anyone has an argument for why this isn't portable, I'd like to hear it.

I think that while the alignment issue is true, it is not the whole story.
Even if alignment is not a problem, you are still taking 4 bytes on the stack, only one of them initialized to zero, and treating them like an integer.
This means that the printed value has 24 un-initialized bits.
And using un-initialized values is a basic 'wrong'.
(Assuming sizeof(int)==4 for simplicity).

Related

Why is this c code not changing the value of arr[3]?

I am creating an int array and then tricking c into believing that it's an array of short values. I know it's not good practice but I am just trying to understand why this isn't working. Shouldn't this change the value of arr[3] ?
#include <stdio.h>
int main() {
printf("Hello, World!\n");
int arr[5];
arr[0] = 0; arr[1] = 0; arr[2] = 0; arr[4] = 0;
arr[3] = 128;
((short*)arr)[6] = 128; // Shouldn't this change arr[3] ? as 6,7 indices in the arr of short would compromise of arr[3] in arr of ints?
int i = 0;
for (i = 0; i < 5; i++){
printf("%d\n", arr[i]);
}
return 0;
}
PS: Here's a deeper clarification:
When I cast int array to a short array, it seemingly becomes an array of 10 short elements (not 5). So when I change arr[6], I am changing only the first 16 bits of the int arr[3]. So arr[3] should still change and it is NOT that I am changing it to 128 again and not seeing the change.
FOR CLARIFICATION: THIS CODE IS ONLY FOR EXPERIMENTAL REASONS! I AM JUST LEARNING HOW POINTERS WORK AND I GET THAT ITS NOT GOOD PRACTICE.
Your code has undefined behavior, because you are writing a datum with a declared type through a pointer to a different type, and the different type is not char.
int arr[5];
/* ... */
((short*)arr)[6] = /* ANYTHING */;
The compiler is entitled to generate machine code that doesn't include the write to ((short*)arr)[6] at all, and this is quite likely with modern compilers. It's also entitled to delete the entire body of main on the theory that all possible executions of the program provoke undefined behavior, therefore the program will never actually be run.
(Some people say that when you write a program with undefined behavior, the C compiler is entitled to make demons fly out of your nose, but as a retired compiler developer I can assure you that most C compilers can't actually do that.)
Have you considered endianness?
EDIT: Now to add more clarity ...
As others have mentioned in the comments, this is most definitely undefined behavior! This is not just "not good practice", it's just don't do it!
Pointers on C is an excellent book that goes over everything you wanted to know about pointers and more. It's dated but still very relevant. You can probably find most of the information online, but I haven't seen many books that deal with pointers as completely as this one.
Though it sounds like you are experimenting, possibly as part of a class. So, here are a number of things wrong with this code:
endianness
memory access model
assumption of type size
assumption of hardware architecture
cross type casting
Remember, even though C is considered a pretty low level language today, it is still a high level programming language that affords many key abstractions.
Now, look at your declaration again.
int arr[5];
You've allocated 5 ints grouped together and accessed via a common variable named arr. By the standard, the array is 5 elements of at least 2 bytes per element with base address of &arr[0]. So, you aren't guaranteed that an int is 2 bytes, or 4 bytes or whatever. Likewise, as short is defined by the standard as at least 2 bytes. However, a short is not an int even if they have the same byte width! Remember, C is strongly typed.
Now, it looks like you are running on a machine where shorts are 2 bytes and ints are 4 bytes. That is where the endianness issue come into play: where is your most significant bit? And where is your most significant byte?
By casting the address of arr to a short pointer first of all breaks both the type and the memory access model. Then, you want to access the 6th element from the offset of arr. However, you aren't accessing relative to the int you declared arr to be, you are accessing through a short pointer that is pointing at the same address as arr!
These following operations ARE NOT the same! And it also falls into the category of undefined - don't do this ever!
int foo;
int pfooInt;
short bar;
short * pfooShort;
bar = (short) foo;
pfooShort = (short*)&foo;
pfooInt = &foo;
bar = *pfooShort;
pfooShort = (short*)pfooInt[0];
Another thing to clarify for you:
int arr[5];
((short *)arr)[6] ...
This does not transform your int array of 5 elements into a short array with 10 elements. arr is still an int array of 5 elements. You just broke the access method and are trying to modify memory in an undefined manner. What you did is tell the compiler "ignore what I told you about arr previously, treat arr as a short pointer for the life of this statement and access/modify 6th short relative to this pointer."
It is changing arr[3], however you are setting it back to 128 so you arent noticing a change. Change the line to:
((short*)arr)[6] = 72;
and you should see the following output:
Also a couple of things to clean up if you are new to C. You can initialize an array to zero by doing the following.
...
int arr[5] = { 0 };
arr[3] = 128;
...
Hope this helps!

Pointer array of pointers with C?

I want an array of pointers and I want to set byte values in the memory addresses where the pointers (of the array) are pointing.
Would this work:
unsigned int *pointer[4] = {(unsigned int *) 0xFF200020, (unsigned int *) 0xFF20001C, (unsigned int *) 0xFF200018, (unsigned int *) 0xFF200014};
*pointer[0] = 0b0111111; // the value is correct for the address
Or is the syntax somehow different?
EDIT:
I'm coding for an SOC board and these are memory addresses that contain the case of some UI elements.
unsigned int *element1 = (unsigned int *) 0xFF200020;
*element1 = 0b0111111;
works so I'm just interested about the C syntax of this.
EDIT2: There was one 0 too much in ... = 0b0...
Short answer:
Everything you've written is fine.
Thoughts:
I'm a big fan of using the types from stdint.h. This would let you write uint32_t which is more clearly a 32 bit unsigned number than unsigned long.
You'll often see people write macros to refer to these registers:
#define REG_IRQ (*(volatile uint32_t *)(0xFF200020))
REG_IRQ = 0x42;
It's possible that you actually want these pointers to be to volatile integers. You want it to be volatile if the value can change outside of the execution of your program. That is, if that memory position doesn't act strictly like a piece of memory. (For example, it's a register that stores the interrupt flags).
With most compilers I've used on embedded platforms, you'll have problems from ignoring volatile once optimizations have been enabled.
0b00111111 is, sadly, non-standard. You can use octal, decimal, or hexadecimal.
Sure, this should work, providing you can find addresses in your own segment.
Most probably, you'll have a segmentation fault when running this code, because 0xFF200020 have really few chances to be in your program segment.
This will not throw any error and will work fine but hard-coding memory address the pointer is pointing to is not a good idea. De-referencing some unknown/non-existing memory location will cause segmentation fault but if you are sure about the memory location and hard-coding values to them as done here is totally fine.

sizeof sideeffect and allocation location [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn’t sizeof for a struct equal to the sum of sizeof of each member?
I can not understand why is it like this:
#include <stdio.h>
#include <stdlib.h>
typedef struct
{
char b;
int a;
} A;
typedef struct
{
char b;
} B;
int main() {
A object;
printf("sizeof char is: %d\n",sizeof(char));
printf("sizeof int is: %d\n",sizeof(int));
printf("==> the sizeof both are: %d\n",sizeof(int)+sizeof(char));
printf("and yet the sizeof struct A is: %d\n",sizeof(object));
printf("why?\n");
B secondObject;
printf("pay attention that the sizeof struct B is: %d which is equal to the "
"sizeof char\n",sizeof(secondObject));
return 0;
}
I think I explained my question in the code and there is no more need to explain. besides I have another question:
I know there is allocation on the: heap/static heap/stack, but what is that means that the allocation location is unknown, How could it be ?
I am talking about this example:
typedef struct
{
char *_name;
int _id;
} Entry;
int main()
{
Entry ** vec = (Entry**) malloc(sizeof(Entry*)*2);
vec[0] = (Entry *) malloc(sizeof (Entry));
vec[0]->_name = (char*)malloc(6);
strcpy (vec[0]->_name, "name");
vec[0]->_id = 0;
return 0;
}
I know that:
vec is on the stack.
*vec is on the heap.
*vec[0] is on the heap.
vec[0]->id is on the heap.
but :
vec[0]->_name is unknown
why ?
There is an unspecified amount of padding between the members of a structure and at the end of a structure. In C the size of a structure object is greater than or equal to the sum of the size of its members.
Take a look at this question as well as this one and many others if you search for CPU and memory alignment. In short, CPUs are happier if they access the memory aligned to the size of the data they are reading. For example, if you are reading a uint16_t, then it would be more efficient (on most CPUs) if you read at an address that is a multiple of 2. The details of why CPUs are designed in such a way is whole other story.
This is why compilers come to the rescue and pad the fields of the structures in such a way that would be most comfortable for the CPU to access them, at the cost of extra storage space. In your case, you are probably given 3 byte of padding between your char and int, assuming int is 4 bytes.
If you look at the C standard (which I don't have nearby right now), or the man page of malloc, you will see such a phrase:
The malloc() and calloc() functions return a pointer to the allocated memory
that is suitably aligned for any kind of variable.
This behavior is exactly due to the same reason I mentioned above. So in short, memory alignment is something to care about, and that's what compilers do for you in struct layout and other places, such as layout of local variables etc.
You're running into structure padding here. The compiler is inserting likely inserting three bytes' worth of padding after the b field in struct A, so that the a field is 4-byte aligned. You can control this padding to some degree using compiler-specific bits; for example, on MSVC, the pack pragma, or the aligned attribute on GCC, but I would not recommend this. Structure padding is there to specify member alignment restrictions, and some architectures will fault on unaligned accesses. (Others might fixup the alignment manually, but typically do this rather slowly.)
See also: http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding
As to your second question, I'm unsure what you mean by the name is "unknown". Care to elaborate?
The compiler is free to add padding in structures to ensure that datatypes are aligned properly. For example, an int will be aligned to sizeof(int) bytes. So I expect the output for the size of your A struct is 8. The compiler does this, because fetching an int from an unaligned address is at best inefficient, and at worst doesn't work at all - that depends on the processor that the computer uses. x86 will fetch happily from unaligned addresses for most data types, but will take about twice as long for the fetch operation.
In your second code-snippet, you haven't declared i.
So vec[0]->_name is not unknown - it is on the heap, just like anything else you get from "malloc" (and malloc's siblings).

Dynamic memory allocation in 'c' Issues

I was writing a code using malloc for something and then faced a issue so i wrote a test code which actually sums up the whole confusion which is below::
# include <stdio.h>
# include <stdlib.h>
# include <error.h>
int main()
{
int *p = NULL;
void *t = NULL;
unsigned short *d = NULL;
t = malloc(2);
if(t == NULL) perror("\n ERROR:");
printf("\nSHORT:%d\n",sizeof(short));
d =t;
(*d) = 65536;
p = t;
*p = 65536;
printf("\nP:%p: D:%p:\n",p,d);
printf("\nVAL_P:%d ## VAL_D:%d\n",(*p),(*d));
return 0;
}
Output:: abhi#ubuntu:~/Desktop/ad/A1/CC$ ./test
SHORT:2
P:0x9512008: D:0x9512008:
VAL_P:65536 ## VAL_D:0
I am allocating 2 bytes of memory using malloc. Malloc which returns a void * pointer is stored in a void* pointer 't'.
Then after that 2 pointers are declared p - integer type and d - of short type. then i assigned t to both of them*(p =t and d=t)* that means both d & p are pointing to same mem location on heap.
on trying to save 65536(2^16) to (*d) i get warning that large int value is truncated which is as expected.
Now i again saved 65536(2^16) to (*p) which did not caused any warning.
*On printing both (*p) and (d) i got different values (though each correct for there own defined pointer type).
My question are:
Though i have allocated 2 bytes(i.e 16 bits) of heap mem using malloc how am i able to save 65536 in those two bytes(by using (p) which is a pointer of integer type).??
i have a feeling that the cause of this is automatic type converion of void to int* pointer (in p =t) so is it that assigning t to p leads to access to memory regions outside of what is allocated through malloc . ??.
Even though all this is happening how the hell derefrencing the same memory region through (*p) and (*d) prints two different answers( though this can also be explained if what i am thinking the cause in question 1).
Can somebody put some light on this, it will be really appreciated..and also if some one can explain the reasons behind this..
Many thanks
Answering your second question first:
The explanation is the fact that an int is generally 4 bytes, and the most significant bytes may be stored in the first two positions. A short, which is only 2 bytes, also stores its data in the first two positions. Clearly, then, storing 65536 in an int and a short, but pointing at the same memory location, will cause the data to be stored offset by two bytes for the int in relation to the short, with the two least significant bytes of the int corresponding to the storage for the short.
Therefore, when the compiler prints *d, it interprets this as a short and looks at the area corresponding to storage for a short, which is not where the compiler previously stored the 65536 when *p was written. Note that writing *p = 65536; overwrote the previous *d = 65536;, populating the two least significant bytes with 0.
Regarding the first question: The compiler does not store the 65536 for *p within 2 bytes. It simply goes outside the bounds of the memory you've allocated - which is likely to cause a bug at some point.
In C there is no protection at all for writing out of bounds of an allocation. Just don't do it, anything can happen. Here it seems to work for you because by some coincidence the space behind the two bytes you allocated isn't used for something else.
1) The granularity of the OS memory manager is 4K. An ovewrite by one bit is unlikely to trigger an AV/segfault, but will it corrupt any data in the adjacent location, leading to:
2) Undefined behaviour. This set of behaviour includes 'aparrently correct operation', (for now!).

Int in a simulated memory array of uchar

In C, in an Unix environment (Plan9), I have got an array as memory.
uchar mem[32*1024];
I need that array to contain different fields, such as an int (integer) to indicate the size of memory free and avaliable. So, I've tried this:
uchar* memp=mem;
*memp=(int)250; //An example of size I want to assign.
I know the size of an int is 4, so I have to force with casting or something like that, that the content of the four first slots of mem have the number 250 in this case, it's big endian.
But the problem is when I try to do what I've explained it doesn't work. I suppose there is a mistake with the conversion of types. I hopefully ask you, how could I force that mem[0] to mem[3] would have the size indicated, representated as an int and no as an uchar?
Thanks in advance
Like this:
*((int*) memp) = 250;
That says "Even though memp is a pointer to characters, I want you treat it as a pointer to integers, and put this integer where it points."
Have you considered using a union, as in:
union mem_with_size {
int size;
uchar mem[32*1024];
};
Then you don't have to worry about the casting. (You still have to worry about byte-ordering, of course, but that's a different issue.)
As others have pointed out, you need to cast to a pointer to int. You also need to make sure you take alignment of the pointer in consideration: on many architectures, an int needs to start at a memory location that is divisible by sizeof(int), and if you try to access an unaligned int, you get a SIGBUS. On other architectures, it works, but slowly. On yet others, it works quickly.
A portable way of doing this might be:
int x = 250;
memcpy(mem + offset, &x, sizeof(x));
Using unions may make this easier, though, so +1 to JamieH.
Cast pointer to int, not unsigned char again!
int * memp = (int *)mem;
* memp = 250; //An example of size I want to assign.

Resources