Getting the size of a struct value - c

Example
#include <stdio.h>
struct A {
char *b;
};
int main(int argc, char *argv[]) {
char c[4] = { 'c', 'a', 't', '\0' };
struct A a;
a.b = c;
printf("%s\n", a.b); // cat
printf("%lu\n", sizeof c); // 4
printf("%lu\n", sizeof a.b); // 8 ???
}
Why does sizeof a.b returns 8 and not 4? If I understood correctly, a.b returns the value that was assigned to it, which is c. But shouldn't it return the size of c (which is 4) then?

sizeof() operator gives the number of bytes allocated to the object and in your case the object is a pointer whose size looks like is 8 bytes on your system.

You're calling sizeof() on two different types.
sizeof(a.b) is sizeof(char *), which is 8 on your platform.
sizeof(c) is sizeof(char[4]), which is 4.
We can have pointers point to arrays via array decaying, which you can read about in this other answer: What is array decaying?

First of all sizeof(a.b) is not size of c. It doesn't give size of what it is pointing to, rather it is size of the pointer.
Take an example of char:
size of char a is 1
and char *b is 4. (on 64 bit)
So it is size of the pointer not what it points to. Please note these sizes are platform dependent.
Although don't get confused by int. An int and int * are of same size on some platforms.

If I understood correctly, a.b returns the value that was assigned to it,
Not exactly. a.b is what's called an lvalue. This means that it designates a memory location. However it does not read that memory location yet; that will only happen if we use a.b within a larger context that expects the memory location to be read.
For example:
a.b = something; // does not read a.b
something = a.b; // does read a.b
The case of sizeof is one context where it does not read the memory location. In fact it tells you how many bytes comprise that memory location; it doesn't tell you anything about what is stored there (let alone about some other memory location that might be pointed to by what is stored there, if it is a pointer).
The output is telling you that your system uses 8 bytes to store a pointer.

sizeof() returns the number of bytes of a variable.
In this case sizeof ( char * ) returns 8 bytes which is the number of bytes that compose a pointer.

Related

Size of pointer, pointer to pointer in C

How can I justify the output of the below C program?
#include <stdio.h>
char *c[] = {"Mahesh", "Ganesh", "999", "333"};
char *a;
char **cp[] = {c+3, c+2, c+1, c};
char ***cpp = cp;
int main(void) {
printf("%d %d %d %d ",sizeof(a),sizeof(c),sizeof(cp),sizeof(cpp));
return 0;
}
Prints
4 16 16 4
Why?
Here is the ideone link if you want to fiddle with it.
char *c[] = {"Mahesh", "Ganesh", "999", "333"};
c is an array of char* pointers. The initializer gives it a length of 4 elements, so it's of type char *[4]. The size of that type, and therefore of c, is 4 * sizeof (char*).
char *a;
a is a pointer of type char*.
char **cp[] = {c+3, c+2, c+1, c};
cp is an array of char** pointers. The initializer has 4 elements, so it's of type char **[4]. It size is 4 * sizeof (char**).
char ***cpp = cp;
cpp is a pointer to pointer to pointer to char, or char***. Its size is sizeof (char***).
Your code uses %d to print the size values. This is incorrect -- but it happens to work on your system. Probably int and size_t are the same size. To print a size_t value correctly, use %zu -- or, if the value isn't very large, you can cast it to int and use %d. (The %zu format was introduced in C99; there might still be some implementations that don't support it.)
The particular sizes you get:
sizeof a == 4
sizeof c == 16
sizeof cp == 16
sizeof cpp == 4
are specific to your system. Apparently your system uses 4-byte pointers. Other systems may have pointers of different sizes; 8 bytes is common. Almost all systems use the same size for all pointer types, but that's not guaranteed; it's possible, for example, for char* to be larger than char***. (Some systems might require more information to specify a byte location in memory than a word location.)
(You'll note that I omitted the parentheses on the sizeof expressions. That's legal because sizeof is an operator, not a function; its operand is either an expression (which may or may not be parenthesized) or a type name in parentheses, like sizeof (char*).)
a is an usually pointer, which represents the memory address. On 32-bit operating system, 32bit (4 Byte) unsigned integer is used to represent the address. Therefore, sizeof(a) is 4.
c is an array with 4 element, each element is a pointer, its size is 4*4 = 16
cp is also an array, each element is a pointer (the first *, wich point to another pointer (the second *). The later pointer points to an string in the memory. Therefore its basic element size should represent the size of a pointer. and then sizeof(cp) = 4*4 = 16.
cpp is a pointer's pointer's pointer. It is as well represent the 32bit memory address. therefore its sizeof is also 4.
a is a pointer. cpp is also a pointer just to different type (pointer to pointer to pointer).
Now c is an array. You have 4 elements, each is a pointer so you have 4 * 4 = 16 (it would be different if you would run it on x64).
Similar goes for cp. Try changing type to int and you will see the difference.
So the reason you got 4 16 16 4, is because 'a' is simply a pointer, on its own, which only requires 4 bytes (as a pointer is holding a 32bit address depending on your architecture) and so when you have a **pointer which is == to a *pointer[], your really making an array of pointers, and since you initalized 4 things that created 4 pointers, thus the 4x4 = 16. And for the cpp you may ask "well wouldn't it then be 16 as it was initalized?" and the answer is no, because a ***pointer is its own separate variable and still just a pointer(a pointer to a pointer to a pointer, or a pointer to an array of pointers), and requires only 4bytes of memory.

adding an integer to a char array in c

It may be silly question but still am not getting it.
I do have a char array say
char arr[100] having some data
char arry[100] ---- some data;
int test;
memcpy(&test,array+4,sizeof(int))
What does this memcpy will do
Thanks
SKP
This might be useful in so-called serialization of data.
Say, if someone saved an integer into a file.
Then you read the file into a buffer (arry in your case) as a stream of bytes. Now you want to convert these bytes into real data, e.g. in your case integer test which has been stored with offset 4.
There are several ways to do that. One is to use memcpy to copy bytes into area where compiler would treat them as an integer.
So to answer your question:
memcpy(&test,array+4,sizeof(int))
...will copy sizeof(int) number of bytes, starting from 4-rth byte from array into memory allocated for variable test (which has type int). Now test has the integer value which was saved into arry originally, probably using the following code:
memcpy(array+4, &original_int, sizeof(int))
Doing this requires some knowledge of hardware and the language. As there are many complications, among which:
byte order in integers.;
data alignment;
It just copy the element array[4] to variable test. On 32-bit machine sizeof(int) = 4. memcpy will copy 4 bytes to the address &test which can hold 4 bytes.
This will copy probably 4 bytes (depending on your machine and compiler--your int might be bigger or smaller) from the 4th through 7th bytes of arry into the integer test.
According to the documentation of memcpy() :
void * memcpy ( void * destination, const void * source, size_t num );
Copies the values of num bytes from the location pointed by source directly to the memory block pointed by destination.
In your case :
num=sizeof(int)
destination=&test A pointer to test
source=&array[4] A pointer to the fourth element of the array of char array
Hence, if sizeof(int)==4 it will copy array[4], array[5],array[6] and array[7] to test
There are questions that can help you understand the memory layout of integers :
int32 storage in memory
How is an integer stored in memory?
There is also an issue with endianless : on my computer, array[4] corresponds to the least significant byte.
Consequently, if array[7]=0x80 and array4]=array[5]=array[6]=0x00 then test will contain 00000080 and test will worth -2^31.
if array[7]=0x2A and array[5]=array[6]=array[4]=0x00 then test will contain 2A000000 and test will worth 42 (that is 0x0000002A).
Here is a test code to be compiled by gcc main.c -o main
#include <stdio.h>
#include <string.h>
int main(int argc,char *argv[]){
char array[100];
int test;
printf("sizeof(int) is %ld\n",sizeof(int));
array[4]=0x00;
array[5]=0;
array[6]=0;
array[7]=0x80;
memcpy(&test,&array[4],sizeof(int));
printf("test worth %d or(hexa) %x\n",test,test);
array[4]=0x2A;
array[5]=0;
array[6]=0;
array[7]=0x00;
memcpy(&test,&array[4],sizeof(int));
printf("test worth %d or(hexa) %x\n",test,test);
return 0;
}
Generally the C library function void *memcpy(void *str1,const void *str2,size_t n) copies n characters from memory area str2 to memory area str1, where:
str1 – this is pointer to the destination array where the content is to be copied, type-casted to a pointer of type void*
str2 -- this is pointer to source of data to be copied, type-casted to a pointer of type void*
n -- this is the number of bytes to be copied
memcpy returns a pointer to destination, which is str1
In your case, is copied the contents of the array, from the address pointed to by array[4] up to sizeof (int) bytes (4 bytes in this case, if you have a 32bit machine), the address pointed to by test

C: Acces bytes in mallocated memory

I have allocated array of void
I need to acces bytes of allocated memory
void* array = (void*) malloc(12);
array[0] = 0;
It returns me this error:
main.c:9: error: invalid use of void expression
array[0] = 0;
^
Is there any way how to do it ?
Thanks!
Your array is a void-pointer. And void (in C) means 'has no type'. So when you dereference it (like array[0]) the compiler has no idea what that means.
To access bytes you need a char type, which is actually the C-equivalent of a byte (a remnant from the days when characters would still fit into (8-bit) bytes).
So declare your array as:
char * array = malloc(12);
Also note that you don't have to cast the result of malloc (especially in your case since it already returns a void *). And, if you want just the 12 bytes and only use them locally (within the function or translation-unit that declares it) then you can just use a 'proper array':
char array[12];
This has the added bonus that you don't need to free it afterwards.
You need to use char or unsigned char rather than void to access the bytes:
char *array = malloc(12);
array[0] = 0;
malloc() returns a void pointer because it doesn't know the type you're allocating. You can't access the memory via that void pointer; you need to tell the compiler how to treat the block of memory. To treat it as bytes, use char or unsigned char.

Dereferencing and typecasting

I've constructed the following sections of code to help myself understand pointer dereferencing and typecasting in C.
char a = 'a';
char * b = &a;
int i = (int) *b;
For the above, I understand that on the 3rd line, I've dereferenced b and got 'a' and (int) will typecast the value of 'a' to its corresponding value of 97 which is stored into i. But for this section of code:
char a = 'a';
char * b = &a;
int i = *(int *)b;
This results in i being some arbitrary large number like 792351. I'm assuming this is a memory address but my question is why? When I typecast b to an integer pointer, does this actually cause b to point to a different area in memory? What is going on?
EDIT: If the above doesn't work, then why would something like this work:
char a = 'a';
void * b = &a;
char c = *(char *)b;
This correctly assigns 'a' to c.
Your int is larger than your char - you get the 'a' value + some random data following it in memory.
E.g, assuming this layout in memory:
'a'
0xFF
0xFF
0xFF
Your char * and int * both point to the 'a'. When you dereference the char *, you get only the first byte, the 'a'. When you dereference the int * (assuming your int is 32-bit) you get the 'a' and the 3 bytes of uninitialized data following it.
EDIT: In response to updated question:
In char c = *(char *)b;, b still points at the 'a' value. You cast it to a char *, and then dereference it, getting the char pointed to by a char *
The last line you're concerned about does a very bad thing. First, it treats b as an int* whereas b is a char*. That is, the memory pointer to by b is assumed as 4 bytes(typically) instead of 1 byte. So when you dereference it, it goes to the 1 byte pointed by the actual b, takes the following 3 bytes too, treats those 4 bytes as a single int, and gives you the result. That's why it's garbage.
In general, casting one pointer type to another pointer type must be done with great caution.
You're casting a char pointer to an int pointer. Characters are (usually) stored as 8 bits. ints, on the other hand, are 32 bits (or 64 on 64-bit systems). So if you look at the other 24 bits of memory next to the 8 bits worth of b, you'll get a bunch of extra bits that weren't initialized. Even the position of *b in i is architecture dependent.
big-endian: **** ****|**** ****|**** ****|0110 0001
little-endian: 0110 0001|**** ****|**** ****|**** ****
When you cast the character stored in the above, all the asterisks become relevant.
Since a char is 1 Byte long, and an int 4, when you read an int from the address of a single character, you're reading the character and 3 more bytes. The content of these bytes is just whatever happens to lie in memory (pointers, the value of b) and could even be unallocated (resulting in a segmentation fault).
When you type cast it to a (int *) type, it will refer to a total of 4 bytes(size if int) in memory.
In the second case, you're treating the same address as if it pointed to an int. Officially, the result is simply undefined behavior.
Realistically, what happens is that whatever happens to be in the four1 bytes starting at that address get interpreted as an int.
1 4 bytes assuming a 32-bit int -- if your implementation has, for example, a 64-bit int, it'll be 8 bytes.

Why does my homespun sizeof operator need a char* cast?

Below is the program to find the size of a structure without using sizeof operator:
struct MyStruct
{
int i;
int j;
};
int main()
{
struct MyStruct *p=0;
int size = ((char*)(p+1))-((char*)p);
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
return 0;
}
Why is typecasting to char * required?
If I don't use the char* pointer, the output is 1 - why?
Because pointer arithmetic works in units of the type pointed to. For example:
int* p_num = malloc(10 * sizeof(int));
int* p_num2 = p_num + 5;
Here, p_num2 does not point five bytes beyond p_num, it points five integers beyond p_num. If on your machine an integer is four bytes wide, the address stored in p_num2 will be twenty bytes beyond that stored in p_num. The reason for this is mainly so that pointers can be indexed like arrays. p_num[5] is exactly equivalent to *(p_num + 5), so it wouldn't make sense for pointer arithmetic to always work in bytes, otherwise p_num[5] would give you some data that started in the middle of the second integer, rather than giving you the sixth integer as you would expect.
In order to move a specific number of bytes beyond a pointer, you need to cast the pointer to point to a type that is guaranteed to be exactly 1 byte wide (a char).
Also, you have an error here:
printf("\nSIZE : [%d]\nSIZE : [%d]\n", size);
You have two format specifiers but only one argument after the format string.
If I don't use the char* pointer, the output is 1 - WHY?
Because operator- obeys the same pointer arithmetic rules that operator+ does. You incremented the sizeof(MyStruct) when you added one to the pointer, but without the cast you are dividing the byte difference by sizeof(MyStruct) in the operator- for pointers.
Why not use the built in sizeof() operator?
Because you want the size of your struct in bytes. And pointer arithmetics implicitly uses type sizes.
int* p;
p + 5; // this is implicitly p + 5 * sizeof(int)
By casting to char* you circumvent this behavior.
Pointer arithmetic is defined in terms of the size of the type of the pointer. This is what allows (for example) the equivalence between pointer arithmetic and array subscripting -- *(ptr+n) is equivalent to ptr[n]. When you subtract two pointers, you get the difference as the number of items they're pointing at. The cast to pointer to char means that it tells you the number of chars between those addresses. Since C makes char and byte essentially equivalent (i.e. a byte is the storage necessary for one char) that's also the number of bytes occupied by the first item.

Resources