Implementation of sizeof operator - c

I have tried implementing the sizeof operator. I have done in this way:
#define my_sizeof(x) ((&x + 1) - &x)
But it always ended up in giving the result as '1' for either of the data type.
I have then googled it, and I found the following code:
#define my_size(x) ((char *)(&x + 1) - (char *)&x)
And the code is working if it is typecasted, I don't understand why. This code is also PADDING a STRUCTURE perfectly.
It is also working for:
#define my_sizeof(x) (unsigned int)(&x + 1) - (unsigned int)(&x)
Can anyone please explain how is it working if typecasted?

The result of pointer subtraction is in elements and not in bytes. Thus the first expression evaluates to 1 by definition.
This aside, you really ought to use parentheses in macros:
#define my_sizeof(x) ((&x + 1) - &x)
#define my_sizeof(x) ((char *)(&x + 1) - (char *)&x)
Otherwise attempting to use my_sizeof() in an expression can lead to errors.

The sizeof operator is part of the C (and C++) language specification, and is implemented inside the compiler (the front-end). There is no way to implement it with other C constructs (unless you use GCC extensions like typeof) because it can accept either types or expressions as operand, without making any side-effect (e.g. sizeof((i>1)?i:(1/i)) won't crash when i==0 but your macro my_sizeof would crash with a division by zero). See also C coding guidelines, and wikipedia.
You should understand C pointer arithmetic. See e.g. this question. Pointer difference is expressed in elements not bytes.

#define my_sizeof(x) ((char *)(&x + 1) - (char *)&x)
This my_sizeof() macro will not work in the following cases:
sizeof 1 - 4 byte (for a platform with 4-byte int)
my_sizeof(1) - won't compile at all.
sizeof (int) - 4 byte(for a platform with 4-byte int)
my_sizeof(int) - won't compile code at all.
It will work only for variables. It won't work for data types like int, float, char etc., for literals like 2, 3.4, 'A', etc., nor for rvalue expressions like a+b or foo().

#define my_sizeof(x) ((&x + 1) - &x)
&x gives the address of the variable (lets say double x) declared in the program and incrementing it with 1 gives the address where the next variable of the type x can be stored (here addr_of(x) + 8, for the size of a double is 8Byte).
The difference gives the result that how many variables of type of x can be stored in that amount of memory which will obviously be 1 for the type x (for incrementing it with 1 and taking the difference is what we've done).
#define my_size(x) ((char *)(&x + 1) - (char *)&x)
typecasting it into char* and taking the difference will tell us how many variables of type char can be stored in the given memory space (the difference). Since each char requires only 1 Byte of memory therefore (amount of memory)/1 will give the number of bytes between two successive memory locations of the type of variable passed on to the macro and hence the amount of memory that the variable of type x requires.
But you won't be able to pass any literal to this macro and know their size.

But it always ended up in giving the result as '1' for either of the data type
Yes, that's how pointer arithmetic works. It works in units of the type being pointed to. So casting to char * works units of char, which is what you want.

This will work for both literals and variables.
#define my_sizeof(x) (char*) (&(((__typeof__(x) *)0)[1])) - (char *)(&(((__typeof__(x) *)0)[0]))

#define my_sizeof(x) ((&x + 1) - &x)
This is basically (difference of two memory values) / (size of the data type).
It gives you the number in which how many number of elements of type x can be stored. And that is 1. You can fit one full x element in this memory space.
When we typecast it to some other datatype, it represents how many number of elements of that datatype can be stored in this memory space.
#define my_size(x) ((char *)(&x + 1) - (char *)&x)
Typecasting it to (char *) gives you the exact number of bytes of memory because char is of one byte.
#define my_sizeof(x) (unsigned int)(&x + 1) - (unsigned int)(&x)
It will give you compilation error as you are typecasting a pointer type to int.

I searched this yesterday, and I found this macro:
#define mysizeof(X) ((X*)0+1)
Which expands X only once (no error as double evaluation of expression like x++), and it works fine until now.

# define my_sizeof(x) ((&x + 1) - &x)
&x gives the address of your variable and incrementing it with one (&x + 1), will give the address, where another variable of type x could be stored.
Now if we do arithmetic over these addresses like ((&x + 1) - &x), then it will tell that within ((&x + 1) - &x) address range 1 variable of type x could be stored.
Now, if we typecast that amount of memory with (char *) [because size of char is 1 byte and incrementing a char * would move with one byte only], then we would get the number of bytes type x is consuming

#include<bits/stdc++.h>
using namespace std;
//#define mySizeOf(T) (char*)(&T + 1) - (char*)(&T)
template<class T>
size_t mySizeOf(T)
{
T temp1;
return (char*)(&temp1 + 1) - (char*)(&temp1);
}
int main()
{
int num = 5;
long numl = 10;
long long numll = 100;
unsigned int num_un_sz = 500;
cout<<"size of int="<<mySizeOf(num) << endl;
cout<<"size of long="<<mySizeOf(numl) << endl;
cout<<"size of long long ="<<mySizeOf(numll) << endl;
cout<<"size of unsigned int="<<mySizeOf(num_un_sz) << endl;
return 0;
}

Related

C - Why does the 'address of' operator return an integer value of a variable, regardless of variable type?

I was doing some practice with strings of different types and returning their address so I could understand the concept of pointer arithmetic better.
I noticed that when using the printf function, and %p as the reference character, the address would increment by 4 + 1 bytes when using the & operand on the variable, and by 1 byte without it.
Here is an example of my code and it's output:
1 #include <stdio.h>
2 #include <string.h>
3
4
5 int main ()
6 {
7 char charString_1[] = "Hello";
8 printf("%s\t%s\t %p\t %p\n", charString_1 + 1, charString_1 + 1, &charString_1 + 1, charString_1 + 1);
The output was the following
ello ello 0x7ffe76aba5d0 0x7ffe76aba5cb
Looking at the last two hex numbers only, the address is 203 and 208 (in decimal) respectively. So the latter is a char + int value bigger than the former. if I increment by two (&charString_1 + 2) , the gap is now 2(char + int) = 10 bytes.
I understand this question might be ridiculous, but my search results have turned up nothing. I'm trying to understand how memory works, and become better at finding common faults in buggy code.
When you do arithmetic on a pointer, the 'base unit size' is the size of the object pointed to.
So, for char_string, which points to a char (size = 1), the + 1 operation adds just one.
However, the expression &char_string evaluates as a pointer to an array, which (in your example) has a size of six characters (including the nul terminator), so the + 1 operation on that adds 6.
The difference in values printed by your two %p fields (5) is the difference between those two sizes (6 - 1 = 5). If you change the length of the array (e.g. like char charString_1[] = "Hello, World!";) you will see a corresponding change in the value of &charString_1 + 1.
"+1" will add the size of one of whatever type the compiler determines it is working with. In one case it believes it is working with a char, so it will add one byte (one "sizeof" a char). In the other case, it determines it is working with a pointer, so it will add one "sizeof" a pointer (typically 4 bytes).
(Edit: See below for the correction by Eric Postpischil, who points out that it actually sizeof pointer vs sizeof array)

difficulty in multidimensional pointers in c?

I have a C program that uses pointers but I am not able to understand the output. Why is the first output 1 and the other is 210. They are both pointers to a 3 dimensional array.
I'm not able to find a solution
int main() {
char arr[5][7][6];
char (*p)[5][7][6] = &arr;
printf("%d\n", (&arr + 1) - &arr);
printf("%d\n", (char *)(&arr + 1) - (char *)&arr);
printf("%d\n", (unsigned)(arr + 1) - (unsigned)arr);
printf("%d\n", (unsigned)(p + 1) - (unsigned)p);
return 0;
}
the first output is 1 and the last is 210
C does pointer arithmetic in units of the pointed-to type.
In (&arr + 1) - &arr, &arr is the address of a char [5][7][6] (an array of 5 arrays of 7 arrays of 6 char). Then &arr +1 is the address of one char [5][7][6] beyond &arr and (&arr + 1) - &arr is distance from &arr to &arr + 1 measured in units of char [5][7][6], so the distance is one unit.
In (char *)(&arr + 1) - (char *)&arr), the two addresses are converted to char *, so the arithmetic is done in units of char. So the result is the distance from &arr to &arr + 1 measured in units of char. Since the distance from &arr to &arr + 1 is one char [5][7][6], it is 5•7•6 char, which is 210 char, so the result is 210.
Incidentals
Do not use %d to print the results of subtracting pointers. When two pointers are subtracted, the type of the result is ptrdiff_t, and it may be printed with %td, as in printf("%td\n", (&arr + 1) - &arr));.
To convert pointers to integers, it is preferable to use uintptr_t, defined in <stdint.h>, rather than unsigned.
To print unsigned values, use %u, not %d.
To print uintptr_t values, include <inttypes.h> and use "%" PRIuPTR, as in printf("%" PRIuPTR "\n", (uintptr_t) (p + 1) - (uintptr_t) p);.
First, it's not safe to use %d to print pointer differences, which have type ptrdiff_t (which is a signed version of size_t).
Ignoring that, you have the following declarations:
char arr[5][7][6];
char (*p)[5][7][6] = &arr;
When subtracting two pointers, result is divided by the size of the target (i.e., the inverse of what happens when you add an integer to a pointer, in which case the integer is scaled by the size).
For the first example:
(&arr + 1) - &arr
Here both &arr and &arr + 1 have type char (*)[5][7][6], so the size of what they point to is sizeof(char [5][7][6]). The pointer addition multiplies 1 by this size, and the pointer subtraction divides the difference by this size, canceling it out. So the result is 1, regardless of the target size.
For the second example:
(char *)(&arr + 1) - (char *)&arr
Here the pointer addition again multiplies 1 by sizeof(char [5][7][6]), which is sizeof(char)*5*7*6, i.e. 1*5*7*6 which is 210. But the subtraction divides by sizeof(char) which is 1. So the result is 210.
For the third example:
(unsigned)(arr + 1) - (unsigned)arr
The effect of the unsigned casts is similar to the effect of the char * casts in the previous example. However, in this one two pointers are arr and arr + 1. In this context, the array types "decay" to the pointer types char (*)[7][6]. The size of the pointer target is therefore sizeof(char)*7*6 i.e. 1*7*6 which is 42. So the result is 42.
Finally, for the last example:
(unsigned)(p + 1) - (unsigned)p)
Both p and p + 1 have type char (*)[5][7][6], so the target size is 210. The unsigned casts again result in straight address subtraction, with no division applied to the result. So the result is 210.
char (*p)[5][7][6] = &arr;
Here p is an array of pointers to chars, not a pointer to an array of chars.
printf("%d\n", (&arr + 1) - &arr);
& sign returns address. you are doing math on addresses not values! and anything plus 1 and minus itself will result in 1
(unsigned)p
this casting behavior is not guaranteed and is not safe to do. and you are not dereferencing your pointer anywhere.
You should read more about pointers, types and casting and operator priority before doing this.
I recommend this two videos by Brian Will:
the C language (part 2 of 5)
the C language (part 5 of 5)

what is the size of variable defined in #define macro? [duplicate]

This question already has answers here:
A riddle (in C)
(4 answers)
Closed 5 years ago.
int array[] = {23,34,12,17,204,99,16};
#define TOTAL_ELEMENTS (sizeof(array) / sizeof(array[0]))
printf("%d",sizeof(TOTAL_ELEMENTS));
here is my some piece of sample code. the array is an integer type. so, the array[0] is also an integer. division of an integer by an integer should yield an integer. But, when I try to find the size of TOTAL_ELEMENTS by using sizeof() operator, it shows 8 bytes. why??
Your use of the macro expands to something like sizeof (sizeof ....). Since the result of sizeof is a size_t, you're getting sizeof (size_t), which is evidently 8 on your platform.
You probably wanted
printf("%zu\n", TOTAL_ELEMENTS);
(Note that %d is the wrong conversion specifier for a size_t, and a good compiler will at least warn about your version.)
Here's a complete program that works:
#include <stdio.h>
int main()
{
int array[] = {23,34,12,17,204,99,16};
size_t const TOTAL_ELEMENTS = (sizeof array) / (sizeof array[0]);
printf("%zu\n", TOTAL_ELEMENTS);
}
Output:
7
Note that I made TOTAL_ELEMENTS be an ordinary object, as there's no need for it to be a macro here. You may need it as a macro if you want a version that will substitute the array name, like this:
#define TOTAL_ELEMENTS(a) (sizeof (a) / sizeof (a)[0])
You'd then write:
printf("%zu\n", TOTAL_ELEMENTS(array));
When you use the #define the pre-processor replaces TOTAL_ELEMENTS with (sizeof(array) / sizeof(array[0])).
The result type of (sizeof(array) / sizeof(array[0])) is a size_t. When you use sizeof operator on a size_t it will return its size. In your case 8 bytes.
sizeof returns a size as type size_t, as per 6.5.3.4p5
Your platform's sizeof(size_t) is 8.
sizeof returns the size of an item, in bytes. Since a lot of types are larger than a single byte, if you have an array, one way to determine its size in elements is to take the total size in bytes of the array, then divide it by the size of the type of element it's composed of - and that type will be in the first element of the array, at index 0.
For example, if I have an array of ints, and an int is 4 bytes on my platform, it would look like this in memory
Item 0 | Item 1 | Item 2
4bytes | 4bytes | 4bytes
sizeof(array) would be 12, the array's total size in bytes. sizeof(array)/sizeof(array[0]) would be 3, the array's total size in elements.
You're using the macro wrong in your code. You should be using:
printf("%zu", TOTAL_ELEMENTS);
Otherwise the code expands to sizeof(sizeof ...) which is not what you want if you're actually after the length of elements in the array. sizeof(sizeof(something)) will return the size of a size_t type on your platform, which is why you're seeing 8.

What means of this code in C qsort?

void qsort (void *a, size_t n, size_t es, int (*compare)(const void *, const void *)
where a is a start of array address, n is sizeof array, es is sizeof array element.
I read the source code of qsort in C that I can't understand. the code is as follows.
#define SWAPINT(a,es) swaptype = ((char*)a- (char*)0 % sizeof(long) || \
es % sizeof(long) ? 2: es == sizeof(long)? 0 : 1
I interpret this macro by,
if(((char*)a- (char*)0)% sizeof(long))==1 || es % sizeof(long)==1)
swaptype = 2;
else if(es== sizeof(long))
swaptype = 0;
else
swaptype = 1;
But I don't understand why type conversion is implemented, (char*)a.
And what means of this line?
(char*)a- (char*)0)% sizeof(long)==1
Wherever you found that code, you probably copied it incorrectly. I found some very similar code in libutil from Canu:
c.swaptype = ((char *)a - (char *)0) % sizeof(long) || \
es % sizeof(long) ? 2 : es == sizeof(long)? 0 : 1;
This code was likely illegitimally (because the terms of the copyright license are violated) copied from FreeBSD's libc:
//__FBSDID("$FreeBSD: src/lib/libc/stdlib/qsort.c,v 1.12 2002/09/10 02:04:49 wollman Exp $");
So I'm guessing you got it from a *BSD libc implementation. Indeedd FreeBSD's quicksort implementation contains the SWAPINIT macro (not SWAPINT):
#define SWAPINIT(TYPE, a, es) swaptype_ ## TYPE = \
((char *)a - (char *)0) % sizeof(TYPE) || \
es % sizeof(TYPE) ? 2 : es == sizeof(TYPE) ? 0 : 1;
After parsing, you should find that the above code is roughly the same as
condition_one = ((char *)a - (char *)0) % sizeof(long);
condition_two = es % sizeof(long);
condition_three = es == sizeof(long);
c.swaptype = (condition_one || condition_two) ? 2 : condition_three ? 0 : 1;
Note that condition_two, as a condition, is not the same as es % sizeof(long) == 1, but rather es % sizeof(long) != 0. Aside from that, your translation was correct.
The intent of these conditions seems to be as follows:
condition_one is true when a is not long-aligned.
condition_two is true when es is not a multiple of long.
condition_three is true when es is exactly long.
As a result,
swaptype == 2 is when you don't have enough guarantees about the elements to be clever about swapping,
swaptype == 1 is intended for arrays with elements that are aligned along long boundaries (note: but not necessarily aligned as longs!), and
swaptype == 0 is intended for arrays that match the previous description, that also have elements that are also long-sized.
There is explicit type conversion in this case, because a has type void*, for which type arithmetic is undefined. However, also note that ((char *)a - (char *)0) is undefined too:
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements.
(C11 draft N1570, section 6.5.6, clause 9 on pages 93 and 94.)
It's not exactly spelled out in C11, but the null pointer is not part of the same array as the object pointed to by a, so the basic rules for pointer arithmetic are violated, so the behaviour is undefined.
The macros is trying to check for alignment portably in a language, C, which doesn't really allow for such a test. So we subtract the null pointer from our pointer to obtain an integer, then take modulus the size of a long. If the result is zero, the data is long-aligned and we can access as longs. If it is not, we can try some other scheme.
As remarked in the comments, the macro definition you present does not expand to valid C code because it involves computing (char*)0 % sizeof(long), where the left-hand operand of the % has type char *. That is not an integer type, but both operands of % are required to have integer type.
Additionally, the macro's expansion has unbalanced parentheses. That's not inherently wrong, but it makes that macro tricky to use. Furthermore, even where operator precedence yields a sensible result, usage of parentheses and extra whitespace can aid human interpretation of the code, at no penalty to execution speed, and negligible extra compilation cost.
So, I think the desired macro would be more like this:
#define SWAPINT(a,es) swaptype = ( \
((((char*)a - (char*)0) % sizeof(long)) || (es % sizeof(long))) \
? 2 \
: ((es == sizeof(long)) ? 0 : 1)) \
)
I'd consider instead writing the penultimate line as
: (es != sizeof(long))
to reduce the complexity of the expression at a slight cost to its comprehensibility. In any event, the intent appears to be to set swaptype to:
2 if a is not aligned on an n-byte boundary, where n is the number of bytes in a long, or if es is not an integer multiple of the size of a long; otherwise
1 if es is unequal to the size of a long; otherwise
0
That's similar, but not identical, to your interpretation. Note, however, that even this code has undefined behavior because of (char*)a - (char*)0. Evaluating that difference has defined behavior only if both pointers point into, or just past the end of, the same object, and (char *)0 does not point (in)to or just past the end of any object.
You asked specifically:
But I don't understand why type conversion is implemented, (char*)a.
That is performed because pointer arithmetic is defined in terms of the pointed-to type, so (1), a conforming program cannot perform arithmetic with a void *, and (2) the code wants the result of the subtraction to be in the same units as the result of the sizeof operator (bytes).
And what means of this line?
(char*)a- (char*)0)% sizeof(long)==1
That line does not appear in the macro you presented, and it is not a complete expression because of unbalanced parentheses. It appears to be trying to determine whether a points one past an n-byte boundary, where n is as defined above, but again, evaluating the pointer difference has undefined behavior. Note also that for an integer x, x % sizeof(long) == 1 evaluated in boolean context has different meaning than x % sizeof(long) evaluated in the same context. The latter makes more sense in the context you described.

Casting and adding pointers to types of different size

Suppose I have following code snippet:
int8_t *a = 1;
int16_t *b = (int16_t*)(a + 1);
int32_t *c = (int32_t*)b + 2;
Then a = 1, b = 2, c = 10.
(Here I am not sure either, because I used printf() with %i and I got a warning about this.)
I am not quite sure how this works. I have some theories, but I prefer to read some documentation about it.
Can someone give me a key word to search for or explain the exact behaviour in this three cases to me? I wasn't able to find information on this matter on SO or google for lack of a word to search for.
Will the output change when I type
int16_t *a = 1;
int32_t *b = (int16_t*)(a + 1);
int64_t *c = (int32_t*)b + 2;
instead?
I think your whole program is undefined behaviour, because I'm not sure if it is valid to put arbitrary values into pointer variables or to output them with %i.
That said, I think most environments are ok with it, so I think I can start to explain.
If a is 1, it (misalignedly) points to the memory address 1.
Then you add 1, so that it points to 2, and cast the result to fit into b.
After that, you bake b a uint32_t * and add 2, so effectively you add 2*4 and thus get make b point to 10 (0xA).
If you do the said changes, your a points to 1 (and 2, as it has 16 bits), adding 1 will make b point to 3 (and 4) (the cast is not needed there), and c will point to 3+2*4 = 11.
Rather than a specific question like this you should probably try to gain more understanding of the language itself (pointers, dereference etc) http://www.cplusplus.com/doc/tutorial/pointers/
Assigning a value to a pointer is like saying this pointer represents the memory at address (value).
int8_t * a = 1; // a is memory at address 1
int16_t *b = (int16_t*)(a + 1); // b is memory at address (a + 1)... 2
int32_t *c = (int32_t*)b + 2; // c is memory at address 2 + (2 * sizeof(int32_t))... 10

Resources