A very often executed piece of code has the following calculation :
long *lp
char *ep, *cp
...
tlen = (ep - cp) / sizeof (*lp);
Would changing this to:
long *lp
char *ep, *cp
...
tlen = (ep - cp) / sizeof (long);
result in any more efficiency (since the sizeof of calculated at compile time) or would a modern compiler handle this at compile time already. what does gcc do ?
The sizeof operator is always a compile time evaluated construct 0, so there is no difference.
The fragment ...
tlen = (ep - cp) / sizeof (*lp);
will therefore be transformed into something not unlike ...
tlen = (ep - cp) / 4;
(assuming that sizeof(long)==4 1.), with optimizations applied the next transformation is probably ...
tlen = (ep - cp) >> 2;
More optimizations to come, of course; it's just a demonstration of a possible consequence of it being a compile time construct 0.
I would always prefer "sizeof(_var-name_)" over sizeof(_typename_), as its more generic and doesn't require manual adjustment when you change the type of the variable (except when you change from array to pointer).
0: Except for variable length arrays.
1: Size differs with platform
sizeof() is always calculated at compile-time, so there's no difference.
You can dispense with the division altogether by writing
tlen = ((long*)ep - (long*)cp);
I'm not sure if the implementation of this would be more efficient though. My little experiment was inconclusive. Test!
Edit: And as mentioned in the comments, it works only if the pointers actually point to longs (or to memory locations fit to hold longs). But if they didn't in the original code, the original result wouldn't make sense either, so I presumed that they are.
Would not result in performance difference but would result in behaviour differences depending on the platform. eg: on Win x64 sizeof(long) will be 4 but sizeof(*lp) is 8
Related
Why I need to figure out the smalles type of a literal (Backstory)
I've written a set of macros to create and use fifos. Macros allow for a generic, yet still very fast implementation on all systems with static memory allocation, such as in small embedded systems. The guys over at codereview did not have any major concerns with my implementation either.
The data is put into anonymous struts, all data is accessed by the identifier of that struct. Currently the functions-like macros to create these structs look like this
#define _fff_create(_type, _depth, _id) \
struct {uint8_t read; uint8_t write; _type data[_depth];} _id = {0,0,{}}
#define _fff_create_deep(_type, _depth, _id) \
struct {uint16_t read; uint16_t write; _type data[_depth];} _id = {0,0,{}}
What I'm looking for
Now I'd like to merge both of these into one macro. To do this I've to figure the minimum required size of read and write to index _depth amount of elements at compile time. Parameters name starting with _ indicate only a literal or a #define value might be passed, both are known at compile time.
Thus I hope to find a macro typeof_literal(arg) which returns uint8_t if arg<256 or uint16_t else.
What I've tried
GCC 4.9.2. offers a command called typeof(). However when used with any literal it returns an int type, which is two byte on my system.
Another feature of GCC 4.9.2 is a compound statement. typeof(({uint8_t u8 = 1; u8;})) will correctly return uint8_t. However I could not figure out a way to put a condition for the type in that block:
typeof(({uint8_t u8 = 1; uint16_t u16 = 1; input ? u8 : u16;})) always returns uint16_t because of the type promotion of the ?: operator
if(...) can't be used either, as any command will happen in "lower" blocks
Macros can't contain #if, which make them unusable for this comparison either.
Can't you just leave it like that?
I realize there might not be a solution to this problem. That's ok too; the current code is just a minor inconvinience. Yet I'd like to know if there's a tricky way around this. A solution to this could open up new possibilities for macros in general. If you are sure that this can't be possible, please explain why.
I think the building block you are looking for is __builtin_choose_expr, which is a lot like the ternary operator, but does not convert its result to a common type. With
#define CHOICE(x) __builtin_choose_expr (x, (int) 1, (short) 2)
this
printf ("%zu %zu\n", sizeof (CHOICE (0)), sizeof (CHOICE (1)));
will print
2 4
as expected.
However, as Greg Hewgill points out, C++ has better facilities for that (but they are still difficult to use).
The macro I was looking for can indeed be written with __builtin_choose_expr as Florian suggested. My solution is attached below, it has been tested and is confirmed working. Use it as you wish!
#define typeof_literal(_literal) \
typeof(__builtin_choose_expr((_literal)>0, \
__builtin_choose_expr((_literal)<=UINT8_MAX, (uint8_t) 0, \
__builtin_choose_expr((_literal)<=UINT16_MAX, (uint16_t) 0, \
__builtin_choose_expr((_literal)<=UINT32_MAX, (uint32_t) 0, (uint64_t) 0))), \
__builtin_choose_expr((_literal)>=INT8_MIN, (int8_t) 0, \
__builtin_choose_expr((_literal)>=INT16_MIN, (int16_t) 0, \
__builtin_choose_expr((_literal)>=INT32_MIN, (int32_t) 0, (int64_t) 0)))))
void qsort (void *a, size_t n, size_t es, int (*compare)(const void *, const void *)
where a is a start of array address, n is sizeof array, es is sizeof array element.
I read the source code of qsort in C that I can't understand. the code is as follows.
#define SWAPINT(a,es) swaptype = ((char*)a- (char*)0 % sizeof(long) || \
es % sizeof(long) ? 2: es == sizeof(long)? 0 : 1
I interpret this macro by,
if(((char*)a- (char*)0)% sizeof(long))==1 || es % sizeof(long)==1)
swaptype = 2;
else if(es== sizeof(long))
swaptype = 0;
else
swaptype = 1;
But I don't understand why type conversion is implemented, (char*)a.
And what means of this line?
(char*)a- (char*)0)% sizeof(long)==1
Wherever you found that code, you probably copied it incorrectly. I found some very similar code in libutil from Canu:
c.swaptype = ((char *)a - (char *)0) % sizeof(long) || \
es % sizeof(long) ? 2 : es == sizeof(long)? 0 : 1;
This code was likely illegitimally (because the terms of the copyright license are violated) copied from FreeBSD's libc:
//__FBSDID("$FreeBSD: src/lib/libc/stdlib/qsort.c,v 1.12 2002/09/10 02:04:49 wollman Exp $");
So I'm guessing you got it from a *BSD libc implementation. Indeedd FreeBSD's quicksort implementation contains the SWAPINIT macro (not SWAPINT):
#define SWAPINIT(TYPE, a, es) swaptype_ ## TYPE = \
((char *)a - (char *)0) % sizeof(TYPE) || \
es % sizeof(TYPE) ? 2 : es == sizeof(TYPE) ? 0 : 1;
After parsing, you should find that the above code is roughly the same as
condition_one = ((char *)a - (char *)0) % sizeof(long);
condition_two = es % sizeof(long);
condition_three = es == sizeof(long);
c.swaptype = (condition_one || condition_two) ? 2 : condition_three ? 0 : 1;
Note that condition_two, as a condition, is not the same as es % sizeof(long) == 1, but rather es % sizeof(long) != 0. Aside from that, your translation was correct.
The intent of these conditions seems to be as follows:
condition_one is true when a is not long-aligned.
condition_two is true when es is not a multiple of long.
condition_three is true when es is exactly long.
As a result,
swaptype == 2 is when you don't have enough guarantees about the elements to be clever about swapping,
swaptype == 1 is intended for arrays with elements that are aligned along long boundaries (note: but not necessarily aligned as longs!), and
swaptype == 0 is intended for arrays that match the previous description, that also have elements that are also long-sized.
There is explicit type conversion in this case, because a has type void*, for which type arithmetic is undefined. However, also note that ((char *)a - (char *)0) is undefined too:
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements.
(C11 draft N1570, section 6.5.6, clause 9 on pages 93 and 94.)
It's not exactly spelled out in C11, but the null pointer is not part of the same array as the object pointed to by a, so the basic rules for pointer arithmetic are violated, so the behaviour is undefined.
The macros is trying to check for alignment portably in a language, C, which doesn't really allow for such a test. So we subtract the null pointer from our pointer to obtain an integer, then take modulus the size of a long. If the result is zero, the data is long-aligned and we can access as longs. If it is not, we can try some other scheme.
As remarked in the comments, the macro definition you present does not expand to valid C code because it involves computing (char*)0 % sizeof(long), where the left-hand operand of the % has type char *. That is not an integer type, but both operands of % are required to have integer type.
Additionally, the macro's expansion has unbalanced parentheses. That's not inherently wrong, but it makes that macro tricky to use. Furthermore, even where operator precedence yields a sensible result, usage of parentheses and extra whitespace can aid human interpretation of the code, at no penalty to execution speed, and negligible extra compilation cost.
So, I think the desired macro would be more like this:
#define SWAPINT(a,es) swaptype = ( \
((((char*)a - (char*)0) % sizeof(long)) || (es % sizeof(long))) \
? 2 \
: ((es == sizeof(long)) ? 0 : 1)) \
)
I'd consider instead writing the penultimate line as
: (es != sizeof(long))
to reduce the complexity of the expression at a slight cost to its comprehensibility. In any event, the intent appears to be to set swaptype to:
2 if a is not aligned on an n-byte boundary, where n is the number of bytes in a long, or if es is not an integer multiple of the size of a long; otherwise
1 if es is unequal to the size of a long; otherwise
0
That's similar, but not identical, to your interpretation. Note, however, that even this code has undefined behavior because of (char*)a - (char*)0. Evaluating that difference has defined behavior only if both pointers point into, or just past the end of, the same object, and (char *)0 does not point (in)to or just past the end of any object.
You asked specifically:
But I don't understand why type conversion is implemented, (char*)a.
That is performed because pointer arithmetic is defined in terms of the pointed-to type, so (1), a conforming program cannot perform arithmetic with a void *, and (2) the code wants the result of the subtraction to be in the same units as the result of the sizeof operator (bytes).
And what means of this line?
(char*)a- (char*)0)% sizeof(long)==1
That line does not appear in the macro you presented, and it is not a complete expression because of unbalanced parentheses. It appears to be trying to determine whether a points one past an n-byte boundary, where n is as defined above, but again, evaluating the pointer difference has undefined behavior. Note also that for an integer x, x % sizeof(long) == 1 evaluated in boolean context has different meaning than x % sizeof(long) evaluated in the same context. The latter makes more sense in the context you described.
My guess would be that in c89 Version 1 is faster because sizeof is a compile time operator, so we will be comparing with a constant. But in c99 we can take sizeof a VLA, so sizeof is a run time operator.
So which one is faster in c99?
And which one is faster in c89?
One define and array for both of them:
#define NUM_ROWS(x) (int) (sizeof(x) / sizeof((x)[0]))
int x[5] = { 0 };
Version 1:
int i;
for (i = 0; i < NUM_ROWS(x); i++) {
// code
}
Version 2:
const int length = NUM_ROWS(x);
int i;
for (i = 0; i < length; i++) {
// code
}
The only true answer to what is faster is: measure.
That said, in version 1 you evaluate and end condition at each iteration of your loop, and in version 2 you only evaluate it once.
Even if sizeof is a constant if you compiler can put the constant value directly in a register for comparison in version 1, it could probably do the same in version 2.
So version2 is in theory either faster or at worst the same speed than version1 (most likely for a constant expression).
sizeof is only evaluated at run-time if a VLA is part of the expression.
Since that's not the case, it'll be just a compile time constant, and you will get the same performance.
Recently, I wrote some code to compare pointers like this:
if(p1+len < p2)
however, some staff said that I should write like this:
if(p2-p1 > len)
to be safe.
Here,p1 and p2 are char * pointers,len is an integer.
I have no idea about that.Is that right?
EDIT1: of course,p1 and p2 pointer to the same memory object at begging.
EDIT2:just one min ago,I found the bogo of this question in my code(about 3K lines),because len is so big that p1+len can't store in 4 bytes of pointer,so p1+len < p2 is true.But it shouldn't in fact,so I think we should compare pointers like this in some situation:
if(p2 < p1 || (uint32_t)p2-p1 > (uint32_t)len)
In general, you can only safely compare pointers if they're both pointing to parts of the same memory object (or one position past the end of the object). When p1, p1 + len, and p2 all conform to this rule, both of your if-tests are equivalent, so you needn't worry. On the other hand, if only p1 and p2 are known to conform to this rule, and p1 + len might be too far past the end, only if(p2-p1 > len) is safe. (But I can't imagine that's the case for you. I assume that p1 points to the beginning of some memory-block, and p1 + len points to the position after the end of it, right?)
What they may have been thinking of is integer arithmetic: if it's possible that i1 + i2 will overflow, but you know that i3 - i1 will not, then i1 + i2 < i3 could either wrap around (if they're unsigned integers) or trigger undefined behavior (if they're signed integers) or both (if your system happens to perform wraparound for signed-integer overflow), whereas i3 - i1 > i2 will not have that problem.
Edited to add: In a comment, you write "len is a value from buff, so it may be anything". In that case, they are quite right, and p2 - p1 > len is safer, since p1 + len may not be valid.
"Undefined behavior" applies here. You cannot compare two pointers unless they both point to the same object or to the first element after the end of that object. Here is an example:
void func(int len)
{
char array[10];
char *p = &array[0], *q = &array[10];
if (p + len <= q)
puts("OK");
}
You might think about the function like this:
// if (p + len <= q)
// if (array + 0 + len <= array + 10)
// if (0 + len <= 10)
// if (len <= 10)
void func(int len)
{
if (len <= 10)
puts("OK");
}
However, the compiler knows that ptr <= q is true for all valid values of ptr, so it might optimize the function to this:
void func(int len)
{
puts("OK");
}
Much faster! But not what you intended.
Yes, there are compilers that exist in the wild that do this.
Conclusion
This is the only safe version: subtract the pointers and compare the result, don't compare the pointers.
if (p - q <= 10)
Technically, p1 and p2 must be pointers into the same array. If they are not in the same array, the behaviour is undefined.
For the addition version, the type of len can be any integer type.
For the difference version, the result of the subtraction is ptrdiff_t, but any integer type will be converted appropriately.
Within those constraints, you can write the code either way; neither is more correct. In part, it depends on what problem you're solving. If the question is 'are these two elements of the array more than len elements apart', then subtraction is appropriate. If the question is 'is p2 the same element as p1[len] (aka p1 + len)', then the addition is appropriate.
In practice, on many machines with a uniform address space, you can get away with subtracting pointers to disparate arrays, but you might get some funny effects. For example, if the pointers are pointers to some structure type, but not parts of the same array, then the difference between the pointers treated as byte addresses may not be a multiple of the structure size. This may lead to peculiar problems. If they're pointers into the same array, there won't be a problem like that — that's why the restriction is in place.
The existing answers show why if (p2-p1 > len) is better than if (p1+len < p2), but there's still a gotcha with it -- if p2 happens to point BEFORE p1 in the buffer and len is an unsigned type (such as size_t), then p2-p1 will be negative, but will be converted to a large unsigned value for comparison with the unsigned len, so the result will probably be true, which may not be what you want.
So you might actually need something like if (p1 <= p2 && p2 - p1 > len) for full safety.
As Dietrich already said, comparing unrelated pointers is dangerous, and could be considered as undefined behavior.
Given that two pointers are within the range 0 to 2GB (on a 32-bit Windows system), subtracting the 2 pointers will give you a value between -2^31 and +2^31. This is exactly the domain of a signed 32-bit integer. So in this case it does seem to make sense to subtract two pointers because the result will always be within the domain you would expect.
However, if the LargeAddressAware flag is enabled in your executable (this is Windows-specific, don't know about Unix), then your application will have an address space of 3GB (when run in 32-bit Windows with the /3G flag) or even 4GB (when run on a 64-bit Windows system).
If you then start to subtract two pointers, the result could be outside the domain of a 32-bit integer, and your comparison will fail.
I think this is one of the reasons why the address space was originally divided in 2 equal parts of 2GB, and the LargeAddressAware flag is still optional. However, my impression is that current software (your own software and the DLL's you're using) seem to be quite safe (nobody subtracts pointers anymore, isn't it?) and my own application has the LargeAddressAware flag turned on by default.
Neither variant is safe if an attacker controls your inputs
The expression p1 + len < p2 compiles down to something like p1 + sizeof(*p1)*len < p2, and the scaling with the size of the pointed-to type can overflow your pointer:
int *p1 = (int*)0xc0ffeec0ffee0000;
int *p2 = (int*)0xc0ffeec0ffee0400;
int len = 0x4000000000000000;
if(p1 + len < p2) {
printf("pwnd!\n");
}
When len is multiplied by the size of int, it overflows to 0 so the condition is evaluated as if(p1 + 0 < p2). This is obviously true, and the following code is executed with a much too high length value.
Ok, so what about p2-p1 < len. Same thing, overflow kills you:
char *p1 = (char*)0xa123456789012345;
char *p2 = (char*)0x0123456789012345;
int len = 1;
if(p2-p1 < len) {
printf("pwnd!\n");
}
In this case, the difference between the pointer is evaluated as p2-p1 = 0xa000000000000000, which is interpreted as a negative signed value. As such, it compares smaller then len, and the following code is executed with a much too low len value (or much too large pointer difference).
The only approach that I know is safe in the presence of attacker-controlled values, is to use unsigned arithmetic:
if(p1 < p2 &&
((uintptr_t)p2 - (uintptr_t)p1)/sizeof(*p1) < (uintptr_t)len
) {
printf("safe\n");
}
The p1 < p2 guarantees that p2 - p1 cannot yield a genuinely negative value. The second clause performs the actions of p2 - p1 < len while forcing use of unsigned arithmetic in a non-UB way. I.e. (uintptr_t)p2 - (uintptr_t)p1 gives exactly the count of bytes between the bigger p2 and the smaller p1, no matter the values involved.
Of course, you don't want to see such comparisons in your code unless you know that you need to defend against determined attackers. Unfortunately, it's the only way to be safe, and if you rely on either form given in the question, you open yourself up to attacks.
I have tried implementing the sizeof operator. I have done in this way:
#define my_sizeof(x) ((&x + 1) - &x)
But it always ended up in giving the result as '1' for either of the data type.
I have then googled it, and I found the following code:
#define my_size(x) ((char *)(&x + 1) - (char *)&x)
And the code is working if it is typecasted, I don't understand why. This code is also PADDING a STRUCTURE perfectly.
It is also working for:
#define my_sizeof(x) (unsigned int)(&x + 1) - (unsigned int)(&x)
Can anyone please explain how is it working if typecasted?
The result of pointer subtraction is in elements and not in bytes. Thus the first expression evaluates to 1 by definition.
This aside, you really ought to use parentheses in macros:
#define my_sizeof(x) ((&x + 1) - &x)
#define my_sizeof(x) ((char *)(&x + 1) - (char *)&x)
Otherwise attempting to use my_sizeof() in an expression can lead to errors.
The sizeof operator is part of the C (and C++) language specification, and is implemented inside the compiler (the front-end). There is no way to implement it with other C constructs (unless you use GCC extensions like typeof) because it can accept either types or expressions as operand, without making any side-effect (e.g. sizeof((i>1)?i:(1/i)) won't crash when i==0 but your macro my_sizeof would crash with a division by zero). See also C coding guidelines, and wikipedia.
You should understand C pointer arithmetic. See e.g. this question. Pointer difference is expressed in elements not bytes.
#define my_sizeof(x) ((char *)(&x + 1) - (char *)&x)
This my_sizeof() macro will not work in the following cases:
sizeof 1 - 4 byte (for a platform with 4-byte int)
my_sizeof(1) - won't compile at all.
sizeof (int) - 4 byte(for a platform with 4-byte int)
my_sizeof(int) - won't compile code at all.
It will work only for variables. It won't work for data types like int, float, char etc., for literals like 2, 3.4, 'A', etc., nor for rvalue expressions like a+b or foo().
#define my_sizeof(x) ((&x + 1) - &x)
&x gives the address of the variable (lets say double x) declared in the program and incrementing it with 1 gives the address where the next variable of the type x can be stored (here addr_of(x) + 8, for the size of a double is 8Byte).
The difference gives the result that how many variables of type of x can be stored in that amount of memory which will obviously be 1 for the type x (for incrementing it with 1 and taking the difference is what we've done).
#define my_size(x) ((char *)(&x + 1) - (char *)&x)
typecasting it into char* and taking the difference will tell us how many variables of type char can be stored in the given memory space (the difference). Since each char requires only 1 Byte of memory therefore (amount of memory)/1 will give the number of bytes between two successive memory locations of the type of variable passed on to the macro and hence the amount of memory that the variable of type x requires.
But you won't be able to pass any literal to this macro and know their size.
But it always ended up in giving the result as '1' for either of the data type
Yes, that's how pointer arithmetic works. It works in units of the type being pointed to. So casting to char * works units of char, which is what you want.
This will work for both literals and variables.
#define my_sizeof(x) (char*) (&(((__typeof__(x) *)0)[1])) - (char *)(&(((__typeof__(x) *)0)[0]))
#define my_sizeof(x) ((&x + 1) - &x)
This is basically (difference of two memory values) / (size of the data type).
It gives you the number in which how many number of elements of type x can be stored. And that is 1. You can fit one full x element in this memory space.
When we typecast it to some other datatype, it represents how many number of elements of that datatype can be stored in this memory space.
#define my_size(x) ((char *)(&x + 1) - (char *)&x)
Typecasting it to (char *) gives you the exact number of bytes of memory because char is of one byte.
#define my_sizeof(x) (unsigned int)(&x + 1) - (unsigned int)(&x)
It will give you compilation error as you are typecasting a pointer type to int.
I searched this yesterday, and I found this macro:
#define mysizeof(X) ((X*)0+1)
Which expands X only once (no error as double evaluation of expression like x++), and it works fine until now.
# define my_sizeof(x) ((&x + 1) - &x)
&x gives the address of your variable and incrementing it with one (&x + 1), will give the address, where another variable of type x could be stored.
Now if we do arithmetic over these addresses like ((&x + 1) - &x), then it will tell that within ((&x + 1) - &x) address range 1 variable of type x could be stored.
Now, if we typecast that amount of memory with (char *) [because size of char is 1 byte and incrementing a char * would move with one byte only], then we would get the number of bytes type x is consuming
#include<bits/stdc++.h>
using namespace std;
//#define mySizeOf(T) (char*)(&T + 1) - (char*)(&T)
template<class T>
size_t mySizeOf(T)
{
T temp1;
return (char*)(&temp1 + 1) - (char*)(&temp1);
}
int main()
{
int num = 5;
long numl = 10;
long long numll = 100;
unsigned int num_un_sz = 500;
cout<<"size of int="<<mySizeOf(num) << endl;
cout<<"size of long="<<mySizeOf(numl) << endl;
cout<<"size of long long ="<<mySizeOf(numll) << endl;
cout<<"size of unsigned int="<<mySizeOf(num_un_sz) << endl;
return 0;
}