Why is the last argument of _mm_permute_ps an int? - c

GCC kindly informed me that the last argument of the SIMD intrinsic _mm_permute_ps must be an 8-bit immediate. Why then is its last argument declared as expecting an int?
__m128 _mm_permute_ps(__m128 a, int imm8);
__m256d _mm256_permute_pd(__m256d a, int imm8);
Would an 8-bit type parameter not provide a more helpful interface to the end user?

It is consistent with all the other intrinsics taking a shuffle vector or immediate argument. Probably to indicate it is an integer and not a character, while avoiding depending on stdint.h for int8_t.
The funnier part from a C++ point of view is that isn't constexpr, so you can give it non-compile time arguments, which then causes fun stuff for the compiler. I once tried improving the intrinsics for gcc in a way that assumed the immediate argument was compile-time, and it broke a surprisingly amount of code.

Related

printf with unmatched format and parameters

i'm trying to understand the printf function.
I know after i read about this function that the c compiler automatically casts all the parameters which are smaller than int like chars and shorts to int.
I also know that long long int (8 bytes) is not casted and pushed to the stack as it is.
so i wrote this simple c code:
#include <stdio.h>
int main()
{
long long int a = 0x4444444443434343LL;
// note that 0x44444444 is 4 times 0x44 which is D in ascii.
// and 0x43434343 is 4 times 0x43 which is C in ascii.
printf("%c %c\n", a);
return 0;
}
that creates the a variable whose size is 8 bytes and pushes it to the stack.
i also know that the printf loops through the format string and when it sees %c it will increment the pointer by 4 (because it knows that a char was converted to int - example below)
something like:
char c = (char) va_arg(list, int) -->
(*(int *)((pointer += sizeof(int)) - sizeof(int)))
as you can see it gets the 4 bytes when the pointer points, and increment it by 4
My question is:
in my logic, it should print on little endian machines C D
this is not what happens and i ask why? im sure some of you know more than me about the implementation and thats why i ask his question.
EDIT: the actual result is C with some garbage character follows it.
i know some might say that its undefined behavior - it really depends on the implementation and i just want to know the logic of the implementation..
Your logic would have explained the behavior of early C compilers in the 70s and 80s. Newer ABIs use a variety of methods to pass arguments to functions, including variable argument functions. You have to study your system ABI to understand how parameters are passed in your case, inferring from constructions that have explicit undefined behavior does not help.
By the way, types shorter than int are not cast or casted, they are promoted to int. Note that float values are converted to double when passed to variable argument functions. Non integer types and integer types larger than int are passed according to the ABI, which means they may be passed in regular registers or even special registers, not necessarily on the stack.
printf relies on macros defined in <stdarg.h> to hide these implementation details, and thus can be written in a portable manner for architectures with different ABIs and different standard type sizes.
There is a fundamental misunderstanding here, as revealed by the comment
according to the format string here the compiler should know that 4 bytes were pushed, convert 4 bytes to char and print it...
But the problem is that there is no rule saying that C uses a single, byte-addressed stack for everything.
Different processor architectures can -- and do -- use a variety of techniques for passing arguments to functions. Some arguments may be passed on a conventional stack, but others may be passed in registers, or via other techniques. Arguments of different types may be passed in different types of registers (32 vs. 64 bit, integer vs. floating point, etc.).
Obviously a C compiler has to know how to properly pass arguments for the platform it's compiling for. Obviously a variadic function like printf has to be carefully written to fetch its variable arguments correctly, based on the platform it's being used on. But a format specifier like %d does not, repeat not, simply mean "pop 4 bytes from the stack and treat them as an int". Similarly, %c does not mean "pop 4 bytes and print the resulting integer as a character". When printf encounters the format specifier %c or %d, it needs to arrange to fetch the next argument of type int, whatever it takes to do that. And if, in fact, the next argument actually passed by the calling code was not of type int -- for example if, as here, the next argument was actually of type long long int -- there's just no way of knowing in general what might happen.
Specifically, when printf has just seen a %d or %c specifier, what it does internally is the equivalent of calling
va_arg(argp, int)
And this literally says, "fetch the next argument of type int". And then it's actually up to the author of va_arg (and the rest of the functions and macros declared in <stdarg.h>) to know exactly what it takes to fetch the next argument of type int on this particular platform.
Clearly it is possible to know what will actually happen on a particular platform. (Obviously the author of va_arg had to know!) But you won't figure it out based on the C language itself, or by making guesses about what you think ought to happen. You're going to have to read about the ABI -- the Application Binary Interface -- that specifies the details of function calling conventions on your platform. These details can be hard to find, because very few programmers actually care about them.
I said that "printf has to be carefully written to fetch its variable arguments correctly", but actually I misspoke slightly, because as I said later, "it's actually up to the author of va_arg to know exactly what it takes". You're right, it is possible to write a reasonably portable implementation of printf. There's an example in the C FAQ list.
If you want to know more about function calling conventions, another interesting topic to read about is Foreign Function Interfaces or FFI. (For example, there's another library libffi that helps you to -- portably! -- perform some more exotic tasks involved in manipulating function arguments.)
There are simply too many notes types
C specifies 11 integer types signed char, char, … unsigned long long as distinct types. Aside from char must match signed char or unsigned char, these could be implemented as 10 different encodings or just 2 (Use 64-bit signed or unsigned for all).
The standard library has a printf() specifiers for each of those 11. (Due to sub-int promotions, there are additional concerns).
So far no real issues.
Yet C has lots of other types with printf() specifiers:
ju uintmax_t
jd intmax_t
zu size_t
td ptrdiff_t
PRIdLEASTN int_leastN_t where N is 8, 16, 32, 64
PRIuLEASTN uint_leastN_t
PRIdN intN_
PRIuN uintN_t
Many others
In general1 these additional types, could be distinct from or compatible with the 11 above.
Any time code uses these other types in a printf(), the distinct/compatible issue will arise and prevent many compilers from detecting/providing the best suggested matching print specifier.
1 Various conditions/limitations exist.

Convert __m256d to __m256i

Since cast like this:
__m256d a;
uint64_t t[4];
_mm256_store_si256( (__m256i*)t, (__m256i)a );/* Cast of 'a' to __m256i not allowed */
are not allowed when compiling under Visual Studio, I thought I could use some intrinsic functions to convert a __m256d value into a __m256i before passing it to _mm256_store_si256 and thus, avoiding the cast which causes the error.
But after looking on that list, I couldn't find a function taking for argument a __m256d value and returning a __256i value. So maybe you could help me writing my own function or finding the function I'm looking for, a function that stores 4x 64-bit double bit value to an array of 4x64-bit integers.
EDIT:
After further research, I found _mm256_cvtpd_epi64 which seems to be exactly what I want. But, my CPU doesn't support AVX512 instructions set...
What is left for me to do here?
You could use _mm256_store_pd( (double*)t, a). I'm pretty sure this is strict-aliasing safe because you're not directly dereferencing the pointer after casting it. The _mm256_store_pd intrinsic wraps the store with any necessary may-alias stuff.
(With AVX512, Intel switched to using void* for the load/store intrinsics instead of float*, double*, or __m512i*, to remove the need for these clunky casts and make it more clear that intrinsics can alias anything.)
The other option is to _mm256_castpd_si256 to reinterpret the bits of your __m256d as a __m256i:
alignas(32) uint64_t t[4];
_mm256_store_si256( (__m256i*)t, _mm256_castpd_si256(a));
If you read from t[] right away, your compiler might optimize away the store/reload and just shuffle or pextrq rax, xmm0, 1 to extract FP bit patterns directly into integer registers. You could write this manually with intrinsics. Store/reload is not bad, though, especially if you want more than 1 of the double bit-patterns as scalar integers.
You could instead use union m256_elements { uint64_t u64[4]; __m256d vecd; };, but there's no guarantee that will compile efficiently.
This cast compiles to zero asm instructions, i.e. it's just a type-pun to keep the C compiler happy.
If you wanted to actually round packed double to the nearest signed or unsigned 64-bit integer and have the result in 2's complement or unsigned binary instead of IEEE754 binary64, you need AVX512F _mm256/512_cvtpd_epi64 (vcvtpd2qq) for it to be efficient. SSE2 + x86-64 can do it for scalar, or you can use some packed FP hacks for numbers in the [0..2^52] range: How to efficiently perform double/int64 conversions with SSE/AVX?.
BTW, storeu doesn't require an aligned destination, but store does. If the destination is a local, you should normally align it instead of using an unaligned store, at least if the store happens in a loop, or if this function can inline into a larger function.

sqrt() of int type in C

I am programming in the c language on mac os x. I am using sqrt, from math.h, function like this:
int start = Data -> start_number;
double localSum;
for (start; start <= end; start++) {
localSum += sqrt(start);
}
This works, but why? and why am I getting no warning? On the man page for sqrt, it takes a double as parameter, but I give it an int - how can it work?
Thanks
Type conversions which do not cause a loss in precision might not throw warnings. They are cast implicitly.
int --> double //no loss in precision (e.g 3 became 3.00)
double --> int //loss in precision (e.g. 3.01222 became 3)
What triggers a warning and what doesn't is depends largely upon the compiler and the flags supplied to it, however, most compilers (atleast the ones I've used) don't consider implicit type-conversions dangerous enough to warrant a warning, as it is a feature in the language specification.
To warn or not to warn:
C99 Rationale states it like a guideline
One of the important outcomes of exploring this (implicit casting) problem is the understanding that high-quality compilers might do well to look
for such questionable code and offer (optional) diagnostics, and that
conscientious instructors might do well to warn programmers of the
problems of implicit type conversions.
C99 Rationale (Apr 2003) : Page 45
The compiler knows the prototype of sqrt, so it can - and will - produce the code to convert an int argument to double before calling the function.
The same holds the other way round too, if you pass a double to a function (with known prototype) taking an int argument, the compiler will produce the conversion code required.
Whether the compiler warns about such conversions is up to the compiler and the warning-level you requested on the command line.
For the conversion int -> double, which usually (with 32-bit (or 16-bit) ints and 64-bit doubles in IEEE754 format) is lossless, getting a warning for that conversion is probably hard if possible at all.
For the double -> int conversion, with gcc and clang, you need to specifically ask for such warnings using -Wconversion, or they will silently compile the code.
Int can be safely upcast automatically to a double because there's no risk of data loss. The reverse is not true. To turn a double to an int, you have to explicitly cast it.
C-compilers do some automatic casting with double and int.
you could also do the following:
int start = Data -> start_number;
int localSum;
for (start; start <= end; start++) {
localSum += sqrt(start);
}
Even if the localSum is an int this will still work, but it will always cut of anything beyond the point.
For example if the return value of sqrt() is 1.365, it will be stored as a simple 1.

Unsigned Overflow in C

Consider the following piece of C code:
#include <stdint.h>
uint32_t inc(uint16_t x) {
return x+1;
}
When compiled with gcc-4.4.3 with flags -std=c99 -march=core2 -msse4.1 -O2 -pipe -Wall on a pure x86_64 system, it produces
movzwl %di,%eax
inc %eax
retq
Now, unsigned overflow is predicted in C. I do not know much about x86_64 assembly, but as far as I can see the 16bit argument register is being moved to a 32bit register, which is incremented and returned. My question is, what if x == UINT16_MAX. An overflow would occur and the standard dictates x+1==0, right? However, given %eax is a 32bit register, it now contains UINT16_MAX+1, which is not correct.
This lets me connect one question here: is there a portable way to disable unsigned overflow in C so that the compiler can assume the upper bits of a small variable stored in a large register will always be 0 (so it needs not clear them)? If not (or if the solution is syntactically nasty), is there a way to do it at least in GCC?
Thank you very much for your time.
No, C types are subject to default promotions. Assuming uint16_t has lower conversion rank than int, it will be promoted to int and the addition will be carried out as an int, then converted to uint32_t when returned.
As for your related question at the end, I don't quite follow what you want.
Use a coding style that does not use compiler intermediaries for calculations, note that (1) is going to have the data type int.
uint32_t inc(uint16_t x) {
uint16_t y = x + 1;
return y;
}
A peculiarity of the way the standard describes integer overflow is that it allows compilers to assume that an overflow cannot occur. In the case you show there, the compiler is not expected to preserve the behavior of an overflow, since after all, the range of possible values that x+1 may take (assuming that overflow doesn't exist) fit in the return type.
For your second question, in C there is no such thing as overflow for unsigned types, the applicable term is wrapping. By definition unsigned types are computed modulo 2^width. Whenever you cast a wider unsigned type to one that is narrower the upper bits will simply be thrown away. All C compilers should implement it like this, there is nothing you have to worry about.
In essence unsigned types are quite simple, the nasty things only come for signed types.

doubt regarding operations on "int" flavors

I am having following doubt regarding "int" flavors (unsigned int, long int, long long int).
When we do some operations(* , /, + , -) between int and its flavors (lets say long int)
in 32bit system and 64bit system is the implicit typecast happen for "int"
for example :-
int x ;
long long int y = 2000;
x = y ; (Higher is assigned to lower one data truncation may happen)
I am expecting compiler to give warning for this But I am not getting any such warning.
Is this due to implicit typecast happen for "x" here.
I am using gcc with -Wall option. Is the behavior will change for 32bit and 64bit.
Thanks
Arpit
-Wall does not activate all possible warnings. -Wextra enables other warnings. Anyway, what you do is a perfectly "legal" operation and since the compiler can't always know at compile-time the value of the datum that could be "truncated", it is ok it does not warn: programmer should be already aware of the fact that a "large" integer could not fit into a "small" integer, so it is up to the programmer usually. If you think your program is written in not-awareness of this, add -Wconversion.
Casting without an explicit type cast operator is perfectly legal in C, but may have undefined behavior. In your case, int x; is signed, so if you try to store a value in it that's outside the range of int, your program has undefined behavior. On the other hand, if x were declared as unsigned x; the behavior is well-defined; cast is via reduction modulo UINT_MAX+1.
As for arithmetic, when you perform arithmetic between integers of different types, the 'smaller' type is promoted to the 'larger' type prior to the arithmetic. The compiler is free to optimize out this promotion of course if it does not affect the results, which leads to idioms like casting a 32bit integer to 64bit before multiplying to get a full 64bit result. Promotion gets a bit confusing and can have unexpected results when signed and unsigned values are mixed. You should look it up if you care to know since it's hard to explain informally.
If you are worried, you can include <stdint.h> and use types with defined lengths, such as uint16_t for a 16-bit unsigned integer.
Your code is perfectly valid (as already said by others). If you want to program in a portable way in most cases you should not use the bare C types int, long or unsigned int but types that tell a bit better what you are planing to do with it.
E.g for indices of arrays use always size_t. Regardless on whether or not you are on a 32 or 64 bit system this will be the right type. Or if you want to take the integer of maximal width on the platform you happen to land on use intmax_t or uintmax_t.
See http://gcc.gnu.org/ml/gcc-help/2003-06/msg00086.html -- the code is perfectly valid C/C++.
You might want to look at static analysis tools (sparse, llvm, etc.) to check for this type of truncation.

Resources