How to get single-operand imul from C code

How to get single-operand imul from C code - c

What do I have to write in C to get the assembler to show the imul with one operand? For example:
imul %ebp

If you want to write C code so that the compiler emits imul with one operand then the only way is to use widening signed multiplication, i.e. cast the operand(s) to a signed type twice the register length. Hence in 32-bit mode the following code will produce the expected output
long long multiply(int x, int y) {
return (long long)x * y;
}
Because non-widening multiplication in 2's complement is the same for both signed and unsigned types, compilers will almost always use multi-operand imul as it's faster and more flexible.
In x86_64 compilers use multi-operand imul to produce 64-bit results even when the inputs were only 32-bits because they still typically have better latency and throughput than single-operand imul. As a result you'll also have to use a double-register-width (128-bit) type like above. Clang, ICC and GCC do support that via __int128, though.
What gcc versions support the __int128 intrinsic type?
Is there a 128 bit integer in gcc?
The below snippet
__int128_t multiply(long long x, long long y) {
return (__int128_t)x * y;
}
will be compiled to
mov rax, rdi
imul rsi
ret
If you use MSVC then you can use _umul128 in 64-bit mode

Related

How to get the mantissa of an 80-bit long double as an int on x86-64

frexpl won't work because it keeps the mantissa as part of a long double. Can I use type punning, or would that be dangerous? Is there another way?

x86's float and integer endianness is little-endian, so the significand (aka mantissa) is the low 64 bits of an 80-bit x87 long double.
In assembly, you just load the normal way, like mov rax, [rdi].
Unlike IEEE binary32 (float) or binary64 (double), 80-bit long double stores the leading 1 in the significand explicitly. (Or 0 for subnormal). https://en.wikipedia.org/wiki/Extended_precision#x86_extended_precision_format
So the unsigned integer value (magnitude) of the true significand is the same as what's actually stored in the object-representation.
If you want signed int, too bad; including the sign bit it would be 65 bits but int is only 32-bit on any x86 C implementation.
If you want int64_t, you could maybe right shift by 1 to discard the low bit, making room for a sign bit. Then do 2's complement negation if the sign bit was set, leaving you with a signed 2's complement representation of the significand value divided by 2. (IEEE FP uses sign/magnitude with a sign bit at the top of the bit-pattern)
In C/C++, yes you need to type-pun, e.g. with a union or memcpy. All C implementations on x86 / x86-64 that expose 80-bit floating point at all use a 12 or 16-byte type with the 10-byte value at the bottom.
Beware that MSVC uses long double = double, a 64-bit float, so check LDBL_MANT_DIG from float.h, or sizeof(long double). All 3 static_assert() statements trigger on MSVC, so they all did their job and saved us from copying a whole binary64 double (sign/exp/mantissa) into our uint64_t.
// valid C11 and C++11
#include <float.h> // float numeric-limit macros
#include <stdint.h>
#include <assert.h> // C11 static assert
#include <string.h> // memcpy
// inline
uint64_t ldbl_mant(long double x)
{
// we can assume CHAR_BIT = 8 when targeting x86, unless you care about DeathStation 9000 implementations.
static_assert( sizeof(long double) >= 10, "x87 long double must be >= 10 bytes" );
static_assert( LDBL_MANT_DIG == 64, "x87 long double significand must be 64 bits" );
uint64_t retval;
memcpy(&retval, &x, sizeof(retval));
static_assert( sizeof(retval) < sizeof(x), "uint64_t should be strictly smaller than long double" ); // sanity check for wrong types
return retval;
}
This compiles efficiently on gcc/clang/ICC (on Godbolt) to just one instruction as a stand-alone function (because the calling convention passes long double in memory). After inlining into code with a long double in an x87 register, it will presumably compile to a TBYTE x87 store and an integer reload.
## gcc/clang/ICC -O3 for x86-64
ldbl_mant:
mov rax, QWORD PTR [rsp+8]
ret
For 32-bit, gcc has a weird redundant-copy missed-optimization bug which ICC and clang don't have; they just do the 2 loads from the function arg without copying first.
# GCC -m32 -O3 copies for no reason
ldbl_mant:
sub esp, 28
fld TBYTE PTR [esp+32] # load the stack arg
fstp TBYTE PTR [esp] # store a local
mov eax, DWORD PTR [esp]
mov edx, DWORD PTR [esp+4] # return uint64_t in edx:eax
add esp, 28
ret
C99 makes union type-punning well-defined behaviour, and so does GNU C++. I think MSVC defines it too.
But memcpy is always portable so that might be an even better choice, and it's easier to read in this case where we just want one element.
If you also want the exponent and sign bit, a union between a struct and long double might be good, except that padding for alignment at the end of the struct will make it bigger. It's unlikely that there'd be padding after a uint64_t member before a uint16_t member, though. But I'd worry about :1 and :15 bitfields, because IIRC it's implementation-defined which order the members of a bitfield are stored in.

while executing procprob, does b can be any data type?

c code :
*u +=a;
*v +=b;
return sizeof(a)+ sizeof(b);
x86-64 code:
movslq %edi, %rdi
addq %rdi, (%rdx)
addb %sil, (%rcx)
movl $6 %eax
ret
I know that movl $6 %eax means 2+4(or 4+2) and one is int and the other is short.
But when we think that ignoring movl $6 %eax, b can be any data types such as 1 ,2, 4 and 8 bytes of the data type. I have a question about this.
Let's assume b is long (Of course, we ignore movl $6 %eax) Does the assembly for b being %sil means b has only 1 byte data and remaining 7 bytes has only zero? Give me some examples when it is okay for b to have %sil register(1-byte register) even though b is long data type(8 bytes register)

Does the assembly for b being %sil mean b has only 1 byte data and remaining 7 bytes has only zero?
No, it means that *v (in memory) is only 1 byte long. Any bytes after that are not part of the object pointed to by v at all. (It has a different size than b.)
If you're supposed to reverse-engineer the types of a and b from the asm: notice that it's sizeof a and b, not sizeof *u and *v. The operand-size of the add instructions matches sizeof(*u) and sizeof(*v), and the source operands for those are the result of C integer promotion / conversion rules being applied to a and b.
e.g. l += s is like l += (long)s if we have long l; short s;
If the addq was confusing you, don't worry, that's invalid with a byte register. Trying to assemble that with GAS (gcc -c foo.s) gives:
foo.s:1: Error: `%sil' not allowed with `addq'
If we assume that it's actually addb %sil, (%rcx) instead of an illegal addq, then the question is answerable.
Assuming the C statements are in the same order as the asm instructions (the compiler chose not to reorder them), then this looks like code from a function signature like this, compiled for the x86-64 System V ABI, so args are in RDI, RSI, RDX, RCX in that order.
int f(TYPEA a, TYPEB b, TYPEU *u, TYPEV *v);
TYPEA and TYPEU are not the same type, which we can already tell because 8 > 6 so any qword type doesn't fit, and the fact that sign-extension was needed.
A dword a is sign-extended to qword. So a is a 32-bit signed integer type. In x86-64 System V, only int meets that description out of the basic types. long is 64-bit, short is 16-bit. (In Windows x64, long is also a 32-bit type, but this smells like x86-64 System V from the choice of registers.)
int32_t is defined in terms of int, on gcc, in case you want to think about it in terms of the fixed-width types.
If it had been movswq %di, %rdi, we'd have int16_t a (or short a). It there had been no sign-extension, then we'd know it was one of int64_t a or uint64_t a.
(*u is either uint64_t or int64_t; we don't know which. (unsigned long long)(int)x; sign-extends to the width of unsigned long long.
Your 6 = 2+4 logic is correct. The other type is definitely 16-bit = 2 bytes, because char is 1 byte in x86-64 System V so sizeof sizes are in bytes. And no mainstream ABI has 5-byte integer types.
short is a 16-bit type; so is unsigned short. We can't uniquely determine which it is.
We're inferring it only from the size: any wider or smaller integer type added to an int8_t will be truncated to the width. (Signed overflow here might actually be undefined behaviour in C, I forget. When compiled for x86-64, the resulting asm behaves the way you'd expect and only takes the low byte of whatever integer type it was.)
Compiling this with clang 7.0 -O3 (on the Godbolt compiler explorer) gives almost exactly the asm you show in the question (except with addb instead of addq). gcc puts the mov-immediate earlier in the function, which possibly lets the function decode in fewer clock cycles, or at least the mov decode a cycle earlier, along with one of the 2-fused-domain-uop memory-destination add instructions.
typedef int TYPEA;
typedef short TYPEB;
typedef long TYPEU;
typedef char TYPEV;
int f(TYPEA a, TYPEB b, TYPEU *u, TYPEV *v) {
*u +=a;
*v +=b;
return sizeof(a)+ sizeof(b);
}
# clang -O3 output
f: # #f
movslq %edi, %rax # clang uses RAX instead of extending into the same register
addq %rax, (%rdx) # no difference in effect.
addb %sil, (%rcx)
movl $6, %eax
retq
Of course, unsigned char or unsigned long for the pointer types give the same asm. Or unsigned long long, which is also a 64-bit type.
But more importantly, unsigned short b would also give the same asm.

How can I do 64 bit division on linux kernel? [duplicate]

This question already has answers here:
How to divide two 64-bit numbers in Linux Kernel?
(4 answers)
Closed 4 years ago.
I want the do the following code on linux kernel (32 bit processor):
#define UQ64 long long int
#define UI32 long int
UQ64 qTimeStamp;
UQ64 qSeconds;
UI32 uTimeStampRes;
qTimeStamp = num1;
uTimeStampRes = num2;
// 64 division !
qSeconds = qTimeStamp / uTimeStampRes;
There is an algorithm to calculate the 64 division ?
Thanks.

The GCC C compiler generates code that calls functions in the libgcc library to implement the / and % operations with 64-bit operands on 32-bit CPUs. However, the Linux kernel is not linked against the libgcc library, so such code will fail to link when building code for a 32-bit Linux kernel. (When building an external kernel module, the problem may not be apparent until you try and dynamically load the module into the running kernel.)
Originally, the Linux kernel only had the do_div(n,base) macro defined by #include <asm/div64.h>. The usage of this macro is unusual because it modifies its first argument in place to become the quotient resulting from the division, and yields (returns) the remainder from the division as its result. This was done for code efficiency reasons but is a bit of a pain to use. Also, it only supports division of a 64-bit unsigned dividend by a 32-bit divisor.
Linux kernel version 2.6.22 introduced the #include <linux/math64.h> header, which defines a set of functions which is more comprehensive than the old do_div(n,base) macro and is easier to use because they behave like normal C functions.
The functions declared by #include <linux/math64.h> for 64-bit division are listed below. Except where indicated, all of these have been available since kernel version 2.6.26.
One of the functions listed below in italics does not exist yet as of kernel version 4.18-rc8. Who knows if it will ever be implemented? (Some other functions declared by the header file related to multiply and shift operations in later kernel versions have been omitted below.)
u64 div_u64(u64 dividend, u32 divisor) — unsigned division of 64-bit dividend by 32-bit divisor.
s64 div_s64(s64 dividend, s32 divisor) — signed division of 64-bit dividend by 32-bit divisor.
u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder) — unsigned division of 64-bit dividend by 32-bit divisor with remainder.
s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder) — signed division of 64-bit dividend by 32-bit divisor with remainder.
u64 div64_u64(u64 dividend, u64 divisor) — unsigned division of 64-bit dividend by 64-bit divisor.
s64 div64_s64(s64 dividend, s64 divisor) — (since 2.6.37) signed division of 64-bit dividend by 64-bit divisor.
u64 div64_u64_rem(u64 dividend, u64 divisor, u64 *remainder) — (since 3.12.0) unsigned division of 64-bit dividend by 64-bit divisor with remainder.
s64 div64_s64_rem(s64 dividend, s64 divisor, s64 *remainder) — (does not exist yet as of 4.18-rc8) signed division of 64-bit dividend by 64-bit divisor with remainder.
div64_long(x,y) — (since 3.4.0) macro to do signed division of a 64-bit dividend by a long int divisor (which is 32-bit or 64 bit, depending on the architecture).
div64_ul(x,y) — (since 3.10.0) macro to do unsigned division of a 64-bit dividend by an unsigned long int divisor (which is 32-bit or 64-bit, depending on the architecture).
u32 iter_div_u64_rem(u64 dividend, u32 divisor, u64 *remainder) — unsigned division of 64-bit division by 32-bit divisor by repeated subtraction of divisor from dividend, with remainder (may be faster than regular division if the dividend is not expected to be much bigger than the divisor).

You can divide any size numbers on any bits computer. The only difference is the way the division is done. On a processor which handles 64 bit integers natively, it will be one machine code instruction (I do not know any 64bit processor without hardware division). On processors with narrower registers it will be translated to a series of machine code instructions or a call to a library function dividing those 64-bit numbers:
uint64_t foo(uint64_t x, uint64_t y)
{
return x/y;
}
On amd64 instruction set:
mov rax, rdi
xor edx, edx
div rsi
ret
On ia32 instrtuction set:
sub esp, 12
push DWORD PTR [esp+28]
push DWORD PTR [esp+28]
push DWORD PTR [esp+28]
push DWORD PTR [esp+28]
call __udivdi3
add esp, 28
ret

How does in assembly does assigning negative number to an unsigned int work?

I Learned About 2's Complement and unsigned and signed int. So I Decided to test my knowledge , as far as i know that a negative number is stored in 2's complement way so that addition and subtraction would not have different algorithm and circuitry would be simple.
Now If I Write
int main()
{
int a = -1 ;
unsigned int b = - 1 ;
printf("%d %u \n %d %u" , a ,a , b, b);
}
Output Comes To Be -1 4294967295 -1 4294967295 . Now , i looked at the bit pattern and various things and then i realized that -1 in 2's complement is 11111111 11111111 11111111 11111111 , so when i interpret it using %d , it gives -1 , but when i interpret using %u , it treat it as a positive number and so it gives 4294967295. I Checked the Assembly Of the code is
.LC0:
.string "%d %u \n %d %u"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], -1
mov DWORD PTR [rbp-8], -1
mov esi, DWORD PTR [rbp-8]
mov ecx, DWORD PTR [rbp-8]
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-4]
mov r8d, esi
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
Now here -1 is moved to the register both the times in unsigned and signed . What i want to know if reinterpretation is only that matters , then why do we have two types unsigned and signed , it is printf format string %d and %u that matters ?
Further what really happens when i assign negative number to a unsigned integer (I learned That The initializer converts this value from int to unsigned int. ) but in the assembly code I did not saw such a thing. So what really happens ??
And How does Machine knows when it has to do 2's complement and when not , does it see the negative sign and performs 2's complement?
I have read almost every question and answer you could think this question be duplicate of , but I could not find a satisfactory solution.

Both signed and unsigned are pieces of memory and according to operations it matters how they behave.
It doesn't make any difference when adding or subtracting because due to 2-complement the operations are exactly the same.
It matters when we compare two numbers: -1 is lower than 0 while 4294967295 obviously isn't.
About conversion - for the same size it simply takes variable content and moves it to another - so 4294967295 becomes -1. For bigger size it's first signed extended and then content is moves.
How does machine now - according the instruction we use. Machines have either different instructions for comparing signed and unsigned or they provide different flags for it (x86 has Carry for unsigned overflow and Overflow for signed overflow).
Additionally, note that C is relaxed how the signed numbers are stored, they don't have to be 2-complements. But nowadays, all common architectures store the signed like this.

There are a few differences between signed and unsigned types:
The behaviors of the operators <, <=, >, >=, /, %, and >> are all different when dealing with signed and unsigned numbers.
Compilers are not required to behave predictably if any computation on a signed value exceeds the range of its type. Even when using operators which would behave identically with signed and unsigned values in all defined cases, some compilers will behave in "interesting" fashion. For example, a compiler given x+1 > y could replace it with x>=y if x is signed, but not if x is unsigned.
As a more interesting example, on a system where "short" is 16 bits and "int" is 32 bits, a compiler given the function:
unsigned mul(unsigned short x, unsigned short y) { return x*y; }
might assume that no situation could ever arise where the product would exceed 2147483647. For example, if it saw the function invoked as unsigned x = mul(y,65535); and y was an unsigned short, it may omit code elsewhere that would only be relevant if y were greater than 37268.

It seems you seem to have missed the facts that firstly, 0101 = 5 in both signed and unsigned integer values and that secondly, you assigned a negative number to an unsigned int - something your compiler may be smart enough to realise and, therfore, correct to a signed int.
Setting an unsigned int to -5 should technically cause an error because unsigned ints can't store values under 0.

You could understand it better when you try to assign a negative value to a larger sized unsigned integer. Compiler generates the assembly code to do sign extension when transferring small size negative value to larger sized unsigned integer.
see this blog post for assembly level explanation.

Choice of signed integer representation is left to the platform. The representation applies to both negative and non-negative values - for example, if 11012 (-5) is the two's complement of 01012 (5), then 01012 (5) is also the two's complement of 11012 (-5).
The platform may or may not provide separate instructions for operations on signed and unsigned integers. For example, x86 provides different multiplication and division instructions for signed (idiv and imul) and unsigned (div and mul) integers, but uses the same addition (add) and subtraction (sub) instructions for both.
Similarly, x86 provides a single comparison (cmp) instruction for both signed and unsigned integers.
Arithmetic and comparison operations will set one or more status register flags (carry, overflow, zero, etc.). These can be used differently when dealing with words that are supposed to represent signed vs. unsigned values.
As far as printf is concerned, you're absolutely correct that the conversion specifier determines whether the bit pattern 0xFFFF is displayed as -1 or 4294967295, although remember that if the type of the argument does not match up with what the conversion specifier expects, then the behavior is undefined. Using %u to display a negative signed int may or may not give you the expected equivalent unsigned value.

How can a 16bit Processor have 4 byte sized long int?

I've problem with the size of long int on a 16-bit CPU. Looking at its architecture:
No register is more than 16-bit long. So, how come long int can have more than 16bits. In fact, according to me for any Processor, the maximum size of the data type must be the size of the general purpose register. Am I right?

Yes. In fact the C and C++ standards require that sizeof(long int) >= 4.*
(I'm assuming CHAR_BIT == 8 in this case.)
This is the same deal with 64-bit integers on 32-bit machines. The way it is implemented is to use two registers to represent the lower and upper halves.
Addition and subtraction are done as two instructions:
On x86:
Addition: add and adc where adc is "add with carry"
Subtraction: sub and sbb where sbb is "subtract with borrow"
For example:
long long a = ...;
long long b = ...;
a += b;
will compile to something like:
add eax,ebx
adc edx,ecx
Where eax and edx are the lower and upper parts of a. And ebx and ecx are the lower and upper parts of b.
Multiplication and division for double-word integers is more complicated, but it follows the same sort of grade-school math - but where each "digit" is a processor word.

No. If the machine doesn't have registers that can handle 32-bit values, it has to simulate them in software. It could do this using the same techniques that are used in any library for arbitrary precision arithmetic.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to get single-operand imul from C code - c

What do I have to write in C to get the assembler to show the imul with one operand? For example: imul %ebp

Related

How to get the mantissa of an 80-bit long double as an int on x86-64

while executing procprob, does b can be any data type?

How can I do 64 bit division on linux kernel? [duplicate]

How does in assembly does assigning negative number to an unsigned int work?

How can a 16bit Processor have 4 byte sized long int?

Categories

Resources