difference between mult, multu simulating in c - c

I'm writing a c program and decodes mips 32 bit instructions and simulates their functionality minus the bitwise part. I don't know how I should be differentiating between signed and unsigned operations here.
For instance, given registers rd and rs, i need to multiply and put the result in in rd.
for the mult instructions, it's as simple as this:
reg[rd] = reg[rs] * reg[rt];
What should the multu instruction be? Do I need to be doing bitwise operations on the contents of the registers first?
I also need to do:
-add, addu,
-div, divu
-sub, subu
Is it the same distinction in functionality for all of them?

MIPS multiply can't overflow. Its a 32x32-bit operation with a full 64-bit result.
There is a significant difference between signed and unsigned results, as well.
To simulate these easily in C, you'll need C99 integer types from <stdint.h>:
uint32_t reg1, reg2; /* Use this type for the registers, normally */
uint32_t hi, lo; /* special MIPS registers for 64-bit products and dividends */
/* Signed mult instruction: */
int64_t temp = (int64_t)(int32_t)reg1 * (int_64_t)(int32_t)reg2;
hi = (uint32_t)((temp>>32) & 0xFFFFFFFF);
lo = (uint32_t)(temp & 0xFFFFFFFF);
The intermediate casts to signed 32-bit types is done so that a cast to signed 64-bit type will sign-extend before multiplication. The unsigned multiply is similar, except no intermediate cast is needed:
/* Unsigned multu instruction: */
uint64_t tempu = (uint64_t)reg1 * (uint64_t)reg2;
hi = (uint32_t)((temp>>32) & 0xFFFFFFFF);
lo = (uint32_t)(temp & 0xFFFFFFFF);

IIRC in mips assembly the only difference between the signed and unsigned variants is whether or not they set the overflow flag. The multu then, is easier to implement. For the regular, signed, mult you will need to determine if it will overflow the target register and set the flag.

Related

Store signed 32-bit in unsigned 64-bit int

Basically, what I want is to "store" a signed 32-bit int inside (in the 32 rightmost bits) an unsigned 64-bit int - since I want to use the leftmost 32 bits for other purposes.
What I'm doing right now is a simple cast and mask:
#define packInt32(X) ((uint64_t)X | INT_MASK)
But this approach has an obvious issue: If X is a positive int (the first bit is not set), everything goes fine. If it's negative, it becomes messy.
The question is:
How to achieve the above, also supporting negative numbers, in the fastest and most-efficient way?
The "mess" you mention happens because you cast a small signed type to a large unsigned type.
During this conversion the size is adjusted first with applying sign extension. This is what causes your trouble.
You can simply cast the (signed) integer to an unsigned type of same size first. Then casting to 64 bit will not trigger sign extension:
#define packInt32(X) ((uint64_t)(uint32_t)(X) | INT_MASK)
You need to mask out any bits besides the low order 32 bits. You can do that with a bitwise AND:
#define packInt32(X) (((uint64_t)(X) & 0xFFFFFFFF) | INT_MASK)
A negative 32-bit integer will get sign-extended into 64-bits.
#include <stdint.h>
uint64_t movsx(int32_t X) { return X; }
movsx on x86-64:
movsx:
movsx rax, edi
ret
Masking out the higher 32-bits will remove cause it to be just zero-extended:
#include <stdint.h>
uint64_t mov(int32_t X) { return (uint64_t)X & 0xFFFFFFFF; }
//or uint64_t mov(int32_t X) { return (uint64_t)(uint32_t)X; }
mov on x86-64:
mov:
mov eax, edi
ret
https://gcc.godbolt.org/z/fihCmt
Neither method loses any info from the lower 32-bits, so either method is a valid way of storing a 32-bit integer into a 64-bit one.
The x86-64 code for a plain mov is one byte shorter (3 bytes vs 4). I don't think there should be much of a speed difference, but if there is one, I'd expect the plain mov to win by a tiny bit.
One option is to untangle the sign-extension and the upper value when it is read back, but that can be messy.
Another option is to construct a union with a bit-packed word. This then defers the problem to the compiler to optimise:
union {
int64_t merged;
struct {
int64_t field1:32,
field2:32;
};
};
A third option is to deal with the sign bit yourself. Store a 15-bit absolute value and a 1-bit sign. Not super-efficient, but more likely to be legal if you should ever encounter a non-2's-complement processor where negative signed values can't be safely cast to unsigned. They are rare as hens teeth, so I wouldn't worry about this myself.
Assuming that the only operation on the 64 bit value will be to convert it back to 32 (and potentially, storing/displaying it), there is no need to apply a mask. The compiler will sign extend the 32 bit attributes when casting it to 64 bit, and will pick the lowest 32 bit when casting the 64 bit value back to 32 bit.
#define packInt32(X) ((uint64_t)(X))
#define unpackInt32(X) ((int)(X))
Or better, using (inline) functions:
inline uint64_t packInt32(int x) { return ((uint64_t) x) ; }
inline int unpackInt32(uint64_t x) { return ((int) x) ; }

Rotate left and back to the right for sign extension with (signed short) cast in C

Previously, I had the following C code, through which I intended to do sign extension of variable 'sample' after a cast to 'signed short' of variable 'sample_unsigned'.
unsigned short sample_unsigned;
signed short sample;
sample = ((signed short) sample_unsigned << 4) >> 4;
In binary representation, I would expect 'sample' to have its most significant bit repeated 4 times. For instance, if:
sample_unsigned = 0x0800 (corresponding to "100000000000" in binary)
I understand 'sample' should result being:
sample = 0xF800 (corresponding to "1111100000000000" in binary)
However, 'sample' always ended being the same as 'sample_unsigned', and I had to split the assignment statement as below, which worked. Why this?
sample = ((signed short) sample_unsigned << 4);
sample >>= 4;
Your approach will not work. There is no gaurantee right shifting will preserve the sign. Even if, it would only work for 16 bit int. For >=32 bit int you have to replicate the sign manually into the upper bits, otherwise it just shifts the same data back and forth. In general, bitshifts of signed values are critical - see the [standard](http://port70.net/~nsz/c/c11/n1570.html#6.5.7 for details. Some constellations invoke undefined behaviour. It is better to avoid them and just work with unsigned integers.
For most platforms, the following works, however. It is not necessarily slower (on platforms with 16 bit int, it is likely even faster):
uint16_t usample;
int16_t ssample;
ssample = (int16_t)usample;
if ( ssample & 0x800 )
ssample |= ~0xFFF;
The cast to int16_t is implementation defined; your compiler shall specify how it is performed. For (almost?) all recent implementations no extra operation is performed. Just verify in the generated code or your compiler documentation. The logical-or relies on intX_t using 2s complement which is guaranteed by the standard - as opposed to the standard types.
On 32 bit platforms, there might be an intrinsic instruction to sign-extend (e.g. ARM Cortex-M3/4 SBFX). Or the compiler provides a builtin function. Depending on your use-case and speed requirements, it might be suitable to use them.
Update:
An alternative approach would be using a bitfield structure:
struct {
int16_t val : 12; // assuming 12 bit signed value like above
} extender;
extender.val = usample;
ssample = extender.val;
This might result in using the same assembler instructions I proposed above.
It is because (signed short) sample_unsigned is automatically converted to int as operand due to interger promotion.
sample = (signed short)((signed short) sample_unsigned << 4) >> 4;
will work as well.

C literal suffix U, UL problems

Could someone explain to me what can happen if I'll forget suffix(postfix) for constants(literals) in ANSI C?
For example I saw for bit shift operations such defines:
#define AAR_INTENSET_NOTRESOLVED_Pos (2UL) /*!< Position of NOTRESOLVED field. */
#define AAR_INTENSET_NOTRESOLVED_Msk (0x1UL << AAR_INTENSET_NOTRESOLVED_Pos) /*!< Bit mask of NOTRESOLVED field. */
#define AAR_INTENSET_NOTRESOLVED_Disabled (0UL) /*!< Interrupt disabled. */
#define AAR_INTENSET_NOTRESOLVED_Enabled (1UL) /*!< Interrupt enabled. */
#define AAR_INTENSET_NOTRESOLVED_Set (1UL) /*!< Enable interrupt on write. */
It's used in 32bit architecture. But it could be ported to 16bit or 8bit.
What can happen if postfix UL is not used and I'll use these macros for bit shift operations as it is supposed?
I just assume that e.g. in 8-bit architecture can (1<<30) leads to overflow.
EDIT: I have found nice link: http://dystopiancode.blogspot.cz/2012/08/constant-suffixes-and-prefixes-in-ansi-c.html
But is it safe to use suffixes if the code is supposed to be ported on various architectures?
For instance if suffix U represents unisgned int so for 8bit architecture it's usually 16bit but for 32bit it's 32bit variable, so 0xFFFFAAAAU is ok for 32bit compiler but not for 8bit compiler, right?
A decimal number like -1,1,2,12345678, etc. without any suffix will get the smallest type it will fit, starting with int, long, long long.
An octal or hex number like 0, 0123, 0x123, 0X123 without any suffix will get the smallest type it will fit, starting with int, unsigned, long, unsigned long, long long, unsigned long long.
The following is a potential problem should AAR_INTENSET_NOTRESOLVED_Pos exceed 31. Note: unsigned long must be at least 32 bits. It would result in 0 ** if unsigned long was 32 bits, but non-zero if longer.
(0x1UL << AAR_INTENSET_NOTRESOLVED_Pos)
The following is a similar potential problem should AAR_INTENSET_NOTRESOLVED_Pos exceed 15. 0x1 is an unsigned, which must only be at least 16 bits. Also if unsigned/int is 16 bits, the minimum, 0x1 will be int. So without explicitly using U, 0x1 could be a problem if AAR_INTENSET_NOTRESOLVED_Pos == 15. [#Matt McNabb]
(0x1 << AAR_INTENSET_NOTRESOLVED_Pos)
Bitwise shift operators
"The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined." C11dr ยง6.5.7 3
Machine width is not the key issue. An 8bit or 16bit machine could use 16, 32, etc. bit size int. Again, 16 bit is the minimum size for a compliant C compiler.
[Edit] **
I should have said " It (shifting more than 31 bits) would result in Undefined behavior, UB, if unsigned long was 32 bits."
It can break.
It might be better to include the cast in the code itself, i.e.
uint32_t something = (uint32_t) AAR_INTENSET_NOTRESOLVED_Set << 30;
This makes it work even if the #define for the constant is simply an integer.

SSE unsigned/signed subtraction of 16 bit register

I have a __m128i register (Vector A) with 16 bit values with the content:
{100,26,26,26,26,26,26,100} // A Vector
Now I subtract the vector
{82,82,82,82,82,82,82,82}
With the instruction
_mm_sub_epi16(a_vec,_mm_set1_epi16(82))
The expected result should be the following vector
{18,-56,-56,-56,-56,-56,-56,18}
But I get
{18,65480,65480,65480,65480,65480,65480,18}
How can I solve that the vector is treated as signed?
The A Vector was created by this instruction:
__m128i a_vec = _mm_srli_epi16(_mm_unpacklo_epi8(score_vec_8bit, score_vec_8bit), 8)
65480 is the same value as -56 (they are both 0xffc8 at the register level) - you're just displaying it as if it were an unsigned short.
Note that for non-saturating addition and subtraction of binary values without carry/borrow flags it really is irrelevant whether the values are signed or unsigned - hence the same instruction can be used for adding both signed and unsigned shorts - the only difference is how you subsequently interpret (or display) the result.

How can a 16bit Processor have 4 byte sized long int?

I've problem with the size of long int on a 16-bit CPU. Looking at its architecture:
No register is more than 16-bit long. So, how come long int can have more than 16bits. In fact, according to me for any Processor, the maximum size of the data type must be the size of the general purpose register. Am I right?
Yes. In fact the C and C++ standards require that sizeof(long int) >= 4.*
(I'm assuming CHAR_BIT == 8 in this case.)
This is the same deal with 64-bit integers on 32-bit machines. The way it is implemented is to use two registers to represent the lower and upper halves.
Addition and subtraction are done as two instructions:
On x86:
Addition: add and adc where adc is "add with carry"
Subtraction: sub and sbb where sbb is "subtract with borrow"
For example:
long long a = ...;
long long b = ...;
a += b;
will compile to something like:
add eax,ebx
adc edx,ecx
Where eax and edx are the lower and upper parts of a. And ebx and ecx are the lower and upper parts of b.
Multiplication and division for double-word integers is more complicated, but it follows the same sort of grade-school math - but where each "digit" is a processor word.
No. If the machine doesn't have registers that can handle 32-bit values, it has to simulate them in software. It could do this using the same techniques that are used in any library for arbitrary precision arithmetic.

Resources