32-bit multiplication through 16-bit shifting - c

I am writing a soft-multiplication function call using shifting and addition. The existing function call goes like this:
unsigned long __mulsi3 (unsigned long a, unsigned long b) {
unsigned long answer = 0;
while(b)
{
if(b & 1) {
answer += a;
};
a <<= 1;
b >>= 1;
}
return answer;
}
Although my hardware does not have a multiplier, I have a hard shifter. The shifter is able to shift up to 16 bits at one time.
If I want to make full use of my 16-bit shifter. Any suggestions on how can I adapt the code above to reflect my hardware's capabilities? The given code shifts only 1-bit per iteration.
The 16-bit shifter can shift 32-bit unsigned long values up to 16 places at a time. The sizeof(unsigned long) == 32 bits

The ability to shift multiple bits is not going to help much, unless you have a hardware multiply, say 8-bit x 8-bit, or you can afford some RAM/ROM to do (say) a 4-bit by 4-bit multiply by lookup.
The straightforward shift and add (as you are doing) can be helped by swapping the arguments so that the multiplier is the smaller.
If your machine is faster doing 16 bit things in general, then treating your 32-bit 'a' as 'a1:a0' 16-bits at a time, and similarly 'b', you just might be able to same some cycles. Your result is only 32-bits, so you don't need to do 'a1 * b1' -- though one or both of those may be zero, so the win may not be big! Also, you only need the ls 16-bits of 'a0 * b1', so that can be done entirely 16-bits -- but if b1 (assuming b <= a) is generally zero this is not a big win, either. For 'a * b0', you need a 32-bit 'a' and 32-bit adds into 'answer', but your multiplier is 16-bits only... which may or may not help.
Skipping runs of multiplier zeros could help -- depending on processor and any properties of the multiplier.
FWIW: doing the magic 'a1*b1', '(a1-a0)*(b0-b1)', 'a0*b0' and combining the result by shifts, adds and subtracts is, in my small experience, an absolute nightmare... the signs of '(a1-a0)', '(b0-b1)' and their product have to be respected, which makes a bit of a mess of what looks like a cute trick. By the time you have finished with that and the adds and subtracts, you have to have a mighty slow multiply to make it all worth while ! When multiplying very, very long integers this may help... but there the memory issues may dominate... when I tried it, it was something of a disappointment.

Having 16-bit shifts can help you in making minor speed enhancement using the following approach:
(U1 * P + U0) * (V1 * P + V0) =
= U1 * V1 * P * P + U1 * V0 * P + U0 * V1 * P + U0 * V0 =
= U1 * V1 * (P*P+P) + (U1-U0) * (V0-V1) * P + U0 * V0 * (1-P)
provided P is a convenient power of 2 (for example, 2^16, 2^32), so multiplying to it is a fast shift. This reduces from 4 to 3 multiplications of smaller numbers, and, recursively, O(N^1.58) instead of O(N^2) for very long numbers.
This method is named Karatsubaʼs multiplication. There are more advanced versions described there.
For small numbers (e.g. 8 by 8 bits), the following method is fast, if you have enough fast ROM:
a * b = square(a+b)/4 - square(a-b)/4
if to tabulate int(square(x)/4), you'll need 1022 bytes for unsigned multiplication and 510 bytes for signed one.

The basic approach is (assuming shifting by 1) :-
Shift the top 16 bits
Set the bottom bit of the top 16 bits to the top bit of the bottom 16 bits
Shift the bottom 16 bits
Depends a bit on your hardware...
but you could try :-
assuming unsigned long is 32 bits
assuming Big Endian
then :-
union Data32
{
unsigned long l;
unsigned short s[2];
};
unsigned long shiftleft32(unsigned long valueToShift, unsigned short bitsToShift)
{
union Data32 u;
u.l = valueToShift
u.s[0] <<= bitsToShift;
u.s[0] |= (u.s[1] >> (16 - bitsToShift);
u.s[1] <<= bitsToShift
return u.l;
}
then do the same in reverse for shifting right

the code above is multiplying on the traditional way, the way we learnt in primary school :
EX:
0101
* 0111
-------
0101
0101.
0101..
--------
100011
of course you can not approach it like that if you don't have either a multiplier operator or 1-bit shifter!
though, you can do it in other ways, for example a loop :
unsigned long _mult(unsigned long a, unsigned long b)
{
unsigned long res =0;
while (a > 0)
{
res += b;
a--;
}
return res;
}
It is costy but it serves your needings, anyways you can think about other approaches if you have more constraints (like computation time ...)

Related

Generating random 64/32/16/ and 8-bit integers in C

I'm hoping that somebody can give me an understanding of why the code works the way it does. I'm trying to wrap my head around things but am lost.
My professor has given us this code snippet which we have to use in order to generate random numbers in C. The snippet in question generates a 64-bit integer, and we have to adapt it to also generate 32-bit, 16-bit, and 8-bit integers. I'm completely lost on where to start, and I'm not necessarily asking for a solution, just on how the original snippet works, so that I can adapt it form there.
long long rand64()
{
int a, b;
long long r;
a = rand();
b = rand();
r = (long long)a;
r = (r << 31) | b;
return r;
}
Questions I have about this code are:
Why is it shifted 31 bits? I thought rand() generated a number between 0-32767 which is 16 bits, so wouldn't that be 48 bits?
Why do we say | (or) b on the second to last line?
I'm making the relatively safe assumption that, in your computer's C implementation, long long is a 64-bit data type.
The key here is that, since long long r is signed, any value with the highest bit set will be negative. Therefore, the code shifts r by 31 bits to avoid setting that bit.
The | is a logical bit operator which combines the two values by setting all of the bits in r which are set in b.
EDIT:
After reading some of the comments, I realized that my answer needs correction. rand() returns a value no more than RAND_MAX which is typically 2^31-1. Therefore, r is a 31-bit integer. If you shifted it 32 bits to the left, you'd guarantee that its 31st bit (0-up counting) would always be zero.
rand() generates a random value [0...RAND_MAX] of questionable repute - but let us set that reputation aside and assume rand() is good enough and it is a
Mersenne number (power-of-2 - 1).
Weakness to OP's code: If RAND_MAX == pow(2,31)-1, a common occurrence, then OP's rand64() only returns values [0...pow(2,62)). #Nate Eldredge
Instead, loop as many times as needed.
To find how many random bits are returned with each call, we need the log2(RAND_MAX + 1). This fortunately is easy with an awesome macro from Is there any way to compute the width of an integer type at compile-time?
#include <stdlib.h>
/* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
#define RAND_MAX_BITWIDTH (IMAX_BITS(RAND_MAX))
Example: rand_ul() returns a random value in the [0...ULONG_MAX] range, be unsigned long 32-bit, 64-bit, etc.
unsigned long rand_ul(void) {
unsigned long r = 0;
for (int i=0; i<IMAX_BITS(ULONG_MAX); i += RAND_MAX_BITWIDTH) {
r <<= RAND_MAX_BITWIDTH;
r |= rand();
}
return r;
}

Flipping bytes, doing arithmetic and flipping them back again

I have a programming/math related question regarding converting between big endian and little endian and doing arithmetic.
Assume we have two integers in little endian mode:
int a = 5;
int b = 6;
//a+b = 11
Let's flip the bytes and add them again:
int a = 1280;
int b = 1536;
//a+b = 2816
Now if we flip the byte order of 2816 we get 11. So essentially we can do arithmetic computation between little endian and big endian and once converted they represent the same number?
Does this have a theory/name behind it in the computer science world?
It doesn't work if the addition involves carrying since carrying propagates right-to-left. Swapping digits doesn't mean carrying switches direction, so any bytes that overflow into the next byte will be different.
Let's look at an example in hex, pretending that endianness means each 4-bit nibble is swapped:
int a = 0x68;
int b = 0x0B;
//a+b: 0x73
int a = 0x86;
int b = 0xB0;
//a+b: 0x136
816 + B16 is 1316. That 1 is carried and adds on to the 6 in the first sum. But in the second sum it's not carried right and added to the 6, it's carried left and overflows into the third hex digit.
First, it should be noted that your assumption that int in C has 16 bits is wrong. In most modern systems int is a 32-bit type, so if we reverse (not flip, which typically means taking the complement) the bytes of 5 we'll get 83886080 (0x05000000), not 1280 (0x0500)
Is the size of C "int" 2 bytes or 4 bytes?
What does the C++ standard state the size of int, long type to be?
Also note that you should write in hex to make it easier to understand because computers don't work in decimal:
int16_t a = 0x0005;
int16_t b = 0x0006;
// a+b = 0x000B
int16_t a = 0x0500; // 1280
int16_t b = 0x0600; // 1536
//a+b = 0x0B00
OK now as others said, ntohl(htonl(5) + htonl(6)) happens to be the same as 5 + 6 just be cause you have small numbers that their reverses' sum don't overflow. Choosing larger numbers and you'll see the difference right away
However that property does hold in ones' complement for systems where values are stored in 2 smaller parts like this case
In ones' complement one does arithmetic with end-around carry by propagating the carry out back to the carry in. That makes ones' complement arithmetic endian independent if one has only one internal "carry break" (i.e. the stored value is broken into two separate chunks) because of the "circular carry"
Suppose we have xxyy and zztt then xxyy + zztt is done like this
carry
xx yy
+ zz <───── tt
──────────────
carry aa bb
│ ↑
└─────────────┘
When we reverse the chunks, yyxx + ttzz is carried the same way. Because xx, yy, zz, tt are chunks of bits of any length, it works for PDP's mixed endian, or when you store a 32-bit number in two 16-bit parts, a 64-bit number in two 32-bit parts...
For example:
0x7896 + 0x6987 = 0xE21D
0x9678 + 0x8769 = 0x11DE1 → 0x1DE1 + 1 = 0x1DE2
0x2345 + 0x9ABC = 0xBE01
0x4523 + 0xBC9A = 0x101BD → 0x01BD + 1 = 0x01BE
0xABCD + 0xBCDE = 0x168AB → 0x68AB + 1 = 0x68AC
0xCDAB + 0xDEBC = 0x1AC67 → 0xAC67 + 1 = 0xAC68
Or John Kugelman's example above: 0x68 + 0x0B = 0x73; 0x86 + 0xB0 = 0x136 → 0x36 + 1 = 0x37
The end-around carry is one of the reasons why ones' complement was chosen for TCP checksum, because you can calculate the sum in higher precision easily. 16-bit CPUs can work in 16-bit units like normal, but 32 and 64-bit CPUs can add 32 and 64-bit chunks in parallel without worrying about the carry when SIMD isn't available like the SWAR technique
This only appears to work because you happened to pick numbers that are small enough so that they as well as their sum fit into one byte. As long as everything going on in your number stays within its respective byte, you can obviously shuffle and deshuffle your bytes however you want, it won't make a difference. If you pick larger numbers, e.g., 1234 and 4321, you will notice that it won't work anymore. In fact, you will most likely end up invoking undefined behavior because your int will overflow…
Apart from all that, you will almost certainly want to read this: https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html

16bit Float Multiplication in C

I'm working on a small project, where I need float multiplication with 16bit floats (half precision). Unhappily, I'm facing some problems with the algorithm:
Example Output
1 * 5 = 5
2 * 5 = 10
3 * 5 = 14.5
4 * 5 = 20
5 * 5 = 24.5
100 * 4 = 100
100 * 5 = 482
The Source Code
const int bits = 16;
const int exponent_length = 5;
const int fraction_length = 10;
const int bias = pow(2, exponent_length - 1) - 1;
const int exponent_mask = ((1 << 5) - 1) << fraction_length;
const int fraction_mask = (1 << fraction_length) - 1;
const int hidden_bit = (1 << 10); // Was 1 << 11 before update 1
int float_mul(int f1, int f2) {
int res_exp = 0;
int res_frac = 0;
int result = 0;
int exp1 = (f1 & exponent_mask) >> fraction_length;
int exp2 = (f2 & exponent_mask) >> fraction_length;
int frac1 = (f1 & fraction_mask) | hidden_bit;
int frac2 = (f2 & fraction_mask) | hidden_bit;
// Add exponents
res_exp = exp1 + exp2 - bias; // Remove double bias
// Multiply significants
res_frac = frac1 * frac2; // 11 bit * 11 bit → 22 bit!
// Shift 22bit int right to fit into 10 bit
if (highest_bit_pos(res_mant) == 21) {
res_mant >>= 11;
res_exp += 1;
} else {
res_mant >>= 10;
}
res_frac &= ~hidden_bit; // Remove hidden bit
// Construct float
return (res_exp << bits - exponent_length - 1) | res_frac;
}
By the way: I'm storing the floats in ints, because I'll try to port this code to some kind of Assembler w/o float point operations later.
The Question
Why does the code work for some values only? Did I forget some normalization or similar? Or does it work only by accident?
Disclaimer: I'm not a CompSci student, it's a leisure project ;)
Update #1
Thanks to the comment by Eric Postpischil I noticed one problem with the code: the hidden_bit flag was off by one (should be 1 << 10). With that change, I don't get decimal places any more, but still some calculations are off (e.g. 3•3=20). I assume, it's the res_frac shift as descibred in the answers.
Update #2
The second problem with the code was indeed the res_frac shifting. After update #1 I got wrong results when having 22 bit results of frac1 * frac2. I've updated the code above with a the corrected shift statement. Thanks to all for every comment and answer! :)
From a cursory look:
No attempt is made to determine the location of the high bit in the product. Two 11-bit numbers, each their high bit set, may produce a 21- or 22-bit number. (Example with two-bit numbers: 102•102 is 1002, three bits, but 112•112 is 10012, four bits.)
The result is truncated instead of rounded.
Signs are ignored.
Subnormal numbers are not handled, on input or output.
11 is hardcoded as a shift amount in one place. This is likely incorrect; the correct amount will depend on how the significand is handled for normalization and rounding.
In decoding, the exponent field is shifted right by fraction_length. In encoding, it is shifted left by bits - exponent_length - 1. To avoid bugs, the same expression should be used in both places.
From a more detailed look by chux:
res_frac = frac1 * frac2 fails if int is less than 23 bits (22 for the product and one for the sign).
This is more a suggestion for how to make it easier to get your code right, rather than analysis of what is wrong with the existing code.
There are a number of steps that are common to some or all of the floating point arithmetic operations. I suggest extracting each into a function that can be written with focus on one issue, and tested separately. Then when you come to write e.g. multiplication, you only have to deal with the specifics of that operation.
All the operations will be easier working with a structure that has the actual signed exponent, and the full significand in a wider unsigned integer field. If you were dealing with signed numbers, it would also have a boolean for the sign bit.
Here are some sample operations that could be separate functions, at least until you get it working:
unpack: Take a 16 bit float and extract the exponent and significand into a struct.
pack: Undo unpack - deal with dropping the hidden bit, applying the bias the expoent, and combining them into a float.
normalize: Shift the significand and adjust the exponent to bring the most significant 1-bit to a specified bit position.
round: Apply your rounding rules to drop low significance bits. If you want to do IEEE 754 style round-to-nearest, you need a guard digit that is the most significant bit that will be dropped, and an additional bit indicating if there are any one bits of lower significance than the guard bit.
One problem is that you are truncating instead of rounding:
res_frac >>= 11; // Shift 22bit int right to fit into 10 bit
You should compute res_frac & 0x7ff first, the part of the 22-bit result that your algorithm is about to discard, and compare it to 0x400. If it is below, truncate. If it is above, round away from zero. If it is equal to 0x400, round to the even alternative.

optimizing bitwise operations

I have a unsigned integer N = abcd where a,b,c,d represents bits from msb to lsb. I want get following numbers
x1 = ab0cd
x2 = ab1cd
What is the fastest way to do it using bitwise operations in C?
What I'm trying right now is as follows
unsigned int blockid1 = N>>offset;
unsigned int key1 = (blockid<<(1+offset))|(((1<<offset)-1)&N);
unsigned int key2 = (key1)|(1<<offset);
here offset is the location where I want to insert 0 and 1.
const unsigned int mask = (~0) << offset; // same as -(2**offset)
unsigned int key1 = N + (N & mask);
unsigned int key2 = key1 - mask;
Since your input is only 4 bits wide, which means there are a total of only 16 outputs, I would recommend at least testing (i.e. implementing and profiling) a look-up table.
These days, with super-fast ALUs and slow(ish) memories, look-ups are not often faster, but it's always worth testing. A small table means it won't pollute the cache very much, which might make it faster than a sequence of arithmetic instructions.
Since your outputs are pretty small too, the complete table could be represented as:
const static uint8_t keys[16][2];
32 bytes is very small, and if you do this often (i.e. many times in a row in a tight loop), the table should fit totally in cache.
You should have a look at Jasper Neumann's pages about bit permutations. It includes an online code generator. However it may be usefully complex for your specific case (permutation of one bit if we consider the 0 or 1 to be the MSB).
Note: I let you google the adresse since it has no domain name and direct IPs are not allowed by SO.

What is the fastest way to multiply a 16-bit integer with a double?

On an 8-bit micro controller I would like to do the following:
16bit_integer = another_16bit_integer * 0.997;
with the least possible number of instructions.
How about integer arithmetic in 32 bits?
16bit_integer = (int16_t) (another_16bit_integer * (int32_t) 997 / 1000);
32 bits will be enough to store (INT16_MAX × 997), do the sum on values 1000 times larger then divide back to your 16 bit scale.
Bit shifts are usually very fast:
y = 0xFF3B * (int32_t) x >> 16;
This is probably better written as:
y = (0.997 * 0x10000) * (int32_t)x >> 16;
A good compiler will generate equivalent output.
If your integers are signed, the constants should be changed to 0x8000 and 15.
You probably meant to have some rounding in there, rather than truncating the result to an integer, otherwise the purpose of the operation is really limited.
But since you asked the question with that specific formula, it brought to mind that your result set is really coarse. For the first 333 numbers, the result is: another_16bit_integer-1. You can approximate it (maybe even exactly, when not performed in my head) with something like:
16bit_integer = another_16bit_integer - 1 - (another_16bit_integer/334);
edit: unsigned int, and you handle 0 on your own.
On my platform ( Atmel AVR 8-bit micro-controller, running gcc )
16bit_integer = another_16bit_integer * 0.997;
Takes about 26 instructions.
16bit_integer = (int16_t) (another_16bit_integer * (int32_t) 997 / 1000);
Takes about 25 instructions.
Here is a very fast way to do this operation:
a = b * 0.99609375;
It's similar to what you want, but it's much faster.
a = b;
a -= b>>8;
Or even faster using a trick that only works on little endian systems, like the PIC.
a = b;
a -= *((int8*)((&b)+1));
Off the top of my head, this comes down to the following assembler on a PIC18:
; a = b
MOVFF 0xc4, 0xc2
NOP
MOVFF 0xc5, 0xc3
NOP
; a -= *((int8*)((&b)+1));
MOVF 0xc5, w
SUBWF 0xc2, f
BTFSC STATUS, C
DECF 0xc
Precomputed lookup table:
16bit_integer = products[another_16bit_integer];
Since you are using an 8 bit processor, you can probably only handle 16 bit results, not 32 bit results. To reduce 16 bit overflow issues I would restate the formula like this:
result16 = operand16 - (operand16 * 3)/1000
This would give accurate results for unsigned integers up to 21845, or signed integers up to 10922. I am assuming the the processor can do 16 bit integer division. If you cannot then you need to do the division the hard way. Multiplying by 3 can be done by simple shifts & adds, if no multiply instruction exists or if multiplication only works with 8 bit operands.
Without knowing the exact microprocessor it is impossible to determine how long such a calculation would take.
Precomputed lookup table:
16bit_integer = products[another_16bit_integer];
That's not going to work so good on the AVR, the 16bit address space is going to be exhausted.
On my platform ( Atmel AVR 8-bit micro-controller, running gcc )
16bit_integer = another_16bit_integer * 0.997;
Takes about 26 instructions.
16bit_integer = (int16_t) (another_16bit_integer * (int32_t) 997 / 1000);
Takes about 25 instructions.
The Atmel AVR is a RISC chip, so counting instructions is a valid comparison.

Resources