Adding 32 bit signed in C - c

I have been given this problem and would like to solve it in C:
Assume you have a 32-bit processor and that the C compiler does not support long long (or long int). Write a function add(a,b) which returns c = a+b where a and b are 32-bit integers.
I wrote this code which is able to detect overflow and underflow
#define INT_MIN (-2147483647 - 1) /* minimum (signed) int value */
#define INT_MAX 2147483647 /* maximum (signed) int value */
int add(int a, int b)
{
if (a > 0 && b > INT_MAX - a)
{
/* handle overflow */
printf("Handle over flow\n");
}
else if (a < 0 && b < INT_MIN - a)
{
/* handle underflow */
printf("Handle under flow\n");
}
return a + b;
}
I am not sure how to implement the long using 32 bit registers so that I can print the value properly. Can someone help me with how to use the underflow and overflow information so that I can store the result properly in the c variable with I think should be 2 32 bit locations. I think that is what the problem is saying when it hints that that long is not supported. Would the variable c be 2 32 bit registers put together somehow to hold the correct result so that it can be printed? What action should I preform when the result over or under flows?

Since this is a homework question I'll try not to spoil it completely.
One annoying aspect here is that the result is bigger than anything you're allowed to use (I interpret the ban on long long to also include int64_t, otherwise there's really no point to it). It may be temping to go for "two ints" for the result value, but that's weird to interpret the value of. So I'd go for two uint32_t's and interpret them as two halves of a 64 bit two's complement integer.
Unsigned multiword addition is easy and has been covered many times (just search). The signed variant is really the same if the inputs are sign-extended: (not tested)
uint32_t a_l = a;
uint32_t a_h = -(a_l >> 31); // sign-extend a
uint32_t b_l = b;
uint32_t b_h = -(b_l >> 31); // sign-extend b
// todo: implement the addition
return some struct containing c_l and c_h
It can't overflow the 64 bit result when interpreted signed, obviously. It can (and should, sometimes) wrap.
To print that thing, if that's part of the assignment, first reason about which values c_h can have. There aren't many possibilities. It should be easy to print using existing integer printing functions (that is, you don't have to write a whole multiword-itoa, just handle a couple of cases).
As a hint for the addition: what happens when you add two decimal digits and the result is larger than 9? Why is the low digit of 7+6=13 a 3? Given only 7, 6 and 3, how can you determine the second digit of the result? You should be able to apply all this to base 232 as well.

First, the simplest solution that satisfies the problem as stated:
double add(int a, int b)
{
// this will not lose precision, as a double-precision float
// will have more than 33 bits in the mantissa
return (double) a + b;
}
More seriously, the professor probably expected the number to be decomposed into a combination of ints. Holding the sum of two 32-bit integers requires 33 bits, which can be represented with an int and a bit for the carry flag. Assuming unsigned integers for simplicity, adding would be implemented like this:
struct add_result {
unsigned int sum;
unsigned int carry:1;
};
struct add_result add(unsigned int a, unsigned int b)
{
struct add_result ret;
ret.sum = a + b;
ret.carry = b > UINT_MAX - a;
return ret;
}
The harder part is doing something useful with the result, such as printing it. As proposed by harold, a printing function doesn't need to do full division, it can simply cover the possible large 33-bit values and hard-code the first digits for those ranges. Here is an implementation, again limited to unsigned integers:
void print_result(struct add_result n)
{
if (!n.carry) {
// no carry flag - just print the number
printf("%d\n", n.sum);
return;
}
if (n.sum < 705032704u)
printf("4%09u\n", n.sum + 294967296u);
else if (n.sum < 1705032704u)
printf("5%09u\n", n.sum - 705032704u);
else if (n.sum < 2705032704u)
printf("6%09u\n", n.sum - 1705032704u);
else if (n.sum < 3705032704u)
printf("7%09u\n", n.sum - 2705032704u);
else
printf("8%09u\n", n.sum - 3705032704u);
}
Converting this to signed quantities is left as an exercise.

Related

Generating random 64/32/16/ and 8-bit integers in C

I'm hoping that somebody can give me an understanding of why the code works the way it does. I'm trying to wrap my head around things but am lost.
My professor has given us this code snippet which we have to use in order to generate random numbers in C. The snippet in question generates a 64-bit integer, and we have to adapt it to also generate 32-bit, 16-bit, and 8-bit integers. I'm completely lost on where to start, and I'm not necessarily asking for a solution, just on how the original snippet works, so that I can adapt it form there.
long long rand64()
{
int a, b;
long long r;
a = rand();
b = rand();
r = (long long)a;
r = (r << 31) | b;
return r;
}
Questions I have about this code are:
Why is it shifted 31 bits? I thought rand() generated a number between 0-32767 which is 16 bits, so wouldn't that be 48 bits?
Why do we say | (or) b on the second to last line?
I'm making the relatively safe assumption that, in your computer's C implementation, long long is a 64-bit data type.
The key here is that, since long long r is signed, any value with the highest bit set will be negative. Therefore, the code shifts r by 31 bits to avoid setting that bit.
The | is a logical bit operator which combines the two values by setting all of the bits in r which are set in b.
EDIT:
After reading some of the comments, I realized that my answer needs correction. rand() returns a value no more than RAND_MAX which is typically 2^31-1. Therefore, r is a 31-bit integer. If you shifted it 32 bits to the left, you'd guarantee that its 31st bit (0-up counting) would always be zero.
rand() generates a random value [0...RAND_MAX] of questionable repute - but let us set that reputation aside and assume rand() is good enough and it is a
Mersenne number (power-of-2 - 1).
Weakness to OP's code: If RAND_MAX == pow(2,31)-1, a common occurrence, then OP's rand64() only returns values [0...pow(2,62)). #Nate Eldredge
Instead, loop as many times as needed.
To find how many random bits are returned with each call, we need the log2(RAND_MAX + 1). This fortunately is easy with an awesome macro from Is there any way to compute the width of an integer type at compile-time?
#include <stdlib.h>
/* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
#define RAND_MAX_BITWIDTH (IMAX_BITS(RAND_MAX))
Example: rand_ul() returns a random value in the [0...ULONG_MAX] range, be unsigned long 32-bit, 64-bit, etc.
unsigned long rand_ul(void) {
unsigned long r = 0;
for (int i=0; i<IMAX_BITS(ULONG_MAX); i += RAND_MAX_BITWIDTH) {
r <<= RAND_MAX_BITWIDTH;
r |= rand();
}
return r;
}

C - Method for setting all even-numbered bits to 1

I was charged with the task of writing a method that "returns the word with all even-numbered bits set to 1." Being completely new to C this seems really confusing and unclear. I don't understand how I can change the bits of a number with C. That seems like a very low level instruction, and I don't even know how I would do that in Java (my first language)! Can someone please help me! This is the method signature.
int evenBits(void){
return 0;
}
Any instruction on how to do this or even guidance on how to begin doing this would be greatly appreciated. Thank you so much!
Break it down into two problems.
(1) Given a variable, how do I set particular bits?
Hint: use a bitwise operator.
(2) How do I find out the representation of "all even-numbered bits" so I can use a bitwise operator to set them?
Hint: Use math. ;-) You could make a table (or find one) such as:
Decimal | Binary
--------+-------
0 | 0
1 | 1
2 | 10
3 | 11
... | ...
Once you know what operation to use to set particular bits, and you know a decimal (or hexadecimal) integer literal to use that with in C, you've solved the problem.
You must give a precise definition of all even numbered bits. Bits are numbered in different ways on different architectures. Hardware people like to number them from 1 to 32 from the least significant to the most significant bit, or sometimes the other way, from the most significant to the least significant bit... while software guys like to number bits by increasing order starting at 0 because bit 0 represents the number 20, ie: 1.
With this latter numbering system, the bit pattern would be 0101...0101, thus a value in hex 0x555...555. If you number bits starting at 1 for the least significant bit, the pattern would be 1010...1010, in hex 0xAAA...AAA. But this representation actually encodes a negative value on current architectures.
I shall assume for the rest of this answer that even numbered bits are those representing even powers of 2: 1 (20), 4 (22), 16 (24)...
The short answer for this problem is:
int evenBits(void) {
return 0x55555555;
}
But what if int has 64 bits?
int evenBits(void) {
return 0x5555555555555555;
}
Would handle 64 bit int but would have implementation defined behavior on systems where int is smaller.
Using macros from <limits.h>, you could mask off the extra bits to handle 16, 32 and 64 bit ints:
#include <limits.h>
int evenBits(void) {
return 0x5555555555555555 & INT_MAX;
}
But this code still makes some assumptions:
int has at most 64 bits.
int has an even number of bits.
INT_MAX is a power of 2 minus 1.
These assumptions are valid for most current systems, but the C Standard allows for implementations where one or more are invalid.
So basically every other bit has to be set to one? This is why we have bitwise operations in C. Imagine a regular bitarray. What you want is the right most even bit and set it to 1(this is the number 2). Then we just use the OR operator (|) to modify our existing number. After doing that. we bitshift the number 2 places to the left (<< 2), this modifies the bit array to 1000 compared to the previous 0010. Then we do the same again and use the or operator. The code below describes it better.
#include <stdio.h>
unsigned char SetAllEvenBitsToOne(unsigned char x);
int IsAllEvenBitsOne(unsigned char x);
int main()
{
unsigned char x = 0; //char is one byte data type ie. 8 bits.
x = SetAllEvenBitsToOne(x);
int check = IsAllEvenBitsOne(x);
if(check==1)
{
printf("shit works");
}
return 0;
}
unsigned char SetAllEvenBitsToOne(unsigned char x)
{
int i=0;
unsigned char y = 2;
for(i=0; i < sizeof(char)*8/2; i++)
{
x = x | y;
y = y << 2;
}
return x;
}
int IsAllEvenBitsOne(unsigned char x)
{
unsigned char y;
for(int i=0; i<(sizeof(char)*8/2); i++)
{
y = x >> 7;
if(y > 0)
{
printf("x before: %d\t", x);
x = x << 2;
printf("x after: %d\n", x);
continue;
}
else
{
printf("Not all even bits are 1\n");
return 0;
}
}
printf("All even bits are 1\n");
return 1;
}
Here is a link to Bitwise Operations in C

What is the fastest way for calculating the sum of arbitrary large binary numbers

I can't seem to find any good literature about this. Having a BigBinaryNumber (two's complement with virtual sign bit) structure like this:
typedef unsigned char byte;
enum Sign {NEGATIVE = (-1), ZERO = 0, POSITIVE = 1};
typedef enum Sign Sign;
struct BigBinaryNumber
{
byte *number;
Sign signum;
unsigned int size;
};
typedef struct BigBinaryNumber BigBinaryNumber;
I could just go for the elementary school approach (i.e. summing individual bytes and using the carry for subsequent summing) or perhaps work with a fixed size look-up table.
Is there any good literature about the fastest method for binary summation?
The fastest method for adding numbers is your processor's existing add instruction. So long as you've got the number laid out sensibly in memory (e.g, you don't have the bit order backwards or anything), it should be pretty straightforward to load 32 bits at a time from each number, add them together natively, and get the carry:
uint32_t *word_1 = &number1.number + offset, *word_2 = &number2.number + offset;
uint32_t *word_tgt = &dest.number + offset;
uint64_t sum = *word_1 + *word_2 + carry; // note the type!
*word_tgt = (uint32_t) sum; // truncate
carry = sum >> 32;
Note that you might have to add some special cases for dealing with the last byte in the number (or make sure that *number always has a multiple of 4 bytes allocated).
If you're using a 64-bit CPU, you may be able to extend this to work with uint64_t. There's no uint128_t for the overflow, though, so you might have to use some trickery to get the carry bit.
The "trick" is to use the native (or maybe larger) integer size.
#duskwuff is on the money to walk through number, multiple bytes at a time.
As with all "What is the fastest way ...", candidate solutions should be profiled.
Follows is a one type solution so one could use the largest type as well, the native type or any 1 type. e.g. uintmax_t or unsigned. The carry is partiality handled via code and carry generation depends on testing if addition will overflow.
typedef unsigned MyInt;
#define MyInt_MAX UINT_MAX
void Add(BigBinaryNumber *a, BigBinaryNumber *b, BigBinaryNumber *sum) {
// Assume same size for a, b, sum.
// Assume memory allocated for sum.
// Assume a->size is a multiple of sizeof(MyInt);
// Assume a->number endian is little and matches platform endian.
// Assume a->alignment matches MyInt alignment.
unsigned int size = a->size;
MyInt* ap = a->number;
MyInt* bp = b->number;
MyInt* sump = sum->number;
int carry = 0;
while (size > 0) {
size -= sizeof(MyInt);
if (carry) {
if (*ap <= (MyInt_MAX - 1 - *bp)) {
carry = 0;
}
*sump++ = *ap++ + *bp++ + 1;
}
else {
if (*ap > (MyInt_MAX - *bp)) {
carry = 1;
}
*sump++ = *ap++ + *bp++;
}
} // end while
// Integer overflow/underflow handling not shown,
// but depend on carry, and the sign of a, b
// Two's complement sign considerations not shown.
}

Why is my bitwise division function producing a segmentation fault?

My code is below, and it works for most inputs, but I've noticed that for very large numbers(2147483647 divided by 2 for a specific example), I get a segmentation fault and the program stops working. Note that the badd() and bsub() functions simply add or subtract integers respectively.
unsigned int bdiv(unsigned int dividend, unsigned int divisor){
int quotient = 1;
if (divisor == dividend)
{
return 1;
}
else if (dividend < divisor)
{ return -1; }// this represents dividing by zero
quotient = badd(quotient, bdiv(bsub(dividend, divisor), divisor));
return quotient;
}
I'm also having a bit of trouble with my bmult() function. It works for some values, but the program fails for values such as -8192 times 3. This function is also listed. Thanks in advance for any help. I really appreciate it!
int bmult(int x,int y){
int total=0;
/*for (i = 31; i >= 0; i--)
{
total = total << 1;
if(y&1 ==1)
total = badd(total,x);
}
return total;*/
while (x != 0)
{
if ((x&1) != 0)
{
total = badd(total, y);
}
y <<= 1;
x >>= 1;
}
return total;
}
The problem with your bdiv is most likely resulting from recursion depth. In the example you gave, you will be putting about 1073741824 frames on to the stack, basically using up your allotted memory.
In fact, there is no real reason this function need be recursive. I could quite easily be converted to an iterative solution, alleviating the stack issue.
In the multiplication, this line is going to overflow and truncate y, and so badd() will be getting wrong inputs:
y<<=1;
This line:
x>>=1;
Is not going to work for negative x well. Most compilers will do a so-called arithmetic shift here, which is like a regular shift with 0 shifted into the most significant bit, but with a twist, the most significant bit will not change. So, shifting any negative value right will eventually give you -1. And -1 shifted right will remain -1, resulting in an infinite loop in your multiplication.
You should not be using the algorithm for multiplication of unsigned integers to multiply signed integers. It's unlikely to work well (if at all) if it uses signed types in its core.
If you want to multiply signed integers, you can first implement multiplication for unsigned ones, using unsigned types. And then you can actually use it for signed multiplication. This will work on virtually all systems because they use 2's complement representation of signed integers.
Examples (assuming 16-bit 2's complement integers):
-1 * +1 -> 0xFFFF * 1 = 0xFFFF -> convert back to signed -> -1
-1 * -1 -> 0xFFFF * 0xFFFF = 0xFFFE0001 -> truncate to 16 bits & convert to signed -> 1
In the division the following two lines
else if (dividend < divisor)
{ return -1; }// this represents dividing by zero
Are plain wrong. Think, how much is 1/2? It's 0, not -1 or (unsigned int)-1.
Further, how much is UINT_MAX/1? It's UINT_MAX. So, when your division function returns UINT_MAX or (unsigned int)-1 you won't be able to tell the difference, because the two values are the same. You really should use a different mechanism to notify the caller of the overflow.
Oh, and of course, this line:
quotient = badd(quotient, bdiv(bsub(dividend, divisor), divisor));
is going to cause a stack overflow when the quotient is expected to be big. Don't do this recursively. At the very least, use a loop instead.

How to treat a struct with two unsigned shorts as if it were an unsigned int? (in C)

I created a structure to represent a fixed-point positive number. I want the numbers in both sides of the decimal point to consist 2 bytes.
typedef struct Fixed_t {
unsigned short floor; //left side of the decimal point
unsigned short fraction; //right side of the decimal point
} Fixed;
Now I want to add two fixed point numbers, Fixed x and Fixed y. To do so I treat them like integers and add.
(Fixed) ( (int)x + (int)y );
But as my visual studio 2010 compiler says, I cannot convert between Fixed and int.
What's the right way to do this?
EDIT: I'm not committed to the {short floor, short fraction} implementation of Fixed.
You could attempt a nasty hack, but there's a problem here with endian-ness. Whatever you do to convert, how is the compiler supposed to know that you want floor to be the most significant part of the result, and fraction the less significant part? Any solution that relies on re-interpreting memory is going to work for one endian-ness but not another.
You should either:
(1) define the conversion explicitly. Assuming short is 16 bits:
unsigned int val = (x.floor << 16) + x.fraction;
(2) change Fixed so that it has an int member instead of two shorts, and then decompose when required, rather than composing when required.
If you want addition to be fast, then (2) is the thing to do. If you have a 64 bit type, then you can also do multiplication without decomposing: unsigned int result = (((uint64_t)x) * y) >> 16.
The nasty hack, by the way, would be this:
unsigned int val;
assert(sizeof(Fixed) == sizeof(unsigned int)) // could be a static test
assert(2 * sizeof(unsigned short) == sizeof(unsigned int)) // could be a static test
memcpy(&val, &x, sizeof(unsigned int));
That would work on a big-endian system, where Fixed has no padding (and the integer types have no padding bits). On a little-endian system you'd need the members of Fixed to be in the other order, which is why it's nasty. Sometimes casting through memcpy is the right thing to do (in which case it's a "trick" rather than a "nasty hack"). This just isn't one of those times.
If you have to you can use a union but beware of endian issues. You might find the arithmetic doesn't work and certainly is not portable.
typedef struct Fixed_t {
union {
struct { unsigned short floor; unsigned short fraction };
unsigned int whole;
};
} Fixed;
which is more likely (I think) to work big-endian (which Windows/Intel isn't).
Some magic:
typedef union Fixed {
uint16_t w[2];
uint32_t d;
} Fixed;
#define Floor w[((Fixed){1}).d==1]
#define Fraction w[((Fixed){1}).d!=1]
Key points:
I use fixed-size integer types so you're not depending on short being 16-bit and int being 32-bit.
The macros for Floor and Fraction (capitalized to avoid clashing with floor() function) access the two parts in an endian-independent way, as foo.Floor and foo.Fraction.
Edit: At OP's request, an explanation of the macros:
Unions are a way of declaring an object consisting of several different overlapping types. Here we have uint16_t w[2]; overlapping uint32_t d;, making it possible to access the value as 2 16-bit units or 1 32-bit unit.
(Fixed){1} is a compound literal, and could be written more verbosely as (Fixed){{1,0}}. Its first element (uint16_t w[2];) gets initialized with {1,0}. The expression ((Fixed){1}).d then evaluates to the 32-bit integer whose first 16-bit half is 1 and whose second 16-bit half is 0. On a little-endian system, this value is 1, so ((Fixed){1}).d==1 evaluates to 1 (true) and ((Fixed){1}).d!=1 evaluates to 0 (false). On a big-endian system, it'll be the other way around.
Thus, on a little-endian system, Floor is w[1] and Fraction is w[0]. On a big-endian system, Floor is w[0] and Fraction is w[1]. Either way, you end up storing/accessing the correct half of the 32-bit value for the endian-ness of your platform.
In theory, a hypothetical system could use a completely different representation for 16-bit and 32-bit values (for instance interleaving the bits of the two halves), breaking these macros. In practice, that's not going to happen. :-)
This is not possible portably, as the compiler does not guarantee a Fixed will use the same amount of space as an int. The right way is to define a function Fixed add(Fixed a, Fixed b).
Just add the pieces separately. You need to know the value of the fraction that means "1" - here I'm calling that FRAC_MAX:
// c = a + b
void fixed_add( Fixed* a, Fixed* b, Fixed* c){
unsigned short carry = 0;
if((int)(a->floor) + (int)(b->floor) > FRAC_MAX){
carry = 1;
c->fraction = a->floor + b->floor - FRAC_MAX;
}
c->floor = a->floor + b->floor + carry;
}
Alternatively, if you're just setting the fixed point as being at the 2 byte boundary you can do something like:
void fixed_add( Fixed* a, Fixed *b, Fixed *c){
int ia = a->floor << 16 + a->fraction;
int ib = b->floor << 16 + b->fraction;
int ic = ia + ib;
c->floor = ic >> 16;
c->fraction = ic - c->floor;
}
Try this:
typedef union {
struct Fixed_t {
unsigned short floor; //left side of the decimal point
unsigned short fraction; //right side of the decimal point
} Fixed;
int Fixed_int;
}
If your compiler puts the two short on 4 bytes, then you can use memcpy to copy your int in your struct, but as said in another answer, this is not portable... and quite ugly.
Do you really care adding separately each field in a separate method?
Do you want to keep the integer for performance reason?
// add two Fixed
Fixed operator+( Fixed a, Fixed b )
{
...
}
//add Fixed and int
Fixed operator+( Fixed a, int b )
{
...
}
You may cast any addressable type to another one by using:
*(newtype *)&var

Resources