How to compare two complex numbers? - c

In C, complex numbers are float or double and have same problem as canonical types:
#include <stdio.h>
#include <complex.h>
int main(void)
{
double complex a = 0 + I * 0;
double complex b = 1 + I * 1;
for (int i = 0; i < 10; i++) {
a += .1 + I * .1;
}
if (a == b) {
puts("Ok");
}
else {
printf("Fail: %f + i%f != %f + i%f\n", creal(a), cimag(a), creal(b), cimag(b));
}
return 0;
}
The result:
$ clang main.c
$ ./a.out
Fail: 1.000000 + i1.000000 != 1.000000 + i1.000000
I try this syntax:
a - b < DBL_EPSILON + I * DBL_EPSILON
But the compiler hate it:
main.c:24:15: error: invalid operands to binary expression ('_Complex double' and '_Complex double')
if (a - b < DBL_EPSILON + I * DBL_EPSILON) {
~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This last works fine but it’s a little fastidious:
fabs(creal(a) - creal(b)) < DBL_EPSILON && fabs(cimag(a) - cimag(b)) < DBL_EPSILON

Comparing 2 complex floating point numbers is much like comparing 2 real floating point numbers.
Comparing for exact equivalences often is insufficient as the numbers involved contain small computational errors.
So rather than if (a == b) code needs to be if (nearlyequal(a,b))
The usual approach is double diff = cabs(a - b) and then comparing diff to some small constant value like DBL_EPSILON.
This fails when a,b are large numbers as their difference may many orders of magnitude larger than DBL_EPSILON, even though a,b differ only by their least significant bit.
This fails for small numbers too as the difference between a,b may be relatively great, but many orders of magnitude smaller than DBL_EPSILON and so return true when the value are relatively quite different.
Complex numbers literally add another dimensional problem to the issue as the real and imaginary components themselves may be greatly different. Thus the best answer for nearlyequal(a,b) is highly dependent on the code's goals.
For simplicity, let us use the magnitude of the difference as compared to the average magnitude of a,b. A control constant ULP_N approximates the number of binary digits of least significance that a,b are allowed to differ.
#define ULP_N 4
bool nearlyequal(complex double a, complex double b) {
double diff = cabs(a - b);
double mag = (cabs(a) + cabs(b))/2;
return diff <= (mag * DBL_EPSILON * (1ull << ULP_N));
}

Instead of comparing the complex number components, you can compute the complex absolute value (also known as norm, modulus or magnitude) of their difference, which is the distance between the two on the complex plane:
if (cabs(a - b) < DBL_EPSILON) {
// complex numbers are close
}
Small complex numbers will appear to be close to zero even if there is no precision issue, a separate issue that is also present for real numbers.

Since complex numbers are represented as floating point numbers, you have to deal with their inherent imprecision. Floating point numbers are "close enough" if they're within the machine epsilon.
The usual way is to subtract them, take the absolute value, and see if it's close enough.
#include <complex.h>
#include <stdbool.h>
#include <float.h>
static inline bool ceq(double complex a, double complex b) {
return cabs(a-b) < DBL_EPSILON;
}

Related

Underflow error in floating point arithmetic in C

I am new to C, and my task is to create a function
f(x) = sqrt[(x^2)+1]-1
that can handle very large numbers and very small numbers. I am submitting my script on an online interface that checks my answers.
For very large numbers I simplify the expression to:
f(x) = x-1
By just using the highest power. This was the correct answer.
The same logic does not work for smaller numbers. For small numbers (on the order of 1e-7), they are very quickly truncated to zero, even before they are squared. I suspect that this has to do with floating point precision in C. In my textbook, it says that the float type has smallest possible value of 1.17549e-38, with 6 digit precision. So although 1e-7 is much larger than 1.17e-38, it has a higher precision, and is therefore rounded to zero. This is my guess, correct me if I'm wrong.
As a solution, I am thinking that I should convert x to a long double when x < 1e-6. However when I do this, I still get the same error. Any ideas? Let me know if I can clarify. Code below:
#include <math.h>
#include <stdio.h>
double feval(double x) {
/* Insert your code here */
if (x > 1e299)
{;
return x-1;
}
if (x < 1e-6)
{
long double g;
g = x;
printf("x = %Lf\n", g);
long double a;
a = pow(x,2);
printf("x squared = %Lf\n", a);
return sqrt(g*g+1.)- 1.;
}
else
{
printf("x = %f\n", x);
printf("Used third \n");
return sqrt(pow(x,2)+1.)-1;
}
}
int main(void)
{
double x;
printf("Input: ");
scanf("%lf", &x);
double b;
b = feval(x);
printf("%f\n", b);
return 0;
}
For small inputs, you're getting truncation error when you do 1+x^2. If x=1e-7f, x*x will happily fit into a 32 bit floating point number (with a little bit of error due to the fact that 1e-7 does not have an exact floating point representation, but x*x will be so much smaller than 1 that floating point precision will not be sufficient to represent 1+x*x.
It would be more appropriate to do a Taylor expansion of sqrt(1+x^2), which to lowest order would be
sqrt(1+x^2) = 1 + 0.5*x^2 + O(x^4)
Then, you could write your result as
sqrt(1+x^2)-1 = 0.5*x^2 + O(x^4),
avoiding the scenario where you add a very small number to 1.
As a side note, you should not use pow for integer powers. For x^2, you should just do x*x. Arbitrary integer powers are a little trickier to do efficiently; the GNU scientific library for example has a function for efficiently computing arbitrary integer powers.
There are two issues here when implementing this in the naive way: Overflow or underflow in intermediate computation when computing x * x, and substractive cancellation during final subtraction of 1. The second issue is an accuracy issue.
ISO C has a standard math function hypot (x, y) that performs the computation sqrt (x * x + y * y) accurately while avoiding underflow and overflow in intermediate computation. A common approach to fix issues with subtractive cancellation is to transform the computation algebraically such that it is transformed into multiplications and / or divisions.
Combining these two fixes leads to the following implementation for float argument. It has an error of less than 3 ulps across all possible inputs according to my testing.
/* Compute sqrt(x*x+1)-1 accurately and without spurious overflow or underflow */
float func (float x)
{
return (x / (1.0f + hypotf (x, 1.0f))) * x;
}
A trick that is often useful in these cases is based on the identity
(a+1)*(a-1) = a*a-1
In this case
sqrt(x*x+1)-1 = (sqrt(x*x+1)-1)*(sqrt(x*x+1)+1)
/(sqrt(x*x+1)+1)
= (x*x+1-1) / (sqrt(x*x+1)+1)
= x*x/(sqrt(x*x+1)+1)
The last formula can be used as an implementation. For vwry small x sqrt(x*x+1)+1 will be close to 2 (for small enough x it will be 2) but we don;t loose precision in evaluating it.
The problem isn't with running into the minimum value, but with the precision.
As you said yourself, float on your machine has about 7 digits of precision. So let's take x = 1e-7, so that x^2 = 1e-14. That's still well within the range of float, no problems there. But now add 1. The exact answer would be 1.00000000000001. But if we only have 7 digits of precision, this gets rounded to 1.0000000, i.e. exactly 1. So you end up computing sqrt(1.0)-1 which is exactly 0.
One approach would be to use the linear approximation of sqrt around x=1 that sqrt(x) ~ 1+0.5*(x-1). That would lead to the approximation f(x) ~ 0.5*x^2.

Floating-point arithemtic in C: epsilon comparison

I'm trying to compare values with double precision using epsilon. However, I have a problem - initially I have thought that the difference should be equal to the epsilon, but it isn't. Additionally, when I've tried to check the binary representation using the successive multiplication something strange has happened and I feel confused, therefore I would appreciate your explanation to the problem and comments on my way of thinking
#include <stdio.h>
#define EPSILON 1e-10
void double_equal(double a, double b) {
printf("a: %.12f, b: %.12f, a - b = %.12f\n", a, b, a - b);
printf("a: %.12f, b: %.12f, b - a = %.12f\n", a, b, b - a);
if (a - b < EPSILON) printf("a - b < EPSILON\n");
if (a - b == EPSILON) printf("a - b == EPSILON\n");
if (a - b <= EPSILON) printf("a - b <= EPSILON\n");
if (b - a <= EPSILON) printf("b - a <= EPSILON\n");
}
int main(void) {
double wit1 = 1.0000000001;
double wit2 = 1.0;
double_equal(wit1, wit2);
return 0;
}
The output is:
a: 1.000000000100, b: 1.000000000000, a - b = 0.000000000100
a: 1.000000000100, b: 1.000000000000, b - a = -0.000000000100
b - a <= EPSILON
Numeric constants in C are declared as doubles if we don't provide "F"/"f" sign right after the number (#define EPSILON 1e-10F), therefore I can't see here the problem of conversion as in this question. Therefore, I have created really simple program for THESE SPECIFIC examples (I know it should include handling converting integral parts to binary numbers).
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
char* convert(double a) {
char* res = malloc(200);
int count = 0;
double integral;
a = modf(a, &integral);
if (integral == 1) {
res[count++] = integral + '0';
res[count++] = '.';
} else {
res[count++] = '0';
res[count++] = '.';
}
while(a != 0 && count < 200) {
printf("%.100f\n", a);
a *= 2;
a = modf(a, &integral);
if (integral == 1) res[count++] = integral + '0';
else res[count++] = '0';
}
res[count] = '\0';
return res;
}
int main(void) {
double wit1 = 1.0000000001;
double diff = 0.0000000001;
char* res = convert(wit1);
char* di = convert(diff);
printf("this: %s\n", res);
printf("diff: %s\n", di);
return 0;
}
Direct output:
this: 1.0000000000000000000000000000000001101101111100111
diff: 0.00000000000000000000000000000000011011011111001101111111011001110101111011110110111011
First question: why there are so many ending zero-ones in the difference? Why do the results after the binary point differ?
However, if we look at the process of calculation and the fractional part, printed out (I'm presenting only the first few lines):
1.0000000001:
0.0000000001000000082740370999090373516082763671875000000000000000000000000000000000000000000000000000
0.0000000002000000165480741998180747032165527343750000000000000000000000000000000000000000000000000000
0.0000000004000000330961483996361494064331054687500000000000000000000000000000000000000000000000000000
0.0000000001:
0.0000000001000000000000000036432197315497741579165547065599639608990401029586791992187500000000000000
0.0000000002000000000000000072864394630995483158331094131199279217980802059173583984375000000000000000
0.0000000004000000000000000145728789261990966316662188262398558435961604118347167968750000000000000000
Second question: why there are so many strange ending numbers? Is this a result of the incapability of the floating-point arithmetic of precisely representing decimal values?
Analyzing the subtraction, I can see, why the result is bigger than the epsilon. I follow the procedure:
Prepare a complement sequence of zero-ones for the sequence to subtract
"Add" the sequences
Subtract the one in the beginning, add it to the rightmost bit
Therefore:
1.0000000000000000000000000000000001101101111100111
- 1.0000000000000000000000000000000000000000000000000
|
\/
1.0000000000000000000000000000000001101101111100111
"+"0.1111111111111111111111111111111111111111111111111
--------------------------------------------------------
10.0000000000000000000000000000000001101101111100110
|
\/
0.0000000000000000000000000000000001101101111100111
Comparing with the calculated value of epsilon:
0.000000000000000000000000000000000110110111110011 0 1111111011001110101111011110110111011
0.000000000000000000000000000000000110110111110011 1
Spaces indicate the difference.
Third question: do I have to worry if I can't compare the value equal to the epsilon? I think that this situation indicates what the interval of tolerance with epsilon has been made for. However, is there anything I should change?
Why do the results after the binary point differ?
Because that is the difference.
Expecting something else comes from thinking 1.0000000001 and 0.0000000001 as double have those 2 values. They do not. Their difference is not 1.0. They have values near those two, each with about 53 binary digits of significance. Their difference is close to the unit in the last place of 1.0000000001.
why there are so many strange ending numbers? Is this a result of the incapability of the floating-point arithmetic of precisely representing decimal values?
Somewhat.
double can encode about 264 different numbers. 1.0000000001 and 0.0000000001 are not in that set. Instead nearby ones are used that look like strange ending numbers.
do I have to worry if I can't compare the value equal to the epsilon? I think that this situation indicates what the interval of tolerance with epsilon has been made for. However, is there anything I should change?
Yes, change use of epsilon. epsilon is useful for the relative difference, not absolute one. Very large consecutive double values are far more than epsilon apart. About 45% of all double, (the small ones) are all less than epsilon in magnitude. Either if (a - b <= EPSILON) printf("a - b <= EPSILON\n"); or if (b - a <= EPSILON) printf("b - a <= EPSILON\n"); will be true for small a, b even though they are trillions of times different in magnitude.
Oversimplification:
if (fabs(a-b) < EPSILON*fabs(a + b)) {
return values_a_b_are_near_each_other;
}
This answer assumes your C implementation uses IEEE-754 binary64, also known as the “double” format for its double type. This is common.
If the C implementation rounds correctly, then double wit1 = 1.0000000001; initializes wit1 to 1.0000000001000000082740370999090373516082763671875. This is because the two representable values nearest 1.0000000001 are 1.000000000099999786229432174877729266881942749023437500000000000000000000000000000000000000000000000 and 1.0000000001000000082740370999090373516082763671875. The latter is chosen since it is closer.
If correctly rounded, the 1e-10 used for EPSILON will produce 0.000000000100000000000000003643219731549774157916554706559963960899040102958679199218750000000000000.
Clearly wit1 - 1 exceeds EPSILON, so the test a - b < EPSILON in double_equal evaluates as false.
First question: why there are so many ending zero-ones in the difference?
Count the number of bits from the first 1 to the last 1. In each number, there are 53. That is because there are 53 bits in the significand of a double. It is a bit of a coincidence your numbers happened to end in a 1 bit. About half the time, the trailing bit is 0, and a quarter of the time, the last two bits are zeros, and so on. However, since there are 53 bits in the significand of a double, there will be exactly 53 bits from the first 1 bit to the last bit that is part of the represented value.
Since your first number starts with a 1 in the integer position, it has at most 52 bits after that. At that point, the number must be rounded to the nearest representable value.
Since your second number is between 2−34 and 2−33, its first 1 bit is in the 2−34 position, and it can go to the 2−86 position before it has to be rounded.
Third question: do I have to worry if I can't compare the value equal to the epsilon?
Why do you want to compare to the epsilon? There is no general solution for comparing floating-point numbers that contain errors from previous operations. Whether or not an “epsilon comparison” can or should be used is dependent on the application and the operations and numbers involved.

Questions about float characteristics [duplicate]

This question already has answers here:
Why are floating point numbers inaccurate?
(5 answers)
Closed 5 years ago.
Q1: For what reason isn't it recommended to compare floats by == or != like in V1?
Q2: Does fabs() in V2 work the same way, like I programmed it in V3?
Q3: Is it ok to use (x >= y) and (x <= y)?
Q4: According to Wikipedia float has a precision between 6 and 9 digits, in my case 7 digits. So on what does it depend, which precision between 6 and 9 digits my float has? See [1]
[1] float characteristics
Source: Wikipedia
Type | Size | Precision | Range
Float | 4Byte ^= 32Bits | 6-9 decimal digits | (2-2^23)*2^127
Source: tutorialspoint
Type | Size | Precision | Range
Float | 4Byte ^= 32Bits | 6 decimal digits | 1.2E-38 to 3.4E+38
Source: chortle
Type | Size | Precision | Range
Float | 4Byte ^= 32Bits | 7 decimal digits | -3.4E+38 to +3.4E+38
The following three codes produce the same result, still it is not recommended to use the first variant.
1. Variant
#include <stdio.h> // printf() scanf()
int main()
{
float a = 3.1415926;
float b = 3.1415930;
if (a == b)
{
printf("a(%+.7f) == b(%+.7f)\n", a, b);
}
if (a != b)
{
printf("a(%+.7f) != b(%+.7f)\n", a, b);
}
return 0;
}
V1-Output:
a(+3.1415925) != b(+3.1415930)
2. Variant
#include <stdio.h> // printf() scanf()
#include <float.h> // FLT_EPSILON == 0.0000001
#include <math.h> // fabs()
int main()
{
float x = 3.1415926;
float y = 3.1415930;
if (fabs(x - y) < FLT_EPSILON)
{
printf("x(%+.7f) == y(%+.7f)\n", x, y);
}
if (fabs(x - y) > FLT_EPSILON)
{
printf("x(%+.7f) != y(%+.7f)\n", x, y);
}
return 0;
}
V2-Output:
x(+3.1415925) != y(+3.1415930)
3. Variant:
#include <stdio.h> // printf() scanf()
#include <float.h> // FLT_EPSILON == 0.0000001
#include <stdlib.h> // abs()
int main()
{
float x = 3.1415926;
float y = 3.1415930;
const int FPF = 10000000; // Float_Precission_Factor
if ((float)(abs((x - y) * FPF)) / FPF < FLT_EPSILON) // if (x == y)
{
printf("x(%+.7f) == y(%+.7f)\n", x, y);
}
if ((float)(abs((x - y) * FPF)) / FPF > FLT_EPSILON) // if (x != y)
{
printf("x(%+.7f) != y(%+.7f)\n", x, y);
}
return 0;
}
V3-Output:
x(+3.1415925) != y(+3.1415930)
I am grateful for any help, links, references and hints!
When working with floating-point operations, almost every step may introduce a small rounding error. Convert a number from decimal in the source code to the floating-point format? There is a small error, unless the number is exactly representable. Add two numbers? Their exact sum often has more bits than fit in the floating-point format, so it has to be rounded to fit. The same is true for multiplication and division. Take a square root? The result is usually irrational and cannot be represented in the floating-point format, so it is rounded. Call the library to get the cosine or the logarithm? The exact result is usually irrational, so it is rounded. And most math libraries have some additional error as well, because calculating those functions very precisely is hard.
So, let’s say you calculate some value and have a result in x. It has a variety of errors incorporated into it. And you calculate another value and have a result in y. Suppose that, if calculated with exact mathematics, these two values would be equal. What is the chance that the errors in x and y are exactly the same?
It is unlikely. If x and y were calculated in different ways, they experienced different errors, and it is essentially chance whether they have the same total error or not. Therefore, even if the exact mathematical results would be equal, x == y may be false because of the errors.
Similarly, two exact mathematical values might be different, but the errors might coincide so that x == y returns true.
Therefore x == y and x != y generally cannot be used to tell if the desired exact mathematical values are equal or not.
What can be used? Unfortunately, there is no general solution to this. Your examples use FLT_EPSILON as an error threshold, but that is not useful. After doing more than a few floating-point operations, the error may easily accumulated to be more than FLT_EPSILON, either as an absolute error or a relative error.
In order to make a comparison, you need to have some knowledge about how large the accumulated error might be, and that depends greatly on the particular calculations you have performed. You also need to know what the consequences of false positives and false negatives are—is it more important to avoid falsely stating two things are equal or to avoid falsely stating two things are unequal? These issues are specific to each algorithm and its data.
Because on 64 bit machine you will find out that 0.1*3 = 0.30000000000000004 :-)
See the links #yano and #PM-77-1 provided as comments.
You know machine stores everything using 0 and 1.
Also know that not every floating point value is representable in binary within a limited bits.
Computers stores possible nearest representable binary of the given numbers.
So their is a difference between 2.0000001 and 2.0000000 in the eye of computer (but we say they are equal!).
Not always this trouble appears, but it is risky.

float vs double comparison [duplicate]

This question already has answers here:
Comparing float and double
(3 answers)
Closed 7 years ago.
int main(void)
{
  float me = 1.1;  
double you = 1.1;   
if ( me == you ) {
printf("I love U");
} else {
printf("I hate U");
}
}
This prints "I hate U". Why?
Floats use binary fraction. If you convert 1.1 to float, this will result in a binary representation.
Each bit right if the binary point halves the weight of the digit, as much as for decimal, it divides by ten. Bits left of the point double (times ten for decimal).
in decimal: ... 0*2 + 1*1 + 0*0.5 + 0*0.25 + 0*0.125 + 1*0.0625 + ...
binary: 0 1 . 0 0 0 1 ...
2's exp: 1 0 -1 -2 -3 -4
(exponent to the power of 2)
Problem is that 1.1 cannot be converted exactly to binary representation. For double, there are, however, more significant digits than for float.
If you compare the values, first, the float is converted to double. But as the computer does not know about the original decimal value, it simply fills the trailing digits of the new double with all 0, while the double value is more precise. So both do compare not equal.
This is a common pitfall when using floats. For this and other reasons (e.g. rounding errors), you should not use exact comparison for equal/unequal), but a ranged compare using the smallest value different from 0:
#include "float.h"
...
// check for "almost equal"
if ( fabs(fval - dval) <= FLT_EPSILON )
...
Note the usage of FLT_EPSILON, which is the aforementioned value for single precision float values. Also note the <=, not <, as the latter will actually require exact match).
If you compare two doubles, you might use DBL_EPSILON, but be careful with that.
Depending on intermediate calculations, the tolerance has to be increased (you cannot reduce it further than epsilon), as rounding errors, etc. will sum up. Floats in general are not forgiving with wrong assumptions about precision, conversion and rounding.
Edit:
As suggested by #chux, this might not work as expected for larger values, as you have to scale EPSILON according to the exponents. This conforms to what I stated: float comparision is not that simple as integer comparison. Think about before comparing.
In short, you should NOT use == to compare floating points.
for example
float i = 1.1; // or double
float j = 1.1; // or double
This argument
(i==j) == true // is not always valid
for a correct comparison you should use epsilon (very small number):
(abs(i-j)<epsilon)== true // this argument is valid
The question simplifies to why do me and you have different values?
Usually, C floating point is based on a binary representation. Many compilers & hardware follow IEEE 754 binary32 and binary64. Rare machines use a decimal, base-16 or other floating point representation.
OP's machine certainly does not represent 1.1 exactly as 1.1, but to the nearest representable floating point number.
Consider the below which prints out me and you to high precision. The previous representable floating point numbers are also shown. It is easy to see me != you.
#include <math.h>
#include <stdio.h>
int main(void) {
float me = 1.1;
double you = 1.1;
printf("%.50f\n", nextafterf(me,0)); // previous float value
printf("%.50f\n", me);
printf("%.50f\n", nextafter(you,0)); // previous double value
printf("%.50f\n", you);
1.09999990463256835937500000000000000000000000000000
1.10000002384185791015625000000000000000000000000000
1.09999999999999986677323704498121514916420000000000
1.10000000000000008881784197001252323389053300000000
But it is more complicated: C allows code to use higher precision for intermediate calculations depending on FLT_EVAL_METHOD. So on another machine, where FLT_EVAL_METHOD==1 (evaluate all FP to double), the compare test may pass.
Comparing for exact equality is rarely used in floating point code, aside from comparison to 0.0. More often code uses an ordered compare a < b. Comparing for approximate equality involves another parameter to control how near. #R.. has a good answer on that.
Because you are comparing two Floating point!
Floating point comparison is not exact because of Rounding Errors. Simple values like 1.1 or 9.0 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Different compilers and CPU architectures store temporary results at different precisions, so results will differ depending on the details of your environment. For example:
float a = 9.0 + 16.0
double b = 25.0
if(a == b) // can be false!
if(a >= b) // can also be false!
Even
if(abs(a-b) < 0.0001) // wrong - don't do this
This is a bad way to do it because a fixed epsilon (0.0001) is chosen because it “looks small”, could actually be way too large when the numbers being compared are very small as well.
I personally use the following method, may be this will help you:
#include <iostream> // std::cout
#include <cmath> // std::abs
#include <algorithm> // std::min
using namespace std;
#define MIN_NORMAL 1.17549435E-38f
#define MAX_VALUE 3.4028235E38f
bool nearlyEqual(float a, float b, float epsilon) {
float absA = std::abs(a);
float absB = std::abs(b);
float diff = std::abs(a - b);
if (a == b) {
return true;
} else if (a == 0 || b == 0 || diff < MIN_NORMAL) {
return diff < (epsilon * MIN_NORMAL);
} else {
return diff / std::min(absA + absB, MAX_VALUE) < epsilon;
}
}
This method passes tests for many important special cases, for different a, b and epsilon.
And don't forget to read What Every Computer Scientist Should Know About Floating-Point Arithmetic!

How will you implement pow(a,b) in C ? condition follows --

without using multiplication or division operators.
You can use only add/substract operators.
A pointless problem, but solvable with the properties of logarithms:
pow(a,b) = exp( b * log(a) )
= exp( exp(log(b) + log(log(a)) )
Take care to insure that your exponential and logarithm functions are using the same base.
Yes, I know how to use a sliderule. Learning that trick will change your perspective of logarithms.
If they are integers, it's simple to turn pow (a, b) into b multiplications of a.
pow(a, b) = a * a * a * a ... ; // do this b times
And simple to turn a * a into additions
a * a = a + a + a + a + ... ; // do this a times
If you combine them, you can make pow.
First, make mult(int a, int b), then use it to make pow.
A recursive solution :
#include<stdio.h>
int multiplication(int a1, int b1)
{
if(b1)
return (a1 + multiplication(a1, b1-1));
else
return 0;
}
int pow(int a, int b)
{
if(b)
return multiplication(a, pow(a, b-1));
else
return 1;
}
int main()
{
printf("\n %d", pow(5, 4));
}
You've already gotten answers purely for FP and purely for integers. Here's one for a FP number raised to an integer power:
double power(double x, int y) {
double z = 1.0;
while (y > 0) {
while (!(y&1)) {
y >>= 2;
x *= x;
}
--y;
z = x * z;
}
return z;
}
At the moment this uses multiplication. You can implement multiplication using only bit shifts, a few bit comparisons, and addition. For integers it looks like this:
int mul(int x, int y) {
int result = 0;
while (y) {
if (y&1)
result += x;
x <<= 1;
y >>= 1;
}
return result;
}
Floating point is pretty much the same, except you have to normalize your results -- i.e., in essence, a floating point number is 1) a significand expressed as a (usually fairly large) integer, and 2) a scale factor. If you want to produce normal IEEE floating point numbers a few parts get a bit ugly though -- for example, the scale factor is stored as a "bias" number instead of any of the usual 1's complement, 2's complement, etc., so working with it is clumsy (basically, each operation you subtract off the bias, do the operation, check for overflow, and (assuming it hasn't overflowed) add the bias back on again).
Doing the job without any kind of logical tests sounds (to me) like it probably wasn't really intended. For quite a few computer architecture classes, it's interesting to reduce a problem to primitive operations you can express directly in hardware (e.g., bit shifts, bitwise-AND, -OR and -NOT, etc.) The implementation shown above fits that reasonably well (if you want to get technical, an adder takes a few gates, but VHDL, Verilog, etc., but it's included in things like VHDL and Verilog anyway).

Resources