I am trying to make a function that quickly calculates x^y mod z. It works well when calculating something like 2^63 mod 3, but at 2^64 mod 3 and higher exponents it just returns 0.
I am suspecting an overflow somewhere, but I can't pin it down. I have tried explicit casts at the places where calculations (* and mod) are made, I have also made my storage variables (resPow, curPow) unsigned long long int (as Suggested here) but that didn't help much.
typedef unsigned long int lint;
lint fastpow(lint nBase, lint nExp, lint nMod) {
int lastTrueBit = 0;
unsigned long long int resPow = 1ULL;
unsigned long long int curPow = nBase;
for (int i = 0; i < 32; i++) {
int currentBit = getBit(nExp, i);
if (currentBit == 1) {
for (lint j = 0; j < i - lastTrueBit; j++) {
curPow = curPow * curPow;
}
resPow =resPow * curPow;
lastTrueBit = i;
}
}
return resPow % nMod;
}
I am suspecting an overflow somewhere,
Yes, both curPow * curPow and resPow * curPow may mathematically overflow.
The usual way to contain overflow here is to perform mod on intermediate products.
// curPow = curPow * curPow;
curPow = (curPow * curPow) % nMod;
// resPow =resPow * curPow;
resPow = (resPow * curPow) % nMod;
This is sufficient when nMod < ULLONG_MAX/(nMod - 1). (The mod value is half the precision of unsigned long long). Otherwise more extreme measures are needed as in: Modular exponentiation without range restriction.
Minor stuff
for(int i = 0; i < 32; i++) assumes lint/unsigned long is 32 bits. Portable code would avoid that magic number. unsigned long is 64-bits on various platforms.
LL is not needed here. U remains useful to quiet various compiler warnings.
// unsigned long long int resPow = 1ULL;
unsigned long long int resPow = 1U;
Related
I want to create large dataset of 60M elements each of dim = 256 (later will be written to a file). I wrote the following to do so:
#define N 60000000L
#define DIM 256
int main(){
int i,j;
double tmp, *data;
data = calloc(N*DIM, sizeof(double));
tmp = 1.0 / RAND_MAX;
for (i = 0; i < N*DIM; i++) data[i] = (double) rand() * tmp - 0.5;
// TODO: Save data buffer in file
free(data);
}
But I get this error:
main.c:111:19: warning: comparison of constant 7695802368 with expression of type 'int' is always
true [-Wtautological-constant-out-of-range-compare]
for (i = 0; i < N*DIM; i++) data[i] = (double) rand() * tmp - 0.5;
~ ^ ~~~~~~~~~~
1 warning generated.
Can someone hint me please why this issue exists and how to get rid of it? Thank you
The int type on your machine cannot hold a large enough number to ever be >= 7695802368. This means your int type is 32 bits. You'll need to use a 64 bit number, long long int or probably more appropriately, unsigned long long int.
Use size_t math with memory allocations and indexing.
It is the Goldilocks type for indexing/sizing, neither too narrow, nor too wide. Keep in mind that it is some unsigned type.
//#define N 60000000L
#define N ((size_t) 60000000u)
#define DIM 256
int main(){
size_t i;// use the best type for indexing
double tmp, *data;
size_t n = N*DIM;
data = calloc(n, sizeof *data); // de-ref the pointer,
if (data) { // check allocation
tmp = 1.0 / RAND_MAX;
for (i = 0; i < n; i++) {
data[i] = (double) rand() * tmp - 0.5;
The warning explains it for you: you're testing a signed integer (that probably maxes out at 2,147,483,647, and is clearly less than the value you're testing against, considering your compiler warning) to see if its value is less than a value that's over 7 billion: it will always be true. Did you maybe mean to do i < N?
My goal is to create a integer type with a bigger size than 4 bytes, or 8 if I use long. I tried malloc to try and give more bytes in the memory for a bigger integer, but it still broke on the 31st iteration (gave a negative number). here's my code:
int main()
{
int x = 31; //(normally an int can do up to 30 without going negative so this is my test number)
int i;
int *bigNum = NULL;
bigNum = malloc((sizeof(int)*2));
*bigNum = 1;
for (i=0; i<x; i++) {
*bigNum = *bigNum * 2;
printf("%d \n", *bigNum);
}
free(bigNum);
}
Output:
2
4
...
..
...
1073741824
-2147483648
Although you have allocated more memory for your integer, no other part of the system knows this, including:
the compiler doesn't know this;
the CPU chip doesn't know this.
printf doesn't know this.
So all calculations are just carried out using the native int size.
Note that you can't tell the CPU chip you use larger integers; it is a physical/design limitation of the chip.
Dereferencing an int * gives you an int no matter how much extra memory you allocate for it.
If you want a dat type able to hold more information, try a long (although the guarantee is that it will be at least as big as an int).
If you want to handle integers beyond what your implementation provides, use a bignum library, like MPIR.
goal is to create a integer type with a bigger size
To handle multi-int integers, code also needs supporting functions for each basic operation:
int main(void) {
int x = 31;
RandBigNum *bigNum = RandBigNum_Init();
RandBigNum_Assign_int(bigNum, 1);
for (int i=0; i<x; i++) {
RandBigNum_Muliply_int(bigNum, 2);
RandBigNum_Print(bigNum);
printf(" \n");
}
Now, how might implement all this? Many approaches.
Below is a simply, incomplete and untested one. It is not necessarily a good approach, but to present an initial idea of the details needed to accomplish a big number library.
// Numbers are all positive. The first array element is the size of the number
typedef unsigned RandBigNum;
#define RandBigNum_MAXP1 (UINT_MAX + 1ull)
RandBigNum *RandBigNum_Init(void) {
return calloc(1, sizeof *RandBigNum);
}
void RandBigNum_Muliply_int(RandBigNum *x, unsigned scale) {
unsigned carry = 0;
for (unsigned i = 1; i <= x[0]; i++) {
unsigned long long product = 1ull * x[i] * scale + carry;
x[i] = product % RandBigNum_MAXP1;
carry *= product / RandBigNum_MAXP1;
}
if (carry) {
unsigned n = x[0] + 2;
x = realloc(x, sizeof *x * n); // re-alloc check omitted
x[x[0]] = carry;
x[0]++;
}
}
// many other functions
I am writing a very basic program to print the range of an unsigned long long variable in C language (0 to ((2 ^ n) - 1) where n is the number of bits for the data type in any system (with C installed in it and according to the compiler). In my system, the size of a long long variable is 8 bytes.
I am using the following code:
#include<stdio.h>
#include<math.h>
int main()
{
unsigned long long n;
//n = pow(2, 63);
//n = (n * 2) - 1;
n = pow(2, 64) - 1;
printf("\nn: %llu\n", n);
return 0;
}
upon compiling, gcc gives me the following error:
Print_long_long_int.c:10:2: warning: overflow in implicit constant conversion [-Woverflow].
On executing it, I get the correct output of
n: 18446744073709551615
But, if I remove the single comments from the lines, and use them:
n = pow(2, 63);
n = (n * 2) - 1;
Instead of:
n = pow(2, 64) - 1;
It doesn't give me any such warning. And executes normally.
Why is this discrepancy happening?
Thanks!
When you are giving
pow(2,64) -1 ;
It exceeds the limit of unsigned long long. This is the reason you are getting that warning.
Range of unsigned long is 0 to 18,446,744,073,709,551,615
Result of pow(2,64) is 18446744073709551616.
Power should return an integer, and you're getting an unsigned long long. I just made a simple power function to handle unsigned long longs.
#include<stdio.h>
unsigned long long power(int base, int exponent)
{
unsigned long long n = 1;
int i;
for (i = 0; i < exponent ; i++)
n *= base;
return n;
}
int main()
{
unsigned long long n = power(2, 64) - 1;
printf("n: %llu\n", n);
return 0;
}
I want do the two's complement of a float data.
unsigned long Temperature ;
Temperature = (~(unsigned long)(564.48))+1;
But the problem is that the cast loses information, 564 instead of 564.48.
Can i do the two's complement without a loss of information?
That is a very weird thing to do; floating-point numbers are not stored as 2s complement, so it doesn't make a lot of sense.
Anyway, you can perhaps use the good old union trick:
union {
float real;
unsigned long integer;
} tmp = { 564.48 };
tmp.integer = ~tmp.integer + 1;
printf("I got %f\n", tmp.real);
When I tried it (on ideone) it printed:
I got -0.007412
Note that this relies on unspecified behavior, so it's possible it might break if your compiler does not implement the access in the most straight-forward manner. This is distinct form undefined behavior (which would make the code invalid), but still not optimal. Someone did tell me that newer standards make it clearer, but I've not found an exact reference so ... consider yourself warned.
You can't use ~ over floats (it must be an integer type):
#include <stdio.h>
void print_binary(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char *) ptr;
unsigned char byte;
int i, j;
for (i = size - 1; i >= 0; i--) {
for (j = 7; j >= 0; j--) {
byte = b[i] & (1 << j);
byte >>= j;
printf("%u", byte);
}
}
printf("\n");
}
int main(void)
{
float f = 564.48f;
char *p = (char *)&f;
size_t i;
print_binary(sizeof(f), &f);
for (i = 0; i < sizeof(float); i++) {
p[i] = ~p[i];
}
print_binary(sizeof(f), &f);
f += 1.f;
return 0;
}
Output:
01000100000011010001111010111000
10111011111100101110000101000111
Of course print_binary is there for test the result, remove it, and (as pointed out by #barakmanos) print_binary assumes little endian, the rest of the code is not affected by endiannes:
#include <stdio.h>
int main(void)
{
float f = 564.48f;
char *p = (char *)&f;
size_t i;
for (i = 0; i < sizeof(float); i++) {
p[i] = ~p[i];
}
f += 1.f;
return 0;
}
Casting a floating-point value to an integer value changes the "bit contents" of that value.
In order to perform two's complement on the "bit contents" of a floating-point value:
float f = 564.48f;
unsigned long Temperature = ~*(unsigned long*)&f+1;
Make sure that sizeof(long) == sizeof(float), or use double instead of float.
Let me preface this with.. I have extremely limited experience with ASM, and even less with SIMD.
But it happens that I have the following MMX/SSE optimised code, that I would like to port across to AltiVec instructions for use on PPC/Cell processors.
This is probably a big ask.. Even though it's only a few lines of code, I've had no end of trouble trying to work out what's going on here.
The original function:
static inline int convolve(const short *a, const short *b, int n)
{
int out = 0;
union {
__m64 m64;
int i32[2];
} tmp;
tmp.i32[0] = 0;
tmp.i32[1] = 0;
while (n >= 4) {
tmp.m64 = _mm_add_pi32(tmp.m64,
_mm_madd_pi16(*((__m64 *)a),
*((__m64 *)b)));
a += 4;
b += 4;
n -= 4;
}
out = tmp.i32[0] + tmp.i32[1];
_mm_empty();
while (n --)
out += (*(a++)) * (*(b++));
return out;
}
Any tips on how I might rewrite this to use AltiVec instructions?
My first attempt (a very wrong attempt) looks something like this.. But it's not entirely (or even remotely) correct.
static inline int convolve_altivec(const short *a, const short *b, int n)
{
int out = 0;
union {
vector unsigned int m128;
int i64[2];
} tmp;
vector unsigned int zero = {0, 0, 0, 0};
tmp.i64[0] = 0;
tmp.i64[1] = 0;
while (n >= 8) {
tmp.m128 = vec_add(tmp.m128,
vec_msum(*((vector unsigned short *)a),
*((vector unsigned short *)b), zero));
a += 8;
b += 8;
n -= 8;
}
out = tmp.i64[0] + tmp.i64[1];
#endif
while (n --)
out += (*(a++)) * (*(b++));
return out;
}
You're not far off - I fixed a few minor problems, cleaned up the code a little, added a test harness, and it seems to work OK now:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <altivec.h>
static int convolve_ref(const short *a, const short *b, int n)
{
int out = 0;
int i;
for (i = 0; i < n; ++i)
{
out += a[i] * b[i];
}
return out;
}
static inline int convolve_altivec(const short *a, const short *b, int n)
{
int out = 0;
union {
vector signed int m128;
int i32[4];
} tmp;
const vector signed int zero = {0, 0, 0, 0};
assert(((unsigned long)a & 15) == 0);
assert(((unsigned long)b & 15) == 0);
tmp.m128 = zero;
while (n >= 8)
{
tmp.m128 = vec_msum(*((vector signed short *)a),
*((vector signed short *)b), tmp.m128);
a += 8;
b += 8;
n -= 8;
}
out = tmp.i32[0] + tmp.i32[1] + tmp.i32[2] + tmp.i32[3];
while (n --)
out += (*(a++)) * (*(b++));
return out;
}
int main(void)
{
const int n = 100;
vector signed short _a[n / 8 + 1];
vector signed short _b[n / 8 + 1];
short *a = (short *)_a;
short *b = (short *)_b;
int sum_ref, sum_test;
int i;
for (i = 0; i < n; ++i)
{
a[i] = rand();
b[i] = rand();
}
sum_ref = convolve_ref(a, b, n);
sum_test = convolve_altivec(a, b, n);
printf("sum_ref = %d\n", sum_ref);
printf("sum_test = %d\n", sum_test);
printf("%s\n", sum_ref == sum_test ? "PASS" : "FAIL");
return 0;
}
(Warning: all of my Altivec experience comes from working on Xbox360/PS3 - I'm not sure how different they are from other Altivec platforms).
First off, you should check your pointer alignment. Most vector loads (and stores) operations are expected to be from 16-byte aligned addresses. If they aren't, things will usually carry on without warning, but you won't get the data you were expecting.
It's possible (but slower) to do unaligned loads, but you basically have to read a bit before and after your data and combine them. See Apple's Altivec page. I've also done it before using an lvlx and lvrx load instructions, and then ORing them together.
Next up, I'm not sure your multiplies and adds are the same. I've never used either _mm_madd_pi16 or vec_msum, so I'm not positive they're equivalent. You should step through in a debugger and make sure they give you the same output for the same input data. Another possible difference is that they may treat overflow differently (e.g. modular vs. saturate).
Last but not least, you're computing 4 ints at a time instead of 2. So your union should hold 4 ints, and you should sum all 4 of them at the end.