Modulo optimization

Modulo optimization - c

I have a C program which does some extensive swapping operations on a large array. It has a modulo operation in its tight loop. In fact there is an integer in the range [-N|N[ with N a power of 2 and it should be wrapped to [0,N[.
Example with N=4: -4 => 0, -3 => 1, -2 => 2, -1 => 3, 0 => 0, ..., 3 => 3
At first I tried the version 1 below but was surprised that version 2 is actually notably faster even though it has a conditional expression.
Can you explain why version 2 is faster than version 1 for this special case?
Version 1:
#define N (1<<(3*5))
inline int modBitAnd(int x)
{
return (x & (N-1));
}
Runtime: 17.1 seconds (for the whole program)
Version 2:
inline int modNeg1(int x)
{
return (x < 0 ? x + N : x);
}
Runtime: 14.6 seconds (for the whole program)
Program is compiled on GCC 4.8.2. with -std=c99 -O3.
Edit:
Here is the main loop in my program:
int en(uint16_t* p, uint16_t i, uint16_t v)
{
uint16_t n1 = p[modNeg1((int)i - 1)];
uint16_t n2 = p[modBitAnd((int)i + 1)];
uint16_t n3 = p[modNeg1((int)i - C_WIDTH)];
uint16_t n4 = p[modBitAnd((int)i + C_WIDTH)];
return d(n1,v) + d(n2,v) + d(n3,v) + d(n4,v);
}
void arrange(uint16_t* p)
{
for(size_t i=0; i<10000000; i++) {
uint16_t ia = random(); // random integer [0|2^15[
uint16_t va = p[ia];
uint16_t ib = random(); // random integer [0|2^15[
uint16_t vb = p[ib];
if(en(p,ia,vb) + en(p,ib,va) < en(p,ia,va) + en(p,ib,vb)) {
p[ia] = vb;
p[ib] = va;
}
}
}
int d(uint16_t a, uint16_t b) is a distance function e.g. abs((int)a-(int)b).
This is how p is initialized:
uint16_t* p = malloc(sizeof(uint16_t)*N);
for(unsigned i=0; i<N; i++) *p++ = i;
First I used modBitAnd everywhere, but found out that the modNeg1 is acutally faster for the two cases where it can be used.

First take a few stackshots to find out where the time is actually going. Your mod functions will grab some fraction of the samples, but you've also got two calls to random, plus a fair amount of array indexing. Also, it looks like you've got four calls to en with some arguments that are the same, so maybe your modularity is leading to repeat calls to the mod functions.

Related

Efficient algorithm to calculate the sum of number of base2 digits (number of bits) over an interval of positive integers

Let's say I've been given two integers a, b where a is a positive integer and is smaller than b. I have to find an efficient algorithm that's going to give me the sum of number of base2 digits (number of bits) over the interval [a, b]. For example, in the interval [0, 4] the sum of digits is equal to 9 because 0 = 1 digit, 1 = 1 digit, 2 = 2 digits, 3 = 2 digits and 4 = 3 digits.
My program is capable of calculating this number by using a loop but I'm looking for something more efficient for large numbers. Here are the snippets of my code just to give you an idea:
int numberOfBits(int i) {
if(i == 0) {
return 1;
}
else {
return (int) log2(i) + 1;
}
}
The function above is for calculating the number of digits of one number in the interval.
The code below shows you how I use it in my main function.
for(i = a; i <= b; i++) {
l = l + numberOfBits(i);
}
printf("Digits: %d\n", l);
Ideally I should be able to get the number of digits by using the two values of my interval and using some special algorithm to do that.

Try this code, i think it gives you what you are needing to calculate the binaries:
int bit(int x)
{
if(!x) return 1;
else
{
int i;
for(i = 0; x; i++, x >>= 1);
return i;
}
}

The main thing to understand here is that the number of digits used to represent a number in binary increases by one with each power of two:
+--------------+---------------+
| number range | binary digits |
+==============+===============+
| 0 - 1 | 1 |
+--------------+---------------+
| 2 - 3 | 2 |
+--------------+---------------+
| 4 - 7 | 3 |
+--------------+---------------+
| 8 - 15 | 4 |
+--------------+---------------+
| 16 - 31 | 5 |
+--------------+---------------+
| 32 - 63 | 6 |
+--------------+---------------+
| ... | ... |
A trivial improvement over your brute force algorithm would then be to figure out how many times this number of digits has increased between the two numbers passed in (given by the base two logarithm) and add up the digits by multiplying the count of numbers that can be represented by the given number of digits (given by the power of two) with the number of digits.
A naive implementation of this algorithm is:
int digits_sum_seq(int a, int b)
{
int sum = 0;
int i = 0;
int log2b = b <= 0 ? 1 : floor(log2(b));
int log2a = a <= 0 ? 1 : floor(log2(a)) + 1;
sum += (pow(2, log2a) - a) * (log2a);
for (i = log2b; i > log2a; i--)
sum += pow(2, i - 1) * i;
sum += (b - pow(2, log2b) + 1) * (log2b + 1);
return sum;
}
It can then be improved by the more efficient versions of the log and pow functions seen in the other answers.

First, we can improve the speed of log2, but that only gives us a fixed factor speed-up and doesn't change the scaling.
Faster log2 adapted from: https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogLookup
The lookup table method takes only about 7 operations to find the log
of a 32-bit value. If extended for 64-bit quantities, it would take
roughly 9 operations. Another operation can be trimmed off by using
four tables, with the possible additions incorporated into each. Using
int table elements may be faster, depending on your architecture.
Second, we must re-think the algorithm. If you know that numbers between N and M have the same number of digits, would you add them up one by one or would you rather do (M-N+1)*numDigits?
But if we have a range where multiple numbers appear what do we do? Let's just find the intervals of same digits, and add sums of those intervals. Implemented below. I think that my findEndLimit could be further optimized with a lookup table.
Code
#include <stdio.h>
#include <limits.h>
#include <time.h>
unsigned int fastLog2(unsigned int v)
{
static const char LogTable256[256] =
{
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};
register unsigned int t, tt; // temporaries
if (tt = v >> 16)
{
return (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
}
else
{
return (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
}
}
unsigned int numberOfBits(unsigned int i)
{
if (i == 0) {
return 1;
}
else {
return fastLog2(i) + 1;
}
}
unsigned int findEndLimit(unsigned int sx, unsigned int ex)
{
unsigned int sy = numberOfBits(sx);
unsigned int ey = numberOfBits(ex);
unsigned int mx;
unsigned int my;
if (sy == ey) // this also means sx == ex
return ex;
// assumes sy < ey
mx = (ex - sx) / 2 + sx; // will eq. sx for sx + 1 == ex
my = numberOfBits(mx);
while (ex - sx != 1) {
mx = (ex - sx) / 2 + sx; // will eq. sx for sx + 1 == ex
my = numberOfBits(mx);
if (my == ey) {
ex = mx;
ey = numberOfBits(ex);
}
else {
sx = mx;
sy = numberOfBits(sx);
}
}
return sx+1;
}
int main(void)
{
unsigned int a, b, m;
unsigned long l;
clock_t start, end;
l = 0;
a = 0;
b = UINT_MAX;
start = clock();
unsigned int i;
for (i = a; i < b; ++i) {
l += numberOfBits(i);
}
if (i == b) {
l += numberOfBits(i);
}
end = clock();
printf("Naive\n");
printf("Digits: %ld; Time: %fs\n",l, ((double)(end-start))/CLOCKS_PER_SEC);
l=0;
start = clock();
do {
m = findEndLimit(a, b);
l += (b-m + 1) * (unsigned long)numberOfBits(b);
b = m-1;
} while (b > a);
l += (b-a+1) * (unsigned long)numberOfBits(b);
end = clock();
printf("Binary search\n");
printf("Digits: %ld; Time: %fs\n",l, ((double)(end-start))/CLOCKS_PER_SEC);
}
Output
From 0 to UINT_MAX
$ ./main
Naive
Digits: 133143986178; Time: 25.722492s
Binary search
Digits: 133143986178; Time: 0.000025s
My findEndLimit can take long time in some edge cases:
From UINT_MAX/16+1 to UINT_MAX/8
$ ./main
Naive
Digits: 7784628224; Time: 1.651067s
Binary search
Digits: 7784628224; Time: 4.921520s

Conceptually, you would need to split the task to two subproblems -
1) find the sum of digits from 0..M, and from 0..N, then subtract.
2) find the floor(log2(x)), because eg for the number 77 the numbers 64,65,...77 all have 6 digits, the next 32 have 5 digits, the next 16 have 4 digits and so on, which makes a geometric progression.
Thus:
int digits(int a) {
if (a == 0) return 1; // should digits(0) be 0 or 1 ?
int b=(int)floor(log2(a)); // use any all-integer calculation hack
int sum = 1 + (b+1) * (a- (1<<b) +1); // added 1, due to digits(0)==1
while (--b)
sum += (b + 1) << b; // shortcut for (b + 1) * (1 << b);
return sum;
}
int digits_range(int a, int b) {
if (a <= 0 || b <= 0) return -1; // formulas work for strictly positive numbers
return digits(b)-digits(a-1);
}

As efficiency depends on the tools available, one approach would be doing it "analog":
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
unsigned long long pow2sum_min(unsigned long long n, long long unsigned m)
{
if (m >= n)
{
return 1;
}
--n;
return (2ULL << n) + pow2sum_min(n, m);
}
#define LN(x) (log2(x)/log2(M_E))
int main(int argc, char** argv)
{
if (2 >= argc)
{
fprintf(stderr, "%s a b\n", argv[0]);
exit(EXIT_FAILURE);
}
long a = atol(argv[1]), b = atol(argv[2]);
if (0L >= a || 0L >= b || b < a)
{
puts("Na ...!");
exit(EXIT_FAILURE);
}
/* Expand intevall to cover full dimensions: */
unsigned long long a_c = pow(2, floor(log2(a)));
unsigned long long b_c = pow(2, floor(log2(b+1)) + 1);
double log2_a_c = log2(a_c);
double log2_b_c = log2(b_c);
unsigned long p2s = pow2sum_min(log2_b_c, log2_a_c) - 1;
/* Integral log2(x) between a_c and b_c: */
double A = ((b_c * (LN(b_c) - 1))
- (a_c * (LN(a_c) - 1)))/LN(2)
+ (b+1 - a);
/* "Integer"-integral - integral of log2(x)'s inverse function (2**x) between log(a_c) and log(b_c): */
double D = p2s - (b_c - a_c)/LN(2);
/* Corrective from a_c/b_c to a/b : */
double C = (log2_b_c - 1)*(b_c - (b+1)) + log2_a_c*(a - a_c);
printf("Total used digits: %lld\n", (long long) ((A - D - C) +.5));
}
:-)
The main thing here is the number and kind of iterations done.
Number is
log(floor(b_c)) - log(floor(a_c))
times
doing one
n - 1 /* Integer decrement */
2**n + s /* One bit-shift and one integer addition */
for each iteration.

Here's an entirely look-up based approach. You don't even need the log2 :)
Algorithm
First we precompute interval limits where the number of bits would change and create a lookup table. In other words we create an array limits[2^n], where limits[i] gives us the biggest integer that can be represented with (i+1) bits. Our array is then {1, 3, 7, ..., 2^n-1}.
Then, when we want to determine the sum of bits for our range, we must first match our range limits a and b with the smallest index for which a <= limits[i] and b <= limits[j] holds, which will then tell us that we need (i+1) bits to represent a, and (j+1) bits to represent b.
If the indexes are the same, then the result is simply (b-a+1)*(i+1), otherwise we must separately get the number of bits from our value to the edge of same number of bits interval, and add up total number of bits for each interval between as well. In any case, simple arithmetic.
Code
#include <stdio.h>
#include <limits.h>
#include <time.h>
unsigned long bitsnumsum(unsigned int a, unsigned int b)
{
// generate lookup table
// limits[i] is the max. number we can represent with (i+1) bits
static const unsigned int limits[32] =
{
#define LTN(n) n*2u-1, n*4u-1, n*8u-1, n*16u-1, n*32u-1, n*64u-1, n*128u-1, n*256u-1
LTN(1),
LTN(256),
LTN(256*256),
LTN(256*256*256)
};
// make it work for any order of arguments
if (b < a) {
unsigned int c = a;
a = b;
b = c;
}
// find interval of a
unsigned int i = 0;
while (a > limits[i]) {
++i;
}
// find interval of b
unsigned int j = i;
while (b > limits[j]) {
++j;
}
// add it all up
unsigned long sum = 0;
if (i == j) {
// a and b in the same range
// conveniently, this also deals with j == 0
// so no danger to do [j-1] below
return (i+1) * (unsigned long)(b - a + 1);
}
else {
// add sum of digits in range [a, limits[i]]
sum += (i+1) * (unsigned long)(limits[i] - a + 1);
// add sum of digits in range [limits[j], b]
sum += (j+1) * (unsigned long)(b - limits[j-1]);
// add sum of digits in range [limits[i], limits[j]]
for (++i; i<j; ++i) {
sum += (i+1) * (unsigned long)(limits[i] - limits[i-1]);
}
return sum;
}
}
int main(void)
{
clock_t start, end;
unsigned int a=0, b=UINT_MAX;
start = clock();
printf("Sum of binary digits for numbers in range "
"[%u, %u]: %lu\n", a, b, bitsnumsum(a, b));
end = clock();
printf("Time: %fs\n", ((double)(end-start))/CLOCKS_PER_SEC);
}
Output
$ ./lookup
Sum of binary digits for numbers in range [0, 4294967295]: 133143986178
Time: 0.000282s

Algorithm
The main idea is to find the n2 = log2(x) rounded down. That is the number of digits in x. Let pow2 = 1 << n2. n2 * (pow2 - x + 1) is the number of digits in the values [x...pow2]. Now find the sun of digits in the powers of 2 from 1 to n2-1
Code
I am certain various simplifications can be made.
Untested code. Will review later.
// Let us use unsigned for everything.
unsigned ulog2(unsigned value) {
unsigned result = 0;
if (0xFFFF0000u & value) {
value >>= 16; result += 16;
}
if (0xFF00u & value) {
value >>= 8; result += 8;
}
if (0xF0u & value) {
value >>= 4; result += 4;
}
if (0xCu & value) {
value >>= 2; result += 2;
}
if (0x2 & value) {
value >>= 1; result += 1;
}
return result;
}
unsigned bit_count_helper(unsigned x) {
if (x == 0) {
return 1;
}
unsigned n2 = ulog2(x);
unsigned pow2 = 1u << n;
unsigned sum = n2 * (pow2 - x + 1u); // value from pow2 to x
while (n2 > 0) {
// ... + 5*16 + 4*8 + 3*4 + 2*2 + 1*1
pow2 /= 2;
sum += n2 * pow2;
}
return sum;
}
unsigned bit_count(unsigned a, unsigned b) {
assert(a < b);
return bit_count_helper(b - 1) - bit_count_helper(a);
}

For this problem your solution is the simplest, the one called "naive" where you look for every element in the sequence or in your case interval for check something or execute operations.
Naive Algorithm
Assuming that a and b are positive integers with b greater than a let's call the dimension/size of the interval [a,b], n = (b-a).
Having our number of elements n and using some notations of algorithms (like big-O notation link), the worst case cost is O(n*(numberOfBits_cost)).
From this we can see that we can speed up our algorithm by using a faster algorithm for computing numberOfBits() or we need to find a way to not look at every element of the interval that costs us n operations.
Intuition
Now looking at a possible interval [6,14] you can see that for 6 and 7 we need 3 digits, with 4 need for 8,9,10,11,12,13,14. This results in calling numberOfBits() for every number that use the same number of digits to be represented, while the following multiplication operation would be faster:
(number_in_subinterval)*digitsForThisInterval
((14-8)+1)*4 = 28
((7-6)+1)*3 = 6
So we reduced the looping on 9 elements with 9 operations to only 2.
So writing a function that use this intuition will give us a more efficient in time, not necessarily in memory, algorithm. Using your numberOfBits() function I have created this solution:
int intuitionSol(int a, int b){
int digitsForA = numberOfBits(a);
int digitsForB = numberOfBits(b);
if(digitsForA != digitsForB){
//because a or b can be that isn't the first or last element of the
// interval that a specific number of digit can rappresent there is a need
// to execute some correction operation before on a and b
int tmp = pow(2,digitsForA) - a;
int result = tmp*digitsForA; //will containt the final result that will be returned
int i;
for(i = digitsForA + 1; i < digitsForB; i++){
int interval_elements = pow(2,i) - pow(2,i-1);
result = result + ((interval_elements) * i);
//printf("NumOfElem: %i for %i digits; sum:= %i\n", interval_elements, i, result);
}
int tmp1 = ((b + 1) - pow(2,digitsForB-1));
result = result + tmp1*digitsForB;
return result;
}
else {
int elements = (b - a) + 1;
return elements * digitsForA; // or digitsForB
}
}
Let's look at the cost, this algorithm costs is the cost of doing correction operation on a and b plus the most expensive one that of the for-loop. In my solution however I'm not looping over all elements but only on numberOfBits(b)-numberOfBits(a) that in the worst case, when [0,n], become log(n)-1 thats equivalent to O(log n).
To resume we passed from a linear operations cost O(n) to a logartmic one O(log n) in the worst case. Look on this diagram the diferinces between the two.
Note
When I talk about interval or sub-interval I refer to the interval of elements that use the same number of digits to represent the number in binary.
Following there are some output of my tests with the last one that shows the difference:
Considered interval is [0,4]
YourSol: 9 in time: 0.000015s
IntuitionSol: 9 in time: 0.000007s
Considered interval is [0,0]
YourSol: 1 in time: 0.000005s
IntuitionSol: 1 in time: 0.000005s
Considered interval is [4,7]
YourSol: 12 in time: 0.000016s
IntuitionSol: 12 in time: 0.000005s
Considered interval is [2,123456]
YourSol: 1967697 in time: 0.005010s
IntuitionSol: 1967697 in time: 0.000015s

Calculating multiples in Haskell (conversion from C)? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm trying to write a Haskell program that calculates multiples. Basically, when given two integers a and b, I want to find how many integers 1 ≤ bi ≤ b are multiple of any integer 2 ≤ ai ≤ a. For example, if a = 3 and b = 30, I want to know how many integers in the range of 1-30 are a multiple of 2 or 3; there are 20 such integers: 2, 3, 4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20, 21, 22, 24, 26, 27, 28, 30.
I have a C program that does this. I'm trying to get this translated into Haskell, but part of the difficulty is getting around the loops that I've used since Haskell doesn't use loops. I appreciate any and all help in translating this!
My C program for reference (sorry if formatting is off):
#define PRIME_RANGE 130
#define PRIME_CNT 32
#define UPPER_LIMIT (1000000000000000ull) //10^15
#define MAX_BASE_MULTIPLES_COUNT 25000000
typedef struct
{
char primeFactorFlag;
long long multiple;
}multipleInfo;
unsigned char primeFlag[PRIME_RANGE + 1];
int primes[PRIME_CNT];
int primeCnt = 0;
int maxPrimeStart[PRIME_CNT];
multipleInfo baseMultiples[MAX_BASE_MULTIPLES_COUNT];
multipleInfo mergedMultiples[MAX_BASE_MULTIPLES_COUNT];
int baseMultiplesCount, mergedMultiplesCount;
void findOddMultiples(int a, long long b, long long *count);
void generateBaseMultiples(void);
void mergeLists(multipleInfo listSource[], int countS, multipleInfo
listDest[], int *countD);
void sieve(void);
int main(void)
{
int i, j, a, n, startInd, endInd;
long long b, multiples;
//Generate primes
sieve();
primes[primeCnt] = PRIME_RANGE + 1;
generateBaseMultiples();
baseMultiples[baseMultiplesCount].multiple = UPPER_LIMIT + 1;
//Input and Output
scanf("%d", &n);
for(i = 1; i <= n; i++)
{
scanf("%d%lld", &a, &b);
//If b <= a, all are multiple except 1
if(b <= a)
printf("%lld\n",b-1);
else
{
//Add all even multiples
multiples = b / 2;
//Add all odd multiples
findOddMultiples(a, b, &multiples);-
printf("%lld\n", multiples);
}
}
return 0;
}
void findOddMultiples(int a, long long b, long long *count)
{
int i, k;
long long currentNum;
for(k = 1; k < primeCnt && primes[k] <= a; k++)
{
for(i = maxPrimeStart[k]; i < maxPrimeStart[k + 1] &&
baseMultiples[i].multiple <= b; i++)
{
currentNum = b/baseMultiples[i].multiple;
currentNum = (currentNum + 1) >> 1; // remove even multiples
if(baseMultiples[i].primeFactorFlag) //odd number of factors
(*count) += currentNum;
else
(*count) -= currentNum;
}
}
}
void addTheMultiple(long long value, int primeFactorFlag)
{
baseMultiples[baseMultiplesCount].multiple = value;
baseMultiples[baseMultiplesCount].primeFactorFlag = primeFactorFlag;
baseMultiplesCount++;
}
void generateBaseMultiples(void)
{
int i, j, t, prevCount;
long long curValue;
addTheMultiple(3, 1);
mergedMultiples[0] = baseMultiples[0];
mergedMultiplesCount = 1;
maxPrimeStart[1] = 0;
prevCount = mergedMultiplesCount;
for(i = 2; i < primeCnt; i++)
{
maxPrimeStart[i] = baseMultiplesCount;
addTheMultiple(primes[i], 1);
for(j = 0; j < prevCount; j++)
{
curValue = mergedMultiples[j].multiple * primes[i];
if(curValue > UPPER_LIMIT)
break;
addTheMultiple(curValue, 1 - mergedMultiples[j].primeFactorFlag);
}
if(i < primeCnt - 1)
mergeLists(&baseMultiples[prevCount], baseMultiplesCount - prevCount, mergedMultiples, &mergedMultiplesCount);
prevCount = mergedMultiplesCount;
}
maxPrimeStart[primeCnt] = baseMultiplesCount;
}
void mergeLists(multipleInfo listSource[], int countS, multipleInfo listDest[], int *countD)
{
int limit = countS + *countD;
int i1, i2, j, k;
//Copy one list in unused safe memory
for(j = limit - 1, k = *countD - 1; k >= 0; j--, k--)
listDest[j] = listDest[k];
//Merge the lists
for(i1 = 0, i2 = countS, k = 0; i1 < countS && i2 < limit; k++)
{
if(listSource[i1].multiple <= listDest[i2].multiple)
listDest[k] = listSource[i1++];
else
listDest[k] = listDest[i2++];
}
while(i1 < countS)
listDest[k++] = listSource[i1++];
while(i2 < limit)
listDest[k++] = listDest[i2++];
*countD = k;
}
void sieve(void)
{
int i, j, root = sqrt(PRIME_RANGE);
primes[primeCnt++] = 2;
for(i = 3; i <= PRIME_RANGE; i+= 2)
{
if(!primeFlag[i])
{
primes[primeCnt++] = i;
if(root >= i)
{
for(j = i * i; j <= PRIME_RANGE; j += i << 1)
primeFlag[j] = 1;
}
}
}
}

First, unless I'm grossly misunderstanding, the number of multiples you have there is wrong. The number of multiples of 2 between 1 and 30 is 15, and the number of multiples of 3 between 1 and 30 is 10, so there should be 25 numbers there.
EDIT: I did misunderstand; you want unique multiples.
To get unique multiples, you can use Data.Set, which has the invariant that the elements of the Set are unique and ordered ascendingly.
If you know you aren't going to exceed x = maxBound :: Int, you can get even better speedups using Data.IntSet. I've also included some test cases and annotated with comments what they run at on my machine.
{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -O2 #-}
module Main (main) where
import System.CPUTime (getCPUTime)
import Data.IntSet (IntSet)
import qualified Data.IntSet as IntSet
main :: IO ()
main = do
test 3 30 -- 0.12 ms
test 131 132 -- 0.14 ms
test 500 300000 -- 117.63 ms
test :: Int -> Int -> IO ()
test !a !b = do
start <- getCPUTime
print (numMultiples a b)
end <- getCPUTime
print $ "Needed " ++ show ((fromIntegral (end - start)) / 10^9) ++ " ms.\n"
numMultiples :: Int -> Int -> Int
numMultiples !a !b = IntSet.size (foldMap go [2..a])
where
go :: Int -> IntSet
go !x = IntSet.fromAscList [x, x+x .. b]

I'm not really into understanding your C, so I implemented a solution afresh using the algorithm discussed here. The N in the linked algorithm is the product of the primes up to a in your problem description.
So first we'll need a list of primes. There's a standardish trick for getting a list of primes that is at once very idiomatic and relatively efficient:
primes :: [Integer]
primes = 2:filter isPrime [3..]
-- Doesn't work right for n<2, but we never call it there, so who cares?
isPrime :: Integer -> Bool
isPrime n = go primes n where
go (p:ps) n | p*p>n = True
| otherwise = n `rem` p /= 0 && go ps n
Next up: we want a way to iterate over the positive square-free divisors of N. This can be achieved by iterating over the subsets of the primes less than a. There's a standard idiomatic way to get a powerset, namely:
-- import Control.Monad
-- powerSet :: [a] -> [[a]]
-- powerSet = filterM (const [False, True])
That would be a fine component to use, but since at the end of the day we only care about the product of each powerset element and the value of the Mobius function of that product, we would end up duplicating a lot of multiplications and counting problems. It's cheaper to compute those two things directly while producing the powerset. So:
-- Given the prime factorization of a square-free number, produce a list of
-- its divisors d together with mu(d).
divisorsWithMu :: Num a => [a] -> [(a, a)]
divisorsWithMu [] = [(1, 1)]
divisorsWithMu (p:ps) = rec ++ [(p*d, -mu) | (d, mu) <- rec] where
rec = divisorsWithMu ps
With that in hand, we can just iterate and do a little arithmetic.
f :: Integer -> Integer -> Integer
f a b = b - sum
[ mu * (b `div` d)
| (d, mu) <- divisorsWithMu (takeWhile (<=a) primes)
]
And that's all the code. Crunched 137 lines of C down to 15 lines of Haskell -- not bad! Try it out in ghci:
> f 3 30
20
As an additional optimization, one could consider modifying divisorsWithMu to short-circuit when its divisor is bigger than b, as we know such terms will not contribute to the final sum. This makes a noticeable difference for large a, as without it there are exponentially many elements in the powerset. Here's how that modification looks:
-- Given an upper bound and the prime factorization of a square-free number,
-- produce a list of its divisors d that are no larger than the upper bound
-- together with mu(d).
divisorsWithMuUnder :: (Ord a, Num a) => a -> [a] -> [(a, a)]
divisorsWithMuUnder n [] = [(1, 1)]
divisorsWithMuUnder n (p:ps) = rec ++ [(p*d, -mu) | (d, mu) <- rec, p*d<=n]
where rec = divisorsWithMuUnder n ps
f' :: Integer -> Integer -> Integer
f' a b = b - sum
[ mu * (b `div` d)
| (d, mu) <- divisorsWithMuUnder b (takeWhile (<=a) primes)
]
Not much more complicated; the only really interesting difference is that there's now a condition in the list comprehension. Here's an example of f' finishing quickly for inputs that would take infeasibly long with f:
> f' 100 100000
88169

With data-ordlist package mentioned by Daniel Wagner in the comments, it is just
f a b = length $ unionAll [ [p,p+p..b] | p <- takeWhile (<= a) primes]
That is all. Some timings, for non-compiled code run inside GHCi:
~> f 100 (10^5)
88169
(0.05 secs, 48855072 bytes)
~> f 131 (3*10^6)
2659571
(0.55 secs, 1493586480 bytes)
~> f 131 132
131
(0.00 secs, 0 bytes)
~> f 500 300000
274055
(0.11 secs, 192704760 bytes)
Compiling will surely make the memory consumption a non-issue, by converting the length to a counting loop.

You'll have to use recursion in place of loops.
In (most) procedural or object-orientated languages, you should hardly ever (never?) be using recursion. It is horribly inefficient, as a new stack frame must be created each time the recursive function is called.
However, in a functional language, like Haskell, the compiler is often able to optimize the recursion away into a loop, which makes it much faster then its procedural counterparts.
I've converted your sieve function into a set of recursive functions in C. I'll leave it to you to convert it into Haskell:
int main(void) {
//...
int root = sqrt(PRIME_RANGE);
primes[primeCnt++] = 2;
sieve(3, PRIME_RANGE, root);
//...
}
void sieve(int i, int end, int root) {
if(i > end) {
return;
}
if(!primeFlag[i]) {
primes[primeCnt++] = i;
if(root >= i) {
markMultiples(i * i, PRIME_RANGE, i);
}
}
i += 2;
sieve(i, end, root);
}
void markMultiples(int j, int end, int prime) {
if(j > end) {
return;
}
primeFlag[j] = 1;
j += i << 1;
markMultiples(j, end, prime);
}
The point of recursion is that the same function is called repeatedly, until a condition is met. The results of one recursive call are passed onto the next call, until the condition is met.
Also, why are you bit-fiddling instead of just multiplying or dividing by 2? Any half-decent compiler these days can convert most multiplications and divisions by 2 into a bit-shift.

recurrence relation : find bit strings of length seven contain two consecutive 0 in C

i have the recurrence relation of
and the initials condition is
a0 = a1 = 0
with these two, i have to find the bit strings of length 7 contain two consecutive 0 which i already solve.
example:
a2 = a2-1 + a2-2 + 22-2
= a1 + a0 + 20
= 0 + 0 + 1
= 1
and so on until a7.
the problem is how to convert these into c?
im not really good at c but i try it like this.
#include<stdio.h>
#include <math.h>
int main()
{
int a[7];
int total = 0;
printf("the initial condition is a0 = a1 = 0\n\n");
// a[0] = 0;
// a[1] = 0;
for (int i=2; i<=7; i++)
{
if(a[0] && a[1])
a[i] = 0;
else
total = (a[i-1]) + (a[i-2]) + (2 * pow((i-2),i));
printf("a%d = a(%d-1) + a(%d-2) + 2(%d-2)\n",i,i,i,i);
printf("a%d = %d\n\n",i,total);
}
}
the output are not the same as i calculate pls help :(

int func (int n)
{
if (n==0 || n==1)
return 0;
if (n==2)
return 1;
return func(n-1) + func(n-2) + pow(2,(n-2));
}
#include<stdio.h>
#include <math.h>
int main()
{
return func(7);
}

First of uncomment the lines which initialized the 2 first elements. Then at the for loop the only 2 lines need are:
a[i]=a[i-1]+a[i-2]+pow(2, i-2);
And then print a i

In the pow() function, pow(x,y) = x^y (which operates on doubles and returns double). The C code in your example is thus doing 2.0*(((double)i-2.0)^(double)i)... A simpler approach to 2^(i-2) (in integer math) is to use the bitwise shift operation:
total = a[i-1] + a[i-2] + (1 << i-2);
(Note: For ANSI C operator precedence consult an internet search engine of your choice.)
If your intention is to make the function capable of supporting floating point, then the pow() function would be appropriate... but the types of the variables would need to change accordingly.
For integer math, you may wish to consider using a long or long long type so that you have less risk of running out of headroom in the type.

How to generate random 64-bit unsigned integer in C

I need generate random 64-bit unsigned integers using C. I mean, the range should be 0 to 18446744073709551615. RAND_MAX is 1073741823.
I found some solutions in the links which might be possible duplicates but the answers mostly concatenates some rand() results or making some incremental arithmetic operations. So results are always 18 digits or 20 digits. I also want outcomes like 5, 11, 33387, not just 3771778641802345472.
By the way, I really don't have so much experience with the C but any approach, code samples and idea could be beneficial.

Concerning "So results are always 18 digits or 20 digits."
See #Thomas comment. If you generate random numbers long enough, code will create ones like 5, 11 and 33387. If code generates 1,000,000,000 numbers/second, it may take a year as very small numbers < 100,000 are so rare amongst all 64-bit numbers.
rand() simple returns random bits. A simplistic method pulls 1 bit at a time
uint64_t rand_uint64_slow(void) {
uint64_t r = 0;
for (int i=0; i<64; i++) {
r = r*2 + rand()%2;
}
return r;
}
Assuming RAND_MAX is some power of 2 - 1 as in OP's case 1073741823 == 0x3FFFFFFF, take advantage that 30 at least 15 bits are generated each time. The following code will call rand() 5 3 times - a tad wasteful. Instead bits shifted out could be saved for the next random number, but that brings in other issues. Leave that for another day.
uint64_t rand_uint64(void) {
uint64_t r = 0;
for (int i=0; i<64; i += 15 /*30*/) {
r = r*((uint64_t)RAND_MAX + 1) + rand();
}
return r;
}
A portable loop count method avoids the 15 /*30*/ - But see 2020 edit below.
#if RAND_MAX/256 >= 0xFFFFFFFFFFFFFF
#define LOOP_COUNT 1
#elif RAND_MAX/256 >= 0xFFFFFF
#define LOOP_COUNT 2
#elif RAND_MAX/256 >= 0x3FFFF
#define LOOP_COUNT 3
#elif RAND_MAX/256 >= 0x1FF
#define LOOP_COUNT 4
#else
#define LOOP_COUNT 5
#endif
uint64_t rand_uint64(void) {
uint64_t r = 0;
for (int i=LOOP_COUNT; i > 0; i--) {
r = r*(RAND_MAX + (uint64_t)1) + rand();
}
return r;
}
The autocorrelation effects commented here are caused by a weak rand(). C does not specify a particular method of random number generation. The above relies on rand() - or whatever base random function employed - being good.
If rand() is sub-par, then code should use other generators. Yet one can still use this approach to build up larger random numbers.
[Edit 2020]
Hallvard B. Furuseth provides as nice way to determine the number of bits in RAND_MAX when it is a Mersenne Number - a power of 2 minus 1.
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
#define RAND_MAX_WIDTH IMAX_BITS(RAND_MAX)
_Static_assert((RAND_MAX & (RAND_MAX + 1u)) == 0, "RAND_MAX not a Mersenne number");
uint64_t rand64(void) {
uint64_t r = 0;
for (int i = 0; i < 64; i += RAND_MAX_WIDTH) {
r <<= RAND_MAX_WIDTH;
r ^= (unsigned) rand();
}
return r;
}

If you don't need cryptographically secure pseudo random numbers, I would suggest using MT19937-64. It is a 64 bit version of Mersenne Twister PRNG.
Please, do not combine rand() outputs and do not build upon other tricks. Use existing implementation:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html

Iff you have a sufficiently good source of random bytes (like, say, /dev/random or /dev/urandom on a linux machine), you can simply consume 8 bytes from that source and concatenate them. If they are independent and have a linear distribution, you're set.
If you don't, you MAY get away by doing the same, but there is likely to be some artefacts in your pseudo-random generator that gives a toe-hold for all sorts of hi-jinx.
Example code assuming we have an open binary FILE *source:
/* Implementation #1, slightly more elegant than looping yourself */
uint64_t 64bitrandom()
{
uint64_t rv;
size_t count;
do {
count = fread(&rv, sizeof(rv), 1, source);
} while (count != 1);
return rv;
}
/* Implementation #2 */
uint64_t 64bitrandom()
{
uint64_t rv = 0;
int c;
for (i=0; i < sizeof(rv); i++) {
do {
c = fgetc(source)
} while (c < 0);
rv = (rv << 8) | (c & 0xff);
}
return rv;
}
If you replace "read random bytes from a randomness device" with "get bytes from a function call", all you have to do is to adjust the shifts in method #2.
You're vastly more likely to get a "number with many digits" than one with "small number of digits" (of all the numbers between 0 and 2 ** 64, roughly 95% have 19 or more decimal digits, so really that is what you will mostly get.

If you are willing to use a repetitive pseudo random sequence and you can deal with a bunch of values that will never happen (like even numbers? ... don't use just the low bits), an LCG or MCG are simple solutions. Wikipedia: Linear congruential generator can get you started (there are several more types including the commonly used Wikipedia: Mersenne Twister). And this site can generate a couple prime numbers for the modulus and the multiplier below. (caveat: this sequence will be guessable and thus it is NOT secure)
#include <stdio.h>
#include <stdint.h>
uint64_t
mcg64(void)
{
static uint64_t i = 1;
return (i = (164603309694725029ull * i) % 14738995463583502973ull);
}
int
main(int ac, char * av[])
{
for (int i = 0; i < 10; i++)
printf("%016p\n", mcg64());
}

I have tried this code here and it seems to work fine there.
#include <time.h>
#include <stdlib.h>
#include <math.h>
int main(){
srand(time(NULL));
int a = rand();
int b = rand();
int c = rand();
int d = rand();
long e = (long)a*b;
e = abs(e);
long f = (long)c*d;
f = abs(f);
long long answer = (long long)e*f;
printf("value %lld",answer);
return 0;
}
I ran a few iterations and i get the following outputs :
value 1869044101095834648
value 2104046041914393000
value 1587782446298476296
value 604955295827516250
value 41152208336759610
value 57792837533816000

If you have 32 or 16-bit random value - generate 2 or 4 randoms and combine them to one 64-bit with << and |.
uint64_t rand_uint64(void) {
// Assuming RAND_MAX is 2^31.
uint64_t r = rand();
r = r<<30 | rand();
r = r<<30 | rand();
return r;
}

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
unsigned long long int randomize(unsigned long long int uint_64);
int main(void)
{
srand(time(0));
unsigned long long int random_number = randomize(18446744073709551615);
printf("%llu\n",random_number);
random_number = randomize(123);
printf("%llu\n",random_number);
return 0;
}
unsigned long long int randomize(unsigned long long int uint_64)
{
char buffer[100] , data[100] , tmp[2];
//convert llu to string,store in buffer
sprintf(buffer, "%llu", uint_64);
//store buffer length
size_t len = strlen(buffer);
//x : store converted char to int, rand_num : random number , index of data array
int x , rand_num , index = 0;
//condition that prevents the program from generating number that is bigger input value
bool Condition = 0;
//iterate over buffer array
for( int n = 0 ; n < len ; n++ )
{
//store the first character of buffer
tmp[0] = buffer[n];
tmp[1] = '\0';
//convert it to integer,store in x
x = atoi(tmp);
if( n == 0 )
{
//if first iteration,rand_num must be less than or equal to x
rand_num = rand() % ( x + 1 );
//if generated random number does not equal to x,condition is true
if( rand_num != x )
Condition = 1;
//convert character that corrosponds to integer to integer and store it in data array;increment index
data[index] = rand_num + '0';
index++;
}
//if not first iteration,do the following
else
{
if( Condition )
{
rand_num = rand() % ( 10 );
data[index] = rand_num + '0';
index++;
}
else
{
rand_num = rand() % ( x + 1 );
if( rand_num != x )
Condition = 1;
data[index] = rand_num + '0';
index++;
}
}
}
data[index] = '\0';
char *ptr ;
//convert the data array to unsigned long long int
unsigned long long int ret = _strtoui64(data,&ptr,10);
return ret;
}

Miller Rabin Primality test accuracy

I know the Miller–Rabin primality test is probabilistic. However I want to use it for a programming task that leaves no room for error.
Can we assume that it is correct with very high probability if the input numbers are 64-bit integers (i.e. long long in C)?

Miller–Rabin is indeed probabilistic, but you can trade accuracy for computation time arbitrarily. If the number you test is prime, it will always give the correct answer. The problematic case is when a number is composite, but is reported to be prime. We can bound the probability of this error using the formula on Wikipedia: If you select k different bases randomly and test them, the error probability is less than 4-k. So even with k = 9, you only get a 3 in a million chance of being wrong. And with k = 40 or so it becomes ridiculously unlikely.
That said, there is a deterministic version of Miller–Rabin, relying on the correctness of the generalized Riemann hypothesis. For the range u
up to 264, it is enough to check a = 2, 3, 5, 7, 11, 13, 17, 19, 23. I have a C++ implementation online which was field-tested in lots of programming contests. Here's an instantiation of the template for unsigned 64-bit ints:
bool isprime(uint64_t n) { //determines if n is a prime number
const int pn = 9, p[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23 };
for (int i = 0; i < pn; ++i)
if (n % p[i] == 0) return n == p[i];
if (n < p[pn - 1]) return 0;
uint64_t s = 0, t = n - 1;
while (~t & 1)
t >>= 1, ++s;
for (int i = 0; i < pn; ++i) {
uint64_t pt = PowerMod(p[i], t, n);
if (pt == 1) continue;
bool ok = 0;
for (int j = 0; j < s && !ok; ++j) {
if (pt == n - 1) ok = 1;
pt = MultiplyMod(pt, pt, n);
}
if (!ok) return 0;
}
return 1;
}
PowerMod and MultiplyMod are just primitives to multiply and exponentiate under a given modulus, using square-and-{multiply,add}.

For n < 2^64, it is possible to perform strong-pseudoprime tests to the seven bases 2, 325, 9375, 28178, 450775, 9780504, and 1795265022 and completely determine the primality of n; see http://miller-rabin.appspot.com/.
A faster primality test performs a strong-pseudoprime test to base 2 followed by a Lucas pseudoprime test. It takes about 3 times as long as a single strong-pseudoprime test, so is more than twice as fast as the 7-base Miller-Rabin test. The code is more complex, but not dauntingly so.
I can post code if you're interested; let me know in the comments.

In each iteration of Miller-Rabin you need to choose a random number. If you are unlucky this random number doesn't reveal certain composites. A small example of this is that 2^341 mod 341 = 2, passing the test
But the test guarantees that it only lets a composite pass with probability <1/4. So if you run the test 64 times with different random values, the probability drops below 2^(-128) which is enough in practice.
You should take a look at the Baillie–PSW primality test. While it may have false positives, there are no known examples for this and according to wikipedia has been verified that no composite number below 2^64 passes the test. So it should fit your requirements.

There are efficient deterministic variants of the MR test for 64-bit values - which do not rely on the GRH - having been exhaustively tested by exploiting GPUs and other known results.
I've listed the pertinent sections of a C program I wrote that tests the primality of any 64-bit value: (n > 1), using Jaeschke's and Sinclair's bases for the deterministic MR variant. It makes use of gcc and clang's __int128 extended type for exponentiation. If not available, an explicit routine is required. Maybe others will find this useful...
#include <inttypes.h>
/******************************************************************************/
static int sprp (uint64_t n, uint64_t a)
{
uint64_t m = n - 1, r, y;
unsigned int s = 1, j;
/* assert(n > 2 && (n & 0x1) != 0); */
while ((m & (UINT64_C(1) << s)) == 0) s++;
r = m >> s; /* r, s s.t. 2^s * r = n - 1, r in odd. */
if ((a %= n) == 0) /* else (0 < a < n) */
return (1);
{
unsigned __int128 u = 1, w = a;
while (r != 0)
{
if ((r & 0x1) != 0)
u = (u * w) % n; /* (mul-rdx) */
if ((r >>= 1) != 0)
w = (w * w) % n; /* (sqr-rdx) */
}
if ((y = (uint64_t) u) == 1)
return (1);
}
for (j = 1; j < s && y != m; j++)
{
unsigned __int128 u = y;
u = (u * u) % n; /* (sqr-rdx) */
if ((y = (uint64_t) u) <= 1) /* (n) is composite: */
return (0);
}
return (y == m);
}
/******************************************************************************/
static int is_prime (uint64_t n)
{
const uint32_t sprp32_base[] = /* (Jaeschke) */ {
2, 7, 61, 0};
const uint32_t sprp64_base[] = /* (Sinclair) */ {
2, 325, 9375, 28178, 450775, 9780504, 1795265022, 0};
const uint32_t *sprp_base;
/* assert(n > 1); */
if ((n & 0x1) == 0) /* even: */
return (n == 2);
sprp_base = (n <= UINT32_MAX) ? sprp32_base : sprp64_base;
for (; *sprp_base != 0; sprp_base++)
if (!sprp(n, *sprp_base)) return (0);
return (1); /* prime. */
}
/******************************************************************************/
Note that the MR (sprp) test is slightly modified to pass values on an iteration where the base is a multiple of the candidate, as mentioned in the 'remarks' section of the website
Update: while this has fewer base tests than Niklas' answer, it's important to note that the bases: {3, 5, 7, 11, 13, 17, 19, 23, 29} provide a cheap test that allows us to eliminate candidates exceeding: 29 * 29 = 841 -
simply using the GCD.
For (n > 29 * 29), we can clearly eliminate any even value as prime. The product of the small primes: (3 * 5 * 7 * 11 * 13 * 17 * 19 * 23 * 29} = 3234846615, fits nicely in a 32-bit unsigned value. A gcd(n, 3234846615) is a lot cheaper than a MR test! If the result is not (1), then (n) > 841 has a small factor.
Merten's (?) theorem suggests that this simple gcd(u64, u64) test eliminates ~ 68% of all odd candidates (as composites). If you're using M-R to search for primes (randomly or incrementally), rather than just a 'one-off' test, this is certainly worth while!

Your computer is not perfect; it has a finite probability of failing in such a way as to produce an incorrect result to a calculation. Providing the probability of the M-R test giving a false result is greatly less than the probability of some other computer failure, then you are fine. There is no reason to run the M-R test for less than 64 iterations (a 1 in 2^128 chance of error). Most examples will fail in the first few iterations, so only the actual primes will be thoroughly tested. Use 128 iterations for a 1 in 2^256 chance of error.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Modulo optimization - c

Related

Efficient algorithm to calculate the sum of number of base2 digits (number of bits) over an interval of positive integers

Calculating multiples in Haskell (conversion from C)? [closed]

recurrence relation : find bit strings of length seven contain two consecutive 0 in C

How to generate random 64-bit unsigned integer in C

Miller Rabin Primality test accuracy

Categories

Resources