My task is
Show on the screen n-element of the progression {xi}.
Xi = Xi-1 - 3Xi-2
X0 = 0
X1 = 2
i = [2,n]
Here is done, but I didn't understand this theme very well, so i need some help with it.
My code(doesn't work):
void __fastcall TForm1::Button1Click(TObject *Sender)
{
int n = Edit1->Text.ToInt();
int i, x;
if(n==0){
i=0;
Label1->Caption = IntToStr(i);
}
if(n==1){
i=2;
Label1->Caption = IntToStr(i);
}
else {
for(i=2;i<=n;i++){
x=(i-1)-3*(i-2);
Label1->Caption = IntToStr(x);
}
}
}
It's not very nessesary to write code in C++ Builder
You misunderstood the progression formula. Xi-1 and Xi-2 refer to previous elements calculated in your progression.
So you need two variables, which will be carrying previous values that you have just calculated. At any given loop, you calculate the current Xi value using the general progression formula, then copy the value of Xi-1 into Xi-2, throwing the previous value of Xi-2. Then you copy the value of Xi (the up to now current value) into Xi-1.
void __fastcall TForm1::Button1Click(TObject *Sender)
{
int n = Edit1->Text.ToInt();
int i, x;
int xim1, xim2
if(n==0){
i=0;
Label1->Caption = IntToStr(i);
}
if(n==1){
i=2;
Label1->Caption = IntToStr(i);
}
else {
xim1 = 2;
xim2 = 0;
for(i=2;i<=n;i++){
x = xim1-3*xim2;
xim2 = xim1;
xim1 = x;
}
Label1->Caption = IntToStr(x);
}
}
Given this generating function:
X_0 = 0
X_1 = 2
X_i = X_{i-1} + 3*X_{i-2} i = [2,n]
How would you calculate x_4? We know that X_4 = X_3 + 3*X_2; which means that we need to be able to calculate X_3 and X_2. We can write these as:
X_2 = X_1 + 3*X_0 = 2 + 3*0 = 2
X_3 = X_2 + 3*X_1 = 2 + 3*2 = 8
X_4 = X_3 + 3*X_2 = 8 + 3*2 = 14
This can normally be written as a recursive function:
int calcSeries(int n)
{
if(0 == n)
return 0;
if(1 == n)
return 2;
return calcSeries(n-1) + 3*calcSeries(n-2)
}
BTW, this is a very naive implementation for this series, the main problem is that we have two recursive trees; if you look at the hand expansion of X_4 above notice that X_2 appears twice (in the calculation of X_3 and X_4), but we don't store this value so we need to calculate it twice.
Related
Having some difficulty troubleshooting code I wrote in C to perform a logistic regression. While it seems to work on smaller, semi-randomized datasets, it stops working (e.g. assigning proper probabilities of belonging to class 1) at around the point where I pass 43,500 observations (determined by tweaking the number of observations created. When creating the 150 features used in the code, I do create the first two as a function of the number of observations, so I'm not sure if maybe that's the issue here, though I am using double precision. Maybe there's an overflow somewhere in the code?
The below code should be self-contained; it generates m=50,000 observations with n=150 features. Setting m below 43,500 should return "Percent class 1: 0.250000", setting to 44,000 or above will return "Percent class 1: 0.000000", regardless of what max_iter (number of times we sample m observations) is set to.
The first feature is set to 1.0 divided by the total number of observations, if class 0 (first 75% of observations), or the index of the observation divided by the total number of observations otherwise.
The second feature is just index divided by total number of observations.
All other features are random.
The logistic regression is intended to use stochastic gradient descent, randomly selecting an observation index, computing the gradient of the loss with the predicted y using current weights, and updating weights with the gradient and learning rate (eta).
Using the same initialization with Python and NumPy, I still get the proper results, even above 50,000 observations.
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <time.h>
// Compute z = w * x + b
double dlc( int n, double *X, double *coef, double intercept )
{
double y_pred = intercept;
for (int i = 0; i < n; i++)
{
y_pred += X[i] * coef[i];
}
return y_pred;
}
// Compute y_hat = 1 / (1 + e^(-z))
double sigmoid( int n, double alpha, double *X, double *coef, double beta, double intercept )
{
double y_pred;
y_pred = dlc(n, X, coef, intercept);
y_pred = 1.0 / (1.0 + exp(-y_pred));
return y_pred;
}
// Stochastic gradient descent
void sgd( int m, int n, double *X, double *y, double *coef, double *intercept, double eta, int max_iter, int fit_intercept, int random_seed )
{
double *gradient_coef, *X_i;
double y_i, y_pred, resid;
int idx;
double gradient_intercept = 0.0, alpha = 1.0, beta = 1.0;
X_i = (double *) malloc (n * sizeof(double));
gradient_coef = (double *) malloc (n * sizeof(double));
for ( int i = 0; i < n; i++ )
{
coef[i] = 0.0;
gradient_coef[i] = 0.0;
}
*intercept = 0.0;
srand(random_seed);
for ( int epoch = 0; epoch < max_iter; epoch++ )
{
for ( int run = 0; run < m; run++ )
{
// Randomly sample an observation
idx = rand() % m;
for ( int i = 0; i < n; i++ )
{
X_i[i] = X[n*idx+i];
}
y_i = y[idx];
// Compute y_hat
y_pred = sigmoid( n, alpha, X_i, coef, beta, *intercept );
resid = -(y_i - y_pred);
// Compute gradients and adjust weights
for (int i = 0; i < n; i++)
{
gradient_coef[i] = X_i[i] * resid;
coef[i] -= eta * gradient_coef[i];
}
if ( fit_intercept == 1 )
{
*intercept -= eta * resid;
}
}
}
}
int main(void)
{
double *X, *y, *coef, *y_pred;
double intercept;
double eta = 0.05;
double alpha = 1.0, beta = 1.0;
long m = 50000;
long n = 150;
int max_iter = 20;
long class_0 = (long)(3.0 / 4.0 * (double)m);
double pct_class_1 = 0.0;
clock_t test_start;
clock_t test_end;
double test_time;
printf("Constructing variables...\n");
X = (double *) malloc (m * n * sizeof(double));
y = (double *) malloc (m * sizeof(double));
y_pred = (double *) malloc (m * sizeof(double));
coef = (double *) malloc (n * sizeof(double));
// Initialize classes
for (int i = 0; i < m; i++)
{
if (i < class_0)
{
y[i] = 0.0;
}
else
{
y[i] = 1.0;
}
}
// Initialize observation features
for (int i = 0; i < m; i++)
{
if (i < class_0)
{
X[n*i] = 1.0 / (double)m;
}
else
{
X[n*i] = (double)i / (double)m;
}
X[n*i + 1] = (double)i / (double)m;
for (int j = 2; j < n; j++)
{
X[n*i + j] = (double)(rand() % 100) / 100.0;
}
}
// Fit weights
printf("Running SGD...\n");
test_start = clock();
sgd( m, n, X, y, coef, &intercept, eta, max_iter, 1, 42 );
test_end = clock();
test_time = (double)(test_end - test_start) / CLOCKS_PER_SEC;
printf("Time taken: %f\n", test_time);
// Compute y_hat and share of observations predicted as class 1
printf("Making predictions...\n");
for ( int i = 0; i < m; i++ )
{
y_pred[i] = sigmoid( n, alpha, &X[i*n], coef, beta, intercept );
}
printf("Printing results...\n");
for ( int i = 0; i < m; i++ )
{
//printf("%f\n", y_pred[i]);
if (y_pred[i] > 0.5)
{
pct_class_1 += 1.0;
}
// Troubleshooting print
if (i < 10 || i > m - 10)
{
printf("%g\n", y_pred[i]);
}
}
printf("Percent class 1: %f", pct_class_1 / (double)m);
return 0;
}
For reference, here is my (presumably) equivalent Python code, which returns the correct percent of identified classes at more than 50,000 observations:
import numpy as np
import time
def sigmoid(x):
return 1 / (1 + np.exp(-x))
class LogisticRegressor:
def __init__(self, eta, init_runs, fit_intercept=True):
self.eta = eta
self.init_runs = init_runs
self.fit_intercept = fit_intercept
def fit(self, x, y):
m, n = x.shape
self.coef = np.zeros((n, 1))
self.intercept = np.zeros((1, 1))
for epoch in range(self.init_runs):
for run in range(m):
idx = np.random.randint(0, m)
x_i = x[idx:idx+1, :]
y_i = y[idx]
y_pred_i = sigmoid(x_i.dot(self.coef) + self.intercept)
gradient_w = -(x_i.T * (y_i - y_pred_i))
self.coef -= self.eta * gradient_w
if self.fit_intercept:
gradient_b = -(y_i - y_pred_i)
self.intercept -= self.eta * gradient_b
def predict_proba(self, x):
m, n = x.shape
y_pred = np.ones((m, 2))
y_pred[:,1:2] = sigmoid(x.dot(self.coef) + self.intercept)
y_pred[:,0:1] -= y_pred[:,1:2]
return y_pred
def predict(self, x):
return np.round(sigmoid(x.dot(self.coef) + self.intercept))
m = 50000
n = 150
class1 = int(3.0 / 4.0 * m)
X = np.random.rand(m, n)
y = np.zeros((m, 1))
for obs in range(m):
if obs < class1:
continue
else:
y[obs,0] = 1
for obs in range(m):
if obs < class1:
X[obs, 0] = 1.0 / float(m)
else:
X[obs, 0] = float(obs) / float(m)
X[obs, 1] = float(obs) / float(m)
logit = LogisticRegressor(0.05, 20)
start_time = time.time()
logit.fit(X, y)
end_time = time.time()
print(round(end_time - start_time, 2))
y_pred = logit.predict(X)
print("Percent:", y_pred.sum() / len(y_pred))
The issue is here:
// Randomly sample an observation
idx = rand() % m;
... in light of the fact that the OP's RAND_MAX is 32767. This is exacerbated by the fact that all of the class 0 observations are at the end.
All samples will be drawn from the first 32768 observations, and when the total number of observations is greater than that, the proportion of class 0 observations among those that can be sampled is less than 0.25. At 43691 total observations, there are no class 0 observations among those that can be sampled.
As a secondary issue, rand() % m does not yield a wholly uniform distribution if m does not evenly divide RAND_MAX + 1, though the effect of this issue will be much more subtle.
Bottom line: you need a better random number generator.
At minimum, you could consider combining the bits from two calls to rand() to yield an integer with sufficient range, but you might want to consider getting a third-party generator. There are several available.
Note: OP reports "m=50,000 observations with n=150 features.", so perhaps this is not the issue for OP, but I'll leave this answer up for reference when OP tries larger tasks.
A potential issue:
long overflow
m * n * sizeof(double) risks overflow when long is 32-bit and m*n > LONG_MAX (or about 46,341 if m, n are the same).
OP does report
A first step is to perform the multiplication using size_t math where we gain at least 1 more bit in the calculation.
// m * n * sizeof(double)
sizeof(double) * m * n
Yet unless OP's size_t is more than 32-bit, we still have trouble.
IAC, I recommend to use size_t for array sizing and indexing.
Check allocations for failure too.
Since RAND_MAX may be too small and array indexing should be done using size_t math, consider a helper function to generate a random index over the entire size_t range.
// idx = rand() % m;
size_t idx = rand_size_t() % (size_t)m;
If stuck with the standard rand(), below is a helper function to extend its range as needed.
It uses the real nifty IMAX_BITS(m).
#include <assert.h>
#include <limits.h>
#include <stdint.h>
#include <stdlib.h>
// https://stackoverflow.com/a/4589384/2410359
/* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
// Test that RAND_MAX is a power of 2 minus 1
_Static_assert((RAND_MAX & 1) && ((RAND_MAX/2 + 1) & (RAND_MAX/2)) == 0, "RAND_MAX is not a Mersenne number");
#define RAND_MAX_WIDTH (IMAX_BITS(RAND_MAX))
#define SIZE_MAX_WIDTH (IMAX_BITS(SIZE_MAX))
size_t rand_size_t(void) {
size_t index = (size_t) rand();
for (unsigned i = RAND_MAX_WIDTH; i < SIZE_MAX_WIDTH; i += RAND_MAX_WIDTH) {
index <<= RAND_MAX_WIDTH;
index ^= (size_t) rand();
}
return index;
}
Further considerations can replace the rand_size_t() % (size_t)m with a more uniform distribution.
As has been determined elsewhere, the problem is due to the implementation's RAND_MAX value being too small.
Assuming 32-bit ints, a slightly better PRNG function can be implemented in the code, such as this C implementation of the minstd_rand() function from C++:
#define MINSTD_RAND_MAX 2147483646
// Code assumes `int` is at least 32 bits wide.
static unsigned int minstd_seed = 1;
static void minstd_srand(unsigned int seed)
{
seed %= 2147483647;
// zero seed is bad!
minstd_seed = seed ? seed : 1;
}
static int minstd_rand(void)
{
minstd_seed = (unsigned long long)minstd_seed * 48271 % 2147483647;
return (int)minstd_seed;
}
Another problem is that expressions of the form rand() % m produce a biased result when m does not divide (unsigned int)RAND_MAX + 1. Here is an unbiased function that returns a random integer from 0 to le inclusive, making use of the minstd_rand() function defined earlier:
static int minstd_rand_max(int le)
{
int r;
if (le < 0)
{
r = le;
}
else if (le >= MINSTD_RAND_MAX)
{
r = minstd_rand();
}
else
{
int rm = MINSTD_RAND_MAX - le + MINSTD_RAND_MAX % (le + 1);
while ((r = minstd_rand()) > rm)
{
}
r /= (rm / (le + 1) + 1);
}
return r;
}
(Actually, it does still have a very small bias because minstd_rand() will never return 0.)
For example, replace rand() % 100 with minstd_rand_max(99), and replace rand() % m with minstd_rand_max(m - 1). Also replace srand(random_seed) with minstd_srand(random_seed).
I need to find out the solution of the expression presented on the picture (110), but can't define the exact formula which would satisfy the conditions.
That's the code I have, but it seems wrong and not finished:
int n = 1, i = 1, x = 1;
float j, k, z, result;
while (i<51)
{
z = n+2;
x = z+2;
k = 1./x;
n+=2;
j = 1./z+k;
i++;
}
result = 1./(1+j);
printf("\nThe result is: %f", result);
}
I would be very grateful for pointing out the mistakes!
Working from inside to outside, each step is the reciprocal of i plus the previous step, where i runs from 103 to 1 in steps of −2, and we start with a “previous step” of 0:
#include <stdio.h>
int main(void)
{
double x = 0;
for (int i = 103; 1 <= i; i -= 2)
x = 1/(i + x);
printf("%g\n", x);
}
I am fighting some simple question.
I want to get prime numbers
I will use this algorithm
and... I finished code writing like this.
int k = 0, x = 1, n, prim, lim = 1;
int p[100000];
int xCount=0, limCount=0, kCount=0;
p[0] = 2;
scanf("%d", &n);
start = clock();
do
{
x += 2; xCount++;
if (sqrt(p[lim]) <= x)
{
lim++; limCount++;
}
k = 2; prim = true;
while (prim && k<lim)
{
if (x % p[k] == 0)
prim = false;
k++; kCount++;
}
if (prim == true)
{
p[lim] = x;
printf("prime number : %d\n", p[lim]);
}
} while (k<n);
I want to check how much repeat this code (x+=2; lim++; k++;)
so I used xCount, limCount, kCount variables.
when input(n) is 10, the results are x : 14, lim : 9, k : 43. wrong answer.
answer is (14,3,13).
Did I write code not well?
tell me correct point plz...
If you want to adapt an algorithm to your needs, it's always a good idea to implement it verbatim first, especially if you have pseudocode that is detailed enough to allow for such a verbatim translation into C-code (even more so with Fortran but I digress)
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main (void){
// type index 1..n
int index;
// var
// x: integer
int x;
//i, k, lim: integer
int i, k, lim;
// prim: boolean
bool prim;
// p: array[index] of integer {p[i] = i'th prime number}
/*
We cannot do that directly, we need to know the value of "index" first
*/
int res;
res = scanf("%d", &index);
if(res != 1 || index < 1){
fprintf(stderr,"Only integral values >= 1, please. Thank you.\n");
return EXIT_FAILURE;
}
/*
The array from the pseudocode is a one-based array, take care
*/
int p[index + 1];
// initialize the whole array with distinguishable values in case of debugging
for(i = 0;i<index;i++){
p[i] = -i;
}
/*
Your variables
*/
int lim_count = 0, k_count = 0;
// begin
// p[1] = 2
p[1] = 2;
// write(2)
puts("2");
// x = 1
x = 1;
// lim = 1
lim = 1;
// for i:=2 to n do
for(i = 2;i < index; i++){
// repeat (until prim)
do {
// x = x + 2
x += 2;
// if(sqr(p[lim]) <= x) then
if(p[lim] * p[lim] <= x){
// lim = lim +1
lim++;
lim_count++;
}
// k = 2
k = 2;
// prim = true
prim = true;
// while (prim and (k < lim)) do
while (prim && (k < lim)){
// prim = "x is not divisible by p[k]"
if((x % p[k]) == 0){
prim = false;
}
// k = k + 1
k++;
k_count++;
}
// (repeat) until prim
} while(!prim);
// p[i] := x
p[i] = x;
// write(x)
printf("%d\n",x);
}
// end
printf("x = %d, lim_count = %d, k_count = %d \n",x,lim_count,k_count);
for(i = 0;i<index;i++){
printf("%d, ",p[i]);
}
putchar('\n');
return EXIT_SUCCESS;
}
It will print an index - 1 number of primes starting at 2.
You can easily change it now--for example: print only the primes up to index instead of index - 1 primes.
In your case the numbers for all six primes up to 13 gives
x = 13, lim_count = 2, k_count = 3
which is distinctly different from the result you want.
Your translation looks very sloppy.
for i:= 2 to n do begin
must translate to:
for (i=2; i<=n; i++)
repeat
....
until prim
must translate to:
do {
...
} while (!prim);
The while prim... loop is inside the repeat...until prim loop.
I leave it to you to apply this to your code and to check that all constructs have been properly translated. it doesn't look too difficult to do that correctly.
Note: it looks like the algorithm uses 1-based arrays whereas C uses 0-based arrays.
I've been following the guide my prof gave us, but I just can't find where I went wrong. I've also been going through some other questions about implementing the Taylor Series in C.
Just assume that RaiseTo(raise a number to the power of x) is there.
double factorial (int n)
{
int fact = 1,
flag;
for (flag = 1; flag <= n; flag++)
{
fact *= flag;
}
return flag;
}
double sine (double rad)
{
int flag_2,
plusOrMinus2 = 0; //1 for plus, 0 for minus
double sin,
val2 = rad,
radRaisedToX2,
terms;
terms = NUMBER_OF_TERMS; //10 terms
for (flag_2 = 1; flag_2 <= 2 * terms; flag_2 += 2)
{
radRaisedToX2 = RaiseTo(rad, flag_2);
if (plusOrMinus2 == 0)
{
val2 -= radRaisedToX2/factorial(flag_2);
plusOrMinus2++; //Add the next number
}
else
{
val2 += radRaisedToX2/factorial(flag_2);
plusOrMinus2--; //Subtract the next number
}
}
sin = val2;
return sin;
}
int main()
{
int degree;
scanf("%d", °ree);
double rad, cosx, sinx;
rad = degree * PI / 180.00;
//cosx = cosine (rad);
sinx = sine (rad);
printf("%lf \n%lf", rad, sinx);
}
So during the loop, I get the rad^x, divide it by the factorial of the odd number series starting from 1, then add or subtract it depending on what's needed, but when I run the program, I get outputs way above one, and we all know that the limits of sin(x) are 1 and -1, I'd really like to know where I went wrong so I could improve, sorry if it's a pretty bad question.
Anything over 12! is larger than can fit into a 32-bit int, so such values will overflow and therefore won't return what you expect.
Instead of computing the full factorial each time, take a look at each term in the sequence relative to the previous one. For any given term, the next one is -((x*x)/(flag_2*(flag_2-1)) times the previous one. So start with a term of x, then multiply by that factor for each successive term.
There's also a trick to calculating the result to the precision of a double without knowing how many terms you need. I'll leave that as an exercise to the reader.
In the function factorial you are doing an int multiply before assigned to the double return value of the function. Factorials can easily break the int range, such as 20! = 2432902008176640000.
You also returned the wrong variable - the loop counter!
Please change the local variable to double, as
double factorial (int n)
{
double fact = 1;
int flag;
for (flag = 1; flag <= n; flag++)
{
fact *= flag;
}
return fact; // it was the wrong variable, and wrong type
}
Also there is not even any need for a factorial calculation. Note that each term of the series multiplies the previous term by rad and divides by the term number - with a change of sign.
Another fairly naive, 5-minute approach involves computing a look-up table that contains the first 20 or so factorials, i.e 1! .. 20! This requires very little memory and can increase speed over the 'each-time' computation method. A further optimization can easily be realized in the function that pre-computes the factorials, taking advantage of the relationship each has to the previous one.
An approach that efficiently eliminated branching (if X do Y else do Z) in the loops of the two trig functions would provide yet more speed again.
C code
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
const int nMaxTerms=20;
double factorials[nMaxTerms];
double factorial(int n)
{
if (n==1)
return 1;
else
return (double)n * factorial(n - 1.0);
}
void precalcFactorials()
{
for (int i=1; i<nMaxTerms+1; i++)
{
factorials[i-1] = factorial(i);
}
}
/*
sin(x) = x - (x^3)/3! + (x^5)/5! - (x^7)/7! .......
*/
double taylorSine(double rads)
{
double result = rads;
for (int curTerm=1; curTerm<=(nMaxTerms/2)-1; curTerm++)
{
double curTermValue = pow(rads, (curTerm*2)+1);
curTermValue /= factorials[ curTerm*2 ];
if (curTerm & 0x01)
result -= curTermValue;
else
result += curTermValue;
}
return result;
}
/*
cos(x) = 1 - (x^2)/2! + (x^4)/4! - (x^6)/6! .......
*/
double taylorCos(double rads)
{
double result = 1.0;
for (int curTerm=1; curTerm<=(nMaxTerms/2)-1; curTerm++)
{
double curTermValue = pow(rads, (curTerm*2) );
curTermValue /= factorials[ (curTerm*2) - 1 ];
if (curTerm & 0x01)
result -= curTermValue;
else
result += curTermValue;
}
return result;
}
int main()
{
precalcFactorials();
printf("Math sin(0.5) = %f\n", sin(0.5));
printf("taylorSin(0.5) = %f\n", taylorSine(0.5));
printf("Math cos(0.5) = %f\n", cos(0.5));
printf("taylorCos(0.5) = %f\n", taylorCos(0.5));
return 0;
}
output
Math sin(0.5) = 0.479426
taylorSin(0.5) = 0.479426
Math cos(0.5) = 0.877583
taylorCos(0.5) = 0.877583
Javascript
Implemented in javascript, the code produces seemingly identical results (I didn't test very much) to the inbuilt Math library when summing just 7 terms in the sin/cos functions.
window.addEventListener('load', onDocLoaded, false);
function onDocLoaded(evt)
{
console.log('starting');
for (var i=1; i<21; i++)
factorials[i-1] = factorial(i);
console.log('calculated');
console.log(" Math.cos(0.5) = " + Math.cos(0.5));
console.log("taylorCos(0.5) = " + taylorCos(0.5));
console.log('-');
console.log(" Math.sin(0.5) = " + Math.sin(0.5));
console.log("taylorSine(0.5) = " + taylorSine(0.5));
}
var factorials = [];
function factorial(n)
{
if (n==1)
return 1;
else
return n * factorial(n-1);
}
/*
sin(x) = x - (x^3)/3! + (x^5)/5! - (x^7)/7! .......
*/
function taylorSine(x)
{
var result = x;
for (var curTerm=1; curTerm<=7; curTerm++)
{
var curTermValue = Math.pow(x, (curTerm*2)+1);
curTermValue /= factorials[ curTerm*2 ];
if (curTerm & 0x01)
result -= curTermValue;
else
result += curTermValue;
}
return result;
}
/*
cos(x) = 1 - (x^2)/2! + (x^4)/4! - (x^6)/6! .......
*/
function taylorCos(x)
{
var result = 1.0;
for (var curTerm=1; curTerm<=7; curTerm++)
{
var curTermValue = Math.pow(x, (curTerm*2));
curTermValue /= factorials[ (curTerm*2)-1 ];
if (curTerm & 0x01)
result -= curTermValue;
else
result += curTermValue;
}
return result;
}
I have a simple (brute-force) recursive solver algorithm that takes lots of time for bigger values of OpxCnt variable. For small values of OpxCnt, no problem, works like a charm. The algorithm gets very slow as the OpxCnt variable gets bigger. This is to be expected but any optimization or a different algorithm ?
My final goal is that :: I want to read all the True values in the map array by
executing some number of read operations that have the minimum operation
cost. This is not the same as minimum number of read operations.
At function completion, There should be no True value unread.
map array is populated by some external function, any member may be 1 or 0.
For example ::
map[4] = 1;
map[8] = 1;
1 read operation having Adr=4,Cnt=5 has the lowest cost (35)
whereas
2 read operations having Adr=4,Cnt=1 & Adr=8,Cnt=1 costs (27+27=54)
#include <string.h>
typedef unsigned int Ui32;
#define cntof(x) (sizeof(x) / sizeof((x)[0]))
#define ZERO(x) do{memset(&(x), 0, sizeof(x));}while(0)
typedef struct _S_MB_oper{
Ui32 Adr;
Ui32 Cnt;
}S_MB_oper;
typedef struct _S_MB_code{
Ui32 OpxCnt;
S_MB_oper OpxLst[20];
Ui32 OpxPay;
}S_MB_code;
char map[65536] = {0};
static int opx_ListOkey(S_MB_code *px_kod, char *pi_map)
{
int cost = 0;
char map[65536];
memcpy(map, pi_map, sizeof(map));
for(Ui32 o = 0; o < px_kod->OpxCnt; o++)
{
for(Ui32 i = 0; i < px_kod->OpxLst[o].Cnt; i++)
{
Ui32 adr = px_kod->OpxLst[o].Adr + i;
// ...
if(adr < cntof(map)){map[adr] = 0x0;}
}
}
for(Ui32 i = 0; i < cntof(map); i++)
{
if(map[i] > 0x0){return -1;}
}
// calculate COST...
for(Ui32 o = 0; o < px_kod->OpxCnt; o++)
{
cost += 12;
cost += 13;
cost += (2 * px_kod->OpxLst[o].Cnt);
}
px_kod->OpxPay = (Ui32)cost; return cost;
}
static int opx_FindNext(char *map, int pi_idx)
{
int i;
if(pi_idx < 0){pi_idx = 0;}
for(i = pi_idx; i < 65536; i++)
{
if(map[i] > 0x0){return i;}
}
return -1;
}
static int opx_FindZero(char *map, int pi_idx)
{
int i;
if(pi_idx < 0){pi_idx = 0;}
for(i = pi_idx; i < 65536; i++)
{
if(map[i] < 0x1){return i;}
}
return -1;
}
static int opx_Resolver(S_MB_code *po_bst, S_MB_code *px_wrk, char *pi_map, Ui32 *px_idx, int _min, int _max)
{
int pay, kmax, kmin = 1;
if(*px_idx >= px_wrk->OpxCnt)
{
return opx_ListOkey(px_wrk, pi_map);
}
_min = opx_FindNext(pi_map, _min);
// ...
if(_min < 0){return -1;}
kmax = (_max - _min) + 1;
// must be less than 127 !
if(kmax > 127){kmax = 127;}
// is this recursion the last one ?
if(*px_idx >= (px_wrk->OpxCnt - 1))
{
kmin = kmax;
}
else
{
int zero = opx_FindZero(pi_map, _min);
// ...
if(zero > 0)
{
kmin = zero - _min;
// enforce kmax limit !?
if(kmin > kmax){kmin = kmax;}
}
}
for(int _cnt = kmin; _cnt <= kmax; _cnt++)
{
px_wrk->OpxLst[*px_idx].Adr = (Ui32)_min;
px_wrk->OpxLst[*px_idx].Cnt = (Ui32)_cnt;
(*px_idx)++;
pay = opx_Resolver(po_bst, px_wrk, pi_map, px_idx, (_min + _cnt), _max);
(*px_idx)--;
if(pay > 0)
{
if((Ui32)pay < po_bst->OpxPay)
{
memcpy(po_bst, px_wrk, sizeof(*po_bst));
}
}
}
return (int)po_bst->OpxPay;
}
int main()
{
int _max = -1, _cnt = 0;
S_MB_code best = {0};
S_MB_code work = {0};
// SOME TEST DATA...
map[ 4] = 1;
map[ 8] = 1;
/*
map[64] = 1;
map[72] = 1;
map[80] = 1;
map[88] = 1;
map[96] = 1;
*/
// SOME TEST DATA...
for(int i = 0; i < cntof(map); i++)
{
if(map[i] > 0)
{
_max = i; _cnt++;
}
}
// num of Opx can be as much as num of individual bit(s).
if(_cnt > cntof(work.OpxLst)){_cnt = cntof(work.OpxLst);}
best.OpxPay = 1000000000L; // invalid great number...
for(int opx_cnt = 1; opx_cnt <= _cnt; opx_cnt++)
{
int rv;
Ui32 x = 0;
ZERO(work); work.OpxCnt = (Ui32)opx_cnt;
rv = opx_Resolver(&best, &work, map, &x, -42, _max);
}
return 0;
}
You can use dynamic programming to calculate the lowest cost that covers the first i true values in map[]. Call this f(i). As I'll explain, you can calculate f(i) by looking at all f(j) for j < i, so this will take time quadratic in the number of true values -- much better than exponential. The final answer you're looking for will be f(n), where n is the number of true values in map[].
A first step is to preprocess map[] into a list of the positions of true values. (It's possible to do DP on the raw map[] array, but this will be slower if true values are sparse, and cannot be faster.)
int pos[65536]; // Every position *could* be true
int nTrue = 0;
void getPosList() {
for (int i = 0; i < 65536; ++i) {
if (map[i]) pos[nTrue++] = i;
}
}
When we're looking at the subproblem on just the first i true values, what we know is that the ith true value must be covered by a read that ends at i. This block could start at any position j <= i; we don't know, so we have to test all i of them and pick the best. The key property (Optimal Substructure) that enables DP here is that in any optimal solution to the i-sized subproblem, if the read that covers the ith true value starts at the jth true value, then the preceding j-1 true values must be covered by an optimal solution to the (j-1)-sized subproblem.
So: f(i) = min(f(j) + score(pos(j+1), pos(i)), with the minimum taken over all 1 <= j < i. pos(k) refers to the position of the kth true value in map[], and score(x, y) is the score of a read from position x to position y, inclusive.
int scores[65537]; // We effectively start indexing at 1
scores[0] = 0; // Covering the first 0 true values requires 0 cost
// Calculate the minimum score that could allow the first i > 0 true values
// to be read, and store it in scores[i].
// We can assume that all lower values have already been calculated.
void calcF(int i) {
int bestStart, bestScore = INT_MAX;
for (int j = 0; j < i; ++j) { // Always executes at least once
int attemptScore = scores[j] + score(pos[j + 1], pos[i]);
if (attemptScore < bestScore) {
bestStart = j + 1;
bestScore = attemptScore;
}
}
scores[i] = bestScore;
}
int score(int i, int j) {
return 25 + 2 * (j + 1 - i);
}
int main(int argc, char **argv) {
// Set up map[] however you want
getPosList();
for (int i = 1; i <= nTrue; ++i) {
calcF(i);
}
printf("Optimal solution has cost %d.\n", scores[nTrue]);
return 0;
}
Extracting a Solution from Scores
Using this scheme, you can calculate the score of an optimal solution: it's simply f(n), where n is the number of true values in map[]. In order to actually construct the solution, you need to read back through the table of f() scores to infer which choice was made:
void printSolution() {
int i = nTrue;
while (i) {
for (int j = 0; j < i; ++j) {
if (scores[i] == scores[j] + score(pos[j + 1], pos[i])) {
// We know that a read can be made from pos[j + 1] to pos[i] in
// an optimal solution, so let's make it.
printf("Read from %d to %d for cost %d.\n", pos[j + 1], pos[i], score(pos[j + 1], pos[i]));
i = j;
break;
}
}
}
}
There may be several possible choices, but all of them will produce optimal solutions.
Further Speedups
The solution above will work for an arbitrary scoring function. Because your scoring function has a simple structure, it may be that even faster algorithms can be developed.
For example, we can prove that there is a gap width above which it is always beneficial to break a single read into two reads. Suppose we have a read from position x-a to x, and another read from position y to y+b, with y > x. The combined costs of these two separate reads are 25 + 2 * (a + 1) + 25 + 2 * (b + 1) = 54 + 2 * (a + b). A single read stretching from x-a to y+b would cost 25 + 2 * (y + b - x + a + 1) = 27 + 2 * (a + b) + 2 * (y - x). Therefore the single read costs 27 - 2 * (y - x) less. If y - x > 13, this difference goes below zero: in other words, it can never be optimal to include a single read that spans a gap of 12 or more.
To make use of this property, inside calcF(), final reads could be tried in decreasing order of start-position (i.e. in increasing order of width), and the inner loop stopped as soon as any gap width exceeds 12. Because that read and all subsequent wider reads tried would contain this too-large gap and therefore be suboptimal, they need not be tried.