I would like to use the SolveWithGuess() function to solve a system of linear equations, starting from a good approximation. However, when I tried to do a test with an initial guess having small perturbations (1.e-9*i; 0 <= i <= 20) compared to the true solution, I received the following error values during the conjugate gradient iterations:
0 1.922e-09
1 3.694e-09
2 7.101e-09
3 1.365e-08
4 2.623e-08
5 5.043e-08
6 9.692e-08
7 1.863e-07
8 3.581e-07
9 6.882e-07
10 1.323e-06
Could you please what could be the problem? My test code is the following:
#include <iostream>
#include <iomanip>
#include <stdio.h>
#include <Eigen/Eigen>
using namespace Eigen;
using namespace std;
void solve()
{
int n = 20;
typedef SparseMatrix<double, ColMajor> SM;
typedef Matrix<double, -1, 1> DV;
SM a(n,n);
DV b(n), x(n);
for (int i = 0; i < n; i++)
{
b[i] = double(i);
for (int j = 0; j < n; j++) a.insert(i, j) = 0.0;
a.coeffRef(i, n - i - 1) = 1.0;
}
for (int i = 0; i < n; i++) x[i] = double(n - i - 1) + 1.e-9 * double(i);
ConjugateGradient<SM> cg;
cg.setMaxIterations(1);
cg.compute(a);
for (int it = 0; it < 100; it++)
{
cg.compute(a);
x = cg.solveWithGuess(b,x);
cout << it << " " << scientific << setw(10) << setprecision(3) << cg.error() << endl;
}
}
There are actually two problems: Your matrix is not positive definite, i.e., you should better use any of the other iterative solvers. But more importantly, if you set the maximum number of iterations to 1 and call solveWithGuess iteratively you actually do something like a gradient descent (i.e., it does not keep the previous search direction), which happens to behave terrible with your matrix.
If you are actually interested what happens "inside" an iterative solver, you need to insert debug-code into the corresponding _solve_impl method (or re-implement the solver accordingly)
Related
I know that e.g. doing this
if (rand() % 2 == 0) value = 0;
else value = (float)rand()/(float)(RAND_MAX/(abs(rand())));
will generate roughly 50% zeroes and 50% other values. But are there other implementations so that I can set the sparsity arbitrarily e.g. to 42% or so.
Since C++ 11, std::bernoulli_distribution from the <random> header can be used.
#include <random>
#include <iostream>
int main() {
std::random_device rd;
std::default_random_engine e(rd());
std::bernoulli_distribution d(0.42); // 42% probability of true
std::cout << std::boolalpha;
for (int i = 0; i < 10; ++i)
std::cout << d(e) << '\n';
return 0;
}
You're getting 50/50 because you're using '%2'.
If you want something like 42 out of 100 values to be non-zero, then all you need is
val = ( rand() % 100 < 42 ) ? rand() : 0;
If you need 42.5%, for instance, simply use '1000' and '425' instead.
Figure out exactly how many zeroes do you want.
Fill that many elements of your arry with zeroes.
Fill the rest with the random values (rand() + 1 to make sure there is no unwanted zeroes).
Shuffle the resulting array.
So I just can't undestand how you do this. You input like N=2 and S=3, which means how many numbers made out of 2 digits, have their sum =3? Like 12 => 1+2= 3; for N=2 and S=3 , there are 3 numbers: 12,30,21.
I don't really know dynamic programming too well. How are you supposed to think this algorithm and the ones like this one?
Just Think about a brute force / Backtrack solution. How would you write that, which states do you need and which states can be dropped?
For this particular problem what you can do is insert a digit at a position starting from the most significant digit and move on to the next position keeping track of the sum. After inserting all the digits, if the sum is zero then you increment your answer by 1.
So the Backtrack solution will be something like this
void f(int pos,int sum) {
if(pos == n) {
if(!sum) ans += 1;
return;
}
for(int i = 0 ; i < 10 ; i++)
f(pos + 1 , sum - i);
}
If you can come up with the backTrack solution then already 90% of your work is done. For dynamic programming, you just have to save the states so that you will not have to recalculate a state twice. Also, think about the corner cases and base cases. In our backTrack solution, we have a bug. We should not insert 0 at the most significant digits.
Fixing all of that The code should look something like this
#define maxN 1000
#define maxSum 1000
#define i64 long long int
#define mod 1000000007
int n;
i64 dp[maxN][maxSum];
i64 f(int pos,int sum) {
if(sum < 0) return 0;
if(pos == n)
{
if(!sum) return 1;
return 0;
}
if( dp[pos][sum] != -1)
return dp[pos][sum];
int lo = 0;
if(!pos)
lo = 1;
i64 ans = 0;
for(int i = lo ; i < 10 ; i++)
ans += f(pos + 1, sum - i);
ans %= mod;
return dp[pos][sum] = ans;
}
int main()
{
int s;
cin >> n >> s;
memset(dp,-1,sizeof dp);
cout << f(0,s) << endl;
return 0;
}
The time complexity of the code is O(maxN * maxSum * 10)
You can find a more optimized solution online. But Once you learn a bit more about dynamic programming, You will realize that coming up with a dp solution is much faster and easier than other solutions. Happy Coding.
I'm trying to generate a random floating point number in between 0 and 1 (whether it's on [0,1] or [0,1) shouldn't matter for me). Every question online about this seems to involves the rand() call, seeded with time(NULL), but I want to be able to invoke my program more than once a second and get different random numbers every time. This lead me to the getrandom syscall in Linux, which pulls from /dev/urandom. I came up with this:
#include <stdio.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <stdint.h>
int main() {
uint32_t r = 0;
for (int i = 0; i < 20; i++) {
syscall(SYS_getrandom, &r, sizeof(uint32_t), 0);
printf("%f\n", ((double)r)/UINT32_MAX);
}
return 0;
}
My question is simply whether or not I'm doing this correctly. It appears to work, but I'm worried that I'm misusing something, and there are next to no examples using getrandom() online.
OP has 2 issues:
How to started the sequence very randomly.
How to generate a double on the [0...1) range.
The usual method is to take a very random source like /dev/urandom or the result from the syscall() or maybe even seed = time() ^ process_id; and seed via srand(). Then call rand() as needed.
Below includes a quickly turned method to generate a uniform [0.0 to 1.0) (linear distribution). But like all random generating functions, really good ones are base on extensive study. This one simply calls rand() a few times based on DBL_MANT_DIG and RAND_MAX,
[Edit] Original double rand_01(void) has a weakness in that it only generates a 2^52 different doubles rather than 2^53. It has been amended. Alternative: a double version of rand_01_ld(void) far below.
#include <assert.h>
#include <float.h>
#include <limits.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
double rand_01(void) {
assert(FLT_RADIX == 2); // needed for DBL_MANT_DIG
unsigned long long limit = (1ull << DBL_MANT_DIG) - 1;
double r = 0.0;
do {
r += rand();
// Assume RAND_MAX is a power-of-2 - 1
r /= (RAND_MAX/2 + 1)*2.0;
limit = limit / (RAND_MAX/2 + 1) / 2;
} while (limit);
// Use only DBL_MANT_DIG (53) bits of precision.
if (r < 0.5) {
volatile double sum = 0.5 + r;
r = sum - 0.5;
}
return r;
}
int main(void) {
FILE *istream = fopen("/dev/urandom", "rb");
assert(istream);
unsigned long seed = 0;
for (unsigned i = 0; i < sizeof seed; i++) {
seed *= (UCHAR_MAX + 1);
int ch = fgetc(istream);
assert(ch != EOF);
seed += (unsigned) ch;
}
fclose(istream);
srand(seed);
for (int i=0; i<20; i++) {
printf("%f\n", rand_01());
}
return 0;
}
If one wanted to extend to an even wider FP, unsigned wide integer types may be insufficient. Below is a portable method that does not have that limitation.
long double rand_01_ld(void) {
// These should be calculated once rather than each function call
// Leave that as a separate implementation problem
// Assume RAND_MAX is power-of-2 - 1
assert((RAND_MAX & (RAND_MAX + 1U)) == 0);
double rand_max_p1 = (RAND_MAX/2 + 1)*2.0;
unsigned BitsPerRand = (unsigned) round(log2(rand_max_p1));
assert(FLT_RADIX != 10);
unsigned BitsPerFP = (unsigned) round(log2(FLT_RADIX)*LDBL_MANT_DIG);
long double r = 0.0;
unsigned i;
for (i = BitsPerFP; i >= BitsPerRand; i -= BitsPerRand) {
r += rand();
r /= rand_max_p1;
}
if (i) {
r += rand() % (1 << i);
r /= 1 << i;
}
return r;
}
If you need to generate doubles, the following algorithm could be of use:
CPython generates random numbers using the following algorithm (I changed the function name, typedefs and return values, but algorithm remains the same):
double get_random_double() {
uint32_t a = get_random_uint32_t() >> 5;
uint32_t b = get_random_uint32_t() >> 6;
return (a * 67108864.0 + b) * (1.0 / 9007199254740992.0);
}
The source of that algorithm is a Mersenne Twister 19937 random number generator by Takuji Nishimura and Makoto Matsumoto. Unfortunately the original link mentioned in the source is not available for download any longer.
The comment on this function in CPython notes the following:
[this function] is the function named genrand_res53 in the original code;
generates a random number on [0,1) with 53-bit resolution; note that
9007199254740992 == 2**53; I assume they're spelling "/2**53" as
multiply-by-reciprocal in the (likely vain) hope that the compiler will
optimize the division away at compile-time. 67108864 is 2**26. In
effect, a contains 27 random bits shifted left 26, and b fills in the
lower 26 bits of the 53-bit numerator.
The orginal code credited Isaku Wada for this algorithm, 2002/01/09
Simplifying from that code, if you want to create a float fast, you should mask the bits of uint32_t with (1 << FLT_MANT_DIG) - 1 and divide by (1 << FLT_MANT_DIG) to get the proper [0, 1) interval:
#include <stdio.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <stdint.h>
#include <float.h>
int main() {
uint32_t r = 0;
float result;
for (int i = 0; i < 20; i++) {
syscall(SYS_getrandom, &r, sizeof(uint32_t), 0);
result = (float)(r & ((1 << FLT_MANT_DIG) - 1)) / (1 << FLT_MANT_DIG);
printf("%f\n", result);
}
return 0;
}
Since it can be assumed that your Linux has a C99 compiler, we can use ldexpf instead of that division:
#include <math.h>
result = ldexpf(r & ((1 << FLT_MANT_DIG) - 1), -FLT_MANT_DIG);
To get the closed interval [0, 1], you can do the slightly less efficient
result = ldexpf(r % (1 << FLT_MANT_DIG), -FLT_MANT_DIG);
To generate lots of good quality random numbers fast, I'd just use the system call to fetch enough data to seed a PRNG or CPRNG, and proceed from there.
What is the best way to manipulate indexing in Armadillo? I was under the impression that it heavily used template expressions to avoid temporaries, but I'm not seeing these speedups.
Is direct array indexing still the best way to approach calculations that rely on consecutive elements within the same array?
Keep in mind, that I hope to parallelise these calculations in the future with TBB::parallel_for (In this case, from a maintainability perspective, it may be simpler to use direct accessing?) These calculations happen in a tight loop, and I hope to make them as optimal as possible.
ElapsedTimer timer;
int n = 768000;
int numberOfLoops = 5000;
arma::Col<double> directAccess1(n);
arma::Col<double> directAccess2(n);
arma::Col<double> directAccessResult1(n);
arma::Col<double> directAccessResult2(n);
arma::Col<double> armaAccess1(n);
arma::Col<double> armaAccess2(n);
arma::Col<double> armaAccessResult1(n);
arma::Col<double> armaAccessResult2(n);
std::valarray<double> valArrayAccess1(n);
std::valarray<double> valArrayAccess2(n);
std::valarray<double> valArrayAccessResult1(n);
std::valarray<double> valArrayAccessResult2(n);
// Prefil
for (int i = 0; i < n; i++) {
directAccess1[i] = i;
directAccess2[i] = n - i;
armaAccess1[i] = i;
armaAccess2[i] = n - i;
valArrayAccess1[i] = i;
valArrayAccess2[i] = n - i;
}
timer.Start();
for (int j = 0; j < numberOfLoops; j++) {
for (int i = 1; i < n; i++) {
directAccessResult1[i] = -directAccess1[i] / (directAccess1[i] + directAccess1[i - 1]) * directAccess2[i - 1];
directAccessResult2[i] = -directAccess1[i] / (directAccess1[i] + directAccess1[i]) * directAccess2[i];
}
}
timer.StopAndPrint("Direct Array Indexing Took");
std::cout << std::endl;
timer.Start();
for (int j = 0; j < numberOfLoops; j++) {
armaAccessResult1.rows(1, n - 1) = -armaAccess1.rows(1, n - 1) / (armaAccess1.rows(1, n - 1) + armaAccess1.rows(0, n - 2)) % armaAccess2.rows(0, n - 2);
armaAccessResult2.rows(1, n - 1) = -armaAccess1.rows(1, n - 1) / (armaAccess1.rows(1, n - 1) + armaAccess1.rows(1, n - 1)) % armaAccess2.rows(1, n - 1);
}
timer.StopAndPrint("Arma Array Indexing Took");
std::cout << std::endl;
timer.Start();
for (int j = 0; j < numberOfLoops; j++) {
for (int i = 1; i < n; i++) {
valArrayAccessResult1[i] = -valArrayAccess1[i] / (valArrayAccess1[i] + valArrayAccess1[i - 1]) * valArrayAccess2[i - 1];
valArrayAccessResult2[i] = -valArrayAccess1[i] / (valArrayAccess1[i] + valArrayAccess1[i]) * valArrayAccess2[i];
}
}
timer.StopAndPrint("Valarray Array Indexing Took:");
std::cout << std::endl;
In vs release mode (/02 - to avoid armadillo array indexing checks), they produce the following timings:
Started Performance Analysis!
Direct Array Indexing Took: 37.294 seconds elapsed
Arma Array Indexing Took: 39.4292 seconds elapsed
Valarray Array Indexing Took:: 37.2354 seconds elapsed
Your direct code is already quite optimal, so expression templates are not going to help here.
However, you may want to make sure the optimization level in your compiler actually enables auto-vectorization (-O3 in gcc). Secondly, you can get a bit of extra speed by #define ARMA_NO_DEBUG before including the Armadillo header. This will turn off all run-time checks (such as bound checks for element access), but this is not recommended until you have completely debugged your program.
what is the complexity of the following c Function ?
double foo (int n) {
int i;
double sum;
if (n==0) return 1.0;
else {
sum = 0.0;
for (i =0; i<n; i++)
sum +=foo(i);
return sum;
}
}
Please don't just post the complexity can you help me in understanding how to go about it .
EDIT: It was an objective question asked in an exam and the Options provided were
1.O(1)
2.O(n)
3.O(n!)
4.O(n^n)
It's Θ(2^n) ( by assuming f is a running time of algorithm we have):
f(n) = f(n-1) + f(n-2) + ... + 1
f(n-1) = f(n-2) + f(n-3) + ...
==> f(n) = 2*f(n-1), f(0) = 1
==> f(n) is in O(2^n)
Actually if we ignore the constant operations, the exact running time is 2n.
Also in the case you wrote this is an exam, both O(n!) and O(n^n) are true and nearest answer to Θ(2^n) among them is O(n!), but if I was student, I'll mark both of them :)
Explanation on O(n!):
for all n >= 1: n! = n(n-1)...*2*1 >= 2*2*2*...*2 = 2^(n-1) ==>
2 * n! >= 2^n ==> 2^n is in O(n!),
Also n! <= n^n for all n >= 1 so n! is in O(n^n)
So O(n!) in your question is nearest acceptable bound to Theta(2^n)
For one, it is poorly coded :)
double foo (int n) { // foo return a double, and takes an integer parameter
int i; // declare an integer variable i, that is used as a counter below
double sum; // this is the value that is returned
if (n==0) return 1.0; // if someone called foo(0), this function returns 1.0
else { // if n != 0
sum = 0.0; // set sum to 0
for (i =0; i<n; i++) // recursively call this function n times, then add it to the result
sum +=foo(i);
return sum; // return the result
}
}
You're calling foo() a total of something like n^n (where you round n down to the nearest integer)
e.g.:
foo(3)will be called 3^3 times.
Good luck, and merry Christmas.
EDIT: oops, just corrected something. Why does foo return a double? It will always return an integer, not a double.
Here would be a better version, with micro-optimizations! :D
int foo(int n)
{
if(n==0) return 1;
else{
int sum = 0;
for(int i = 0; i < n; ++i)
sum += foo(i);
return sum;
}
}
You could have been a bit more clearer... grumble grumble
<n = ?> : <return value> : <number of times called>
n = 0 : 1 : 1
n = 1 : 1 : 2
n = 2 : 2 : 4
n = 3 : 4 : 8
n = 4 : 8 : 16
n = 5 : 16 : 32
n = 6 : 32 : 64
n = 7 : 64 : 128
n = 8 : 128 : 256
n = 9 : 256 : 512
n = 10 : 512 : 1024
number_of_times_called = pow(2, n-1);
Let's try putting in inputs, shall we?
Using this code:
#include <iostream>
double foo (int n) {
int i;
double sum;
if (n==0) return 1.0;
else {
sum = 0.0;
for (i =0; i<n; i++)
sum +=foo(i);
return sum;
}
}
int main(int argc, char* argv[])
{
for(int n = 0; 1; n++)
{
std::cout << "n = " << n << " : " << foo(n);
std::cin.ignore();
}
return(0);
}
We get:
n = 0 : 1
n = 1 : 1
n = 2 : 2
n = 3 : 4
n = 4 : 8
n = 5 : 16
n = 6 : 32
n = 7 : 64
n = 8 : 128
n = 9 : 256
n = 10 : 512
Therefore, it can be simplified to:
double foo(int n)
{
return((double)pow(2, n));
}
The function is composed of multiple parts.
The first bit of complexity is the if(n==0)return 1.0;, since that only generates one run. That would be O(1).
The next part is the for(i=0; i<n; i++) loop. Since that loops from 0..n it is O(n)
Than there is the recursion, for every number in n you run the function again. And in that function again the loop, and the next function. And so on...
To figure out what it will be I recommend you add a global ounter inside of the loop so you can see how many times it is executed for a certain number.