Nonlinear optimization C - c

I would like to perform a non-linear optimization algorithm using C.
The problem is:
over the five points that are in vector X.
X, Y(X), lower and upper bounds are known.
I have found the nlopt library on C but I do not know if It is possible to perform the optimization over the five discrete points.
Anything to suggest, even another library?
Thanks!

I would suggest Octave. For nonlinear programming on Octave, refer to
Octave Optimization.
You could implement using matlab-like language.
It also has C/C++ api.
See this post: How to embed the GNU Octave in C/C++ program?.
And also, this pdf

Consider optimizing matlab code instead of reimplementing algorithm in another language - matlab can be pretty fast if optimized properly (avoid using for loop, use vectorized computations, pre-allocate memory).
Take a look at http://www.mathworks.com/company/newsletters/news_notes/june07/patterns.html

Related

Does Intel MKL or some similar library provide a vectorized way to count the number of elements in an array fulfilling some condition in C?

The problem
I'm working on implementing and refining an optimization algorithm with some fairly large arrays (from tens of millions of floats and up) and using mainly Intel MKL in C (not C++, at least not so far) to squeeze out every possible bit of performance. Now I've run into a silly problem - I have a parameter that sets maxima and minima for subsets of a set of (tens of millions) of coefficients. Actually applying these maxima and minima using MKL functions is easy - I can create equally-sized vectors with the limits for every element and use V?Fmax and V?Fmin to apply them. But I also need to account for this clipping in my error metric, which requires me to count the number of elements that fall outside these constraints.
However, I can't find an MKL function that allows me to do things like counting the number of elements that fulfill some condition, the way you can create and sum logical arrays with e.g. NumPy in Python or in MATLAB. Irritatingly, when I try to google this question, I only get answers relating to Python and R.
Obviously I can just write a loop that increments a counter for each element that fulfills one of the conditions, but if there is an already optimized implementation that allows me to achieve this, I would much prefer that just owing to the size of my arrays.
Does anyone know of a clever way to achieve this robustly and very efficiently using Intel MKL (maybe with the statistics toolbox or some creative use of elementary functions?), a similarly optimized library that does this, or a highly optimized way to hand-code this? I've been racking my brain trying to come up with some out-of-the box method, but I'm coming up empty.
Note that it's necessary for me to be able to do this in C, that it's not viable for me to shift this task to my Python frontend, and that it is indeed necessary for me to code this particular subprogram in C in the first place.
Thanks!
If you were using c++, count_if from the algorithms library with an execution policy of par_unseq may parallelize and vectorize the count. On Linux at least, it typically uses Intel TBB to do this.
It's not likely to be as easy in c. Because c doesn't have concepts like templates, callables or lambdas, the only way to specialize a generic (library-provided) count()-function would be to pass a function pointer as a callback (like qsort() does). Unless the compiler manages to devirtualize and inline the callback, you can't vectorize at all, leaving you with (possibly thread parallelized) scalar code. OTOH, if you use for example gcc vector intrinsics (my favourite!), you get vectorization but not parallelization. You could try to combine the approaches, but I'd say get over yourself and use c++.
However, if you only need vectorization, you can almost certainly just write sequential code and have the compiler autovectorize, unless the predicate for what should be counted is poorly written, or your compiler is braindamaged.
For example. gcc vectorizes the code on x86 if at least sse4 instructions are available (-msse4). With AVX[2/512] (-mavx / -mavx2 / -mavx512f) you can get wider vectors to do more elements at once. In general, if you're compiling on the same hardware you will be running the program on, I'd recommend letting gcc autodetect the optimal instruction set extensions (-march=native).
Note that in the provided code, the conditions should not use short-circuiting or (||), because then the read from the max-vector is semantically forbidden if the comparison with the min-vector was already true for the current element, severely hindering vectorization (though avx512 could potentially vectorize this with somewhat catastrophic slowdown).
I'm pretty sure gcc is not nearly optimal in the code it generates for avx512, since it could do the k-reg (mask register) or in the mask registers with kor[b/w/d/q], but maybe somebody with more experience in avx512 (*cougth* Peter Cordes *cough*) could weigh in on that.
MKL doesn't provide such functions but You may try to check another performance library - IPP which contains a set of threshold functions that could be useful to your case. Please refer to the IPP Developer Reference to check more details - https://software.intel.com/content/www/us/en/develop/documentation/ipp-dev-reference/top/volume-1-signal-and-data-processing/essential-functions/conversion-functions/threshold.html

Checking C code according to Matlab

I have a linear algebra algorithm written in Matlab and this code runs correctly.
I have implemented the same algorithm in C for obtaining better performance.
However, my C code has bugs.
In order to find bugs, I visually compare matrices generated by the two implementations at each step of the algorithm.
Is there a better way to compare output of Matlab and C codes? I have to work close to machine precision.

Any way to vectorize in C

My question may seem primitive or dumb because, I've just switched to C.
I have been working with MATLAB for several years and I've learned that any computation should be vectorized in MATLAB and I should avoid any for loop to get an acceptable performance.
It seems that if I want to add two vectors, or multiply matrices, or do any other matrix computation, I should use a for loop.
It is appreciated if you let me know whether or not there is any way to do the computations in a vectorized sense, e.g. reading all elements of a vector using only one command and adding those elements to another vector using one command.
Thanks
MATLAB suggests you to avoid any for loop because most of the operations available on vectors and matrices are already implements in its API and ready to be used. They are probably optimized and they work directly on underlying data instead that working at MATLAB language level, a sort of opaque implementation I guess.
Even MATLAB uses for loops underneath to implement most of its magic (or delegates them to highly specialized assembly instructions or through CUDA to the GPU).
What you are asking is not directly possible, you will need to use loops to work on vectors and matrices, actually you would search for a library which allows you to do most of the work without directly using a for loop but by using functions already defined that wraps them.
As it was mentioned, it is not possible to hide the for loops. However, I doubt that the code MATLAB produces is in any way faster the the one produced by C. If you compile your C code with the -O3 it will try to use every hardware feature your computer has available, such as SIMD extensions and multiple issue. Moreover, if your code is good and it doesn't cause too many pipeline stalls and you use the cache, it will be really fast.
But i think what you are looking for are some libraries, search google for LAPACK or BLAS, they might be what you are looking for.
In C there is no way to perform operations in a vectorized way. You can use structures and functions to abstract away the details of operations but in the end you will always be using fors to process your data.
As for speed C is a compiled language and you will not get a performance hit from using for loops in C. C has the benefit (compared to MATLAB) that it does not hide anything from you, so you can always see where your time is being used. On the downside you will notice that things that MATLAB makes trivial (svd,cholesky,inv,cond,imread,etc) are challenging in C.

C API Library for Solving Sparse Systems of Linear Equations?

I need to solve a large, sparse system of linear equations from a program written in D. Ideally I'd like a library with a D-style interface, but I doubt one exists. However, D can directly access C APIs. Therefore, please suggest some libraries that solve large, sparse systems of linear equations with the following characteristics:
Exposes a C API.
Free/open source. Preferably non-copyleft, too, but this is not a hard requirement.
Well-tested and debugged. Easy to set up and use. Not written by academics just to get a paper on their method and then completely unmaintained.
The classical library for sparse problem is suite-sparse. You have packages on many systems. Matlab uses it internally.
My bad, I tangle LAPACK that I used old time ago and ARPACK that I used more time ago.
Here is link to arpack http://www.caam.rice.edu/~kristyn/parpack_home.html:
The package is designed to compute a few eigenvalues and corresponding eigenvectors of a general n by n matrix A. It is most appropriate for large sparse or structured matrices.
And here link with comparison of libaries for Linear Algebra:
http://www.netlib.org/utk/people/JackDongarra/la-sw.html
you can find there SparseLib++, mentioned here arpack and much more libaries in matrix form.
There is a dedicated package called CSPARSE, and it's written in C. It seems that the implemention is based on [david2006direct].
https://people.sc.fsu.edu/~jburkardt/c_src/csparse/csparse.html
Davis, T. A. (2006). Direct methods for sparse linear systems. Society for Industrial and Applied Mathematics.

Vectorized Trig functions in C?

I'm looking to calculate highly parallelized trig functions (in block of like 1024), and I'd like to take advantage of at least some of the parallelism that modern architectures have.
When I compile a block
for(int i=0; i<SIZE; i++) {
arr[i]=sin((float)i/1024);
}
GCC won't vectorize it, and says
not vectorized: relevant stmt not supported: D.3068_39 = __builtin_sinf (D.3069_38);
Which makes sense to me. However, I'm wondering if there's a library to do parallel trig computations.
With just a simple taylor series up the 11th order, GCC will vectorize all the loops, and I'm getting speeds over twice as fast as a naive sin loop (with bit-exact answers, or with 9th order series, only a single bit off for the last two out of 1600 values, for a >3x speedup). I'm sure someone has encountered a problem like this before, but when I google, I find no mentions of any libraries or the like.
A. Is there something existing already?
B. If not, advice for optimizing parallel trig functions?
EDIT: I found the following library called "SLEEF": http://shibatch.sourceforge.net/ which is described in this paper and uses SIMD instructions to calculate several elementary functions. It uses SSE and AVX specific code, but I don't think it will be hard to turn it into standard C loops.
Since you said you were using GCC it looks like there are some options:
http://gruntthepeon.free.fr/ssemath/
This uses SSE and SSE2 instructions to implement it.
http://www.gamasutra.com/view/feature/4248/designing_fast_crossplatform_simd_.php
This has an alternate implementation. Some of the comments are pretty good.
That said, I'd probably look into GPGPU for a solution. Maybe writing it in CUDA or OpenCL (If I remember correctly CUDA supports the sine function). Here are some libraries that look like they might make it easier.
https://code.google.com/p/slmath/
https://code.google.com/p/thrust/
Since you are looking to calculate harmonics here, I have some code that addressed a similar problem. It is vectorized already and faster than anything else I have found. As a side benefit, you get the cosine for free.
What platform are you using? Many libraries of this sort already exist:
Intel's provides the Vector Math Library (VML) with icc.
Apple provides the vForce library as part of the Accelerate framework.
HP provides their own Vector Math Library for Itanium (and may other architectures, too).
Sun provided libmvec with their compiler tools.
...
Instead of the taylor series, I would look at the algorithms fdlibm uses. They should get you as much precision with fewer steps.
My answer was to create my own library to do exactly this called vectrig: https://github.com/jeremysalwen/vectrig

Resources