Implementation of Image Convolution (Image Processing) in C - c

I am testing some convolution algorithms i found in some sites but none of them apply the matrix filters as it should.
I am writing a very simple 24 bits bmp library on my own, but now i need a little help with the convolution, i don't need FFT or complex algorithm, running time is not important at this time.
The last code i was testing was this: But i didn't work fine.
Could some one indicate me a code or algorithm in C?.
You can have a look at this algorithm - this is the closest which i can find:
Convolution to blur the image
Know that the basic convolution algorithm is more or less the same, the affect changes only by the kernel values.

There is an open source C# library which provides methods to perform image convolution of simple filters. It would be an easy port to C.
The actual methods to perform convolution can be found here. The BitmapContext class is used to just wrap a pointer to bitmap. I believe in C# this is treated as int*, so this code is operating on 4 bytes at a time.

I created Image Convolution library for simple cases -
It is pretty fast (OpenMP + SIMD).
Though I'm not an advanced programmer of something, just tried doing it to do first steps in utilizing SIMD.
Still, from what can be seen in VS 2015, the CPU utilization is pretty good.
If you have ideas to make it even faster, I will be happy.
Feel free to use it in any manner you'd like.


JPEG source-code and quantization mode change - C language

I'm assigned to do a project that consists in changing the quantization in the JPEG source-code, from the quantization tables to Lloyd-Max quantization. The problem is not knowing what to do (I know how to change the quantization), but where to find the code I'm suposed to change.
If someone is familiar with the libjpeg-turbo, could you give me some advice on doing so?
I refrained from responding because it has been a long time since I have prowled around in the LIBJPEG code and I understand that it has been rewritten. The code functions well and is efficient but it is quite torturous to read and understand.
This is a C++ library that apparently was written for instructive purposes. For understandability it is about as good as you are going to get with JPEG:
However, if I remember correctly, this one, like LIbJPEG, combines some steps of the DCT and quantization.

How come the mex code is running more slowly than the matlab code

I use matlab to write a program with many iterations. It cannot be vectorized since the data processing in each iteration is related to that in the previous iteration.
Then I transform the matlab code to mex using the build-in MATLAB coder and the resulting speed is even lower. I don't know whether I need to write the mex code by myself since it seems the mex code doesn't help.
I'd suggest that if you can, you get in touch with MathWorks to ask them for some advice. If you're not able to do that, then I would suggest really reading through the documentation and trying everything you find before giving up.
I've found that a few small changes to the way one implements the MATLAB code, and a few small changes to the project settings (such as disabling responsiveness to Ctrl-C, extrinsic calls back to MATLAB) can make give a speed difference of an order of magnitude or more in the generated code. There are not many people outside MathWorks who would be able to give good advice on exactly what changes might be worthwhile/sensible for you.
I should say that I've only used MATLAB Coder on one project, and I'm not at all an expert (actually not even a competent) C programmer. Nevertheless I've managed to produce C code that was about 10-15 times as fast as the original MATLAB code when mexed. I achieved that by a) just fiddling with all the different settings to see what happened and b) methodically going through the documentation, and seeing if there were places in my MATLAB code where I could apply any of the constructs I came across (such as coder.nullcopy, coder.unroll etc). Of course, your code may differ substantially.

how to align two meshes

I have a very nice & tricky question for you.
I need to align two meshes using a very fast algorithm. Given mesh1 and mesh2 I want to find how I need to traslate and rotate mesh1 to be in the same position of mesh2.
Firstly I did this using inertia moments of the two meshes, but the algorithm does not work if the second mesh is similar to the first one but with some missing parts. In other words, take two identical meshes and from one of them cut same parts off.
I'd like to write the code in C because I need to perform that on multiplatform machines (linux/win) and do that in a very fast way: it has to be put into a GA algorithm.
The two meshes are in STL (stereolitography) format (binary or ascii) but maybe can be useful using another kind of file format.
Do you have any idea how to perform this stuff?
question update:
first of all I want to thank you guys very much for all your suggestions. I've downloaded an install PCL on my machine and compiled the ICP (tutorial) algorithm successfully, taken from PCL web site.
But now I have some questions about that, maybe because for me is a brand new thing. what is the meaning of the 4x4 matrix output for the fitness? I should expect a rotational matrix and a traslational vector..
I hope some of you can help me.
If you need any other info please ask.
Point Cloud Library has several resources that you may find useful. As #Throwback1986 says, ICP is one excellent algorithm for aligning geometry. Pcl also features other, often faster alignment algorithms, based on identifying and matching features of interest in two pieces of geometry. The library finds a lot of use in the robotics communities, who, like you, are very performance conscious.
Pcl is written in c++. While not as portable as straight C, They offer installation instructions for windows, a few *nix flavors, and mac os. I've seen it running on ios and android as well. Check out the tutorials.
Iterative Closest Point (ICP) is one way of registering (aligning) 3D point clouds with rigid transformations. (It can also apply to meshes.)
Here is a good introduction:
Here is a reasonable summary:
Here is a matlab implementation:
Here are some potential optimizations:

Matrix solving with C (within CUDA)

As part of a larger problem, I need to solve small linear systems (i.e NxN where N ~10) so using the relevant cuda libraries doesn't make any sense in terms of speed.
Unfortunately something that's also unclear is how to go about solving such systems without pulling in the big guns like GSL, EIGEN etc.
Can anyone point me in the direction of a dense matrix solver (Ax=B) in straight C?
For those interested, the basic structure of the generator for this section of code is:
for v in range N:
for x in range N:
Unfortunately I have approximately zero knowledge of higher mathematics, so any advice would be appreciated.
UPDATE: I've been working away at this, and have a nearly-solution that runs but isn't working. Anyone lurking is welcome to check out what I've got so far on pastebin.
I'm using Crout Decomposition with Pivoting which seems to be the most general approach. The idea for this test is that every thread does the same work. Boring I know, but the plan is that the matrixcount variable is increased, actual data is put in, and each thread solves the small matrices individually.
Thanks for everyone who's been checking on this.
POST-ANSWER UPDATE: Finished the matrix solving code for CPU and GPU operation, check out my lazy-writeup here
CUDA won't help here, that's true. Matrices like that are just too small for it.
What you do to solve a system of linear equations is LU decomposition:
Or even better a QR decomposition with Householder reflections like in the Gram-Schmidt process.
Solving the linear equation becomes easy afterwards, but I'm afraid there always is some "higher mathematics" (linear algebra) involved. That, and there are many (many!) C libraries out there for solving linear equations. Doesn't seem like "big guns" to me.

How to use cepstral?

Recently I asked this question: How to get the fundamental frequency from FFT? (you don't actually need to read it)
My doubt right now it: how to use the cepstral algorithm?
I just don't know how to use it because the only language that I know is ActionScript 3, and for this reason I have few references about the native functions found in C, Java and so on, and how I should implement them on AS. Most articles are about these languages =/
(althought, answers in other languages than AS are welcome, just explain how the script works please)
The articles I found about cepstral to find the fundamental frequency of a FFT result told me that I should do this:
signal → FT → abs() → square → log → FT → abs() → square → power cepstrum
Important info:
I am developing a GUITAR TUNER in flash
This is the first time I am dealing with advanced sound
I am using an FFT to extract frequency bins from the signal that reaches user's microphone, but I got stuck in getting the fundamental frequency from it
I don't know:
How to apply a square in an ARRAY (I mean, the data that my FFT gives me is an array. Should I multiply it by itself? ActionScript's debug throws errors when I try to fftResults * fftResults)
How to apply the "log". I would not know how to apply it even if I had a single number.
What is the difference between complex cepstral and power cepstral. Also, what of them should I use? I am trying to develop a guitar tuner.
Note that the output of an FFT is an array of complex values, i.e. each bin = re + j*im. I think you can just combine the abs and square operations and calculate re*re + im*im for each bin. This gives you a single positive value for each bin, and obviously you can calculate the log value for each bin quite easily. You then need to do a second FFT on this log squared data and again using the output of this second FFT you will calculate re*re + im*im for each bin. You will then have an array of postive values which will have one or more peaks representing the fundamental frequency or frequencies of your input.
The autocorrelation is the easiest and most logical approach, and the best place to start.
To get this working, start with a simple autocorrelation, and then, if necessary, improve it following the outline provided by YIN. (YIN is based on the autocorrelation with refinements. But whether or not you'll need these refinements depends on details of your situation.) This way also, you can learn as you go rather than trying to understand the whole thing in one shot.
Although FFT approaches can also work, they are a bit more confusing. The issue is that what you are really after is the period, and this isn't well represented by the FFT. The missing fundamental is a good example of this, where if you have 2Hz and 3Hz, the fundamental is 1Hz, but is nowhere in the FFT, while 1Hz is obvious in a time based representation (e.g. the autocorrelation). Add to this that overtones aren't necessarily harmonic, and noise, etc... and all of these issues make it usually best to start with a direct approach to the problem.
There are many ways of finding fundamental frequency (F0).
For languages like Java etc there are many libraries with those type of algorithms already implemented (you can study their sources).
MFCC (based on cepstral) implemented in Comirva (Open source).
Audacity (beta version!) (Open source) presents cepstrum, autocorellation, enhanced autocorellation,
Yin based on autocorrelation (example )
Finding max signal values after FFT
All these algorithms may be be very helpful for you. However easiest way to get F0 (one value in Hz) would be to use Yin.
