Real time linear interpolation about 3-D servo motor - c

As we know in ordinary linear interpolation, the final destination is fixed. I want to use a camera to catch the moving objects and the coordinate can be the final destination. Anybody could help me finish this algorithm in C code?

Assuming that you're trying to track a moving object with a gimballed camera, the problem is the mismatch between the linear, constant speed assumption and the motion of the camera. Even if your object is moving at a constant speed, the camera will have to rotate at a non-constant speed to keep track of the object. For example, the camera will have to rotate quickly when the object is near the camera, but will rotate very slowly when the object is far away.
1) Figure out the Cartesian (XYZ) coordinates of the starting and ending points.
2) Compute a sequence of linear interpolations between the start and end point in Cartesian space. This is a sequence of points in Cartesian space that estimate the object's trajectory.
3) Convert the sequence of Cartesian points from the Cartesian coordinate system to the Spherical coordinate system.
4) The spherical coordinates Theta and Phi are the angles that your camera must move through in time.
All of the computations described above are simple and closed-form. You shouldn't need to apply any "real-time" programming techniques aside basic concepts like no dynamic allocation and no interpreted or garbage collected languages. If reliability is very important then you will want to employ a suitable real-time OS. Linux has a good real-time patch that provides pretty good soft-real-time performance.

Related

AI of spaceship's propulsion: land a 3D ship at position=0 and angle=0

This is a very difficult problem about how to maneuver a spaceship that can both translate and rotate in 3D, for a space game.
The spaceship has n jets placing in various positions and directions.
Transformation of i-th jet relative to the CM of spaceship is constant = Ti.
Transformation is a tuple of position and orientation (quaternion or matrix 3x3 or, less preferable, Euler angles).
A transformation can also be denoted by a single matrix 4x4.
In other words, all jet are glued to the ship and cannot rotate.
A jet can exert force to the spaceship only in direction of its axis (green).
As a result of glue, the axis rotated along with the spaceship.
All jets can exert force (vector,Fi) at a certain magnitude (scalar,fi) :
i-th jet can exert force (Fi= axis x fi) only within range min_i<= fi <=max_i.
Both min_i and max_i are constant with known value.
To be clear, unit of min_i,fi,max_i is Newton.
Ex. If the range doesn't cover 0, it means that the jet can't be turned off.
The spaceship's mass = m and inertia tensor = I.
The spaceship's current transformation = Tran0, velocity = V0, angularVelocity = W0.
The spaceship physic body follows well-known physic rules :-
Torque=r x F
F=ma
angularAcceleration = I^-1 x Torque
linearAcceleration = m^-1 x F
I is different for each direction, but for the sake of simplicity, it has the same value for every direction (sphere-like). Thus, I can be thought as a scalar instead of matrix 3x3.
Question
How to control all jets (all fi) to land the ship with position=0 and angle=0?
Math-like specification: Find function of fi(time) that take minimum time to reach position=(0,0,0), orient=identity with final angularVelocity and velocity = zero.
More specifically, what are names of technique or related algorithms to solve this problem?
My research (1 dimension)
If the universe is 1D (thus, no rotation), the problem will be easy to solve.
( Thank Gavin Lock, https://stackoverflow.com/a/40359322/3577745 )
First, find the value MIN_BURN=sum{min_i}/m and MAX_BURN=sum{max_i}/m.
Second, think in opposite way, assume that x=0 (position) and v=0 at t=0,
then create two parabolas with x''=MIN_BURN and x''=MAX_BURN.
(The 2nd derivative is assumed to be constant for a period of time, so it is parabola.)
The only remaining work is to join two parabolas together.
The red dash line is where them join.
In the period of time that x''=MAX_BURN, all fi=max_i.
In the period of time that x''=MIN_BURN, all fi=min_i.
It works really well for 1D, but in 3D, the problem is far more harder.
Note:
Just a rough guide pointing me to a correct direction is really appreciated.
I don't need a perfect AI, e.g. it can take a little more time than optimum.
I think about it for more than 1 week, still find no clue.
Other attempts / opinions
I don't think machine learning like neural network is appropriate for this case.
Boundary-constrained-least-square-optimisation may be useful but I don't know how to fit my two hyper-parabola to that form of problem.
This may be solved by using many iterations, but how?
I have searched NASA's website, but not find anything useful.
The feature may exist in "Space Engineer" game.
Commented by Logman: Knowledge in mechanical engineering may help.
Commented by AndyG: It is a motion planning problem with nonholonomic constraints. It could be solved by Rapidly exploring random tree (RRTs), theory around Lyapunov equation, and Linear quadratic regulator.
Commented by John Coleman: This seems more like optimal control than AI.
Edit: "Near-0 assumption" (optional)
In most case, AI (to be designed) run continuously (i.e. called every time-step).
Thus, with the AI's tuning, Tran0 is usually near-identity, V0 and W0 are usually not so different from 0, e.g. |Seta0|<30 degree,|W0|<5 degree per time-step .
I think that AI based on this assumption would work OK in most case. Although not perfect, it can be considered as a correct solution (I started to think that without this assumption, this question might be too hard).
I faintly feel that this assumption may enable some tricks that use some "linear"-approximation.
The 2nd Alternative Question - "Tune 12 Variables" (easier)
The above question might also be viewed as followed :-
I want to tune all six values and six values' (1st-derivative) to be 0, using lowest amount of time-steps.
Here is a table show a possible situation that AI can face:-
The Multiplier table stores inertia^-1 * r and mass^-1 from the original question.
The Multiplier and Range are constant.
Each timestep, the AI will be asked to pick a tuple of values fi that must be in the range [min_i,max_i] for every i+1-th jet.
Ex. From the table, AI can pick (f0=1,f1=0.1,f2=-1).
Then, the caller will use fi to multiply with the Multiplier table to get values''.
Px'' = f0*0.2+f1*0.0+f2*0.7
Py'' = f0*0.3-f1*0.9-f2*0.6
Pz'' = ....................
SetaX''= ....................
SetaY''= ....................
SetaZ''= f0*0.0+f1*0.0+f2*5.0
After that, the caller will update all values' with formula values' += values''.
Px' += Px''
.................
SetaZ' += SetaZ''
Finally, the caller will update all values with formula values += values'.
Px += Px'
.................
SetaZ += SetaZ'
AI will be asked only once for each time-step.
The objective of AI is to return tuples of fi (can be different for different time-step), to make Px,Py,Pz,SetaX,SetaY,SetaZ,Px',Py',Pz',SetaX',SetaY',SetaZ' = 0 (or very near),
by using least amount of time-steps as possible.
I hope providing another view of the problem will make it easier.
It is not the exact same problem, but I feel that a solution that can solve this version can bring me very close to the answer of the original question.
An answer for this alternate question can be very useful.
The 3rd Alternative Question - "Tune 6 Variables" (easiest)
This is a lossy simplified version of the previous alternative.
The only difference is that the world is now 2D, Fi is also 2D (x,y).
Thus I have to tune only Px,Py,SetaZ,Px',Py',SetaZ'=0, by using least amount of time-steps as possible.
An answer to this easiest alternative question can be considered useful.
I'll try to keep this short and sweet.
One approach that is often used to solve these problems in simulation is a Rapidly-Exploring Random Tree. To give at least a little credibility to my post, I'll admit I studied these, and motion planning was my research lab's area of expertise (probabilistic motion planning).
The canonical paper to read on these is Steven LaValle's Rapidly-exploring random trees: A new tool for path planning, and there have been a million papers published since that all improve on it in some way.
First I'll cover the most basic description of an RRT, and then I'll describe how it changes when you have dynamical constraints. I'll leave fiddling with it afterwards up to you:
Terminology
"Spaces"
The state of your spaceship can be described by its 3-dimension position (x, y, z) and its 3-dimensional rotation (alpha, beta, gamma) (I use those greek names because those are the Euler angles).
state space is all possible positions and rotations your spaceship can inhabit. Of course this is infinite.
collision space are all of the "invalid" states. i.e. realistically impossible positions. These are states where your spaceship is in collision with some obstacle (With other bodies this would also include collision with itself, for example planning for a length of chain). Abbreviated as C-Space.
free space is anything that is not collision space.
General Approach (no dynamics constraints)
For a body without dynamical constraints the approach is fairly straightforward:
Sample a state
Find nearest neighbors to that state
Attempt to plan a route between the neighbors and the state
I'll briefly discuss each step
Sampling a state
Sampling a state in the most basic case means choosing at random values for each entry in your state space. If we did this with your space ship, we'd randomly sample for x, y, z, alpha, beta, gamma across all of their possible values (uniform random sampling).
Of course way more of your space is obstacle space than free space typically (because you usually confine your object in question to some "environment" you want to move about inside of). So what is very common to do is to take the bounding cube of your environment and sample positions within it (x, y, z), and now we have a lot higher chance to sample in the free space.
In an RRT, you'll sample randomly most of the time. But with some probability you will actually choose your next sample to be your goal state (play with it, start with 0.05). This is because you need to periodically test to see if a path from start to goal is available.
Finding nearest neighbors to a sampled state
You chose some fixed integer > 0. Let's call that integer k. Your k nearest neighbors are nearby in state space. That means you have some distance metric that can tell you how far away states are from each other. The most basic distance metric is Euclidean distance, which only accounts for physical distance and doesn't care about rotational angles (because in the simplest case you can rotate 360 degrees in a single timestep).
Initially you'll only have your starting position, so it will be the only candidate in the nearest neighbor list.
Planning a route between states
This is called local planning. In a real-world scenario you know where you're going, and along the way you need to dodge other people and moving objects. We won't worry about those things here. In our planning world we assume the universe is static but for us.
What's most common is to assume some linear interpolation between the sampled state and its nearest neighbor. The neighbor (i.e. a node already in the tree) is moved along this linear interpolation bit by bit until it either reaches the sampled configuration, or it travels some maximum distance (recall your distance metric).
What's going on here is that your tree is growing towards the sample. When I say that you step "bit by bit" I mean you define some "delta" (a really small value) and move along the linear interpolation that much each timestep. At each point you check to see if you the new state is in collision with some obstacle. If you hit an obstacle, you keep the last valid configuration as part of the tree (don't forget to store the edge somehow!) So what you'll need for a local planner is:
Collision checking
how to "interpolate" between two states (for your problem you don't need to worry about this because we'll do something different).
A physics simulation for timestepping (Euler integration is quite common, but less stable than something like Runge-Kutta. Fortunately you already have a physics model!
Modification for dynamical constraints
Of course if we assume you can linearly interpolate between states, we'll violate the physics you've defined for your spaceship. So we modify the RRT as follows:
Instead of sampling random states, we sample random controls and apply said controls for a fixed time period (or until collision).
Before, when we sampled random states, what we were really doing was choosing a direction (in state space) to move. Now that we have constraints, we randomly sample our controls, which is effectively the same thing, except we're guaranteed not to violate our constraints.
After you apply your control for a fixed time interval (or until collision), you add a node to the tree, with the control stored on the edge. Your tree will grow very fast to explore the space. This control application replaces linear interpolation between tree states and sampled states.
Sampling the controls
You have n jets that individually have some min and max force they can apply. Sample within that min and max force for each jet.
Which node(s) do I apply my controls to?
Well you can choose at random, or your can bias the selection to choose nodes that are nearest to your goal state (need the distance metric). This biasing will try to grow nodes closer to the goal over time.
Now, with this approach, you're unlikely to exactly reach your goal, so you need to define some definition of "close enough". That is, you will use your distance metric to find nearest neighbors to your goal state, and then test them for "close enough". This "close enough" metric can be different than your distance metric, or not. If you're using Euclidean distance, but it's very important that you goal configuration is also rotated properly, then you may want to modify the "close enough" metric to look at angle differences.
What is "close enough" is entirely up to you. Also something for you to tune, and there are a million papers that try to get you a lot closer in the first place.
Conclusion
This random sampling may sound ridiculous, but your tree will grow to explore your free space very quickly. See some youtube videos on RRT for path planning. We can't guarantee something called "probabilistic completeness" with dynamical constraints, but it's usually "good enough". Sometimes it'll be possible that a solution does not exist, so you'll need to put some logic in there to stop growing the tree after a while (20,000 samples for example)
More Resources:
Start with these, and then start looking into their citations, and then start looking into who is citing them.
Kinodynamic RRT*
RRT-Connect
This is not an answer, but it's too long to place as a comment.
First of all, a real solution will involve both linear programming (for multivariate optimization with constraints that will be used in many of the substeps) as well as techniques used in trajectory optimization and/or control theory. This is a very complex problem and if you can solve it, you could have a job at any company of your choosing. The only thing that could make this problem worse would be friction (drag) effects or external body gravitation effects. A real solution would also ideally use Verlet integration or 4th order Runge Kutta, which offer improvements over the Euler integration you've implemented here.
Secondly, I believe your "2nd Alternative Version" of your question above has omitted the rotational influence on the positional displacement vector you add into the position at each timestep. While the jet axes all remain fixed relative to the frame of reference of the ship, they do not remain fixed relative to the global coordinate system you are using to land the ship (at global coordinate [0, 0, 0]). Therefore the [Px', Py', Pz'] vector (calculated from the ship's frame of reference) must undergo appropriate rotation in all 3 dimensions prior to being applied to the global position coordinates.
Thirdly, there are some implicit assumptions you failed to specify. For example, one dimension should be defined as the "landing depth" dimension and negative coordinate values should be prohibited (unless you accept a fiery crash). I developed a mockup model for this in which I assumed z dimension to be the landing dimension. This problem is very sensitive to initial state and the constraints placed on the jets. All of my attempts using your example initial conditions above failed to land. For example, in my mockup (without the 3d displacement vector rotation noted above), the jet constraints only allow for rotation in one direction on the z-axis. So if aZ becomes negative at any time (which is often the case) the ship is actually forced to complete another full rotation on that axis before it can even try to approach zero degrees again. Also, without the 3d displacement vector rotation, you will find that Px will only go negative using your example initial conditions and constraints, and the ship is forced to either crash or diverge farther and farther onto the negative x-axis as it attempts to maneuver. The only way to solve this is to truly incorporate rotation or allow for sufficient positive and negative jet forces.
However, even when I relaxed your min/max force constraints, I was unable to get my mockup to land successfully, demonstrating how complex planning will probably be required here. Unless it is possible to completely formulate this problem in linear programming space, I believe you will need to incorporate advanced planning or stochastic decision trees that are "smart" enough to continually use rotational methods to reorient the most flexible jets onto the currently most necessary axes.
Lastly, as I noted in the comments section, "On May 14, 2015, the source code for Space Engineers was made freely available on GitHub to the public." If you believe that game already contains this logic, that should be your starting place. However, I suspect you are bound to be disappointed. Most space game landing sequences simply take control of the ship and do not simulate "real" force vectors. Once you take control of a 3-d model, it is very easy to predetermine a 3d spline with rotation that will allow the ship to land softly and with perfect bearing at the predetermined time. Why would any game programmer go through this level of work for a landing sequence? This sort of logic could control ICBM missiles or planetary rover re-entry vehicles and it is simply overkill IMHO for a game (unless the very purpose of the game is to see if you can land a damaged spaceship with arbitrary jets and constraints without crashing).
I can introduce another technique into the mix of (awesome) answers proposed.
It lies more in AI, and provides close-to-optimal solutions. It's called Machine Learning, more specifically Q-Learning. It's surprisingly easy to implement but hard to get right.
The advantage is that the learning can be done offline, so the algorithm can then be super fast when used.
You could do the learning when the ship is built or when something happens to it (thruster destruction, large chunks torn away...).
Optimality
I observed you're looking for near-optimal solutions. Your method with parabolas is good for optimal control. What you did is this:
Observe the state of the system.
For every state (coming in too fast, too slow, heading away, closing in etc.) you devised an action (apply a strategy) that will bring the system into a state closer to the goal.
Repeat
This is pretty much intractable for a human in 3D (too many cases, will drive you nuts) however a machine may learn where to split the parabolas in every dimensions, and devise an optimal strategy by itself.
THe Q-learning works very similarly to us:
Observe the (secretized) state of the system
Select an action based on a strategy
If this action brought the system into a desirable state (closer to the goal), mark the action/initial state as more desirable
Repeat
Discretize your system's state.
For each state, have a map intialized quasi-randomly, which maps every state to an Action (this is the strategy). Also assign a desirability to each state (initially, zero everywhere and 1000000 to the target state (X=0, V=0).
Your state would be your 3 positions, 3 angles, 3translation speed, and three rotation speed.
Your actions can be any combination of thrusters
Training
Train the AI (offline phase):
Generate many diverse situations
Apply the strategy
Evaluate the new state
Let the algo (see links above) reinforce the selected strategies' desirability value.
Live usage in the game
After some time, a global strategy for navigation emerges. You then store it, and during your game loop you simply sample your strategy and apply it to each situation as they come up.
The strategy may still learn during this phase, but probably more slowly (because it happens real-time). (Btw, I dream of a game where the AI would learn from every user's feedback so we could collectively train it ^^)
Try this in a simple 1D problem, it devises a strategy remarkably quickly (a few seconds).
In 2D I believe excellent results could be obtained in an hour.
For 3D... You're looking at overnight computations. There's a few thing to try and accelerate the process:
Try to never 'forget' previous computations, and feed them as an initial 'best guess' strategy. Save it to a file!
You might drop some states (like ship roll maybe?) without losing much navigation optimality but increasing computation speed greatly. Maybe change referentials so the ship is always on the X-axis, this way you'll drop x&y dimensions!
States more frequently encountered will have a reliable and very optimal strategy. Maybe normalize the state to make your ship state always close to a 'standard' state?
Typically rotation speeds intervals may be bounded safely (you don't want a ship tumbling wildely, so the strategy will always be to "un-wind" that speed). Of course rotation angles are additionally bounded.
You can also probably discretize non-linearly the positions because farther away from the objective, precision won't affect the strategy much.
For these kind of problems there are two techniques available: bruteforce search and heuristics. Bruteforce means to recognize the problem as a blackbox with input and output parameters and the aim is to get the right input parameters for winning the game. To program such a bruteforce search, the gamephysics runs in a simulation loop (physics simulation) and via stochastic search (minimax, alpha-beta-prunning) every possibility is tried out. The disadvantage of bruteforce search is the high cpu consumption.
The other techniques utilizes knowledge about the game. Knowledge about motion primitives and about evaluation. This knowledge is programmed with normal computerlanguages like C++ or Java. The disadvantage of this idea is, that it is often difficult to grasp the knowledge.
The best practice for solving spaceship navigation is to combine both ideas into a hybrid system. For programming sourcecode for this concrete problem I estimate that nearly 2000 lines of code are necessary. These kind of problems are normaly done within huge projects with many programmers and takes about 6 months.

Triangulating a planar point set with known Boundary

I have a planar point set P. I already know what points p in P belong to the boundary B(p). Said boundary may be convex or non convex. Now, I would like to find a triangulation of P with boundary B(p). My questions:
Is there an Algorithm that achieves this directly? A close candidate would be the Constrained Delaunay Triangulation (CDT). However, I don't think CDT applies here: I could feed all the edges in B(p) as a constraint, such that all the edges would be contained in the triangulation. However, this does not necessarily entail that this will be the boundary of the triangulation. Correct me if I'm wrong here.
If you now of such an Algorithm, can you point me to a (lightweight) C library that provides an implementation?
Alternatively: I could of course just triangulate P using the standard Delaunay triangulation from GTS. I would then need to prune all the faces with a vertex outside of B(p). Is this possible with GTS?
I could feed all the edges in B(p) as a constraint, such that all the edges would be contained in the triangulation. However, this does not necessarily entail that this will be the boundary of the triangulation.
You're right that the constrained Delaunay triangulation may fill in the concavities of the boundary. Every triangle, however, is either completely inside or completely outside of the boundary, so it's easy enough to delete the ones outside by traversing the dual of the planar straight-line graph starting from the infinite face, treating the boundary edges as impassable. Jonathan Shewchuk's library Triangle, for example, does this. The license may not be to your liking, but if you already have another library to compute constrained Delaunay triangulations, we're not talking about a lot of additional code.
poly2tri finds a CDT of a planar region given its boundary. It is easy to build and use.
You need first to do Ear Clipping using the boundary points, then transfer the result to Delaunay triangulation and add internal points.

OpenGL rotation and scaling

Does rotation always occur about the origin (0,0,0)?
Does translation always occur relative to previous translation?
Does scaling increase the coordinates axes size?
I suggest that a good way for a beginner is to start by thinking about points rather than 3D objects. Then all the transformation can be thought of as functions to change a point position to a new position.
First imagine an XYZ cartesian coordinate space, then imagine a point (X,Y,Z) in space with origin (0, 0, 0). All OpenGL knows at this stage is the point X,Y,Z. Now you are ready to begin:
Rotation requires an angle and a center of rotation. glRotate allows you to only specify the angles. By virtue of mathematics, conceptually, the center of rotation is at the location (X-X,Y-Y,Z-Z) or (0,0,0).
Translation is just an offset from the current position. Since OpenGL knows your point (X,Y,Z) it simply adds the offest to the position vector. It is therefore more correct to say it is relative to the current position rather than previous translation.
Scaling is a multiplication of the point vector (X.m,Y.m,Z.m) hence it simply just translating that point by a factor of m. Hence conceptually one can say it doesn't change the coordinate axes size.
However, when you start to think in 3D things get abit tricky because you will realise that if you are not careful, the all the points in a single 3D object doesn't always change position in the way you desire relative to each other. You will learn for example that if you want to rotate about the object's center, you will have to "move it to the origin, rotate, and then move it back again". This process of moving it back an forth can be thought as specifying the center of rotation. These are actually mathematical "tricks" that you apply.
Does rotation always occur about the origin (0,0,0)?
Indeed this is the case.
Does translation always occur relative to previous translation?
Does scaling increase the coordinates axes size?
This requires some explanation: OpenGL, and so many other software operating with geometry data don't build a list of chained transformations. What they maintain is one single homogenous transformation matrix.
"Appending" a transformation is done by multiplying the current transformation matrix with the transformation matrix describing the "next" transformation, replacing the old transformation. This also means that a compound transformation matrix, like what you end up having in the OpenGL modelview, may be applied as transformation as well.
To make a long story short, it depends all on the transformation applied. Old OpenGL gives you some basic matrix manipulations. In OpenGL-3 they have been removed, because OpenGL is not a math library, but draws stuff.
So how does such a transformation matrix look like? Like this:
Xx Yx Zx Tx
Xy Yy Zy Ty
Xz Yz Zz Tz
_x _y _z w
Maybe you noticed that there are 3 major columns designated by capital X, Y, Z. Those columns form vectors. And in the case of 3D transformations those are the base vectors of a coordinate system, relative the one the transformation is applied upon. However vectors only give "directions" and a length. So what's needed as well is the relative point of origin of the new coordinate system, and that's what the T vector contains.
Most of the time _x = _y = _z = 0 and w = 1
Transforming a point of geometry happens by multiplying the points vector with the matrix. Let such a matrix be M, the point p, then
p' = M * p
Now assume we chain transformations:
p'' = M' * p' = M' * M * p
We can substitute M_ = M' * M, so
p'' = M_ * p
It's easy to see, that we can chain this arbitrarily long
To answer your two last questions: Transformations (not just translations) do chain. And yes, applying a scaling transform will "scale" the axes.
And to clear up some commong misunderstanding: OpenGL is not a scene graph, it does not deal with "objects", but just lists of geometry. glScale, glTranslate, glRotate don't transform objects, but "chain up" transformation operations.
someone with more experience will surely point you to a good tutorial but your question reflect that you don't understand the 3D graphical pipeline and more precisely the concept of projection matrix (I might have the wrong name here since I studied this ages ago in French lol).
Basically whenever you apply a rotation/translation/scaling you are modifying the same matix
therefor when you each operation modifies the existing state.
For example doing rotation then a translation will give you a different result that translation then rotaiton (try doing the solar system sun earth moon it will help you understand)
regarding your questions:
No the basic rotation will not always occur in 0,0,0. for example if you first translate to 2,3,4 then the rotation will happen in 2,3,4.
the simple answer is yes, you are moving your matrice form its last position.(read my comment at the end for the not the simple answer ^^)
scaling will affect all the transformations done after. example scale 1,2,2 followed by a translation 2,3,4 could be seen as a a global translation 2,6,8
now for the not so simple part:
as explained each change will be affected by the previous changes (example of the scale)
also there is a lot of ways to do the same thing or to alter the behavior, for example:
achieving absolute translation can be done like this
-translate
-create an object
-indentity (reset the matrix to 0)
-translate2
-create object2
My advice is read tutorials but also global 3D programing blogs or a book (red book is good when you start lol)

Fast way to implement 2D convolution in C

I am trying to implement a vision algorithm, which includes a prefiltering stage with a 9x9 Laplacian-of-Gaussian filter. Can you point to a document which explains fast filter implementations briefly? I think I should make use of FFT for most efficient filtering.
Are you sure you want to use FFT? That will be a whole-array transform, which will be expensive. If you've already decided on a 9x9 convolution filter, you don't need any FFT.
Generally, the cheapest way to do convolution in C is to set up a loop that moves a pointer over the array, summing the convolved values at each point and writing the data to a new array. This loop can then be parallelised using your favourite method (compiler vectorisation, MPI libraries, OpenMP, etc).
Regarding the boundaries:
If you assume the values to be 0 outside the boundaries, then add a 4 element border of 0 to your 2d array of points. This will avoid the need for `if` statements to handle the boundaries, which are expensive.
If your data wraps at the boundaries (ie it is periodic), then use a modulo or add a 4 element border which copies the opposite side of the grid (abcdefg -> fgabcdefgab for 2 points). **Note: this is what you are implicitly assuming with any kind of Fourier transform, including FFT**. If that is not the case, you would need to account for it before any FFT is done.
The 4 points are because the maximum boundary overlap of a 9x9 kernel is 4 points outside the main grid. Thus, n points of border needed for a 2n+1 x 2n+1 kernel.
If you need this convolution to be really fast, and/or your grid is large, consider partitioning it into smaller pieces that can be held in the processor's cache, and thus calculated far more quickly. This also goes for any GPU-offloading you might want to do (they are ideal for this type of floating-point calculation).
Here is a theory link
http://hebb.mit.edu/courses/9.29/2002/readings/c13-1.pdf
And here is a link to fftw, which is a pretty good FFT library that I've used in the past (check licenses to make sure it is suitable) http://www.fftw.org/
All you do is FFT your image and kernel (the 9x9 matrix). Multiply together, then back transform.
However, with a 9x9 matrix you may still be better doing it in real coordinates (just with a double loop over the image pixels and the matrix). Try both ways!
Actually you don't need to use a FFT size large enough to hold the entire image. You can do a lot of smaller overlapping 2d ffts. You can search for "fast convolution" "overlap save" "overlap add".
However, for a 9x9 kernel. You may not see much advantage speedwise.

Spatial Data Structures in C

I do work in theoretical chemistry on a high performance cluster, often involving molecular dynamics simulations. One of the problems my work addresses involves a static field of N-dimensional (typically N = 2-5) hyper-spheres, that a test particle may collide with. I'm looking to optimize (read: overhaul) the the data structure I use for representing the field of spheres so I can do rapid collision detection. Currently I use a dead simple array of pointers to an N-membered struct (doubles for each coordinate of the center) and a nearest-neighbor list. I've heard of oct- and quad- trees but haven't found a clear explanation of how they work, how to efficiently implement one, or how to then do fast collision detection with one. Given the size of my simulations, memory is (almost) no object, but cycles are.
How best to approach this for your problem depends on several factors that you have not described:
- Will the same hypersphere arrangement be used for many particle collision calculations?
- Are the hyperspheres uniform size?
- What is the movement of the particle (e.g. straight line/curve) and is that movement affected by the spheres?
- Do you consider the particle to have zero volume?
I assume that the particle does not have simple straight line movement as that would be the relatively fast calculation of finding the closest point between a line and a point, which is likely going to be about the same speed as finding which of the boxes the line intersects with (to determine where in the n-tree to examine).
If your hypersphere positions are fixed for a lot of particle collisions then computing a voronoi decomposition/Dirichlet tessellation would give you a fast way of later finding exactly which sphere is closest to your particle for any given point in the space.
However to answer your original question about octrees/quadtrees/2^n-trees, in n dimensions you start with a (hyper)-cube that contains the area of space that you are interested in. This will be subdivided into 2^n hypercubes if you deem the contents to be too complicated. This continues recursively until you have only simple elements (e.g. one hypersphere centroid) in the leaf nodes.
Now that the n-tree is built you use it for collision detection by taking the path of your particle and intersecting it with the outer hypercube. The intersection position will tell you which hypercube in the next level down of the tree to visit next, and you determine the position of intersection with all 2^n hypercubes at that level, following downwards until you reach a leaf node. Once you reach the leaf you can examine interactions between your particle path and the hypersphere stored at that leaf. If you have collision you have finished, otherwise you have to find the exit point of the particle path from the current hypercube leaf and determine which hypercube it moves to next. Continue until you find a collision or entirely leave the overall bounding hypercube.
Efficiently finding the neighbouring hypercube when exiting a hypercube is one of the most challenging parts of this approach. For 2^n trees Samet's approaches {1, 2} can be adapted. For kd-trees (binary trees) an approach is suggested in {3} section 4.3.3.
Efficient implementation can be as simple as storing a list of 8 pointers from each hypercube to its children hypercubes, and marking the hypercube in a special way if it is a leaf (e.g. make all pointers NULL).
A description of dividing space to create a quadtree (which you can generalise to n-tree) can be found in Klinger & Dyer {4}
As others have mentioned kd-trees may be more suited than 2^n-trees as extension to an arbitrary number of dimensions is more straightforward, however they will result in a deeper tree. It is also easier to adapt the split positions to match the geometry of your
hyperspheres with a kd-tree. The description above of collision detection in a 2^n tree is equally applicable to a kd-tree.
{1} Connected Component Labeling, Hanan Samet, Using Quadtrees Journal of the ACM Volume 28 , Issue 3 (July 1981)
{2} Neighbor finding in images represented by octrees, Hanan Samet, Computer Vision, Graphics, and Image Processing Volume 46 , Issue 3 (June 1989)
{3} Convex hull generation, connected component labelling, and minimum distance
calculation for set-theoretically defined models, Dan Pidcock, 2000
{4} Experiments in picture representation using regular decomposition, Klinger, A., and Dyer, C.R. E, Comptr. Graphics and Image Processing 5 (1976), 68-105.
It sounds like you'd want to implement a kd-tree, which would allow you to more quickly search the N-dimensional space. There's some more information and links to implementations at the Stony Brook Algorithm Repository.
Since your field is static (by which I'm assuming you mean that the hyper spheres don't move), then the fastest solution I know of is a Kdtree.
You can either make your own, or use someone else's, like this one:
http://libkdtree.alioth.debian.org/
A Quad tree is a 2 dimensional tree, in which at each level a node has 4 children, each of which covers 1/4 of the area of the parent node.
An Oct tree is a 3 dimensional tree, in which at each level a node has 8 children, each of which contains 1/8th of the volume of the parent node. Here is picture to help you visualize it: http://en.wikipedia.org/wiki/Octree
If you're doing N dimensional intersection tests, you could generalize this to an N tree.
Intersection algorithms work by starting at the top of the tree and recursively traversing into any child nodes that intersect the object being tested, at some point you get to leaf nodes, which contain the actual objects.
An octree will work as long as you can specify the spheres by their centres - it hierarchically bins points into cubic regions with eight children. Working out neighbours in an octree data structure will require you to do sphere-intersecting-cube calculations (to some extent easier than they look) to work out which cubic regions in an octree are within the sphere.
Finding the nearest neighbours means walking back up the tree until you get a node with more than one populated child and all surrounding nodes included (this ensures the query gets all sides).
From memory, this is the (somewhat naive) basic algorithm for sphere-cube intersection:
i. Is the centre within the cube (this gets the eponymous situation)
ii. Are any of the corners of the cube within radius r of the centre (corners within the sphere)
iii. For each surface of the cube (you can eliminate some of the surfaces by working out which side of the surface the centre lies on) work out (this is all first-year vector arithmetic):
a. A normal of the surface that goes to the centre of the sphere
b. The distance from the centre of the sphere to the intersection of the normal with the plane of the surface (chord intersets plane the surface of the cube)
c. Intersection of the plane lies within the side of the cube (one condition of chord intersection to the cube)
iv. Calculate the size of the chord (Sin of Cos^-1 of ratio of normal length to radius of sphere)
v. If the nearest point on the line is less than the distance of the chord and the point lies between the ends of the line the chord intersects one of the edges of the cube (chord intersects cube surface somewhere along one of the edges).
Slightly dimly remembered but this is something I did for a situation involving spherical regions using an octee data structure (many years ago). You may also wish to check out KD-trees as some of the other posters suggest but your initial question sounds very similar to what I did.

Resources