Artificial neural network configuration

Artificial neural network configuration - artificial-intelligence

I believe I know the answer to this question already, but I wanted the input from the community. Below is my question.
The following diagram represents an artificial neural network for an associative memory as discussed in this chapter.
What stable configuration would it associate with any initial pattern in which only two neurons that are opposite each
other on the perimeter were excited?
Answer 1 : If the network is given an initial configuration in which only two opposing units are excited, it will blink – alternating between vertical and horizontal “lines.”
Answer 2: If the network is given an initial configuration in which only two opposing units are excited, the center until will be excited and all the perimeter units will remain inhibited.
Answer 3: Any pattern in which at least two perimeter units are excited and the center unit is inhibited is associated with the pattern in which the perimeter units, as well as the center unit, are excited.
Answer4: Any pattern in which at least two perimeter units are excited and the center unit is inhibited is associated with the pattern in which the perimeter units are excited and the center unit is inhibited.
The answer I choose was answer 4. The reason why I choose answer 4 is because if you look at the diagram the weight of the outer diagram are positive. This may sound trivial, but because the sum of the neurons of the perimeter is positive output( i.e 1) this would cause opposite perimeter to be excited.

Related

How to determine optimum hidden layers and neurons based on inputs and outputs in a NN?

I'm refering mostly to this paper here: http://clgiles.ist.psu.edu/papers/UMD-CS-TR-3617.what.size.neural.net.to.use.pdf
Current Setup:
I'm currently trying to port the neural-genetic AI solution that I have laying around to get into a multi-purpose multi-agent tool. So, for example, it should work as an AI in a game engine for moving around entities and let 'em shoot and destroy the enemy (so e.g. 4 inputs like distance x,y and angle x,y and 2 outputs like accelerate left,right).
The state so far is that I'm using the same amount of genomes as there are agents to determine the fittest agents. 20% of the fittest agents are combined with each other (zz, zw genomes selected) and create 2 babies for the new population each. The rest of the new population per-new-generation is selected randomly across the old population, including the fittest with-an-unfit-genome.
That works pretty well to prime the AI, after generation 50-100 it is pretty much human-unbeatable in a Breakout clone and a little Tank game where you can shoot and move around.
As I had the idea to use on evolution population for each "type of Agent" the question is now if it is possible to determine the amount of hidden layers and the amount of neurons in the hidden layers generically.
My setup for the tank game is 4 inputs, 3 outputs and 1 hidden layer with 12 neurons that worked the best (around 50 generations to be really strong).
My setup for a breakout game is 6 inputs, 2 outputs and 2 hidden layers with 12 neurons that seems to work best.
Done Research:
So, back to the paper: On page 32 you can see that it seems that more neurons per hidden layer need of course more time for priming, but the more neurons are in between, the more are the chances to get into the function without noise.
I currently prime my AI only using the fitness increase on successfully being better than the last try.
So in a tank game it means he successfully shot the other tank (wounded him 4 times is better, then enemy is dead) and won the round.
In the breakout game it's similar as I have a paddle that the AI can move around and it can collect points. "Getting shot" or negative treatment here is that it forgot to catch the ball. So potential noise input would be 2 output values (move-left, move-right) that depend on 4 input values (ball x, y, degx, degy).
Questions:
So, what kind of calculation for the amount of hidden layers and amount of neurons do you think can be a good tradeoff to have no noise that kills the genome evolution?
What is the minimum amount of agents until you can say that "it evolves further"? My current training setup is always around having 50 agents that learn in parallel (so they basically simulate 50 games in parallel "behind the scenes").

In sum, for most problems, one could probably get decent performance (even without a second optimization step) by setting the hidden layer configuration using just two rules: (i) number of hidden layers equals one; and (ii) the number of neurons in that layer is the mean of the neurons in the input and output layers.
-doug
In short. It's an ongoing area of research. Most (All that I know of) ANN using numerous neurons and H-Layers don't set a static number of either, instead they use algorithms to continuously modify these values. Usually constructing and destroying when outputs converge/diverge.
Since it sounds like you're already using some evolutionary computing, consider looking into Andrew Turner's work on CGPANN, I remember it getting pretty decent improvements on benchmarks similar to your work.

How to understand Locality Sensitive Hashing? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I noticed that LSH seems a good way to find similar items with high-dimension properties.
After reading the paper http://www.slaney.org/malcolm/yahoo/Slaney2008-LSHTutorial.pdf, I'm still confused with those formulas.
Does anyone know a blog or article that explains that the easy way?

The best tutorial I have seen for LSH is in the book: Mining of Massive Datasets.
Check Chapter 3 - Finding Similar Items
http://infolab.stanford.edu/~ullman/mmds/ch3a.pdf
Also I recommend the below slide:
http://www.cs.jhu.edu/%7Evandurme/papers/VanDurmeLallACL10-slides.pdf .
The example in the slide helps me a lot in understanding the hashing for cosine similarity.
I borrow two slides from Benjamin Van Durme & Ashwin Lall, ACL2010 and try to explain the intuitions of LSH Families for Cosine Distance a bit.
In the figure, there are two circles w/ red and yellow colored, representing two two-dimensional data points. We are trying to find their cosine similarity using LSH.
The gray lines are some uniformly randomly picked planes.
Depending on whether the data point locates above or below a gray line, we mark this relation as 0/1.
On the upper-left corner, there are two rows of white/black squares, representing the signature of the two data points respectively. Each square is corresponding to a bit 0(white) or 1(black).
So once you have a pool of planes, you can encode the data points with their location respective to the planes. Imagine that when we have more planes in the pool, the angular difference encoded in the signature is closer to the actual difference. Because only planes that resides between the two points will give the two data different bit value.
Now we look at the signature of the two data points. As in the example, we use only 6 bits(squares) to represent each data. This is the LSH hash for the original data we have.
The hamming distance between the two hashed value is 1, because their signatures only differ by 1 bit.
Considering the length of the signature, we can calculate their angular similarity as shown in the graph.
I have some sample code (just 50 lines) in python here which is using cosine similarity.
https://gist.github.com/94a3d425009be0f94751

Tweets in vector space can be a great example of high dimensional data.
Check out my blog post on applying Locality Sensitive Hashing to tweets to find similar ones.
http://micvog.com/2013/09/08/storm-first-story-detection/
And because one picture is a thousand words check the picture below:
http://micvog.files.wordpress.com/2013/08/lsh1.png
Hope it helps.
#mvogiatzis

Here's a presentation from Stanford that explains it. It made a big difference for me. Part two is more about LSH, but part one covers it as well.
A picture of the overview (There are much more in the slides):
Near Neighbor Search in High Dimensional Data - Part1:
http://www.stanford.edu/class/cs345a/slides/04-highdim.pdf
Near Neighbor Search in High Dimensional Data - Part2:
http://www.stanford.edu/class/cs345a/slides/05-LSH.pdf

LSH is a procedure that takes as input a set of documents/images/objects and outputs a kind of Hash Table.
The indexes of this table contain the documents such that documents that are on the same index are considered similar and those on different indexes are "dissimilar".
Where similar depends on the metric system and also on a threshold similarity s which acts like a global parameter of LSH.
It is up to you to define what the adequate threshold s is for your problem.
It is important to underline that different similarity measures have different implementations of LSH.
In my blog, I tried to thoroughly explain LSH for the cases of minHashing( jaccard similarity measure) and simHashing (cosine distance measure). I hope you find it useful:
https://aerodatablog.wordpress.com/2017/11/29/locality-sensitive-hashing-lsh/

I am a visual person. Here is what works for me as an intuition.
Say each of the things you want to search for approximately are physical objects such as an apple, a cube, a chair.
My intuition for an LSH is that it is similar to take the shadows of these objects. Like if you take the shadow of a 3D cube you get a 2D square-like on a piece of paper, or a 3D sphere will get you a circle-like shadow on a piece of paper.
Eventually, there are many more than three dimensions in a search problem (where each word in a text could be one dimension) but the shadow analogy is still very useful to me.
Now we can efficiently compare strings of bits in software. A fixed length bit string is kinda, more or less, like a line in a single dimension.
So with an LSH, I project the shadows of objects eventually as points (0 or 1) on a single fixed length line/bit string.
The whole trick is to take the shadows such that they still make sense in the lower dimension e.g. they resemble the original object in a good enough way that can be recognized.
A 2D drawing of a cube in perspective tells me this is a cube. But I cannot distinguish easily a 2D square from a 3D cube shadow without perspective: they both looks like a square to me.
How I present my object to the light will determine if I get a good recognizable shadow or not. So I think of a "good" LSH as the one that will turn my objects in front of a light such that their shadow is best recognizable as representing my object.
So to recap: I think of things to index with an LSH as physical objects like a cube, a table, or chair, and I project their shadows in 2D and eventually along a line (a bit string). And a "good" LSH "function" is how I present my objects in front of a light to get an approximately distinguishable shape in the 2D flatland and later my bit string.
Finally when I want to search if an object I have is similar to some objects that I indexed, I take the shadows of this "query" object using the same way to present my object in front of the light (eventually ending up with a bit string too). And now I can compare how similar is that bit string with all my other indexed bit strings which is a proxy for searching for my whole objects if I found a good and recognizable way to present my objects to my light.

As a very short, tldr answer:
An example of locality sensitive hashing could be to first set planes randomly (with a rotation and offset) in your space of inputs to hash, and then to drop your points to hash in the space, and for each plane you measure if the point is above or below it (e.g.: 0 or 1), and the answer is the hash. So points similar in space will have a similar hash if measured with the cosine distance before or after.
You could read this example using scikit-learn: https://github.com/guillaume-chevalier/SGNN-Self-Governing-Neural-Networks-Projection-Layer

Find optimal/good-enough strategy and AI for the game 'Proximity'?

'Proximity' is a strategy game of territorial domination similar to Othello, Go and Risk.
Two players, uses a 10x12 hex grid. Game invented by Brian Cable in 2007.
Seems to be a worthy game for discussing a) optimal algorithm then b) how to build an AI.
Strategies are going to be probabilistic or heuristic-based, due to the randomness factor, and the insane branching factor (20^120).
So it will be kind of hard to compare objectively.
A compute time limit of 5 seconds max per turn seems reasonable => this rules out all brute-force attempts. (Play the game's AI on Expert level to get a feel - it does a very good job based on some simple heuristic)
Game: Flash version here, iPhone version iProximity here and many copies elsewhere on the web
Rules: here
Object: to have control of the most armies after all tiles have been placed. You start with an empty hexboard. Each turn you receive a randomly numbered tile (value between 1 and 20 armies) to place on any vacant board space. If this tile is adjacent to any ALLY tiles, it will strengthen each of those tile's defenses +1 (up to a max value of 20). If it is adjacent to any ENEMY tiles, it will take control over them IF its number is higher than the number on the enemy tile.
Thoughts on strategy: Here are some initial thoughts; setting the computer AI to Expert will probably teach a lot:
minimizing your perimeter seems to be a good strategy, to prevent flips and minimize worst-case damage
like in Go, leaving holes inside your formation is lethal, only more so with the hex grid because you can lose armies on up to 6 squares in one move
low-numbered tiles are a liability, so place them away from your main territory, near the board edges and scattered. You can also use low-numbered tiles to plug holes in your formation, or make small gains along the perimeter which the opponent will not tend to bother attacking.
a triangle formation of three pieces is strong since they mutually reinforce, and also reduce the perimeter
Each tile can be flipped at most 6 times, i.e. when its neighbor tiles are occupied. Control of a formation can flow back and forth. Sometimes you lose part of a formation and plug any holes to render that part of the board 'dead' and lock in your territory/ prevent further losses.
Low-numbered tiles are obvious-but-low-valued liabilities, but high-numbered tiles can be bigger liabilities if they get flipped (which is harder). One lucky play with a 20-army tile can cause a swing of 200 (from +100 to -100 armies). So tile placement will have both offensive and defensive considerations.
Comment 1,2,4 seem to resemble a minimax strategy where we minimize the maximum expected possible loss (modified by some probabilistic consideration of the value ß the opponent can get from 1..20 i.e. a structure which can only be flipped by a ß=20 tile is 'nearly impregnable'.)
I'm not clear what the implications of comments 3,5,6 are for optimal strategy.
Interested in comments from Go, Chess or Othello players.
(The sequel ProximityHD for XBox Live, allows 4-player -cooperative or -competitive local multiplayer increases the branching factor since you now have 5 tiles in your hand at any given time, of which you can only play one. Reinforcement of ally tiles is increased to +2 per ally.)

A former member of the U of A GAMES group here.
That branching factor is insane. Far worse than Go.
Basically, you're hooped.
The problem with this game is that it is not deterministic due to the selection of a random tile. This actually adds another layer of nodes between each existing layer of nodes in the tree. You'll be interested in my publications on *-Minimax to learn about techniques for searching in stochastic domains.
In order to complete one-ply searches before the end of this century, you're going to need some very aggressive forward pruning techniques. Throw provably best move out the window early and concentrate on building good move ordering.

For general algorithms, I would suggest you to check the research done by the Alberta University AI Games group: http://games.cs.ualberta.ca Many of the algorithms there guarantee to find optimal policies. However, I doubt you're really interested in finding the optimal, aim for the "good enough" unless you want to sell that game in Korea :D
From your description, I have understood the game to be a two-player with full-observability i.e. no hidden units and such and fully deterministic i.e. player's actions outcomes do not require rolling, then you should take a look at the real-time bounded-search minimax derivatives proposed by the U Alberta guys. However, being able to do bound as well the depth of the backups of the value function would perhaps be a nice way to add a "difficulty level" to your game. They have been doing some work - a bit fishy imo - on sampling the search space for improving value function estimates.
About the "strategy" section you describe: in the framework I am mentioning, you will have to encode that knowledge as an evaluation function. Look at the work of Michael Büro and others - also in the U Alberta group - for examples of such knowledge engineering.
Another possibility would be to pose the problem as a Reinforcement Learning problem, where adversary moves are compiled as "afterstates". Look that up on the Barto & Sutton book: http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html However the value function for a RL problem resulting from such a compilation might prove a bit difficult to solve optimally - the number of states will blow up like an H-Bomb. However, if you see how to use a factored representation, things can be much easier. And your "strategy" could perhaps be encoded as some shaping function, which would be speeding up the learning process considerably.
EDIT: Damn English prepositions

How to program a neural network for chess?

I want to program a chess engine which learns to make good moves and win against other players. I've already coded a representation of the chess board and a function which outputs all possible moves. So I only need an evaluation function which says how good a given situation of the board is. Therefore, I would like to use an artificial neural network which should then evaluate a given position. The output should be a numerical value. The higher the value is, the better is the position for the white player.
My approach is to build a network of 385 neurons: There are six unique chess pieces and 64 fields on the board. So for every field we take 6 neurons (1 for every piece). If there is a white piece, the input value is 1. If there is a black piece, the value is -1. And if there is no piece of that sort on that field, the value is 0. In addition to that there should be 1 neuron for the player to move. If it is White's turn, the input value is 1 and if it's Black's turn, the value is -1.
I think that configuration of the neural network is quite good. But the main part is missing: How can I implement this neural network into a coding language (e.g. Delphi)? I think the weights for each neuron should be the same in the beginning. Depending on the result of a match, the weights should then be adjusted. But how? I think I should let 2 computer players (both using my engine) play against each other. If White wins, Black gets the feedback that its weights aren't good.
So it would be great if you could help me implementing the neural network into a coding language (best would be Delphi, otherwise pseudo-code). Thanks in advance!

In case somebody randomly finds this page. Given what we know now, what the OP proposes is almost certainly possible. In fact we managed to do it for a game with much larger state space - Go ( https://deepmind.com/research/case-studies/alphago-the-story-so-far ).

I don't see why you can't have a neural net for a static evaluator if you also do some classic mini-max lookahead with alpha-beta pruning. Lots of Chess engines use minimax with a braindead static evaluator that just adds up the pieces or something; it doesn't matter so much if you have enough levels of minimax. I don't know how much of an improvement the net would make but there's little to lose. Training it would be tricky though. I'd suggest using an engine that looks ahead many moves (and takes loads of CPU etc) to train the evaluator for an engine that looks ahead fewer moves. That way you end up with an engine that doesn't take as much CPU (hopefully).
Edit: I wrote the above in 2010, and now in 2020 Stockfish NNUE has done it. "The network is optimized and trained on the [classical Stockfish] evaluations of millions of positions at moderate search depth" and then used as a static evaluator, and in their initial tests they got an 80-elo improvement when using this static evaluator instead of their previous one (or, equivalently, the same elo with a little less CPU time). So yes it does work, and you don't even have to train the network at high search depth as I originally suggested: moderate search depth is enough, but the key is to use many millions of positions.

Been there, done that. Since there is no continuity in your problem (the value of a position is not closely related to an other position with only 1 change in the value of one input), there is very little chance a NN would work. And it never did in my experiments.
I would rather see a simulated annealing system with an ad-hoc heuristic (of which there are plenty out there) to evaluate the value of the position...
However, if you are set on using a NN, is is relatively easy to represent. A general NN is simply a graph, with each node being a neuron. Each neuron has a current activation value, and a transition formula to compute the next activation value, based on input values, i.e. activation values of all the nodes that have a link to it.
A more classical NN, that is with an input layer, an output layer, identical neurons for each layer, and no time-dependency, can thus be represented by an array of input nodes, an array of output nodes, and a linked graph of nodes connecting those. Each node possesses a current activation value, and a list of nodes it forwards to. Computing the output value is simply setting the activations of the input neurons to the input values, and iterating through each subsequent layer in turn, computing the activation values from the previous layer using the transition formula. When you have reached the last (output) layer, you have your result.

It is possible, but not trivial by any means.
https://erikbern.com/2014/11/29/deep-learning-for-chess/
To train his evaluation function, he utilized a lot of computing power to do so.
To summarize generally, you could go about it as follows. Your evaluation function is a feedforward NN. Let the matrix computations lead to a scalar output valuing how good the move is. The input vector for the network is the board state represented by all the pieces on the board so say white pawn is 1, white knight is 2... and empty space is 0. An example board state input vector is simply a sequence of 0-12's. This evaluation can be trained using grandmaster games (available at a fics database for example) for many games, minimizing loss between what the current parameters say is the highest valuation and what move the grandmasters made (which should have the highest valuation). This of course assumes that the grandmaster moves are correct and optimal.

What you need to train a ANN is either something like backpropagation learning or some form of a genetic algorithm. But chess is such an complex game that it is unlikly that a simple ANN will learn to play it - even more if the learning process is unsupervised.
Further, your question does not say anything about the number of layers. You want to use 385 input neurons to encode the current situation. But how do you want to decide what to do? On neuron per field? Highest excitation wins? But there is often more than one possible move.
Further you will need several hidden layers - the functions that can be represented with an input and an output layer without hidden layer are really limited.
So I do not want to prevent you from trying it, but chances for a successful implemenation and training within say one year or so a practically zero.
I tried to build and train an ANN to play Tic-tac-toe when I was 16 years or so ... and I failed. I would suggest to try such an simple game first.

The main problem I see here is one of training. You say you want your ANN to take the current board position and evaluate how good it is for a player. (I assume you will take every possible move for a player, apply it to the current board state, evaluate via the ANN and then take the one with the highest output - ie: hill climbing)
Your options as I see them are:
Develop some heuristic function to evaluate the board state and train the network off that. But that begs the question of why use an ANN at all, when you could just use your heuristic.
Use some statistical measure such as "How many games were won by white or black from this board configuration?", which would give you a fitness value between white or black. The difficulty with that is the amount of training data required for the size of your problem space.
With the second option you could always feed it board sequences from grandmaster games and hope there is enough coverage for the ANN to develop a solution.
Due to the complexity of the problem I'd want to throw the largest network (ie: lots of internal nodes) at it as I could without slowing down the training too much.

Your input algorithm is sound - all positions, all pieces, and both players are accounted for. You may need an input layer for every past state of the gameboard, so that past events are used as input again.
The output layer should (in some form) give the piece to move, and the location to move to.
Write a genetic algorithm using a connectome which contains all neuron weights and synapse strengths, and begin multiple separated gene pools with a large number of connectomes in each.
Make them play one another, keep the best handful, crossover and mutate the best connectomes to repopulate the pool.

Read blondie24 : http://www.amazon.co.uk/Blondie24-Playing-Kaufmann-Artificial-Intelligence/dp/1558607838.
It deals with checkers instead of chess but the principles are the same.

Came here to say what Silas said. Using a minimax algorithm, you can expect to be able to look ahead N moves. Using Alpha-beta pruning, you can expand that to theoretically 2*N moves, but more realistically 3*N/4 moves. Neural networks are really appropriate here.
Perhaps though a genetic algorithm could be used.

Spatial Data Structures in C

I do work in theoretical chemistry on a high performance cluster, often involving molecular dynamics simulations. One of the problems my work addresses involves a static field of N-dimensional (typically N = 2-5) hyper-spheres, that a test particle may collide with. I'm looking to optimize (read: overhaul) the the data structure I use for representing the field of spheres so I can do rapid collision detection. Currently I use a dead simple array of pointers to an N-membered struct (doubles for each coordinate of the center) and a nearest-neighbor list. I've heard of oct- and quad- trees but haven't found a clear explanation of how they work, how to efficiently implement one, or how to then do fast collision detection with one. Given the size of my simulations, memory is (almost) no object, but cycles are.

How best to approach this for your problem depends on several factors that you have not described:
- Will the same hypersphere arrangement be used for many particle collision calculations?
- Are the hyperspheres uniform size?
- What is the movement of the particle (e.g. straight line/curve) and is that movement affected by the spheres?
- Do you consider the particle to have zero volume?
I assume that the particle does not have simple straight line movement as that would be the relatively fast calculation of finding the closest point between a line and a point, which is likely going to be about the same speed as finding which of the boxes the line intersects with (to determine where in the n-tree to examine).
If your hypersphere positions are fixed for a lot of particle collisions then computing a voronoi decomposition/Dirichlet tessellation would give you a fast way of later finding exactly which sphere is closest to your particle for any given point in the space.
However to answer your original question about octrees/quadtrees/2^n-trees, in n dimensions you start with a (hyper)-cube that contains the area of space that you are interested in. This will be subdivided into 2^n hypercubes if you deem the contents to be too complicated. This continues recursively until you have only simple elements (e.g. one hypersphere centroid) in the leaf nodes.
Now that the n-tree is built you use it for collision detection by taking the path of your particle and intersecting it with the outer hypercube. The intersection position will tell you which hypercube in the next level down of the tree to visit next, and you determine the position of intersection with all 2^n hypercubes at that level, following downwards until you reach a leaf node. Once you reach the leaf you can examine interactions between your particle path and the hypersphere stored at that leaf. If you have collision you have finished, otherwise you have to find the exit point of the particle path from the current hypercube leaf and determine which hypercube it moves to next. Continue until you find a collision or entirely leave the overall bounding hypercube.
Efficiently finding the neighbouring hypercube when exiting a hypercube is one of the most challenging parts of this approach. For 2^n trees Samet's approaches {1, 2} can be adapted. For kd-trees (binary trees) an approach is suggested in {3} section 4.3.3.
Efficient implementation can be as simple as storing a list of 8 pointers from each hypercube to its children hypercubes, and marking the hypercube in a special way if it is a leaf (e.g. make all pointers NULL).
A description of dividing space to create a quadtree (which you can generalise to n-tree) can be found in Klinger & Dyer {4}
As others have mentioned kd-trees may be more suited than 2^n-trees as extension to an arbitrary number of dimensions is more straightforward, however they will result in a deeper tree. It is also easier to adapt the split positions to match the geometry of your
hyperspheres with a kd-tree. The description above of collision detection in a 2^n tree is equally applicable to a kd-tree.
{1} Connected Component Labeling, Hanan Samet, Using Quadtrees Journal of the ACM Volume 28 , Issue 3 (July 1981)
{2} Neighbor finding in images represented by octrees, Hanan Samet, Computer Vision, Graphics, and Image Processing Volume 46 , Issue 3 (June 1989)
{3} Convex hull generation, connected component labelling, and minimum distance
calculation for set-theoretically defined models, Dan Pidcock, 2000
{4} Experiments in picture representation using regular decomposition, Klinger, A., and Dyer, C.R. E, Comptr. Graphics and Image Processing 5 (1976), 68-105.

It sounds like you'd want to implement a kd-tree, which would allow you to more quickly search the N-dimensional space. There's some more information and links to implementations at the Stony Brook Algorithm Repository.

Since your field is static (by which I'm assuming you mean that the hyper spheres don't move), then the fastest solution I know of is a Kdtree.
You can either make your own, or use someone else's, like this one:
http://libkdtree.alioth.debian.org/

A Quad tree is a 2 dimensional tree, in which at each level a node has 4 children, each of which covers 1/4 of the area of the parent node.
An Oct tree is a 3 dimensional tree, in which at each level a node has 8 children, each of which contains 1/8th of the volume of the parent node. Here is picture to help you visualize it: http://en.wikipedia.org/wiki/Octree
If you're doing N dimensional intersection tests, you could generalize this to an N tree.
Intersection algorithms work by starting at the top of the tree and recursively traversing into any child nodes that intersect the object being tested, at some point you get to leaf nodes, which contain the actual objects.

An octree will work as long as you can specify the spheres by their centres - it hierarchically bins points into cubic regions with eight children. Working out neighbours in an octree data structure will require you to do sphere-intersecting-cube calculations (to some extent easier than they look) to work out which cubic regions in an octree are within the sphere.
Finding the nearest neighbours means walking back up the tree until you get a node with more than one populated child and all surrounding nodes included (this ensures the query gets all sides).
From memory, this is the (somewhat naive) basic algorithm for sphere-cube intersection:
i. Is the centre within the cube (this gets the eponymous situation)
ii. Are any of the corners of the cube within radius r of the centre (corners within the sphere)
iii. For each surface of the cube (you can eliminate some of the surfaces by working out which side of the surface the centre lies on) work out (this is all first-year vector arithmetic):
a. A normal of the surface that goes to the centre of the sphere
b. The distance from the centre of the sphere to the intersection of the normal with the plane of the surface (chord intersets plane the surface of the cube)
c. Intersection of the plane lies within the side of the cube (one condition of chord intersection to the cube)
iv. Calculate the size of the chord (Sin of Cos^-1 of ratio of normal length to radius of sphere)
v. If the nearest point on the line is less than the distance of the chord and the point lies between the ends of the line the chord intersects one of the edges of the cube (chord intersects cube surface somewhere along one of the edges).
Slightly dimly remembered but this is something I did for a situation involving spherical regions using an octee data structure (many years ago). You may also wish to check out KD-trees as some of the other posters suggest but your initial question sounds very similar to what I did.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight