Subtle Lack of Randomness in Maze Generation Algorithm in C - c

This concerns what I can only surmise is a flaw in somebody's code used to generate random mazes. The code is somewhat long, but most of it is commented-out options and/or not concerned specifically with the randomization.
I got the 2001x2001 maze from the link dllu put up and saved as a png here. From that, I have created this. To get the blue pattern, I started filling in dead ends starting in the bottom left-hand corner of the maze. According to the backtracker algorithm he used, that's the point at which the maze begins to be generated: so if you following the trail of dead ends created by that, you can systematically fill in all the dead ends on that side of the maze. In other words, the central blue mass represents the total accessible area starting from the bottom left, up to the sole frontier pixel at 2678 x 1086.
But there's something immediately anomalous, in that the blue "fractal" seems to repeat itself. Indeed, by overlaying one part of the fractal, rotated and mirrored, you can see there is an exact correspondence of shape. Another anomaly from this overlay maps part of one of the continents onto another, but strangely only a sliver of the landmass this time. Evidently these aren't the only auto-correspondences.
But besides the shape of the dead-end components, there is repetition of the actual pattern of the walls when you zoom in. Strangest of all, the repetition isn't exact, but only maybe 50-60% of the walls correspond. Zoomed and constrasted sample of a region:
Long question short, what in the code is causing this obscure lack of randomness?

The standard library function rand is often horribly implemented and your code is doing rand()%4 which exacerbates the problem because poor implementations tend to have even less randomness in the lower bits. Try replacing rand with a different random number generator.

Anonymous has properly indicated that rand is poorly implemented and that it has poor entropy in its lower bits.
To alleviate this issue, you could use a stronger pseudo-random number generator (like one suitable for cryptographic methods); however, the higher quality of the randomness tends to lead to longer time in the random number generating function.
I would initially attempt to pull out some of the upper level bits from your random generator. It might be sufficient to improve the result with a less-than perfectly random generator. As they are pulled from a different region of the result, and shifted to the 0-3 values you need, their distribution should be different (and hopefully a bit more random)

Related

How to account for move order in chess board evaluation

I am programming a Chess AI using an alpha-beta pruning algorithm that works at fixed depth. I was quite surprised to see that by setting the AI to a higher depth, it played even worse. But I think I figured it why so.
It currently works that way : All positions are listed, and for each of them, every other positions from that move is listed and so on... Until the fixed depth is reached : the board is evaluated by checking what pieces are present and by setting a value for every piece types. Then, the value bubbles up to the root using the minimax algorithm with alpha-beta.
But I need to account for the move order. For instance, there is two options, a checkmate in 2 moves, and another in 7 moves, then the first one has to be chosen. The same thing goes to taking a queen in whether 3 or 6 moves.
But since I only evaluate the board at the deepest nodes and that I only check the board as the evaluation result, it doesn't know what was the previous moves were.
I'm sure there is a better way to evaluate the game that can account for the way the pieces moved through the search.
EDIT: I figured out why it was playing weird. When I searched for moves (depth 5), it ended with a AI move (a MAX node level). By doing so, it counted moves such as taking a knight with a rook, even if it made the latter vulnerable (the algorithm cannot see it because it doesn't search deeper than that).
So I changed that and I set depth to 6, so it ends with a MIN node level.
Its moves now make more sense as it actually takes revenge when attacked (what it sometimes didn't do and instead played a dumb move).
However, it is now more defensive than ever and does not play : it moves its knight, then moves it back to the place it was before, and therefore, it ends up losing.
My evaluation is very standard, only the presence of pieces matters to the node value so it is free to pick the strategy it wants without forcing it to do stuff it doesn't need to.
Consedering that, is that a normal behaviour for my algorithm ? Is it a sign that my alpha-beta algorithm is badly implemented or is it perfectly normal with such an evaluation function ?
If you want to select the shortest path to a win, you probably also want to select the longest path to a loss. If you were to try to account for this in the evaluation function, you would have to the path length along with the score and have separate evaluation functions for min and max. It's a lot of complex and confusing overhead.
The standard way to solve this problem is with an iterative deepening approach to the evaluation. First you search deep enough for 1 move for all players, then you run the entire search again searching 2 moves for each player, etc until you run out of time. If you find a win in 2 moves, you stop searching and you'll never run into the 7 moves situation. This also solves your problem of searching odd depths and getting strange evaluations. It has many other benefits, like always having a move ready to go when you run out of time, and some significant algorithmic improvements because you won't need the overhead of tracking visited states.
As for the defensive play, that is a little bit of the horizon effect and a little bit of the evaluation function. If you have a perfect evaluation function, the algorithm only needs to see one move deep. If it's not perfect (and it's not), then you'll need to get much deeper into search. Last I checked, algorithms that can run on your laptop and see about 8 plys deep (a ply is 1 move for each player) can compete with strong humans.
In order to let the program choose the shortest checkmate, the standard approach is to give a higher value to mates that occur closer to the root. Of course, you must detect checkmates, and give them some score.
Also, from what you describe, you need a quiescence search.
All of this (and much more) is explained in the chess programming wiki. You should check it out:
https://chessprogramming.wikispaces.com/Checkmate#MateScore
https://chessprogramming.wikispaces.com/Quiescence+Search

Why does convolution with kernels work?

I don't understand how someone could come up with a simple 3x3 matrix called kernel, so when applied to the image, it would produce some awesome effect. Examples: http://en.wikipedia.org/wiki/Kernel_(image_processing) . Why does it work? How did people come up with those kernels (trial and error?)? Is it possible to prove it will always work for all images?
I don't understand how someone could come up with a simple 3x3 matrix called kernel, so when applied to the image, it would produce some awesome effect. Examples: http://en.wikipedia.org/wiki/Kernel_(image_processing).
If you want to dig into the history, you'll need to check some other terms. In older textbooks on image processing, what we think of as kernels today are more likely to be called "operators." Another key term is convolution. Both these terms hint at the mathematical basis of kernels.
http://en.wikipedia.org/wiki/Convolution
You can read about mathematical convolution in the textbook Computer Vision by Ballard and Brown. The book dates back to the early 80s, but it's still quite useful, and you can read it for free online:
http://homepages.inf.ed.ac.uk/rbf/BOOKS/BANDB/toc.htm
From the table of contents to the Ballard and Brown book you'll find a link to a PDF for section 2.2.4 Spatial Properties.
http://homepages.inf.ed.ac.uk/rbf/BOOKS/BANDB/LIB/bandb2_2.pdf
In the PDF, scroll down to the section "The Convolution Theorem." This provides the mathematical background for convolution. It's a relatively short step from thinking about convolution expressed as functions and integrals to the application of the same principles to the discrete world of grayscale (or color) data in 2D images.
You will notice that a number of kernels/operators are associated with names: Sobel, Prewitt, Laplacian, Gaussian, and so on. These names help suggest that there's a history--really quite a long history--of mathematical development and image processing research that has lead to the large number of kernels in common use today.
Gauss and Laplace lived long before us, but their mathematical work has trickled down into forms we can use in image processing. They didn't work on kernels for image processing, but mathematical techniques they developed are directly applicable and commonly used in image processing. Other kernels were developed specifically for processing images.
The Prewitt operator (kernel), which is quite similar to the Sobel operator, was published in 1970, if Wikipedia is correct.
http://en.wikipedia.org/wiki/Prewitt_operator
Why does it work?
Read about the mathematical theory of convolution to understand how one function can be "passed over" or "dragged" across another. That can explain the theoretical basis.
Then there's the question of why individual kernels work. In you look at the edge transition from dark to light in an image, and if you plot the pixel brightness on a 2D scatterplot, you'll notice that the values in the Y-axis increase rapidly about the edge transition in the image. That edge transition is a slope. A slope can be found using the first derivative. Tada! A kernel that approximates a first derivative operator will find edges.
If you know there's such a thing in optics as Gaussian blur, then you might wonder how it could be applied to a 2D image. Thus the derivation of the Gaussian kernel.
The Laplacian, for instance, is an operator that, according to the first sentence from the Wikipedia entry, "is a differential operator given by the divergence of the gradient of a function on Euclidean space."
http://en.wikipedia.org/wiki/Laplacian
Hoo boy. It's quite a leap from that definition to a kernel. The following page does a fine job of explaining the relationship between derivatives and kernels, and it's a quick read:
http://www.aishack.in/2011/04/the-sobel-and-laplacian-edge-detectors/
You'll also see that one form of the Laplacian kernel is simply named the "edge-finding" kernel in the Wikipedia entry you cited.
There is more than one edge-finding kernel, and each has its place. The Laplacian, Sobel, Prewitt, Kirsch, and Roberts kernels all yield different results, and are suited for different purposes.
How did people come up with those kernels (trial and error?)?
Kernels were developed by different people following a variety of research paths.
Some kernels (to my memory) were developed specifically to model the process of "early vision." Early vision isn't what happens only to early humans, or only for people who rise at 4 a.m., but instead refers to the low-level processes of biological vision: sensing of basic color, intensity, edges, and that sort of thing. At the very low level, edge detection in biological vision can be modeled with kernels.
Other kernels, such as the Laplacian and Gaussian, are approximations of mathematical functions. With a little effort you can derive the kernels yourself.
Image editing and image processing software packages will often allow you to define your own kernel. For example, if you want to identify a shape in an image small enough to be defined by a few connected pixels, then you can define a kernel that matches the shape of the image feature you want to detect. Using custom kernels to detect objects is too crude to work in most real-world applications, but sometimes there are reasons to create a special kernel for a very specific purpose, and sometimes a little trial and error is necessary to find a good kernel.
As user templatetypedef pointed out, you can think of kernels intuitively, and in a fairly short time develop a feel for what each would do.
Is it possible to prove it will always work for all images?
Functionally, you can throw a 3x3, 5x5, or NxN kernel at an image of the appropriate size and it'll "work" in the sense that the operation will be performed and there will be some result. But then the ability to compute a result whether it's useful or not isn't a great definition of "works."
One information definition of whether a kernel "works" is whether convolving an image with that kernel produces a result that you find useful. If you're manipulating images in Photoshop or GIMP, and if you find that a particular enhancement kernel doesn't yield quite what you want, then you might say that kernel doesn't work in the context of your particular image and the end result you want. In image processing for computer vision there's a similar problem: we must pick one or more kernels and other (often non-kernel based) algorithms that will operate in sequence to do something useful such as identify faces, measures the velocity of cars, or guide robots in assembly tasks.
Homework
If you want to understand how you can translate a mathematical concept into a kernel, it helps to derive a kernel by yourself. Even if you know what the end result of the derivation should be, to grok the notion of kernels and convolution it helps to derive a kernel from a mathematical function by yourself, on paper, and (preferably) from memory.
Try deriving the 3x3 Gaussian kernel from the mathematical function.
http://en.wikipedia.org/wiki/Gaussian_function
Deriving the kernel yourself, or at least finding an online tutorial and reading closely, will be quite revealing. If you'd rather not do the work, then you may not appreciate the way that some mathematical expression "translates" to a bunch of numbers in a 3x3 matrix. But that's okay! If you get the general sense of a common kernel is useful, and if you observe how two similar kernels produce slightly different results, then you'll develop a good feel for them.
Intuitively, a convolution of an image I with a kernel K produces a new image that's formed by computing a weighted sum, for each pixel, of all the nearby pixels weighted by the weights in K. Even if you didn't know what a convolution was, this idea still seems pretty reasonable. You can use it to do a blur effect (by using a Gaussian weighting of nearby pixels) or to sharpen edges (by subtracting each pixel from its neighbors and putting no weight anywhere else.) In fact, if you knew you needed to do all these operations, it would make sense to try to write a function that given I and K did the weighted sum of nearby pixels, and to try to optimize that function as aggressively as possible (since you'd probably use it a lot).
To get from there to the idea of a convolution, you'd probably need to have a background in Fourier transforms and Fourier series. Convolutions are a totally natural idea in that domain - if you compute the Fourier transformation of two images and multiply the transforms together, you end up computing the transform of the convolution. Mathematicians had worked that out a while back, probably by answering the very natural question "what function has a Fourier transform defined by the product of two other Fourier transforms?," and from there it was just a matter of time before the connection was found. Since Fourier transforms are already used extensively in computing (for example, in signal processing in networks), my guess is that someone with a background in Fourier series noticed that they needed to apply a kernel K to an image I, then recognized that this is way easier and more computationally efficient when done in frequency space.
I honestly have no idea what the real history is, but this is a pretty plausible explanation.
Hope this helps!
There is a good deal of mathematical theory about convolutions, but the kernel examples you link to are simple to explain intuitively:
[ 0 0 0]
[ 0 1 0]
[ 0 0 0]
This one says to take the original pixel and nothing else, so it yields just the original image.
[-1 -1 -1]
[-1 8 -1]
[-1 -1 -1]
This one says to subtract the eight neighbors from eight times the original pixel. First consider what happens in a smooth part of the image, where there is solid, unchanging color. Eight times the original pixel equals the sum of eight identical neighbors, so the difference is zero. Thus, smooth parts of the image become black. However, parts of the images where there are changes do not become black. Thus, this kernel highlights changes, so it highlights places where one shape ends and another begins: the edges of objects in the image.
[ 0 1 0]
[ 1 -4 1]
[ 0 1 0]
This is similar to the one above, but it is tuned differently.
[ 0 -1 0]
[-1 5 -1]
[0 -1 0]
Observe that this is just the negation of the edge detector above plus the first filter we saw, the one for the original image. So this kernel both highlights edges and adds that to the original image. The result is the original image with more visible edges: a sharpening effect.
[ 1 2 1]
[ 2 4 2]
[ 1 2 1]
[ 1 1 1]
[ 1 1 1]
[ 1 1 1]
Both of these blend the original pixel with its neighbors. So they blur the image a little.
There are two ways of thinking about (or encoding) an image: the spatial domain and the frequency domain. A spatial representation is based on pixels, so it's more familiar and easier to obtain. Both the image and the kernel are expressed in the spatial domain.
To get to the frequency domain, you need to use a Fourier or related transform, which is computationally expensive. Once you're there, though, many interesting manipulations are simpler. To blur an image, you can just chop off some high-frequency parts — like cropping the image in the spatial domain. Sharpening is the opposite, akin to increasing the contrast of high-frequency information.
Most of the information of an image is in the high frequencies, which represent detail. Most interesting detail information is at a small, local scale. You can do a lot by looking at neighboring pixels. Blurring is basically taking a weighted average of neighboring pixels. Sharpening consists of looking at the difference between a pixel and its neighbors and enhancing the contrast.
A kernel is usually produced by taking a frequency-domain transformation, then keeping only the high-frequency part and expressing it in the spatial domain. This can only be done for certain transformation algorithms. You can compute the ideal kernel for blurring, sharpening, selecting certain kinds of lines, etc., and it will work intuitively but otherwise seems like magic because we don't really have a "pixel arithmetic."
Once you have a kernel, of course, there's no need to get into the frequency domain at all. That hard work is finished, conceptually and computationally. Convolution is pretty friendly to all involved, and you can seldom simplify any further. Of course, smaller kernels are friendlier. Sometimes a large kernel can be expressed as a convolution of small sub-kernels, which is a kind of factoring in both the math and software senses.
The mathematical process is pretty straightforward and has been studied since long before there were computers. Most common manipulations can be done mechanically on an optical bench using 18th century equipment.
I think the best way to explain them is to start in 1d and discuss the z-transform and its inverse. That switches from the time domain to the frequency domain — from describing a wave as a timed sequence of samples to describing it as the amplitude of each frequency that contributes to it. The two representations contain the same amount of information, they just express it differently.
Now suppose you had a wave described in the frequency domain and you wanted to apply a filter to it. You might want to remove high frequencies. That would be a blur. You might want to remove low frequencies. That would be a sharpen or, in extremis, an edge detect.
You could do that by just forcing the frequencies you don't want to 0 — e.g. by multiplying the entire range by a particular mask, where 1 is a frequency you want to keep and 0 is a frequency you want to eliminate.
But what if you want to do that in the time domain? You could transfer to the frequency domain, apply the mask, then transform back. But that's a lot of work. So what you do (approximately) is transform the mask from the frequency domain to the time domain. You can then apply it in the time domain.
Following the maths involved for transforming back and forth, in theory to apply that you'd have to make each output sample a weighted sum of every single input sample. In the real world you make a trade-off. You use the sum of, say, 9 samples. That gives you a smaller latency and less processing cost than using, say, 99 samples. But it also gives you a less accurate filter.
A graphics kernel is the 2d analogue of that line of thought. They tend to be small because processing cost grows with the square of the edge length so it gets expensive very quickly. But you can approximate any sort of frequency domain limiting filter.

What is the most practical board representation for a magic bitboard move generation system?

I am rewriting a chess engine I wrote to run on magic bitboards. I have the magic functions written and they take a square of the piece and an occupancy bitboard as parameters. What I am having debates with myself is which one of these board representation schemes is faster/more practical:
scheme 1: There is a bitboard for each type of piece, 1 for white knights, 1 for black rooks. . . , and in order to generate moves and push them to the move stack, I must serialize them to find the square of the piece and then call the magic function. Then I must serialize that move bitboard and push them. The advantage is is that the attacking and occupancy bitboards are closer at hand.
scheme 2: A simple piece centric array [2][16] or [32] contains the square indices of the pieces. A simply loopthrough and call of the functions is all it takes for the move bitboards. I then serialize those bitboards and push them to the move stack. I also have to maintain an occupancy bitboard. I guess getting an attack bitboard shouldn't be any different: I have to once again generate all the move bitboards and, instead of serializing them, I bitwise operate them in a mashup of magic.
I'm leaning towards scheme 2, but for some reason I think there is some sort of implementation similar to scheme 1 that is standard. For some reason I can't find drawbacks of making a "bitboard" engine without actually using bitboards. I wouldn't even be using bitboards for king and knight data, just a quick array lookup.
I guess my question is more of whether there is a better way to do this board representation, because I remember reading that keeping a bitboard for each type of piece is standard (maybe this is only necessary with rotated bitboards?). I'm relatively new to bitboard engines but I've read a lot and I've implemented the magic method. I certainly like the piece centric array approach - it makes a lot of arbitrary stuff like printing the board to the screen easier, but if there is a better/equal/more standard way can someone please point it out? Thanks in advance - I know this is a fairly specific question and difficult to answer unless you are very familiar with chess programming.
Last minute question: how is the speed of a lookup into a 2D array measure up to using a 1D array and adding 16 * team_side to the normal index to lookup the piece?
edit: I thought I should add that I am valuing speed over almost all else in my chess implementation. Why else would I go with magic bitboards as opposed to simply arrays with slide data?
There is no standard answer to this, sorry.
The number and types of data structures you need depends on exactly what you want to do in your program. For example, having more than one representation of the pieces on the board makes some operations faster. On the other hand, it takes more time to update your data during each move.
To get the maximum speed, it is your job to find out what works best for your program. Does maintaining an extra array of pieces result in a net speedup for a program? It depends!
Sometimes it is a net gain to maintain a data structure, sometimes you can delay the calculations and cache the result, and sometimes you just calculate it when needed (and hope it isn't needed very often).

Why is Faile so much faster than The Simple Chess Program (TSCP)? (Chess engine optimization)

I hope this isn't too much of an arbitrary question, but I have been looking through the source codes of Faile and TSCP and I have been playing them against each other. As far as I can see the engines have a lot in common, yet Faile searches ~1.3 million nodes per second while TSCP searches only 300k nodes per second.
The source code for faile can be found here: http://faile.sourceforge.net/download.php. TSCP source code can be found here: http://www.tckerrigan.com/Chess/TSCP.
After looking through them I see some similarities: both use an array board representation (although Faile uses a 144 size board), both use a alpha beta search with some sort of transposition table, both have very similar evaluate functions. The main difference I can find is that Faile uses a redundant representation of the board by also having arrays of the piece locations. This means that when the moves are generated (by very similar functions for both programs), Faile has to for loop through fewer bad pieces, while maintaining this array costs considerably fewer resources.
My question is: why is there a 4x difference in the speed of these two programs? Also, why does Faile consistently beat TSCP (I estimate about a ~200 ELO difference just by watching their moves)? For the latter, it seems to be because Faile is searching several plies deeper.
Short answer: TSCP is very simple (as you can guess from its name). Faile is more advanced, some time was spent by developers to optimize it. So it is just reasonable for Faile to be faster, which means also deeper search and higher ELO.
Long answer: As far as I remember, the most important part of the program, using alpha beta search (part which influences performance the most), is move generator. TSCP's move generator does not generate moves in any particular order. Faile's generator (as you noticed), uses piece list, which is sorted in order of decreasing piece value. This means it generates more important moves first. This allows alpha-beta pruning to cut more unneeded moves and makes search tree less branchy. And less branchy tree may be deeper and still have the same number of nodes, which allows deeper search.
Here is a very simplified example how the order of moves allows faster search. Suppose, last white's move was silly - they moved some piece to unprotected position. If we find some black's move that removes this piece, we can ignore all other, not yet estimated moves and return back to processing white's move list. Queen controls much more space than a pawn, so it has more chances to remove this piece, so if we look at queen's moves first, we can more likely skip more unneeded moves.
I didn't compare other parts of these programs. But most likely, Faile optimizes them better as well. Things like alpha-beta algorithm itself, variable depth of the search tree, static position analysis may be also optimized.
TSCP has not hash tables (-75 ELO).
TSCP has not Killers moves for ordering (-50 ELO).
TSCP has not null move (-100 ELO).
TSCP has a bad attack function design (-25 ELO).
In these 4 things you have about a difference of 250 points ELO. This will increase the number of nodes per second but you can not compare nodes per second on different engines as programmers can use a different interpretation of what is a node.

Generating totally random numbers without random function? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
True random number generator
I was talking to a friend the other day and we were trying to figure out if it is possible to generate completely random numbers without the help of a random function? In C for example "rand" generates pseudo-random numbers. Or we can use something like "srand( time( NULL ) );" This will allow the computer to read numbers from its clock as seed values. So if I understand everything I have read so far right, then I am pretty sure that no random function actually produces truely random numbers. How would one write a program that generates numbers that are completely random and what would code look like?
Check out this question:
True random number generator
Also, from wikipedia's entry on pseudorandom numbers
As John von Neumann joked, "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."
The excellent random.org website provides hardware-based random numbers as well as a number of software interfaces to retrieve these.
This can be used e.g. for genuinely unpredictable seeds or for 'true' random numbers. Being a web service, there are limits on the number of draws you can make, so don't try to use this for your graduate school monte carlo simulation.
FWIW, I wrapped one of those interface in the R package random.
It would look like:
int random = CallHardwareRandomGenerator();
Even with hardware, randomness is tricky. There are things which are physically random (atomic decay is random, but with predictable average amounts, so that can be used as a source of random information) there are things that are physically random enough to make prediction impractical (this is how casinos make money).
There are things that are largely indeterminate (mix up information from key-stroke rate, mouse-movements, and a few things like that), which are a good-enough source of "randomness" for many uses.
Mathematically, we cannot produce randomness, but we can improve distribution and make something harder to predict. Cryptographic PRNGs do a stronger job at this than most, but are more expensive in terms of resources.
This is more of a physics question I think. If you think about it nothing is random, it just occurs due to events the complexity of which make them unpredictable to us. A computer is a subsystem just like any other in the universe and by giving it unpredictable external inputs (RTC, I/O garbage) we can get the same kind of randomness that that a roulette wheel gets from varying friction, air resistance, initial impulse and millions of factors that I can't wrap my head around.
There's room for a fair amount of philosophical debate about what "truly random" really even means. From a practical viewpoint, even sources we know aren't truly random can be used in ways that produce what are probably close enough for almost any practical purpose though (in particular, that at least with current technology, full knowledge of the previously produced bitstream appears to be insufficient to predict the next bit accurately). Most of those do involve a bit of extra hardware though -- for example, it's pretty easy to put a source together from a little bit of Americium out of a smoke detector.
There are quite a few more sources as well, though they're mostly pretty low bandwidth (e.g., collect one bit for each keystroke, based on whether the interval between keystrokes was an even or odd number of CPU clocks -- assuming the CPU clock and keyboard clock are derived from separate crystals). OTOH, you have to be really careful with this -- a fair number of security holes (e.g., in Netscape around v. 4.0 or so) have stemmed from people believing that such sources were a lot more random than they really were.
While there are a number of web sites that produce random numbers from hardware sources, most of them are useless from a viewpoint of encryption. Even at best, you're just trusting your SSL (or TLS) connection to be secure so nobody captured the data you got from the site.

Resources