Giving the shape as output using GA - artificial-intelligence

Giving the shape as output using GA - artificial-intelligence

The scenario is, I want to get the output as a shape, when the number of edges, vertices and the interior angle is given as input. And am trying to do this using Genetic Algorithms.
My problem is, am having a starting trouble. How would I create the initial population randomly for this case? And how could I define the chromosomes in bitwise representation?
I was referring some PPTs.
But in my case, I think I can't represent the chromosome as bits. Because it's numeric value that I would be giving isn't it? Any clues to make me move forward?

Genetic Algorithms don't have to be represented as bits, although I prefer to do it this way. The best way is probably to just convert the numbers from binary to whatever form you need to represent your shapes and back again.
You can either scale the binary or clip the edges to make it fit whatever boundary you need.
In terms of initialisation all you need to do is work out how many bits you need to represent all your input and generate this randomly. For example, if you wanted 3 whole numbers between 0-255 you would need 24 bits (8 * 3). Just randomly generate this number for each chromosome in the population. When creating the shape you just split the chromosome into 3, convert into your 3 whole numbers and use them.

Related

From int array of 4096 elements to a float between 0 and 1

I am currently working on a project and I need to work with images.
However, my images are 64*64 sized, so when I load one, I have a 4096 int array.
I would like to convert this array to a float that is between 0 and 1 (and of course I will need the function that need to build an image from a float).
Do you have any idea or suggestion of how to do it ?
Because I need to make an algorithm but I don't really know how to proceed.
Best regards and thank you.

The only way this could make some sense is if the image is binary (1 bit per pixel)
but even then the lossless naive conversion will take 64x64 bits which is far from what single 32bit float can do. So there is some piece of info missing. To make this possible you need introduce some kind of compression but even that could be not enough unless lossy compression used. Anyway you should add some sample images so we see what are you dealing with.
I am afraid the only usable compression for this would be using DCT (like in JPEG) on the full image. So do a DCT of the image and store only first few coefficients. for example if 4 bit coefficients used then you can store 32/4=8 coefficients which could be enough but hard to say if 4 bits will be enough to reconstruct the image back.
In similar cases visual hashes are used
but you have no way to turn them back to the original image. They are pretty much the same as hashes but their binary representation is visually similar to the image.
float is really not a good way for this
due to precision/rounding problems. You are loosing more bits then if just integer type would be used. Yes you can use integer type stored as float in integer format but the resulting float value can be jibberish with possibility of throwing exception if used as regular float.
If the target float should be in range <0.0,1.0> then exceptions will not occur but you can not use exponent nor sign for storage limiting the usable bits to only 23 from original 32.
When put all together without additional info I would:
Do a DCT on 64x64 image matrix
use only 1x4bit + 6*3bit top left corner matrix cells
encode into mantisa bits by concatenating
mantissa = coeff0+coeff1<<4+coef2<<7+coef3<<10+...
set sign and exponent to set range to <0.0,1.0>
If I am not mistaking sign=0 and exponent=-1 + 32bit_float_bias
put the integer parts of float to floating value
union x { float f; DWORD dw; }
DWORD sign=...,mantisa=...,exponent=...;
x.dw=sign<<31;
x.dw|=exponent<<23;
x.dw|=mantisa;
return x.f;
To obtain back the image (at least something close to it) reverse the steps. Yo can improve quality with introducing of some filters to get closer to your original images. But without actually seeing any of them is hard to tell which one to use or if even possible...

How to normalize multiple array of different size in matlab

I use set of images for image processing in which each image generates unique code (Freeman chain code). The size of array for each image varies. However the value ranges from 0 to 7. For e.g. First image creates array of 3124 elements. Second image creates array of 1800 elements.
Now for further processing, I need a fixed size of those array. So, is there any way to Normalize it ?

There is a reason why you are getting different sized arrays when applying a chain code algorithm to different images. This is because the contours that represent each shape are completely different. For example, the letter C and D will most likely contain chain codes that are of a different length because you are describing a shape as a chain of values from a starting position. The values ranging from 0-7 simply tell you which direction you need to look next given the current position of where you're looking in the shape. Usually, chain codes have the following convention:
3 2 1
4 x 0
5 6 7
0 means to move to the east, 1 means to move north east, 2 means to move north and so on. Therefore, if we had the following contour:
o o x
o
o o o
With the starting position at x, the chain code would be:
4 4 6 6 0 0
Chain codes encode how we should trace the perimeter of an object given a starting position. Now, what you are asking is whether or not we can take two different contours with different shapes and represent them using the same number of values that represent their chain code. You can't because of the varying length of the chain code.
tl;dr
In general, you can't. The different sized arrays mean that the contours that are represented by those chain codes are of different lengths. What you are actually asking is whether or not you can represent two different and unrelated contours / chain codes with the same amount of elements.... and the short answer is no.
What you need to think about is why you want to try and do this? Are you trying to compare the shapes between different contours? If you are, then doing chain codes is not the best way to do that due to how sensitive chain codes are with respect to how the contour changes. Adding the slightest bit of noise would result in an entirely different chain code.
Instead, you should investigate shape similarity measures instead. An authoritative paper by Remco Veltkamp talks about different shape similarity measures for the purposes of shape retrieval. See here: http://www.staff.science.uu.nl/~kreve101/asci/smi2001.pdf . Measures such as the Hausdorff distance, Minkowski distance... or even simple moments are some of the most popular measures that are used.

sequences with the same order in an array - Identify sequences

I'm looking for a hint towards a solution of the problem:
Suppose there's an array with some numbers in ascending order and some in descending, for example [1,2,5,9,6,3,2,4,7,8] has sequences asc [1,2,5,9], desc [(9),6,3,2], asc [(2),4,7,8].
Now this isn't a problem, I could simply loop through an array and add them to some data structure, and when the direction changes - I store this structure somwhere and start filling next one.
What I've found tricky is if I want to have threshold of some sort. For example: [0,50,100,99,98,97,105,160]
So the sequence in descending order [(100), 99, 98, 97] could be neglected, because overall change is -3, whereas the sequence was increasing much more dramatically (+100) and as a result, the algorithm identifies only one sequence in ascending order.
I have tried the same method as above, simply adding all sequences in a data structure and then comparing the change in values of two consequtive items: (100 vs -3 means -3 can be neglected). But then the problem is if I have say this situation:
(example only in change of values from start to end of sequense)
[+100, -3, +1, -50]
in this situation I cannot neglect descending movement, because the numbers start to descend, then slightly ascend and again go down pretty significantly.
and it gets really confusing with stuff like that:
[+100, -3, +3, -3, +3, -50]
this is quick sketch of representation of what I am trying to achieve:
black lines represent initial data in an array, red thin lines are desired resulting output
Could somebody point me out in right direction? How would I approach this situation? Compare multiple sequences at a time slowly combining sequences together? Maybe I would need to go through sequences multiple times?
I'm not sure If I've come across problem like that and don't know working algorithm. This is a problem I've faced myself trying to analyse some data.

If I understand correctly, you expect your curve to be a succession of alternatively increasing and decreasing sequences, with a bit of added noise.
The usual way to get rid of noise is to filter data. There are millions of ways to do that, most of them requiring frequency analysis, but in your case you could probably get good enough results with something simple.
The main point is that the relevant variable is not the values in the array, but their variations.
Given N values, consider the array of N-1 elements holding the differences between two consecutive values.
[0,50,100,99,98,97,105,160] -> 50,100,-1,-1,-1,6,45
Now eliminate all values whose absolute value is below a given threshold (say 10 for instance)
-> 50,100,0,0,0,0,45
you can then detect a rising sequence by looking at streaks of all positive or null values (and the same for decreasing sequences, considering zero or negative values).
As for all filtering processes, you will have to find a sweet spot for your threshold. Too low and it will fail to eliminate insignificant variations, too high and it will wipe out significant slope inversions.

I don't know if I understand your problem correctly, but I had to do this kind of dimensionality reduction many times before, so I wrote a small javascript library to do so. It uses the Perceptually Important Points algorithm.
In the algorithm you can define a custom metric of the distance between three consecutive points (to measure how much a single point adds in entropy).
Here is a demonstration (in JS). It works kind like a heap, where you remove points that do not contribute so much to the overall entropy:
for(var i=0; i<data.length; i++)
heap.add(data[i]);
while(heap.minValue() < threshold)
heap.removeMin();
And here is the library.

Generate unique identifier for chess board

I'm looking for something like a checksum for a chess board with pieces in specific places. I'm looking to see if a dynamic programming or memoized solution is viable for an AI chess player. The unique identifier would be used to easily check if two boards are equal or to use as indices in the arrays. Thanks for the help.

An extensively used checksum for board positions is the Zobrist signature.
It's an almost unique index number for any chess position, with the requirement that two similar positions generate entirely different indices. These index numbers are used for faster and space efficient transposition tables / opening books.
You need a set of randomly generated bitstrings:
one for each piece at each square;
one to indicate the side to move;
four for castling rights;
eight for the file of a valid en-passant square (if any).
If you want to get the Zobrist hash code of a certain position, you have to xor all random numbers linked to the given feature (details: here and Correctly Implementing Zobrist Hashing).
E.g the starting position:
[Hash for White Rook on a1] xor [White Knight on b1] xor ... ( all pieces )
... xor [White castling long] xor ... ( all castling rights )
XOR allows a fast incremental update of the hash key during make / unmake of moves.
Usually 64bit are used as a standard size in modern chess programs (see The Effect of Hash Signature Collisions in a Chess Program).
You can expect to encounter a collision in a 32 bit hash when you have evaluated √ 232 == 216. With a 64 bit hash, you can expect a collision after about 232 or 4 billion positions (birthday paradox).

If you're looking for a checksum, the usual solution is Zobrist Hashing.
If you're looking for a true unique-identifier, the usual human-readable solution is Forsyth notation.
For a non-human-readable unique-identifier, you can store the type/color of the piece on each square using four-bits. Throw in another 3-bits for en-passant square, 4-bits for which castlings are still allowed, and one-bit for whose turn it is, and you end up with exactly 33 bytes for each board-setup.

You can use a checksum like md5, sha, just pass your chessboard cells as text, like:
TKBQKBHT
........
........
........
tkbqkbht
And get the checksum for generated text.
The checksum between one to other board will be different without any related value, at this point may be create a unique string (or array of bits) is the best way:
TKBQKBHT........................tkbqkbht
Because it will be unique too and is easily compare with others.

If two games achieve the same configuration through different moves or move orders, they should still be "equal". e.g. You shouldn't have to distinguish between which pawn is in a particular location, as long as the location is the same. You don't seem to really want to hash, but to uniquely and correctly distinguish between these board states.
One method is to use a 64x12 square-by-piecetype membership matrix. You can store this as a bit vector and then compare vectors for the check. e.g. the first 64 addresses in the vector might show which locations on the board contain pawns. The next 64 show locations which contain knights. You could let the first 6 sections show membership of white pieces and the final 6 show membership of black pieces.
Binary membership matrix pseudocode:
bool[] memberships = zeros(64*12);
move(pawn,a3,a2);
def move(piece,location,oldlocation):
memberships(pawn,location) = 1;
memberships(pawn,oldlocation) = 0;
This is cumbersome because you have to be careful how you implement it. e.g. make sure there is only one king maximum for each player. The advantage is that it only takes 768 bits to store a state.
Another way is a length-64 integer vector representing vectorized addresses for the board locations. In this case, the first 8 addresses might represent the state of the first row of the board.
Non-binary membership matrix pseudocode:
half[] memberships = zeros(64);
memberships[8] = 1; // white pawn at location a2
memberships[0] = 2; // white rook at location a1
...
memberships[63] = 11; // black knight at location g8
memberships[64] = 12; // black rook at location h8
The nice thing about the non-binary vector is you don't have as much freedom to accidently assign multiple pieces to one location. The downside is that it is now larger to store each state. Larger representations will be slower to do equality comparisons on. (in my example, assume each vector location stores a 16-bit half-word, we get 64*16=1014 bits to store one state compared to the 768 bits for the binary vector)
Either way, you'd probably want to enumerate each piece and board location.
enumerate piece {
empty = 0;
white_pawn = 1;
white_rook = 2;
...
black_knight = 11;
black_rook = 12;
}
enumerate location {
a1 = 0;
...
}
And testing for equality is just comparing two vectors together.

There are 64 squares. There are twelve different figures in chess that can occupy a square plus the possibility of no figure occupying it. Makes 13. You need 4 bits to represent those 13 (2^4 = 16). So you end up with 32 bytes to unambiguously store a chess board.
If you want to ease handling you can store 64 bytes instead, one byte per square, as bytes are easier to read and write.
EDIT: I've read some more on chess and have come to the following conclusion: Two boards are only the same, if all previous boards since last capture or pawn move are also the same. This is because of the threefold repetition rule. If for the third time the board looks exactly the same in a game, a draw can be claimed. So in spite of seeing the same board in two matches, it may be considered unfortunate in one match to make a certain move, so as to avoid a draw, whereas in the other match there is no such danger.
It is up to you, how you want to go about it. You would need a unique identifyer of variable length due to the variable number of previous boards to store. Well, maybe you take it easy, turn a blind eye to this and just store the last five moves to detect directly repetetive moves that could lead to a third repetion of positions, this being the most often occuring reason.
If you want to store moves with the board: There are 64x63=4032 thinkable moves (12 bits necessary), but many of them illegal of course. If I count correctly there are 1728 legal moves (A1->A2 = legal, A1->D2 illegal for instance), which would fit in 11 bits. I would still go for the 12 bits, however, as to make interpretion as easy as possible by storing 0/1 for A1->A2 and 62/63 for H7->H8.
Then there is the 50 moves rule. You don't have to store moves here. Only the number of moves since last capture or pawn move from 0 to 50 (that's enough; it doesn't matter whether it's 50, 51 or more). So another six bits for this.
At last: Black's or white's move? Enpassantable pawn? Castlingable rook? Some additional bits for this (or extension of the 13 occupancies to save some bits).
EDIT again: So if you want to use the board to compare with other matches, then "two boards are only the same, if all previous boards since last capture or pawn move are also the same" applies. If you only want to detect repetion of positions in the same game, however, then you should be fine by just using the 15 occupancies x 64 squares plus one bit for who's move it is.

Continuous Vs. discrete attributes

Could anyone please clarify the difference between continuous and discrete attributes?
Thanks.

I will try to explain with an example:
Suppose your table in the database has a column which stores the temperature of the day or say a furnace. The values for that column come from a continuous domain of temperature values.
If the table has a column named gender. Then that is discrete in the sense that only two or maybe three values comprise its domain.
I hope this helps.
cheers

(It's been a long while since I did any pure maths, so take this with a pinch of salt.)
Speaking theoretically, continuous attributes come from an infinite set (i.e. real numbers, you can make them as large or small as you need). Discrete attributes come from a finite or countably infinite set (i.e. integers).
Another way of looking at it is that continuous attributes can have infinitesimally small differences between one value and the next, while discrete attributes always have some limit on the difference between one value and the next.
Practically spoken, continuous attributes would be a floating-point type, where discrete would be integers or characters.

Simon Righarts is right, except for his final conclusion.
Since computer memory is always finite, the set of representible values of any type is by definition also always finite too, and therefore in computer science there is no such thing as "continuous TYPES (which I think was what you were really asking about, not "continuous attributes"). Well, at least not in that part of computer science that gets applied anywhere in real life.
The classical floating-point type, encoded in 32 bits, has a maximum of 2^32 representible values. The classical floating-point type, encoded in 64 bits, has a maximum of 2^64 representible values. Non-representible values are plain useless and not worth considering. BigInteger types, which take as many bytes as are needed to hold a value, are limited to a maximum of 2^(8*computermemorysize) representible values. All of them are very much finite.

Data can be Descriptive (like "high" or "fast") or Numerical (numbers).
And Numerical Data can be Discrete or Continuous:
Discrete data is counted,
Continuous data is measured
Discrete Data
Discrete Data can only take certain values.
Example 1: the number of students in a class we can't have half a student.
Example 1: the results of rolling 2 dice Only have the values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12 we can not have 2.1 or 3.5.
Continuous Data
Continuous Data can take any value (within a range).
Examples:
A person's height could be any value (within the range of human heights), not just certain fixed heights, time in a race you could even measure it to fractions of a second, A dog's weight, or length of a leaf.

Attributes:
Discrete Attribute
Has only a finite or countably infinite set of values
E.g., zip codes, profession, or the set of words in a collection of documents
Sometimes, represented as integer variables
Note: Binary attributes are a special case of discrete attributes
Continuous Attribute:
Has real numbers as attribute values
E.g., temperature, height, or weight
Practically, real values can only be measured and represented using a finite number of digits