pattern recognition - "is this a pattern?"

pattern recognition - "is this a pattern?" - c

I have a large vector of numbers, say 500 numbers. I would like a program to detect patterns (reoccurrence in this case) in such vector based on following rules:
A sequence of numbers is a pattern if:
The size of the sequence is between 3 and 20 numbers.
The RELATIVE positions of the numbers in sequence is repeated at
least one other time in a vector. So let's say if I have a sequence
(1,4,3) and then (3,6,5) somewhere else in the vector then (1,4,3) is
a pattern. (as well as (2,5,4), (3,6,5) etc.)
The sequences can't intersect. So, a vector (1,2,3,4,5) does not
contain patterns (1,2,3) and (3,4,5)(we can't use the same number for
both sequences). However, (1,2,3,3,4,5) does contain a pattern
(1,2,3) (or (3,4,5))
A subset A of a pattern B is a pattern ONLY IF A appears somewhere
else outside B. So, a vector (1,2,3,4,7,8,9,2,3,4,5) would contain
patterns (1,2,3,4) and (1,2,3), because (1,2,3,4) is repeated (in a
form of (2,3,4,5)) and (1,2,3) is repeated (in a form (7,8,9)).
However, if the vector was (1,2,3,4,2,3,4,5) the only pattern will
be (1,2,3,4), because (1,2,3) appeares only in context of (1,2,3,4).
I'd like to know several things:
First of all I hope the rules don't go against each other. I made them myself so there might be a clash somewhere that I didn't notice, please let me know if you do notice it.
Secondly, how would one implement such system in the most efficient way? Maybe someone can point out towards some particular literature on the subject? I could go number by number starting with searching a sequence repetition for all subsets of 3, then 4,5 and till 20. But that seems to be not very efficient..
I am interested in implementation of such system in C, but any general guidance is very welcome.
Thank you in advance!

Just a couple of observations:
If you're interested in relative values, then your first step should be to calculate the differences between adjacent elements of the vector, e.g.:
Original numbers:
1 4 3 2 5 1 1 3 6 5 6 2 5 4 4 4 1 4 3 2
********* ********* ********* *********
Difference values:
3 -1 -1 3 -4 0 2 3 -1 1 4 3 -1 -3 0 -3 3 -1 -1
****** ****** ****** ******
Once you've done that, you could use an autocorrelation method to look for repeated patterns in the data. This can be computed in O(n log n) time, and possibly even faster if you're only concerned with exact matches.

Related

How to check n-puzzle with two blanks is Solvable?

I have modified the question on n-puzzle. In this scenario, the puzzle has two blanks instead of one blank.
Initial State
3 5 1
4 6 -
7 2 -
Goal State
- 1 7
3 2 -
5 6 4
Is there any algorithm that I can use for this?

All existing algorithms that solve the regular sliding tile puzzle (such as A* or IDA*) can solve this variant as well. The puzzle with multiple blanks is equivalent to a pattern database for the sliding-tile puzzle - the exact solution to the puzzle with some pieces replaced with blanks can be used as a heuristic for the original puzzle with only a single blank.
(To be precise they are equivalent to additive pattern databases. You can combine several together and add their heuristic values as long as the action cost of swapping two blanks is 0 and none of the tiles are duplicated.)

Best approach for finding the maximum array element in a given range

Given a non-negative integer array of length n and m queries consisting of two integers a and b, it is expected to find the maximum in the range of index [a,b] of the array. Note that a can be greater than b, in which case the desired range is from a to n and then from 1 to b. And an input k is also given that signifies that the length of the range to be considered is also constant that is constant
Example:
INPUT:
6 3 5 ---> n,m,k
7 6 2 6 1 5 ---> integer array
1 5 ---> query 1
2 6 ---> query 2
4 2 ---> query 3
OUTPUT:
7
6
7
I referred this article but am not able to get how to take care of the cases where a>b. Is there any other approach for this problem

Sliding window approach:
To solve the problem using approach mentioned i.e. Sliding Window Maximum, Just append the input array to itself like as shown below:
7 6 2 6 1 5 7 6 2 6 1 5
For a<=b case work as normal.
For a>bcase: Consider b = a + k. So your new range is [a,a+k] which you can happily solve without any changes to algorithm.
To optimize the above approach a bit, you can just append first k elements.
If you slide over every time a query arrives, it takes O(n) per query. k being very close or equal to n is the worst case.
Alternative Approach: Use the following approach in case of heavy querying and flexible ranges.
You are looking for range queries and this is what Segment Trees are popular for.
This tutorial finds the minimum in given range. I know you have asked for maximum, which is just a trivial change you have to make in code.
For a>b case, query two times once for [1,b] & then for [a,n] and report the maximum out of the two.
Preprocessing time: O(n)
Extra Space: O(n)
This approach is very efficient as it will answer every query in O(logn) which is quite helpful in case you are querying too much.
Sliding Window is going to output maximum element in all the ranges, but you need the maximum element only in given range. So instead of going with Sliding Window approach go with Segment Trees or Binary Indexed Trees. You'll feel the fun of truly querying within a range and not sliding over. (Just sliding over every time a query arrives won't scale if the range is flexible.)

I think this could be done by using divide and conquer approach, so let's take a look at the above example.
So for the case a>b
find max for range (1,b), say max_b = max_in_range(1,b).
find max for range (a,n), say max_a = max_in_range(a,n).
Now you can easily take up max between two numbers using a in built max method in any language as
ans = max(max_a, max_b)
But problems like this which involes ranges, you can solve it using segment trees, here is the link to start with - https://en.wikipedia.org/wiki/Segment_tree
Hope this helps!

Algorithm so that i can index 2^n combinations in a way so i can backtrack from any index value of 1 to 2^n without using an array

I am trying to do something but it is outside my field. To explain lets set n=3 to simplify things where n is the total number of the parameters in this example: A, B, C. These parameters can have a state of ON and OFF (aka 0 or 1).
The total number of combinations of these parameters is 2^n = 8 in this case which can be visualized as:
ABC
1: 000
2: 111
3: 100
4: 010
5: 001
6: 110
7: 011
8: 101
Of course the above list can be sorted in (2^n)! = 40320 ways.
I want an algorithm so that i can calculate the state of any of my parameters (0 or 1) given a number from 1 to 2^n. For example if i have the number of 3 using the table above i know state of A is 1 and B and C is 0. Of course you can have a table/array to look it up given a specific sorting, but even for relatively small values of n you need to have a huge table.
I'm not familiar with this and the methods you can do indexing that's why i need help.
Kind regards

Just realised you can actually look at it another way. What you want is a function encrypting N bits to another set of N bits. In practice this is the same as format preserving encryption. The question is, do you care whether:
all 2^n cases are covered, or just a large enough number close to 2^n (you have to choose the right encryption/hash method)
you want to do this one way or both ways (that is, do you ever want to ask - I have this number corresponding to that number, which permutation am I using)
If the answer is no to both, you can just find an FPE algorithm that doesn't require you to generate the whole table (some do).

I have seen another problem of finding all subsets of a given set using bitmask. You can use the same concept in your case. This link contains a good tutorial.

Heuristic for shifting array

Given a goal state
int final[3][3]={{1,2,3},
{4,5,6},
{7,8,9}};
and a random initial state, I want to sort my array as final only by shifting rows (right or left) and columns (up and down) of my table
7 8 4 by shifting to the right the first row it will become 4 7 8
2 1 9 2 1 9
6 5 3 6 5 3
So I want to use a* search and I'm trying to find a good heuristic.
I've already tried with misplaced array elements.
Any suggestions?

I view this as an algebraic problem. You are given a group of permutation which is generated by 6 cycles (3 rows and 3 columns) and you want to find some more moves which help you to get to any permutation.
First advice: not all permutations are possible! Since every shift is an even permutation (a 3-cycle is the composition of two transpositions) only even permutations are possible. Hence you will not find any solution to a configuration where all is in place but two swapped numbers as in (2,1,3),(4,5,6),(7,8,9).
Second advice. If r is a row shift and c is a coumn shift, compute the action of rcr'c' where r' and c' are the inverse shifts. This "commutator" is again a cycle of 3 elements but this time they are not in a row or column. By choosing different r and c you get a lot of 3-cycles which can be used in the third advice.
Third advice. Consider the region of numbers which are already in their final position. Apply 3-cycles to the complement of this set to reduce it, until you get to a solution.

FFT and convolution

Im writing for school 2dFFT using on image filtering.
And I have problem with filter matrix.
I made my fft so it accepts 2^n input, and all filter matrix are odd numbers.
So I need solution to somehow transform filter matrix to acceptable input for my function.
I have next idea and Im not sure how it will work.
If I have filter matrix:
1 2 3
4 5 6
7 8 9
To transform it to:
0 0 0 0
1 2 3 0
4 5 6 0
7 8 9 0
And when Im matching "center" of matrix with my pixel, match center of "submatrix" and after that extract values I need.
Is that possible?
Also Can someone tell me what is max size of filter I can get? Is it larger than lets say 32x32?

Filter masks are used to express filters with compact support. Compact support means that the signal has non-zero values only in a limited range. By extending your filter mask with zero values, you are in fact doing a natural thing. The zeros are part of the original filter.
The real problem however is a different thing. I assume that you use FFT according to the convolution theorem. For that, you need element-wise multiplication. You can only do element-wise multiplication when both your filter and your signal have the same number of elements. So you would need to extend your filter to the signal size (using zeros).
There is no limit on filter mask size. For convolution the only restriction is compact support (as explained above).