Related
Say I have two multidimensional arrays of equal depth, say:
[ [1, 2, 3],
[4, 5, 6],
[7, 8, 9] ]
and
[ [2, 3],
[5, 6] ]
What sort of algorithm can I follow to determine if the latter is a contiguous subarray of the former?
For example, with the above example, it is:
And also with this pair of 3d arrays:
[ [ [4, 6],
[5, 7] ],
[ [2, 8],
[9, 3] ] ]
[ [ [4, 6] ],
[ [2, 8] ] ]
Another way of interpreting this is that by removing the first or last item from a dimension of the first array repeatedly, you will eventually get the target array.
The Rabin-Karp string search algorithm can be extended to multiple dimensions to solve this problem.
Lets say your pattern array is M rows by N columns:
Using any rolling hash function, like a polynomial hash, first replace every column of your pattern array with the hash of the column, reducing it to 1 dimension. Then hash the remaining row. This will be your pattern hash.
Now use the rolling hash in your target array to replace all values in rows >= M by the hash of those values with the M-1 values above them.
Then, similarly replace all remaining values in columns >= N-1 with the hash of those values and the N-1 values to the left.
Finally, find any instances of the pattern hash in the resulting matrix. When you find one, compare with your pattern array to see if it's a real match.
This algorithm extends to as many dimensions as you like and, like simple Rabin-Karp, it takes O(N) expected time if the number of dimensions is constant.
The simple and naive approach would be, to look for first (0,0) match and then to compare the sub array.
Example: (Python)
hay=[ [1, 2, 3],
[4, 5, 6],
[7, 8, 9] ]
needle=[ [2, 3],
[5, 6] ]
def get_sub_array(array,i,j,width,height):
sub_array=[]
for n in range(i,i+height):
sub_array.append(array[n][j:j+width])
return sub_array
def compare(arr1,arr2):
for i in range(len(arr1)):
for j in range(len(arr1[0])):
if arr1[i][j]!=arr2[i][j]:
return False
return True
def is_sub_array(hay,needle):
hay_width=len(hay[0])
hay_height=len(hay)
needle_width=len(needle[0])
needle_height=len(needle)
for i in range(hay_height-needle_height+1):
for j in range(hay_width-needle_width+1):
if hay[i][j]==needle[0][0]:
if compare(
get_sub_array(hay,i,j,needle_width,needle_height),
needle
):
return True
return False
print(is_sub_array(hay,needle))
Output:
True
I have the following 2 arrays:
arr = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[7, 5, 6, 3],
[2, 4, 8, 9]]
ids = np.array([6, 5])
Each row in the array arr describes a 4-digit id, there are no redundant ids - neither in their values nor their combination. So if [1, 2, 3, 4] exists, no other combination of these 4 digits can exist. This will be important in a sec.
The array ids contains a 2-digit id, however the order is random. Now I need to go through each row of arr and look if this 2-digit partial id part of any 4-digit id. In this example ids fits to the 2nd and 3rd row from the top of arr.
My current solution with np.isin only works if the array ids has also a 4-digit row.
arr[np.isin(arr, ids).all(1)]
Changing all(1) to any(1) doesn´t do the trick either, because this way it would be enough if just one digit of ids is in one row of arr, however I need both values.
Does anyone have a compact solution?
Just need the boolean index to only accept values that are 2. When doing non-boolean operations like sum with boolean arrays, True and False values are interpreted as 1 and 0
arr[np.isin(arr, ids).sum(1) == 2]
I have come accross a line code that actually works for the work I am doing but I do not understand it. I would like someone to please explain what it means.
b=(3,1,2,1)
a=2
q=np.zeros(b+(a,))
I would like to know why length of q is always the first entry of b.
for example len(q)=3
if b=(1,2,4,3) then len(q)=1
This is really confusing as I thought that the function 'len' returns the number of columns of a given array. Also, how do I get the number of rows of q. So far the only specifications I have found are len(q), q.size( which gives the total number of elements in q) and q.shape(which also I do not quite get the output, because in the latter case, q.shape=(b,a)=(1,2,4,3,2).
Is there function that could return the size of the array in terms of the numberof columns and rows? for example 24x2?
Thank you in advance.
In Python a array does only have one dimension, that's why len(array) returns a single number.
Assuming that you have a 'matrix' in form of array of arrays, like this:
1 2 3
4 5 6
7 8 9
declared like
mat = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
you can determine the 'number of columns and rows' by the following commands:
rows = len(mat)
columns = len(mat[0])
Note that it only works if number of elements in each row is constant
If you are using numpy to make the arrays, another way to get the column rows and columns is using the tuple from the np.shape() function. Here is a complete example:
import numpy as np
mat = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
rownum = np.shape(mat)[0]
colnum = np.shape(mat)[1]
I'm not looking for any code or having anything being done for me. I need some help to get started in the right direction but do not know how to go about it. If someone could provide some resources on how to go about solving these problems I would very much appreciate it. I've sat with my notebook and am having trouble designing an algorithm that can do what I'm trying to do.
I can probably do:
foreach element in array1
foreach element in array2
check if array1[i] == array2[j]+x
I believe this would work for both forward and backward sequences, and for the multiples just check array1[i] % array2[j] == 0. I have a list which contains int arrays and am getting list[index] (for array1) and list[index+1] for array2, but this solution can get complex and lengthy fast, especially with large arrays and a large list of those arrays. Thus, I'm searching for a better solution.
I'm trying to come up with an algorithm for finding sequential numbers in different arrays.
For example:
[1, 5, 7] and [9, 2, 11] would find that 1 and 2 are sequential.
This should also work for multiple sequences in multiple arrays. So if there is a third array of [24, 3, 15], it will also include 3 in that sequence, and continue on to the next array until there isn't a number that matches the last sequential element + 1.
It also should be able to find more than one sequence between arrays.
For example:
[1, 5, 7] and [6, 3, 8] would find that 5 and 6 are sequential and also 7 and 8 are sequential.
I'm also interested in finding reverse sequences.
For example:
[1, 5, 7] and [9, 4, 11]would return 5 and 4 are reverse sequential.
Example with all:
[1, 5, 8, 11] and [2, 6, 7, 10] would return 1 and 2 are sequential, 5 and 6 are sequential, 8 and 7 are reverse sequential, 11 and 10 are reverse sequential.
It can also overlap:
[1, 5, 7, 9] and [2, 6, 11, 13] would return 1 and 2 sequential, 5 and 6 sequential and also 7 and 6 reverse sequential.
I also want to expand this to check numbers with a difference of x (above examples check with a difference of 1).
In addition to all of that (although this might be a different question), I also want to check for multiples,
Example:
[5, 7, 9] and [10, 27, 8] would return 5 and 10 as multiples, 9 and 27 as multiples.
and numbers with the same ones place.
Example:
[3, 5, 7] and [13, 23, 25] would return 3 and 13 and 23 have the same ones digit.
Use a dictionary (set or hashmap)
dictionary1 = {}
Go through each item in the first array and add it to the dictionary.
[1, 5, 7]
Now dictionary1 = {1:true, 5:true, 7:true}
dictionary2 = {}
Now go through each item in [6, 3, 8] and lookup if it's part of a sequence.
6 is part of a sequence because dictionary1[6+1] == true
so dictionary2[6] = true
We get dictionary2 = {6:true, 8:true}
Now set dictionary1 = dictionary2 and dictionary2 = {}, and go to the third array.. and so on.
We only keep track of sequences.
Since each lookup is O(1), and we do 2 lookups per number, (e.g. 6-1 and 6+1), the total is n*O(1) which is O(N) (N is the number of numbers across all the arrays).
The brute force approach outlined in your pseudocode will be O(c^n) (exponential), where c is the average number of elements per array and n is the number of total arrays.
If the input space is sparse (meaning there will be more missing numbers on average than presenting numbers), then one way to speed up this process is to first create a single sorted set of all the unique numbers from all your different arrays. This "master" set will then allow you to early exit (i.e. break statements in your loops) on any sequences which are not viable.
For example, if we have input arrays [1, 5, 7] and [6, 3, 8] and [9, 11, 2], the master ordered set would be {1, 2, 3, 5, 6, 7, 8, 9, 11}. If we are looking for n+1 type sequences, we could skip ever continuing checking any sequence that contains a 3 or 9 or 11 (because the n+1 value in not present at the next index in the sorted set. While the speedups are not drastic in this particular example, if you have hundreds of input arrays and very large range of values for n (sparsity), then the speedups should be exponential because you will be able to early exit on many permutations. If the input space is not sparse (such as in this example where we didn't have many holes), the speedups will be less than exponential.
A further improvement would be to store a "master" set of key-value pairs, where the key is the n value as shown in the example above, and the value portion of the pair is a list of the indices of any arrays that contain that value. The master set of the previous example would then be: {[1, 0], [2, 2], [3, 1], [5, 0], [6, 1], [7, 0], [8, 1], [9, 2], [11, 2]}. With this architecture, scan time could potentially be as low as O(c*n), because you could just traverse this single sorted master set looking for valid sequences instead of looping over all the sub-arrays. By also requiring the array indexes to increment, you can clearly see that the 1->2 sequence can be skipped because the arrays are not in the correct order, and the same with the 2->3 sequence, etc. Note this toy example is somewhat oversimplified because in practice you would need a list of indices for the value portions of the key-value pairs. This would be necessary if the same value of n ever appeared in multiple arrays (duplicate values).
Let's say I have an array with 5 elements. How can I calculate all possible repetitive permutations of this array in C.
Edit: What I mean is creating all possible arrays by using that 5 number. So the positon matters.
Example:
array = [1,2,3,4,5]
[1,1,1,1,1]
[1,1,1,1,2]
[1,1,1,2,3]
.
.
A common way to generate combinations or permutations is to use recursion: enumerate each of the possibilities for the first element, and prepend those to each of the combinations or permutations for the same set reduced by one element. So, if we say that you're looking for the number of permutations of n things taken k at a time and we use the notation perms(n, k), you get:
perms(5,5) = {
[1, perms(5,4)]
[2, perms(5,4)]
[3, perms(5,4)]
[4, perms(5,4)]
[5, perms(5,4)]
}
Likewise, for perms(5,4) you get:
perms(5,4) = {
[1, perms(5,3)]
[2, perms(5,3)]
[3, perms(5,3)]
[4, perms(5,3)]
[5, perms(5,3)]
}
So part of perms(5,5) looks like:
[1, 1, perms(5,3)]
[1, 2, perms(5,3)]
[1, 3, perms(5,3)]
[1, 4, perms(5,3)]
[1, 5, perms(5,3)]
[2, 1, perms(5,3)]
[2, 2, perms(5,3)]
...
Defining perms(n, k) is easy. As for any recursive definition, you need two things: a base case and a recursion step. The base case is where k = 0: perms(n, 0) is an empty array, []. For the recursive step, you generate elements by prepending each of the possible values in your set to all of the elements of perms(n, k-1).
If I get your question correctly, you need to generate all 5 digit numbers with digits 1,2,3,4 and 5. So there is a simple solution - generate all numbers base five up to 44444 and then map the 0 to 1, 1 to 2 and so on. Add leading zeros where needed - so 10 becomes 00010 or [1,1,1,2,1].
NOTE: you don't actually have to generate the numbers themselves, you may just iterate the numbers up to 5**5(excluding) and for each of them find the corresponing sequence by getting it's digits base 5.
int increment(size_t *dst, size_t len, size_t base) {
if (len == 0) return 0;
if (dst[len-1] != base-1) {
++dst[len-1];
return 1;
} else {
dst[len-1] = 0;
return increment(dst, len-1, base);
}
}
Armed with this function you can iterate over all repetitive permutations of (0 ... 4) starting from {0, 0, 0, 0, 0}. The function will return 0 when it runs out of repetitive permutations.
Then for each repetitive permutation in turn, use the contents as indexes into your array so as to get a repetitive permutation of the array rather than of (0 ... 4).
In your given example, each position could be occupied by either 1, 2, 3, 4, 5. As there are 5 positions, the total number of possibilities = 5 * 5 * 5 * 5 * 5 = 5 ^ 5 = 3125. In general, it would be N ^ N. (where ^ is the exponentiation operator).
To generate these possibilities, in each of the positions, put the numbers 1, 2, 3, 4, 5, one by one, and increment starting from the last position, similar to a 5 digit counter.
Hence, start with 11111. Increment the last position to get 11112 ... until 11115.
Then wrap back to 1, and increment the next digit 11121 continue with 11122 ... 11125, etc. Repeat this till you reach the first position, and you would end at 55555.