Sort array based on frequency - arrays

How can I sort an array by most repetitive values.?
suppose I have an array [3, 3, 3, 3, 4, 4]
Expected the result as [3, 4] since 3 is most repeated and 4 is least repeated.
Is there any way too do it?
Thanks in advance....!

Here is one way of doing it:
distictList: Get all distinct values from the array and store in this
countArray: For each ith index in distinctList countArray[i] holds the occurrence of the distinctList[i]
Now sort countArray and apply same swaps on the distinctList simultaneously.
Ex: [3, 3, 4, 4, 4]
distinctList [3,4]
countArray [2,3]
Descending sort countArray [3,2] sorting distinctList at the same time [4,3]
Output: [4, 3]`

Simple in Python:
data = [3, 2, 3, 4, 2, 1, 3]
frequencies = {x:0 for x in data}
for x in data:
frequencies[x] = frequencies[x] + 1
sorted_with_repetitions = sorted(data, key=lambda x:frequencies[x],reverse=True)
sorted_without_repetitions = sorted(frequencies.keys(), key=lambda x:frequencies[x],reverse=True)
print(data)
print(sorted_with_repetitions)
print(sorted_without_repetitions)
print(frequencies)
The same approach (an associative container to collect distinct values and count occurrences, used in a custom comparison to sort an array with the original data or only distinct items) is suitable for Java.

Related

Algorithm to check if a multidimensional array contains another?

Say I have two multidimensional arrays of equal depth, say:
[ [1, 2, 3],
[4, 5, 6],
[7, 8, 9] ]
and
[ [2, 3],
[5, 6] ]
What sort of algorithm can I follow to determine if the latter is a contiguous subarray of the former?
For example, with the above example, it is:
And also with this pair of 3d arrays:
[ [ [4, 6],
[5, 7] ],
[ [2, 8],
[9, 3] ] ]
[ [ [4, 6] ],
[ [2, 8] ] ]
Another way of interpreting this is that by removing the first or last item from a dimension of the first array repeatedly, you will eventually get the target array.
The Rabin-Karp string search algorithm can be extended to multiple dimensions to solve this problem.
Lets say your pattern array is M rows by N columns:
Using any rolling hash function, like a polynomial hash, first replace every column of your pattern array with the hash of the column, reducing it to 1 dimension. Then hash the remaining row. This will be your pattern hash.
Now use the rolling hash in your target array to replace all values in rows >= M by the hash of those values with the M-1 values above them.
Then, similarly replace all remaining values in columns >= N-1 with the hash of those values and the N-1 values to the left.
Finally, find any instances of the pattern hash in the resulting matrix. When you find one, compare with your pattern array to see if it's a real match.
This algorithm extends to as many dimensions as you like and, like simple Rabin-Karp, it takes O(N) expected time if the number of dimensions is constant.
The simple and naive approach would be, to look for first (0,0) match and then to compare the sub array.
Example: (Python)
hay=[ [1, 2, 3],
[4, 5, 6],
[7, 8, 9] ]
needle=[ [2, 3],
[5, 6] ]
def get_sub_array(array,i,j,width,height):
sub_array=[]
for n in range(i,i+height):
sub_array.append(array[n][j:j+width])
return sub_array
def compare(arr1,arr2):
for i in range(len(arr1)):
for j in range(len(arr1[0])):
if arr1[i][j]!=arr2[i][j]:
return False
return True
def is_sub_array(hay,needle):
hay_width=len(hay[0])
hay_height=len(hay)
needle_width=len(needle[0])
needle_height=len(needle)
for i in range(hay_height-needle_height+1):
for j in range(hay_width-needle_width+1):
if hay[i][j]==needle[0][0]:
if compare(
get_sub_array(hay,i,j,needle_width,needle_height),
needle
):
return True
return False
print(is_sub_array(hay,needle))
Output:
True

Loop to perform operation on i+1 in numpy array

I have a numpy array, I'd like to take the 3 numbers in each row, minus them from the next row and store those values in another array.
something like
for i in array:
a = i - i+1
I know this is very wrong, but at least this gives the idea of what I want.
Obviously i+1 will just result in the value + 1 and then all I have is a = 1,1,1
When I say i+1 I mean the next in line.
So for example:
input = np.array([[4,4,5], [2,3,1],[1,2,0]])
output = np.array([2,1,4],[1,1,1]) etc....
What would be the best way to do this iteratively on thousands of rows?
IIUC, instead of looping, you can just shift your arrays 1 up using np.roll, subtract that from your original input, and take all the resulting arrays except the last (because there will be nothing to subtract from the last array):
>>> inp = np.array([[4,4,5], [2,3,1],[1,2,0]])
>>> inp
array([[4, 4, 5],
[2, 3, 1],
[1, 2, 0]])
>>> (inp - np.roll(inp,-1,axis=0))[:-1]
array([[2, 1, 4],
[1, 1, 1]])
Or, a more straightforward way would just be to use numpy indexing:
>>> inp[:-1] - inp[1:]
array([[2, 1, 4],
[1, 1, 1]])

How to find indexes of max n values in array

I have an array [0, 0, 10, 0, 3, 1]. I want to have the indexes of max three elements from this array, which would be: [2, 4, 5].
How do I do it without finding max element, delete it (make 0), then find next one, delete it and at last find third one? I can't sort this array, I need index from the current position.
a = [0, 0, 10, 0, 3, 1]
a.each_index.max_by(3){|i| a[i]} # => [2, 4, 5]
or
[0, 0, 10, 0, 3, 1].each_with_index.max(3).map(&:last) # => [2, 4, 5]
arr = [1, 3, 2, 4]
n = 2
p arr.each_with_index.sort.map(&:last).last(n).reverse
#=> [3,1]
How does it work ?
arr.each_with_index.sort
Will return an array of arrays. Each of the arrays is constructed as follows [value, index] and they are sorted based on value. In our example it would return [[1, 0], [2, 2], [3, 1], [4, 3]].
arr.map(&:last)
Will loop through all of those arrays and take the last value of the array (the index) and return an array of those indexes. In our example it will return [0, 2, 1, 3]. Now we have an array of indexes sorted in ascending order so the first value is the minimum and the last value is the maximum.
arr.last(n)
Returns an array containing last n values of an array. Since the array is in ascending order the max n values are the last n values.
arr.reverse
Reverts an array. It is optional. If you want to have the max in first position of your array and the Nth maximum value at the end then use reverse, else do not use it.

numpy using multidimensional index array on another multidimensional array

I have a 2 multidimensional arrays, and I'd like to use one as the index to produce a new multidimensional array. For example:
a = array([[4, 3, 2, 5],
[7, 8, 6, 8],
[3, 1, 5, 6]])
b = array([[0,2],[1,1],[3,1]])
I want to use the first array in b to return those indexed elements in the first array of a, and so on. So I want the output to be:
array([[4,2],[8,8],[6,1]])
This is probably simple but I couldn't find an answer by searching. Thanks.
This is a little tricky, but the following will do it:
>>> a[np.arange(3)[:, np.newaxis], b]
array([[4, 2],
[8, 8],
[6, 1]])
You need to index both the rows and the columns of the a array, so to match your b array you would need an array like this:
rows = np.array([[0, 0],
[1, 1],
[2, 2]])
And then a[rows, b] would clearly return what you are after. You can get the same result relying on broadcasting as above, replacing the rows array with np.arange(3)[:, np.newaxis], which is equivalent to np.arange(3).reshape(3, 1).

How to extract lines in an array, which contain a certain value? (numpy, scipy)

I have an numpy 2D array and I want it to return coloumn c where (r, c-1) (row r, coloumn c) equals a certain value (int n).
I don't want to iterate over the rows writing something like
for r in len(rows):
if array[r, c-1] == 1:
store array[r,c]
, because there are 4000 of them and this 2D array is just one of 20 i have to look trough.
I found "filter" but don't know how to use it (Found no doc).
Is there an function, that provides such a search?
I hope I understood your question correctly. Let's say you have an array a
a = array(range(7)*3).reshape(7, 3)
print a
array([[0, 1, 2],
[3, 4, 5],
[6, 0, 1],
[2, 3, 4],
[5, 6, 0],
[1, 2, 3],
[4, 5, 6]])
and you want to extract all lines where the first entry is 2. This can be done like this:
print a[a[:,0] == 2]
array([[2, 3, 4]])
a[:,0] denotes the first column of the array, == 2 returns a Boolean array marking the entries that match, and then we use advanced indexing to extract the respective rows.
Of course, NumPy needs to iterate over all entries, but this will be much faster than doing it in Python.
Numpy arrays are not indexed. If you need to perform this specific operation more effeciently than linear in the array size, then you need to use something other than numpy.

Resources