Select the most different elements in an array - arrays

I have a file text with P random entries in Binary (or Hex) for processing, from that P number, I have to take N entries such that they are the most different possible between them so i have a good representative of the possible population.
So far, I have think of do a comparison between the current N, and a average of the array that contains the elements using a modified version of the algorithm in: How do I calculate similarity of two integers?
or having a cumulative score of similarity (the higher the most different) between the next element to be selected and all the elements in the array, and choose the next one, and repeat until have selected the required N
I do not know if there is a better solution to this.
Ex.
[00011111, 00101110, 11111111, 01001010 , 00011000, 10010000, 01110101]
P = 7
N = 3
Result: [00011111, 10010000, 00101110]
Thanks in advance

You should compare them Pairwise. this comparison problem is Shortest common supersequence problem (see this). a shortest common supersequence of strings x and y is a shortest string z such that both x and y are subsequences of z. The shortest common supersequence is a problem closely related to the longest common subsequence (see enter link description here). Best solution for the longest common subsequence is dynamic programming method.

You could calculate the Hamming distances for all combinations if you want to choose the most different binary representation (see https://en.wikipedia.org/wiki/Hamming_distance ).
Edit: small hack
import numpy as np
a = [0b00011111, 0b00101110, 0b11111111, 0b01001010, 0b00011000, 0b10010000, 0b01110101]
N = 3
b = []
for i in a:
b.append(np.unpackbits(np.uint8(i))) #to binary representation
valuesWithBestOverallDiffs = []
def getItemWithBestOverallDiff(b):
itemWithBestOverallDiff = [0, 0] #idx, value
for biter, bval in enumerate(b):
hammDistSum = 0
for biter2, bval2 in enumerate(b):
if biter == biter2:
continue
print("iter1: " + str(biter) + " = " + str(bval))
print("iter2: " + str(biter2) + " = " + str(bval2))
hammDist = len(np.bitwise_xor(bval, bval2).nonzero()[0])
print(" => " + str(hammDist))
hammDistSum = hammDistSum + hammDist
if hammDistSum > itemWithBestOverallDiff[1]:
itemWithBestOverallDiff = [biter, hammDistSum]
#print(itemWithBestOverallDiff)
return itemWithBestOverallDiff
for i in range(N):
itemWithBestOverallDiff = getItemWithBestOverallDiff(b)
print("adding item nr " + str(itemWithBestOverallDiff[0]) + " with value 0b" + str(b[itemWithBestOverallDiff[0]]) + " = " + str(a[itemWithBestOverallDiff[0]]))
val = a.pop(itemWithBestOverallDiff[0])
b.pop(itemWithBestOverallDiff[0])
valuesWithBestOverallDiffs.append(val)
print("result: ")
print(valuesWithBestOverallDiffs)
The final output is
result:
[144, 117, 255]
which is 0b10010000, 0b01110101, 0b11111111

Related

Algorithm: Given an array, find the maximum sum after rearrangement

You are given an array A, of size N, containing numbers from 0-N. For each sub-array starting from 0th index, lets say Si, we say Bi is the smallest non negative number that is not present in Si.
We need to find the maximum possible sum of all Bi of this array.
We can rearrange the array to obtain the maximum sum.
For example:
A = 1, 2, 0 , N = 3
then lets say we rearranged it as A= 0, 1, 2
S1 = 0, B1= 1
S2 = 0,1 B2= 2
S3 = 0,1,2 B3= 3
Hence the sum is 6
Whatever examples I have tried, I have seen that sorted array will give the maximum sum. Am I correct or missing something here.
Please help to find the correct logic for this problem. I am not looking for optimal solution but just the correct logic.
Yes, sorting the array maximizes the sum of 𝐵𝑖
As the input size is 𝑛, it does not include every number in the range {0, ..., 𝑛}, as that is a set of 𝑛 + 1 numbers. Let's say it only lacks value 𝑘, then 𝐵𝑖 is 𝑘 for all 𝑖 >= 𝑘. If there are other numbers that are missing, but greater than 𝑘, there is no impact on any 𝐵𝑖.
Thus we need to find out the minimum missing value 𝑘 in the range {0, ..., 𝑛}. And then the maximised sum is 1 + 2 + ... + 𝑘 + (𝑛−𝑘)𝑘. This is 𝑘(𝑘+1)/2 + (𝑛−𝑘)𝑘 = 𝑘(1 + 2𝑛 − 𝑘)/2
To find the value of 𝑘, create a boolean array of size 𝑛 + 1, and set the entry at index 𝑣 to true when 𝑣 is encountered in the input. 𝑘 is then the first index at which that boolean array still has a false value.
Here is a little implementation in a JavaScript snippet:
function maxSum(arr) {
const n = arr.length;
const isUsed = Array(n + 1).fill(false);
for (const value of arr) {
isUsed[value] = true;
}
const k = isUsed.indexOf(false);
return k * (1 + 2*n - k) / 2;
}
console.log(maxSum([0, 1, 2])); // 6
console.log(maxSum([0, 2, 2])); // 3
console.log(maxSum([1, 0, 1])); // 5

Find Minimum Score Possible

Problem statement:
We are given three arrays A1,A2,A3 of lengths n1,n2,n3. Each array contains some (or no) natural numbers (i.e > 0). These numbers denote the program execution times.
The task is to choose the first element from any array and then you can execute that program and remove it from that array.
For example:
if A1=[3,2] (n1=2),
A2=[7] (n2=1),
A3=[1] (n3=1)
then we can execute programs in various orders like [1,7,3,2] or [7,1,3,2] or [3,7,1,2] or [3,1,7,2] or [3,2,1,7] etc.
Now if we take S=[1,3,2,7] as the order of execution the waiting time of various programs would be
for S[0] waiting time = 0, since executed immediately,
for S[1] waiting time = 0+1 = 1, taking previous time into account, similarly,
for S[2] waiting time = 0+1+3 = 4
for S[3] waiting time = 0+1+3+2 = 6
Now the score of array is defined as sum of all wait times = 0 + 1 + 4 + 6 = 11, This is the minimum score we can get from any order of execution.
Our task is to find this minimum score.
How can we solve this problem? I tried with approach trying to pick minimum of three elements each time, but it is not correct because it gets stuck when two or three same elements are encountered.
One more example:
if A1=[23,10,18,43], A2=[7], A3=[13,42] minimum score would be 307.
The simplest way to solve this is with dynamic programming (which runs in cubic time).
For each array A: Suppose you take the first element from array A, i.e. A[0], as the next process. Your total cost is the wait-time contribution of A[0] (i.e., A[0] * (total_remaining_elements - 1)), plus the minimal wait time sum from A[1:] and the rest of the arrays.
Take the minimum cost over each possible first array A, and you'll get the minimum score.
Here's a Python implementation of that idea. It works with any number of arrays, not just three.
def dp_solve(arrays: List[List[int]]) -> int:
"""Given list of arrays representing dependent processing times,
return the smallest sum of wait_time_before_start for all job orders"""
arrays = [x for x in arrays if len(x) > 0] # Remove empty
#functools.lru_cache(100000)
def dp(remaining_elements: Tuple[int],
total_remaining: int) -> int:
"""Returns minimum wait time sum when suffixes of each array
have lengths in 'remaining_elements' """
if total_remaining == 0:
return 0
rem_elements_copy = list(remaining_elements)
best = 10 ** 20
for i, x in enumerate(remaining_elements):
if x == 0:
continue
cost_here = arrays[i][-x] * (total_remaining - 1)
if cost_here >= best:
continue
rem_elements_copy[i] -= 1
best = min(best,
dp(tuple(rem_elements_copy), total_remaining - 1)
+ cost_here)
rem_elements_copy[i] += 1
return best
return dp(tuple(map(len, arrays)), sum(map(len, arrays)))
Better solutions
The naive greedy strategy of 'smallest first element' doesn't work, because it can be worth it to do a longer job to get a much shorter job in the same list done, as the example of
A1 = [100, 1, 2, 3], A2 = [38], A3 = [34],
best solution = [100, 1, 2, 3, 34, 38]
by user3386109 in the comments demonstrates.
A more refined greedy strategy does work. Instead of the smallest first element, consider each possible prefix of the array. We want to pick the array with the smallest prefix, where prefixes are compared by average process time, and perform all the processes in that prefix in order.
A1 = [ 100, 1, 2, 3]
Prefix averages = [(100)/1, (100+1)/2, (100+1+2)/3, (100+1+2+3)/4]
= [ 100.0, 50.5, 34.333, 26.5]
A2=[38]
A3=[34]
Smallest prefix average in any array is 26.5, so pick
the prefix [100, 1, 2, 3] to complete first.
Then [34] is the next prefix, and [38] is the final prefix.
And here's a rough Python implementation of the greedy algorithm. This code computes subarray averages in a completely naive/brute-force way, so the algorithm is still quadratic (but an improvement over the dynamic programming method). Also, it computes 'maximum suffixes' instead of 'minimum prefixes' for ease of coding, but the two strategies are equivalent.
def greedy_solve(arrays: List[List[int]]) -> int:
"""Given list of arrays representing dependent processing times,
return the smallest sum of wait_time_before_start for all job orders"""
def max_suffix_avg(arr: List[int]):
"""Given arr, return value and length of max-average suffix"""
if len(arr) == 0:
return (-math.inf, 0)
best_len = 1
best = -math.inf
curr_sum = 0.0
for i, x in enumerate(reversed(arr), 1):
curr_sum += x
new_avg = curr_sum / i
if new_avg >= best:
best = new_avg
best_len = i
return (best, best_len)
arrays = [x for x in arrays if len(x) > 0] # Remove empty
total_time_sum = sum(sum(x) for x in arrays)
my_averages = [max_suffix_avg(arr) for arr in arrays]
total_cost = 0
while True:
largest_avg_idx = max(range(len(arrays)),
key=lambda y: my_averages[y][0])
_, n_to_remove = my_averages[largest_avg_idx]
if n_to_remove == 0:
break
for _ in range(n_to_remove):
total_time_sum -= arrays[largest_avg_idx].pop()
total_cost += total_time_sum
# Recompute the changed array's avg
my_averages[largest_avg_idx] = max_suffix_avg(arrays[largest_avg_idx])
return total_cost

Minimize the number of operation to make all elements of array equal

Given an array of n elements you are allowed to perform only 2 kinds of operation to make all elements of array equal.
multiply any element by 2
divide element by 2(integer division)
Your task is to minimize the total number of above operation performed to make all elements of array equal.
Example
array = [3,6,7] minimum operation is 2 as 6 and 7 can be divided by 2 to obtain 3.
I cannot think of even the brute force solution.
Constraints
1 <= n <= 100000 and
1 <= ai <=100000
where ai is the ith element of array.
View all numbers as strings of 0 and 1, via their binary expansion.
E.g.: 3, 6, 7 are represented as 11, 110, 111, respectively.
Dividing by 2 is equivalent to removing the right most 0 or 1, and multiplying by 2 is equivalent to adding a 0 from the right.
For a string consisting of 0 and 1, let us define its "head" to be a substring that is the left several terms of the string, which ends with 1.
E.g.: 1100101 has heads 1, 11, 11001, 1100101.
The task becomes finding longest common head of all the given strings, and then determining how many 0's to add after this common head.
An example:
Say you have the following strings:
10101001, 101011, 10111, 1010001
find the longest common head of 10101001 and 101011, which is 10101;
find the longest common head of 10101 and 10111, which is 101;
find the longest common head of 101 and 1010001, which is 101.
Then you are sure that all the numbers should become a number of the form 101 00....
To determine how many 0's to add after 101, find the number of consecutive 0's directly following 101 in every string:
For 10101001: 1
For 101011: 1
For 10111: 0
For 1010001: 3
It remains to find an integer k that minimizes |k - 1| + |k - 1| + |k - 0| + |k - 3|. Here we find k = 1. So every number should becomd 1010 in the end.
As the other answer explains, backtracking is not necessary. For the fun of it a little implementation of that approach. (See link to run online at the bottom):
First we need a function that determines the number of binary digits in a number:
def getLength(i: Int): Int = {
#annotation.tailrec
def rec(i: Int, result: Int): Int =
if(i > 0)
rec(i >> 1, result + 1)
else
result
rec(i, 0)
}
Then we need a function that determines the common prefix of two numbers of equal length
#annotation.tailrec
def getPrefix(i: Int, j: Int): Int =
if(i == j) i
else getPrefix(i >> 1, j >> 1)
And of a list of arbitrary numbers:
def getPrefix(is: List[Int]): Int = is.reduce((x,y) => {
val shift = Math.abs(getLength(x) - getLength(y))
val x2 = Math.max(x,y)
val y2 = Math.min(x,y)
getPrefix((x2 >> shift), y2)
})
Then we need the length of the suffix without counting leeding zeros of the suffix:
def getSuffixLength(i: Int, prefix: Int) = {
val suffix = i ^ (prefix << (getLength(i) - getLength(prefix)))
getLength(suffix)
}
Now we can compute the number of operations we need to synchronize an operation i to the prefix with "zeros" zeros appended.
def getOperations(i: Int, prefix: Int, zeros: Int): Int = {
val length = getLength(i) - getLength(prefix)
val suffixLength = getSuffixLength(i, prefix)
suffixLength + Math.abs(zeros - length + suffixLength)
}
Now we can find the minimal numbers of operations and return that together with the value we will sync to:
def getMinOperations(is: List[Int]) = {
val prefix = getPrefix(is)
val maxZeros = getLength(is.max) - getLength(prefix)
(0 to maxZeros).map{zeros => (is.map{getOperations(_, prefix, zeros)}.sum, prefix << zeros)}.minBy(_._1)
}
You can try this solution at:
http://goo.gl/lLr5jl
The last step of finding the right number of zeros can be improved, as only the length of a suffix without leading zeros matters, not what it looks like. So we can compute the number of operations we need for these together by counting how many there are:
def getSuffixLength(i: Int, prefix: Int) = {
val suffix = i ^ (prefix << (getLength(i) - getLength(prefix)))
getLength(suffix)
}
def getMinOperations(is: List[Int]) = {
val prefix = getPrefix(is)
val maxZeros = getLength(is.max) - getLength(prefix)
val baseCosts = is.map(getSuffixLength(_,prefix)).sum
val suffixLengths: List[(Int, Int)] = is.foldLeft(Map[Int, Int]()){
case (m,i) => {
val x = getSuffixLength(i,prefix) - getLength(i) + getLength(prefix)
m.updated(x, 1 + m.getOrElse(x, 0))
}
}.toList
val (minOp, minSol) = (0 to maxZeros).map{zeros => (suffixLengths.map{
case (x, count) => count * Math.abs(zeros + x)
}.sum, prefix << zeros)}.minBy(_._1)
(minOp + baseCosts, minSol)
}
All axillary operations only take logarithmic time in the size of the maximal number. We have to go through the hole list to collect the suffix lengths. And then we have to guess the number of zeros where there are at most logarithmic in the maximal number many zeros. So we should have a complexity of
O(|list|*ld(maxNum) + (ld(maxNum))^2)
So for your bounds this is basically linear in the input size.
This version can be found here:
http://goo.gl/ijzYik

Count items in one cell array in another cell array matlab

I have 2 cell arrays which are "celldata" and "data" . Both of them store strings inside. Now I would like to check each element in "celldata" whether in "data" or not? For example, celldata = {'AB'; 'BE'; 'BC'} and data={'ABCD' 'BCDE' 'ACBE' 'ADEBC '}. I would like the expected output will be s=3 and v= 1 for AB, s=2 and v=2 for BE, s=2 and v=2 for BC, because I just need to count the sequence of the string in 'celldata'
The code I wrote is shown below. Any help would be certainly appreciated.
My code:
s=0; support counter
v=0; violate counter
SV=[]; % array to store the support
VV=[]; % array to store the violate
pairs = ['AB'; 'BE'; 'BC']
%celldata = cellstr(pairs)
celldata = {'AB'; 'BE'; 'BC'}
data={'ABCD' 'BCDE' 'ACBE' 'ADEBC '} % 3 AB, 2 BE, 2 BC
for jj=1:length(data)
for kk=1:length(celldata)
res = regexp( data(jj),celldata(kk) )
m = cell2mat(res);
e=isempty(m) % check res array is empty or not
if e == 0
s = s + 1;
SV(jj)=s;
v=v;
else
s=s;
v= v+1;
VV(jj)=v;
end
end
end
If I am understanding your variables correctly, s is the number of cells which the substring AB, AE and, BC does not appear and v is the number of times it does. If this is accurate then
v = cellfun(#(x) length(cell2mat(strfind(data, x))), celldata);
s = numel(data) - v;
gives
v = [1;1;3];
s = [3;3;1];

Matlab, Random Cell Array

I have a cell array Q, wich contains questions. And a Logical vector containing 1/0 as true / false and in same order as Q like this:
Q = {'A ball is squared: ' 'My computer is slow: ' 'A triangle has 3 corners: '};
A = {0 1 1};
I would then make a Q_random, containing the questions from Q but in randomly order and a A_random contaning the logical numbers which respond to the Q_random. I've come up with this code, but i not sure that this is the best way to do it.
Could I use another method, which is more simple and effective ?
Q = {'A ball is squared: ' 'My computer is slow: ' 'A triangle has 3 corners: '};
A = {0 1 1};
Q_random = cell(1,numel(Q));
A_random = cell(1,numel(Q));
i = 1;
while (i <= numel(Q));
random_number = randi(numel(Q));
if isempty(Q_random{random_number});
Q_random(random_number) = Q(i);
A_random(random_number) = A(i);
i = i + 1;
else
end
I would use randperm to generate randomly ordered indexes
rand_ind=randperm(length(Q));
and then use the random indexes to generate the randomly permuted cell arrays
Q_random=Q(rand_ind);
A_random=A(rand_ind);
This answer to a previous related question may also be worth looking at.

Resources