Given an array of integers, resize the array with approximate equal distances - arrays

Given an array of numbers like this:
[0, 99, 299, 498, 901]
What algorithm could I use to resize that array into an array with approximately equal distances between them. Or said another way, resample with approximate greatest common multiples. So with the example above the Greatest Common Divisor is approximately 100, thus the result would be:
[0, 99, 200, 299, 400, 498, 600, 700, 800, 901]
Using the original values would be nice, and error bars can be set (above solution has error set to 2), but would also be happy with this result:
[0, 100, 200, 300, 400, 500, 600, 700, 800, 900]
Update 12 Jan 2017
Based on Redu's answer, here is the Swift version of his code:
var arr: [Int] = [0, 99, 299, 498, 901]
var diffs = [Int]()
var minGap: Int = 0
for x in 0..<arr.count-1 {
let gap = arr[x+1] - arr[x]
diffs.append(gap)
if minGap == 0 || minGap > gap {
minGap = gap
}
}
var resamples = [Int]()
for item in arr {
if let lastSample = resamples.last {
let n = Int((Float(item - lastSample) / Float(minGap)).rounded())
let g = (item - lastSample) / n
var inserts = [Int]()
for x in 0..<n-1 {
let newSample = lastSample + ((x+1) * g)
inserts.append(newSample)
}
resamples.append(item)
resamples.append(contentsOf: inserts)
} else {
resamples.append(item)
}
}

Essentially you want to use a least squares regression against an arithmetic progression.
An arithmetic progression can be parameterised with 3 terms: first term, last term, and common difference. These 3 terms would form the parameters of your objective function, which you will seek to minimise.
In each optimisation step, you'd need to pick which terms in the trial arithmetic progression need to be regressed against your original set. That will be quite challenging, but luckily both series will be sorted so this ought to be an O(N) traversal.
A constraint around the 3 terms would be a set that is typographically pleasing. For example, would 100, 200, 300 be preferred over 99, 198, 297 even if the source series is 99, 297?
A full answer would I feel be too broad - and is probably at least a week's work. But this is how I would embark on the project.

The following would be my solution in JS. I first find the minimum gap and then try to find how many of that would fit in-between each item and process accordingly without altering the original values.
Obviously for this algorithm to work the input array must be sorted in ascending order.
var arr = [0, 99, 299, 498, 901],
gap = Math.min(...Array(arr.length-1).fill().map((_,i) => arr[i+1]-arr[i])), // Find the minimum gap
res = arr.reduce((p,c,i) => { var n = Math.round((c-p[p.length-1])/gap); // Find howmany gaps are inbetween according to the minimum gap
g = Math.round((c-p[p.length-1])/n); // Calculate the average gap top apply
return i ? p.concat(Array(Math.round(n-1)).fill().map((_,i) => p[p.length-1] + (i+1)*g),c)
: p.concat(c);
},[]);
console.log(res);
Explanation:
gap = Math.min(...Array(arr.length-1).fill().map((_,i) => arr[i+1]-arr[i])),
First we set up a new array in size one less than the input array. (Array(arr.length-1)) first we initialize (.fill()) it with undefined elements and then .map() every element with arr[i+1]-arr[i]. So now we have the gaps array. Then we spread it into a Math.min() function as arguments. It's the Math.min(...Array( part. So now we have the minimum gap as 99 in the above given case.
res = arr.reduce((p,c,i) => { var n = Math.round((c-p[p.length-1])/gap);
g = Math.round((c-p[p.length-1])/n);
return i ? p.concat(Array(Math.round(n-1)).fill().map((_,i) => p[p.length-1] + (i+1)*g),c)
: p.concat(c);
},[]);
.reduce() part is slightly tough looking but it's easy. Our .reduce() operation takes a function as it's argument (mostly known as a callback function) and runs it with every iteration over the array items. This callback function is the part which starts with (p,c,i) => {... }. This is an arrow function. Which is essentially same with normal functions. x => x means function(x) { return x;} or x => {return x;}. In our case since we use braces to define the body of our function (due to multiple statements) we will have to use a return instruction.
Our .reduce() uses an initial value which is an empty array. It's the ,[]); part at the very end. The callback function, which reduce will invoke per array item, will be passed three arguments (p,c,i) The initial empty array gets assigned to the p (previous) argument, the current item gets assigned to the c argument and the current index gets assigned to the i argument per call.
In the body of our callback we define 2 variables. n and g.
n = Math.round((c-p[p.length-1])/gap);
p[p.length-1] returns the last element of the p array. So in the first turn; when i = 0, p[0] is undefined and Math.round((c-p[p.length-1])/gap); is a NaN (Not a Number) but we don't care because;
return i ? p.concat(Array(Math.round(n-1)).fill().map((_,i) => p[p.length-1] + (i+1)*g),c)
: p.concat(c);
The ternary conditional means that;
result = condition ? if true do this
: if false do this
So as you see depending on the condition it does either one of the instructions and returns the result. In our case the result is returned as the value of p.
So in our case if i == 0 (false value in JS) then only do p.concat(c) and return the new p value and continue with the next iteration (invoke callback with the new p, c and i values.
If i is not false (any value other than 0) then do like
p.concat(Array(Math.round(n-1)).fill().map((_,i) => p[p.length-1] + (i+1)*g),c)
Which means create an array in the size to take the gap many interim elements, initialize the array with undefineds and map each element with p[p.length-1] + (i+1)*g and concatenate this array to the p array and append c to the very end and then return the p array.
One thing to remind: p.concat(whatever...) instruction would return a new array consisting of the elements of p and the "items" of the arrays included as argument or the items itself included ar argument. I mean;
[1,2,3].concat([4,5,6],[7,8],9) would result [1,2,3,4,5,6,7,8,9]
So this should explain it.

Related

Next greater element over a certain percentage of each element in array

I have seen some posts about next greater element. I am looking for a more performant solution for one of its variant.
The problem :
I have an array of numbers. I want to know for each number, the next index where the value become bigger than a percentage of X.
Example :
Let's suppose I have this array [1000, 900, 1005, 1022, 1006] and I set a target of 1%. Meanwhile, I want to know when the value become 1% bigger than it was.
1000 -> We want to know when value become bigger of equal to 1010 -> Index = 3
900 -> We want to know when value become bigger of equal to 909 -> Index = 2
1005 -> We want to know when value become bigger of equal to 1015.05 -> Index = 3
1022 -> We want to know when value become bigger of equal to 1030.2 -> Index = -1
1006 -> We want to know when value become bigger of equal to 1016.06 -> Index = -1
Naïve solution :
An O(n^2) algorithm can solve the problem. But it's too slow for my needs.
Does anyone know a faster algorithm to solve this problem or one of its close variant ?
I'd use a min heap. Each element in the min heap is a tuple (value, index) where value is the target value, and index is the index in the input array where that target value originated.
Then the algorithm is:
create an output array with all elements set to -1
for each element in the input array
for each target value on the min heap less than the element's value
pop the (targetValue, targetIndex) tuple
record the index of the current input element at the target index
add the current element (value, index) tuple to the min heap
For example, given the array in the question, the algorithm performs the following steps:
Create an output array with all elements set to -1
Read 1000, put (1010, 0) in the min heap.
Read 900, put (909, 1) in the min heap.
Read 1005. That's larger than 909, so pop the (909, 1), and record index 2 as the answer for element 909. Put (1015.05, 2) in the min heap.
Read 1022. Pop (1010, 0) and then (1015.05, 2) from the min heap, recording index 3 as the answer for elements 1000 and 1005. Put (1030.2, 3) in the min heap.
Read 1006, put (1016.06, 4) in the min heap.
Since the end of the input array has been reached, (1030.2, 3) and (1016.06, 4) will never be popped, and the corresponding elements in the output array remain as -1
Running time is O(nlogn).
Sample python implementation:
from heapq import heappush, heappop
def nextGreater(inputArray):
targetHeap = []
outputArray = [-1] * len(inputArray)
for inputIndex, inputValue in enumerate(inputArray):
while targetHeap and targetHeap[0][0] < inputValue:
targetValue, targetIndex = heappop(targetHeap)
outputArray[targetIndex] = inputIndex
heappush(targetHeap, (inputValue * 1.01, inputIndex))
return outputArray
inputArray = [1000, 900, 1005, 1022, 1006]
outputArray = nextGreater(inputArray)
print outputArray # [3, 2, 3, -1, -1]
You can create a list of tuples of index and value in array. Sort the list by value. Then you can iterate over the list using two pointers finding values that are greater by the given percentage and capture the corresponding indices. Complexity would be O(nlogn)
Sample implementation in java 17 given below:
final double percentage = 1.01;
int[] arr = new int[]{1000, 900, 1005, 1022, 1006};
record KeyValuePair(int value, int index) {}
List<KeyValuePair> keyValuePairs = new ArrayList<>();
for (int i = 0; i < arr.length; ++i) {
keyValuePairs.add(new KeyValuePair(arr[i], i));
}
keyValuePairs.sort(Comparator.comparingInt(KeyValuePair::value));
int i = 0, j = 1;
while (i != keyValuePairs.size() && j != keyValuePairs.size()) {
if (keyValuePairs.get(i).value() * percentage < keyValuePairs.get(j).value()) {
if (keyValuePairs.get(i).index() < keyValuePairs.get(j).index()) {
System.out.println("For index " + keyValuePairs.get(i).index() + " -> " + keyValuePairs.get(j).index());
} else if (keyValuePairs.get(i).index() + 1 != keyValuePairs.size()) {
System.out.println("For index " + keyValuePairs.get(i).index() + " -> " + (keyValuePairs.get(i).index() + 1));
}
++i;
} else {
++j;
}
}

How to get average of values in array between two given indexes in Swift

I'm trying to get the average of the values between two indexes in an array. The solution I first came to reduces the array to the required range, before taking the sum of values divided by the number of values. A simplified version looks like this:
let array = [0, 2, 4, 6, 8, 10, 12]
// The aim is to take the average of the values between array[n] and array[.count - 1].
I attempted with the following code:
func avgOf(x: Int) throws -> String {
let avgforx = solveList.count - x
// Error handling to check if x in average of x does not overstep bounds
guard avgforx > 0 else {
throw FuncError.avgNotPossible
}
solveList.removeSubrange(ClosedRange(uncheckedBounds: (lower: 0, upper: avgforx - 1)))
let avgx = (solveList.reduce(0, +)) / Double(x)
// Rounding
let roundedAvgOfX = (avgx * 1000).rounded() / 1000
print(roundedAvgOfX)
return "\(roundedAvgOfX)"
}
where avgforx is used to represent the lower bound :
array[(.count - 1) - x])
The guard statement makes sure that if the index is out of range, the error is handled properly.
solveList.removeSubrange was my initial solution, as it removes the values outside of the needed index range (and subsequently delivers the needed result), but this has proved to be problematic as the values not taken in the average should remain.
The line in removeSubrange basically takes a needed index field (e.g. array[5] to array[10]), removes all the values from array[0] to array[4], and then takes the sum of the resulting array divided by the number of elements.
Instead, the values in array[0] to array[4] should remain.
I would appreciate any help.
(Swift 4, Xcode 10)
Apart from the fact that the original array is modified, the error in your code is that it divides the sum of the remaining elements by the count of the removed elements (x) instead of dividing by the count of remaining elements.
A better approach might be to define a function which computes the average of a collection of integers:
func average<C: Collection>(of c: C) -> Double where C.Element == Int {
precondition(!c.isEmpty, "Cannot compute average of empty collection")
return Double(c.reduce(0, +))/Double(c.count)
}
Now you can use that with slices, without modifying the original array:
let array = [0, 2, 4, 6, 8, 10, 12]
let avg1 = average(of: array[3...]) // Average from index 3 to the end
let avg2 = average(of: array[2...4]) // Average from index 2 to 4
let avg3 = average(of: array[..<5]) // Average of first 5 elements

How to convert two associated arrays so that elements are evenly distributed?

There are two arrays, an array of images and an array of the corresponding labels. (e.g pictures of figures and it's values)
The occurrences in the labels are unevenly distributed.
What I want is to cut both arrays in such a way, that the labels are evenly distributed. E.g. every label occurs 2 times.
To test I've just created two 1D arrays and it was working:
labels = np.array([1, 2, 3, 3, 1, 2, 1, 3, 1, 3, 1,])
images = np.array(['A','B','C','C','A','B','A','C','A','C','A',])
x, y = zip(*sorted(zip(images, labels)))
label = list(set(y))
new_images = []
new_labels = []
amount = 2
for i in label:
start = y.index(i)
stop = start + amount
new_images = np.append(new_images, x[start: stop])
new_labels = np.append(new_labels, y[start: stop])
What I get/want is this:
new_labels: [ 1. 1. 2. 2. 3. 3.]
new_images: ['A' 'A' 'B' 'B' 'C' 'C']
(It is not necessary, that the arrays are sorted)
But when I tried it with the right data (images.shape = (35000, 32, 32, 3), labels.shape = (35000)) I've got an error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This does not help me a lot:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think that my solution is quite dirty anyhow. Is there a way to do it right?
Thank you very much in advance!
When your labels are equal, the sort function tries to sort on the second value of the tuples it has as input, since this is an array in the case of your real data, (instead of the 1D data), it cannot compare them and raises this error.
Let me explain it a bit more detailed:
x, y = zip(*sorted(zip(images, labels)))
First, you zip your images and labels. What this means, is that you create tuples with the corresponding elements of images and lables. The first element from images by the first element of labels, etc.
In case of your real data, each label is paired with an array with shape (32, 32, 3).
Second you sort all those tuples. This function tries first to sort on the first element of the tuple. However, when they are equal, it will try to sort on the second element of the tuples. Since they are arrays it cannot compare them en throws an error.
You can solve this by explicitly telling the sorted function to only sort on the first tuple element.
x, y = zip(*sorted(zip(images, labels), key=lambda x: x[0]))
If performance is required, using itemgetter will be faster.
from operator import itemgetter
x, y = zip(*sorted(zip(images, labels), key=itemgetter(0)))

Is there a way to reshape an array that does not maintain the original size (or a convenient work-around)?

As a simplified example, suppose I have a dataset composed of 40 sorted values. The values of this example are all integers, though this is not necessarily the case for the actual dataset.
import numpy as np
data = np.linspace(1,40,40)
I am trying to find the maximum value inside the dataset for certain window sizes. The formula to compute the window sizes yields a pattern that is best executed with arrays (in my opinion). For simplicity sake, let's say the indices denoting the window sizes are a list [1,2,3,4,5]; this corresponds to window sizes of [2,4,8,16,32] (the pattern is 2**index).
## this code looks long because I've provided docstrings
## just in case the explanation was unclear
def shapeshifter(num_col, my_array=data):
"""
This function reshapes an array to have 'num_col' columns, where
'num_col' corresponds to index.
"""
return my_array.reshape(-1, num_col)
def looper(num_col, my_array=data):
"""
This function calls 'shapeshifter' and returns a list of the
MAXimum values of each row in 'my_array' for 'num_col' columns.
The length of each row (or the number of columns per row if you
prefer) denotes the size of each window.
EX:
num_col = 2
==> window_size = 2
==> check max( data[1], data[2] ),
max( data[3], data[4] ),
max( data[5], data[6] ),
.
.
.
max( data[39], data[40] )
for k rows, where k = len(my_array)//num_col
"""
my_array = shapeshifter(num_col=num_col, my_array=data)
rows = [my_array[index] for index in range(len(my_array))]
res = []
for index in range(len(rows)):
res.append( max(rows[index]) )
return res
So far, the code is fine. I checked it with the following:
check1 = looper(2)
check2 = looper(4)
print(check1)
>> [2.0, 4.0, ..., 38.0, 40.0]
print(len(check1))
>> 20
print(check2)
>> [4.0, 8.0, ..., 36.0, 40.0]
print(len(check2))
>> 10
So far so good. Now here is my problem.
def metalooper(col_ls, my_array=data):
"""
This function calls 'looper' - which calls
'shapeshifter' - for every 'col' in 'col_ls'.
EX:
j_list = [1,2,3,4,5]
==> col_ls = [2,4,8,16,32]
==> looper(2), looper(4),
looper(8), ..., looper(32)
==> shapeshifter(2), shapeshifter(4),
shapeshifter(8), ..., shapeshifter(32)
such that looper(2^j) ==> shapeshifter(2^j)
for j in j_list
"""
res = []
for col in col_ls:
res.append(looper(num_col=col))
return res
j_list = [2,4,8,16,32]
check3 = metalooper(j_list)
Running the code above provides this error:
ValueError: total size of new array must be unchanged
With 40 data points, the array can be reshaped into 2 columns of 20 rows, or 4 columns of 10 rows, or 8 columns of 5 rows, BUT at 16 columns, the array cannot be reshaped without clipping data since 40/16 ≠ integer. I believe this is the problem with my code, but I do not know how to fix it.
I am hoping there is a way to cutoff the last values in each row that do not fit in each window. If this is not possible, I am hoping I can append zeroes to fill the entries that maintain the size of the original array, so that I can remove the zeroes after. Or maybe even some complicated if - try - break block. What are some ways around this problem?
I think this will give you what you want in one step:
def windowFunc(a, window, f = np.max):
return np.array([f(i) for i in np.split(a, range(window, a.size, window))])
with default f, that will give you a array of maximums for your windows.
Generally, using np.split and range, this will let you split into a (possibly ragged) list of arrays:
def shapeshifter(num_col, my_array=data):
return np.split(my_array, range(num_col, my_array.size, num_col))
You need a list of arrays because a 2D array can't be ragged (every row needs the same number of columns)
If you really want to pad with zeros, you can use np.lib.pad:
def shapeshifter(num_col, my_array=data):
return np.lib.pad(my_array, (0, num_col - my.array.size % num_col), 'constant', constant_values = 0).reshape(-1, num_col)
Warning:
It is also technically possible to use, for example, a.resize(32,2) which will create an ndArray padded with zeros (as you requested). But there are some big caveats:
You would need to calculate the second axis because -1 tricks don't work with resize.
If the original array a is referenced by anything else, a.resize will fail with the following error:
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
The resize function (i.e. np.resize(a)) is not equivalent to a.resize, as instead of padding with zeros it will loop back to the beginning.
Since you seem to want to reference a by a number of windows, a.resize isn't very useful. But it's a rabbit hole that's easy to fall into.
EDIT:
Looping through a list is slow. If your input is long and windows are small, the windowFunc above will bog down in the for loops. This should be more efficient:
def windowFunc2(a, window, f = np.max):
tail = - (a.size % window)
if tail == 0:
return f(a.reshape(-1, window), axis = -1)
else:
body = a[:tail].reshape(-1, window)
return np.r_[f(body, axis = -1), f(a[tail:])]
Here's a generalized way to reshape with truncation:
def reshape_and_truncate(arr, shape):
desired_size_factor = np.prod([n for n in shape if n != -1])
if -1 in shape: # implicit array size
desired_size = arr.size // desired_size_factor * desired_size_factor
else:
desired_size = desired_size_factor
return arr.flat[:desired_size].reshape(shape)
Which your shapeshifter could use in place of reshape

Swift Comparing values in a single array

Using swift in Xcode I have an array of float values 'IMProdArray.'
I would like to determine a function that checks the values in the array to determine if any of the values are within 0.200 of each other. If they are return 'false', if they aren't, return 'true'.
As a similar function I would also like to calculate the biggest distance between two values and return the halfway point value: i.e.
In an array I have values: 1, 3, 4, 10, 11, 12
the largest gap between two values (if they are in order) is 4-10. The mid value of this is 7. So return 7.
A nudge in the right direction would be greatly appreciated.
Since you asked for a solution, here's the solution (though I really shouldn't just be writing your code for you).
This works for an array sorted in ascending order (sort your array before using this code if it isn't already sorted):
var maxGap = -1
var maxGapIndex = -1
for i in [1..<IMProdArray.count] {
let gap = IMProdArray[i] - IMProdArray[i-1]
if gap <= 0.2 {
// handle the values being within 0.2 of each other
}
if gap > maxGap {
maxGap = gap
maxGapIndex = i-1 // store the index of the first number
}
}
You can then retrieve the index of the gap from maxGapIndex. For your example, maxGapIndex will be 2, which is the index of 4 in your array.

Resources