Minimize (firstA_max - firstA_min) + (secondB_max - secondB_min) - arrays

Given n pairs of integers. Split into two subsets A and B to minimize sum(maximum difference among first values of A, maximum difference among second values of B).
Example : n = 4
{0, 0}; {5;5}; {1; 1}; {3; 4}
A = {{0; 0}; {1; 1}}
B = {{5; 5}; {3; 4}}
(maximum difference among first values of A, maximum difference among second values of B).
(maximum difference among first values of A) = fA_max - fA_min = 1 - 0 = 1
(maximum difference among second values of B) = sB_max - sB_min = 5 - 4 = 1
Therefore, the answer if 1 + 1 = 2. And this is the best way.
Obviously, maximum difference among the values equals to (maximum value - minimum value). Hence, what we need to do is find the minimum of (fA_max - fA_min) + (sB_max - sB_min)
Suppose the given array is arr[], first value if arr[].first and second value is arr[].second.
I think it is quite easy to solve this in quadratic complexity. You just need to sort the array by the first value. Then all the elements in subset A should be picked consecutively in the sorted array. So, you can loop for all ranges [L;R] of the sorted. Each range, try to add all elements in that range into subset A and add all the remains into subset B.
For more detail, this is my C++ code
int calc(pair<int, int> a[], int n){
int m = 1e9, M = -1e9, res = 2e9; //m and M are min and max of all the first values in subset A
for (int l = 1; l <= n; l++){
int g = m, G = M; //g and G are min and max of all the second values in subset B
for(int r = n; r >= l; r--) {
if (r - l + 1 < n){
res = min(res, a[r].first - a[l].first + G - g);
}
g = min(g, a[r].second);
G = max(G, a[r].second);
}
m = min(m, a[l].second);
M = max(M, a[l].second);
}
return res;
}
Now, I want to improve my algorithm down to loglinear complexity. Of course, sort the array by the first value. After that, if I fixed fA_min = a[i].first, then if the index i increase, the fA_max will increase while the (sB_max - sB_min) decrease.
But now I am still stuck here, is there any ways to solve this problem in loglinear complexity?

The following approach is an attempt to escape the n^2, using an argmin list for the second element of the tuples (lets say the y-part). Where the points are sorted regarding x.
One Observation is that there is an optimum solution where A includes index argmin[0] or argmin[n-1] or both.
in get_best_interval_min_max we focus once on including argmin[0] and the next smallest element on y and so one. The we do the same from the max element.
We get two dictionaries {(i,j):(profit, idx)}, telling us how much we gain in y when including points[i:j+1] in A, towards min or max on y. idx is the idx in the argmin array.
calculate the objective for each dict assuming max/min or y is not in A.
combine the results of both dictionaries, : (i1,j1): (v1, idx1) and (i2,j2): (v2, idx2). result : j2 - i1 + max_y - min_y - v1 - v2.
Constraint: idx1 < idx2. Because the indices in the argmin array can not intersect, otherwise some profit in y might be counted twice.
On average the dictionaries (dmin,dmax) are smaller than n, but in the worst case when x and y correlate [(i,i) for i in range(n)] they are exactly n, and we do not win any time. Anyhow on random instances this approach is much faster. Maybe someone can improve upon this.
import numpy as np
from random import randrange
import time
def get_best_interval_min_max(points):# sorted input according to x dim
L = len(points)
argmin_b = np.argsort([p[1] for p in points])
b_min,b_max = points[argmin_b[0]][1], points[argmin_b[L-1]][1]
arg = [argmin_b[0],argmin_b[0]]
res_min = dict()
for i in range(1,L):
res_min[tuple(arg)] = points[argmin_b[i]][1] - points[argmin_b[0]][1],i # the profit in b towards min
if arg[0] > argmin_b[i]: arg[0]=argmin_b[i]
elif arg[1] < argmin_b[i]: arg[1]=argmin_b[i]
arg = [argmin_b[L-1],argmin_b[L-1]]
res_max = dict()
for i in range(L-2,-1,-1):
res_max[tuple(arg)] = points[argmin_b[L-1]][1]-points[argmin_b[i]][1],i # the profit in b towards max
if arg[0]>argmin_b[i]: arg[0]=argmin_b[i]
elif arg[1]<argmin_b[i]: arg[1]=argmin_b[i]
# return the two dicts, difference along y,
return res_min, res_max, b_max-b_min
def argmin_algo(points):
# return the objective value, sets A and B, and the interval for A in points.
points.sort()
# get the profits for different intervals on the sorted array for max and min
dmin, dmax, y_diff = get_best_interval_min_max(points)
key = [None,None]
res_min = 2e9
# the best result when only the min/max b value is includes in A
for d in [dmin,dmax]:
for k,(v,i) in d.items():
res = points[k[1]][0]-points[k[0]][0] + y_diff - v
if res < res_min:
key = k
res_min = res
# combine the results for max and min.
for k1,(v1,i) in dmin.items():
for k2,(v2,j) in dmax.items():
if i > j: break # their argmin_b indices can not intersect!
idx_l, idx_h = min(k1[0], k2[0]), max(k1[1],k2[1]) # get index low and idx hight for combination
res = points[idx_h][0]-points[idx_l][0] -v1 -v2 + y_diff
if res < res_min:
key = (idx_l, idx_h) # new merged interval
res_min = res
return res_min, points[key[0]:key[1]+1], points[:key[0]]+points[key[1]+1:], key
def quadratic_algorithm(points):
points.sort()
m, M, res = 1e9, -1e9, 2e9
idx = (0,0)
for l in range(len(points)):
g, G = m, M
for r in range(len(points)-1,l-1,-1):
if r-l+1 < len(points):
res_n = points[r][0] - points[l][0] + G - g
if res_n < res:
res = res_n
idx = (l,r)
g = min(g, points[r][1])
G = max(G, points[r][1])
m = min(m, points[l][1])
M = max(M, points[l][1])
return res, points[idx[0]:idx[1]+1], points[:idx[0]]+points[idx[1]+1:], idx
# let's try it and compare running times to the quadratic_algorithm
# get some "random" points
c1=0
c2=0
for i in range(100):
points = [(randrange(100), randrange(100)) for i in range(1,200)]
points.sort() # sorted for x dimention
s = time.time()
r1 = argmin_algo(points)
e1 = time.time()
r2 = quadratic_algorithm(points)
e2 = time.time()
c1 += (e1-s)
c2 += (e2-e1)
if not r1[0] == r2[0]:
print(r1,r2)
raise Exception("Error, results are not equal")
print("time of argmin_algo", c1, "time of quadratic_algorithm",c2)

UPDATE: #Luka proved the algorithm described in this answer is not exact. But I will keep it here because it's a good performance heuristics and opens the way to many probabilistic methods.
I will describe a loglinear algorithm. I couldn't find a counter example. But I also couldn't find a proof :/
Let set A be ordered by first element and set B be ordered by second element. They are initially empty. Take floor(n/2) random points of your set of points and put in set A. Put the remaining points in set B. Define this as a partition.
Let's call a partition stable if you can't take an element of set A, put it in B and decrease the objective function and if you can't take an element of set B, put it in A and decrease the objective function. Otherwise, let's call the partition unstable.
For an unstable partition, the only moves that are interesting are the ones that take the first or the last element of A and move to B or take the first or the last element of B and move to A. So, we can find all interesting moves for a given unstable partition in O(1). If an interesting move decreases the objective function, do it. Go like that until the partition becomes stable. I conjecture that it takes at most O(n) moves for the partition to become stable. I also conjecture that at the moment the partition becomes stable, you will have a solution.

Related

Find Minimum Score Possible

Problem statement:
We are given three arrays A1,A2,A3 of lengths n1,n2,n3. Each array contains some (or no) natural numbers (i.e > 0). These numbers denote the program execution times.
The task is to choose the first element from any array and then you can execute that program and remove it from that array.
For example:
if A1=[3,2] (n1=2),
A2=[7] (n2=1),
A3=[1] (n3=1)
then we can execute programs in various orders like [1,7,3,2] or [7,1,3,2] or [3,7,1,2] or [3,1,7,2] or [3,2,1,7] etc.
Now if we take S=[1,3,2,7] as the order of execution the waiting time of various programs would be
for S[0] waiting time = 0, since executed immediately,
for S[1] waiting time = 0+1 = 1, taking previous time into account, similarly,
for S[2] waiting time = 0+1+3 = 4
for S[3] waiting time = 0+1+3+2 = 6
Now the score of array is defined as sum of all wait times = 0 + 1 + 4 + 6 = 11, This is the minimum score we can get from any order of execution.
Our task is to find this minimum score.
How can we solve this problem? I tried with approach trying to pick minimum of three elements each time, but it is not correct because it gets stuck when two or three same elements are encountered.
One more example:
if A1=[23,10,18,43], A2=[7], A3=[13,42] minimum score would be 307.
The simplest way to solve this is with dynamic programming (which runs in cubic time).
For each array A: Suppose you take the first element from array A, i.e. A[0], as the next process. Your total cost is the wait-time contribution of A[0] (i.e., A[0] * (total_remaining_elements - 1)), plus the minimal wait time sum from A[1:] and the rest of the arrays.
Take the minimum cost over each possible first array A, and you'll get the minimum score.
Here's a Python implementation of that idea. It works with any number of arrays, not just three.
def dp_solve(arrays: List[List[int]]) -> int:
"""Given list of arrays representing dependent processing times,
return the smallest sum of wait_time_before_start for all job orders"""
arrays = [x for x in arrays if len(x) > 0] # Remove empty
#functools.lru_cache(100000)
def dp(remaining_elements: Tuple[int],
total_remaining: int) -> int:
"""Returns minimum wait time sum when suffixes of each array
have lengths in 'remaining_elements' """
if total_remaining == 0:
return 0
rem_elements_copy = list(remaining_elements)
best = 10 ** 20
for i, x in enumerate(remaining_elements):
if x == 0:
continue
cost_here = arrays[i][-x] * (total_remaining - 1)
if cost_here >= best:
continue
rem_elements_copy[i] -= 1
best = min(best,
dp(tuple(rem_elements_copy), total_remaining - 1)
+ cost_here)
rem_elements_copy[i] += 1
return best
return dp(tuple(map(len, arrays)), sum(map(len, arrays)))
Better solutions
The naive greedy strategy of 'smallest first element' doesn't work, because it can be worth it to do a longer job to get a much shorter job in the same list done, as the example of
A1 = [100, 1, 2, 3], A2 = [38], A3 = [34],
best solution = [100, 1, 2, 3, 34, 38]
by user3386109 in the comments demonstrates.
A more refined greedy strategy does work. Instead of the smallest first element, consider each possible prefix of the array. We want to pick the array with the smallest prefix, where prefixes are compared by average process time, and perform all the processes in that prefix in order.
A1 = [ 100, 1, 2, 3]
Prefix averages = [(100)/1, (100+1)/2, (100+1+2)/3, (100+1+2+3)/4]
= [ 100.0, 50.5, 34.333, 26.5]
A2=[38]
A3=[34]
Smallest prefix average in any array is 26.5, so pick
the prefix [100, 1, 2, 3] to complete first.
Then [34] is the next prefix, and [38] is the final prefix.
And here's a rough Python implementation of the greedy algorithm. This code computes subarray averages in a completely naive/brute-force way, so the algorithm is still quadratic (but an improvement over the dynamic programming method). Also, it computes 'maximum suffixes' instead of 'minimum prefixes' for ease of coding, but the two strategies are equivalent.
def greedy_solve(arrays: List[List[int]]) -> int:
"""Given list of arrays representing dependent processing times,
return the smallest sum of wait_time_before_start for all job orders"""
def max_suffix_avg(arr: List[int]):
"""Given arr, return value and length of max-average suffix"""
if len(arr) == 0:
return (-math.inf, 0)
best_len = 1
best = -math.inf
curr_sum = 0.0
for i, x in enumerate(reversed(arr), 1):
curr_sum += x
new_avg = curr_sum / i
if new_avg >= best:
best = new_avg
best_len = i
return (best, best_len)
arrays = [x for x in arrays if len(x) > 0] # Remove empty
total_time_sum = sum(sum(x) for x in arrays)
my_averages = [max_suffix_avg(arr) for arr in arrays]
total_cost = 0
while True:
largest_avg_idx = max(range(len(arrays)),
key=lambda y: my_averages[y][0])
_, n_to_remove = my_averages[largest_avg_idx]
if n_to_remove == 0:
break
for _ in range(n_to_remove):
total_time_sum -= arrays[largest_avg_idx].pop()
total_cost += total_time_sum
# Recompute the changed array's avg
my_averages[largest_avg_idx] = max_suffix_avg(arrays[largest_avg_idx])
return total_cost

Maximum cost of Matrix

You are given an N*M 2D matrix, Each cell contains a number and no two cells have the same number. We have to select one element from each of the N rows, let selected elements be c1,c2,...cN.
The cost of Matrix is defined as - the sum of (ci-sj)(1<=i,j<=N) where sj represents the greatest element in jth row that is <=ci, If no such sj exists then sj=0.
We have to find the maximum possible cost of the matrix.
For Example:-
If N=M=3 and matrix = [[4,3,2], [6,1,5], [8,9,7]]
Now the values for c1 can be 4,3 or 2, values for c2 can be 6,1 or 5 and values for c3 can be 8,9 or 7
If we select c1=4, c2=6 and c3=9
Cost of c1 = (4-4)+(4-1)+(4-0)=7, here s1 = 4(greatest element <=c1=4 in 1st row), s2 = 1, s3 = 0(as no element is <=c3=9 in 3rd row)
similarly, Cost of c2 = (6-4)+(6-6)+(6-0)=8
cost of c3 = (9-4)+(9-6)+(9-9) = 8
Total Cost = 7+8+8 = 23
This is the maximum score that we can get from any values of c1,c2,c3.
Another Example:-
If Matrix = [[2,22,28,30],[21,5,14,4],[20,6,15,23]] then maximum cost would be 60.
I tried choosing the maximum element from each row as ci, but this does not work. Can anyone tell how we can approach and solve this problem?
Edit:
I have tried long for coding and understanding this but am unable to do so successfully, here is what I am able to code until now, this passes some cases but fails some.
def solve(matrix):
N = len(matrix)
M = len(matrix[1])
i = 0
totalcost = 0
for row in matrix:
itotalcost = 0
for ci in row:
icost = 0
totalicost = 0
for k in range(0, N):
if k != i:
idx = 0
sj = 0
isj = 0
for idx in range(0, M):
isj = matrix[k][idx]
if isj <= ci and isj > sj:
sj = isj
icost = ci - sj
totalicost += icost
#print("ci=", ci, "sj", sj, "icost=", icost)
if itotalcost < totalicost:
itotalcost = totalicost
i += 1
#print("itotalcost=", itotalcost)
totalcost += itotalcost
return totalcost
You can solve this in O(N log N) time, where N is the total number of elements.
First, we can define the value of a nonnegative integer x independently of what row it is in. Letting max_at_most(row, x) denote a function returning the largest element in row that is at most x, or 0 if none exists.
Then:
value(x) = sum over all rows R of: { x - max_at_most(R, x) }
Now, max_at_most is a monotonic function of x for a fixed row, and it only changes at most length(row) times for each row, which we can use to calculate it quickly.
Find all unique elements in your matrix, and track the row indices where each element occurs. Sort the unique elements. Now, if we iterate over the elements in order, we can track the largest elements seen in each row (and also the sum of max_at_most(row, x) over all rows) in O(1) time.
def max_matrix_value(matrix: List[List[int]]) -> int:
"""Maximize the sum over all rows i, j, of (c_i - f(j, c_i)})
where c_i must be an element of row i and f(j, c_i) is the largest
element of row j less than or equal to c_i, or 0 if none exists."""
num_rows = len(matrix)
elem_to_indices = defaultdict(list)
for index, row in enumerate(matrix):
for elem in row:
elem_to_indices[elem].append(index)
current_max_element = [0] * num_rows
current_max_sum = 0
max_value_by_row = [0] * num_rows
for element in sorted(elem_to_indices):
# Update maximum element seen in each row
for index in elem_to_indices[element]:
difference = element - current_max_element[index]
current_max_sum += difference
current_max_element[index] = element
max_value_for_element = element * num_rows - current_max_sum
# Update maximum value achieved by row, if we have a new record
for index in elem_to_indices[element]:
max_value_by_row[index] = max(max_value_by_row[index],
max_value_for_element)
return sum(max_value_by_row)

Generate a matrix of combinations (permutation) without repetition (array exceeds maximum array size preference)

I am trying to generate a matrix, that has all unique combinations of [0 0 1 1], I wrote this code for this:
v1 = [0 0 1 1];
M1 = unique(perms([0 0 1 1]),'rows');
• This isn't ideal, because perms() is seeing each vector element as unique and doing:
4! = 4 * 3 * 2 * 1 = 24 combinations.
• With unique() I tried to delete all the repetitive entries so I end up with the combination matrix M1 →
only [4!/ 2! * (4-2)!] = 6 combinations!
Now, when I try to do something very simple like:
n = 15;
i = 1;
v1 = [zeros(1,n-i) ones(1,i)];
M = unique(perms(vec_1),'rows');
• Instead of getting [15!/ 1! * (15-1)!] = 15 combinations, the perms() function is trying to do
15! = 1.3077e+12 combinations and it's interrupted.
• How would you go about doing in a much better way? Thanks in advance!
You can use nchoosek to return the indicies which should be 1, I think in your heart you knew this must be possible because you were using the definition of nchoosek to determine the expected final number of permutations! So we can use:
idx = nchoosek( 1:N, k );
Where N is the number of elements in your array v1, and k is the number of elements which have the value 1. Then it's simply a case of creating the zeros array and populating the ones.
v1 = [0, 0, 1, 1];
N = numel(v1); % number of elements in array
k = nnz(v1); % number of non-zero elements in array
colidx = nchoosek( 1:N, k ); % column index for ones
rowidx = repmat( 1:size(colidx,1), k, 1 ).'; % row index for ones
M = zeros( size(colidx,1), N ); % create output
M( rowidx(:) + size(M,1) * (colidx(:)-1) ) = 1;
This works for both of your examples without the need for a huge intermediate matrix.
Aside: since you'd have the indicies using this approach, you could instead create a sparse matrix, but whether that's a good idea or not would depend what you're doing after this point.

How to translate a solution into divide-and-conquer (finding a sub array with the largest, smallest value)

I am trying to get better at divide an conquer algorithms and am using this one below as an example. Given an array _in and some length l it finds the start point of a sub array _in[_min_start,_min_start+l] such that the lowest value in that sub array is the highest it could possible be. I have come up with a none divide and conquer solution and am wondering how I could go about translating this into one which divides the array up into smaller parts (divide-and-conquer).
def main(_in, l):
_min_start = 0
min_trough = None
for i in range(len(_in)+1-l):
if min_trough is None:
min_trough = min(_in[i:i+l])
if min(_in[i:i+l]) > min_trough:
_min_start = i
min_trough = min(_in[i:i+l])
return _min_start, _in[_min_start:_min_start+l]
e.g. For the array [5, 1, -1, 2, 5, -4, 3, 9, 8, -2, 0, 6] and a sub array of lenght 3 it would return start position 6 (resulting in the array [3,9,8]).
Three O(n) solutions and a benchmark
Note I'm renaming _in and l to clearer-looking names A and k.
Solution 1: Divide and conquer
Split the array in half. Solve left half and right half recursively. The subarrays not yet considered cross the middle, i.e., they're a suffix of the left part plus a prefix of the right part. Compute k-1 suffix-minima of the left half and k-1 prefix-minima of the right half. That allows you to compute the minimum for each middle-crossing subarray of length k in O(1) time each. The best subarray for the whole array is the best of left-best, right-best and crossing-best.
Runtime is O(n), I believe. As Ellis pointed out, in the recursion the subarray can become smaller than k. Such cases take O(1) time to return the equivalent of "there aren't any k-length subarrays in here". So the time is:
T(n) = { 2 * T(n/2) + O(k) if n >= k
{ O(1) otherwise
For any 0 <= k <= n we have k=nc with 0 <= c <= 1. Then the number of calls is Θ(n1-c) and each call's own work takes Θ(nc) time, for a total of Θ(n) time.
Posted a question about the complexity to be sure.
Python implementation:
def solve_divide_and_conquer(A, k):
def solve(start, stop):
if stop - start < k:
return -inf,
mid = (start + stop) // 2
left = solve(start, mid)
right = solve(mid, stop)
i0 = mid - k + 1
prefixes = accumulate(A[mid:mid+k-1], min)
if i0 < 0:
prefixes = [*prefixes][-i0:]
i0 = 0
suffixes = list(accumulate(A[i0:mid][::-1], min))[::-1]
crossing = max(zip(map(min, suffixes, prefixes), count(i0)))
return max(left, right, crossing)
return solve(0, len(A))[1]
Solution 2: k-Blocks
As commented by #benrg, the above dividing-and-conquering is needlessly complicated. We can simply work on blocks of length k. Compute the suffix minima of the first block and the prefix minima of the second block. That allows finding the minimum of each k-length subarray within these two blocks in O(1) time. Do the same with the second and third block, the third and fourth block, etc. Time is O(n) as well.
Python implementation:
def solve_blocks(A, k):
return max(max(zip(map(min, prefixes, suffixes), count(mid-k)))
for mid in range(k, len(A)+1, k)
for prefixes in [accumulate(A[mid:mid+k], min, initial=inf)]
for suffixes in [list(accumulate(A[mid-k:mid][::-1], min, initial=inf))[::-1]]
)[1]
Solution 3: Monoqueue
Not divide & conquer, but first one I came up with (and knew was O(n)).
Sliding window, represent the window with a deque of (sorted) indexes of strictly increasing array values in the window. When sliding the window to include a new value A[i]:
Remove the first index from the deque if the sliding makes it fall out of the window.
Remove indexes whose array values are larger than A[i]. (They can never be the minimum of the window anymore.)
Include the new index i.
The first index still in the deque is the index of the current window's minimum value. Use that to update overall result.
Python implementation:
from collections import deque
A = [5, 1, -1, 2, 5, -4, 3, 9, 8, -2, 0, 6]
k = 3
I = deque()
for i in range(len(A)):
if I and I[0] == i - k:
I.popleft()
while I and A[I[-1]] >= A[i]:
I.pop()
I.append(i)
curr_min = A[I[0]]
if i == k-1 or i > k-1 and curr_min > max_min:
result = i - k + 1
max_min = curr_min
print(result)
Benchmark
With 4000 numbers from the range 0 to 9999, and k=2000:
80.4 ms 81.4 ms 81.8 ms solve_brute_force
80.2 ms 80.5 ms 80.7 ms solve_original
2.4 ms 2.4 ms 2.4 ms solve_monoqueue
2.4 ms 2.4 ms 2.4 ms solve_divide_and_conquer
1.3 ms 1.4 ms 1.4 ms solve_blocks
Benchmark code (Try it online!):
from timeit import repeat
from random import choices
from itertools import accumulate
from math import inf
from itertools import count
from collections import deque
def solve_monoqueue(A, k):
I = deque()
for i in range(len(A)):
if I and I[0] == i - k:
I.popleft()
while I and A[I[-1]] >= A[i]:
I.pop()
I.append(i)
curr_min = A[I[0]]
if i == k-1 or i > k-1 and curr_min > max_min:
result = i - k + 1
max_min = curr_min
return result
def solve_divide_and_conquer(A, k):
def solve(start, stop):
if stop - start < k:
return -inf,
mid = (start + stop) // 2
left = solve(start, mid)
right = solve(mid, stop)
i0 = mid - k + 1
prefixes = accumulate(A[mid:mid+k-1], min)
if i0 < 0:
prefixes = [*prefixes][-i0:]
i0 = 0
suffixes = list(accumulate(A[i0:mid][::-1], min))[::-1]
crossing = max(zip(map(min, suffixes, prefixes), count(i0)))
return max(left, right, crossing)
return solve(0, len(A))[1]
def solve_blocks(A, k):
return max(max(zip(map(min, prefixes, suffixes), count(mid-k)))
for mid in range(k, len(A)+1, k)
for prefixes in [accumulate(A[mid:mid+k], min, initial=inf)]
for suffixes in [list(accumulate(A[mid-k:mid][::-1], min, initial=inf))[::-1]]
)[1]
def solve_brute_force(A, k):
return max(range(len(A)+1-k),
key=lambda start: min(A[start : start+k]))
def solve_original(_in, l):
_min_start = 0
min_trough = None
for i in range(len(_in)+1-l):
if min_trough is None:
min_trough = min(_in[i:i+l])
if min(_in[i:i+l]) > min_trough:
_min_start = i
min_trough = min(_in[i:i+l])
return _min_start # , _in[_min_start:_min_start+l]
solutions = [
solve_brute_force,
solve_original,
solve_monoqueue,
solve_divide_and_conquer,
solve_blocks,
]
for _ in range(3):
A = choices(range(10000), k=4000)
k = 2000
# Check correctness
expect = None
for solution in solutions:
index = solution(A.copy(), k)
assert 0 <= index and index + k-1 < len(A)
min_there = min(A[index : index+k])
if expect is None:
expect = min_there
print(expect)
else:
print(min_there == expect, solution.__name__)
print()
# Speed
for solution in solutions:
copy = A.copy()
ts = sorted(repeat(lambda: solution(copy, k), number=1))[:3]
print(*('%5.1f ms ' % (t * 1e3) for t in ts), solution.__name__)
print()

Fractal dimension algorithms gives results of >2 for time-series

I'm trying to compute Fractal Dimension of very specific time series array.
I've found implementations of Higuchi FD algorithm:
def hFD(a, k_max): #Higuchi FD
L = []
x = []
N = len(a)
for k in range(1,k_max):
Lk = 0
for m in range(0,k):
#we pregenerate all idxs
idxs = np.arange(1,int(np.floor((N-m)/k)),dtype=np.int32)
Lmk = np.sum(np.abs(a[m+idxs*k] - a[m+k*(idxs-1)]))
Lmk = (Lmk*(N - 1)/(((N - m)/ k)* k)) / k
Lk += Lmk
L.append(np.log(Lk/(m+1)))
x.append([np.log(1.0/ k), 1])
(p, r1, r2, s)=np.linalg.lstsq(x, L)
return p[0]
from https://github.com/gilestrolab/pyrem/blob/master/src/pyrem/univariate.py
and Katz FD algorithm:
def katz(data):
n = len(data)-1
L = np.hypot(np.diff(data), 1).sum() # Sum of distances
d = np.hypot(data - data[0], np.arange(len(data))).max() # furthest distance from first point
return np.log10(n) / (np.log10(d/L) + np.log10(n))
from https://github.com/ProjectBrain/brainbits/blob/master/katz.py
I expect results of ~1,5 in both cases however get 2,2 and 4 instead...
hFD(x,4) = 2.23965648024 (k value of here is chosen as an example, however result won't change much in range 4-12 edit: I was able to get result of ~1,9 with k=22, however this still does not make any sense);
katz(x) = 4.03911343057
Which in theory should not be possible for 1D time-series array.
Questions here are: are Higuchi and Katz algorithms not suitable for time-series analysis in general, or am I doing something wrong on my side? Also are there any other python libraries with already implemented and error-less algorithms to verify my results?
My array of interest (each element represents point in time t, t+1, t+2,..., t+N)
x = np.array([373.4413096546802, 418.58026161917803,
395.7387698762124, 416.21163042783206,
407.9812265426947, 430.2355284504048,
389.66095393296763, 442.18969320408166,
383.7448638776275, 452.8931822090381,
413.5696828065546, 434.45932712853585
,429.95212301648996, 436.67612861616215,
431.10235365546964, 418.86935850068545,
410.84902747247423, 444.4188867775925,
397.1576881118471, 451.6129904245434,
440.9181246439599, 438.9857353268666,
437.1800408012741, 460.6251405281339,
404.3208481355302, 500.0432305427639,
380.49579242696177, 467.72953450552893,
333.11328535523967, 444.1171938340972,
303.3024198243042, 453.16332062153276,
356.9697406524534, 520.0720647379901,
402.7949987727925, 536.0721418821788,
448.21609036718445, 521.9137447208354,
470.5822486372967, 534.0572029633416,
480.03741443274765, 549.2104258193126,
460.0853321729541, 561.2705350421926,
444.52689144575794, 560.0835589548401,
462.2154563472787, 559.7166600213686,
453.42374550322353, 559.0591804941763,
421.4899935529862, 540.7970410737004,
454.34364779193913, 531.6018122709779,
437.1545739076901, 522.4262260216169,
444.6017030695873, 533.3991716674865,
458.3492761150962, 513.1735160522104])
The array you are trying to estimate hDF is too short. You need to get longer sample or oversample the current one to have at least 128 points for hDF and more then 4000 points for Katz
import scipy.signal as signal
...
x_res=signal.resample(x,128)
hfd(x_res,4) will be 1.74383694265

Resources