Finding the center of 1D data

Finding the center of 1D data - c

Say I have data: 0 (or near 0), 0, 0, ..., 1, 10, 52, 80, 100, 100, 100, 100 (for a while), 90, 45, 5, 0, 0, 0...
I want to find the index (not necessarily an int, I want more precision) of the 'center' of my plateau of data.
My first thought was to do a gaussian fit, but the data is rather flat for a while in the center. So maybe some kind of square (?) fit. I've been looking at minimization with gsl also, but I don't know what the simplest way to do this would be.
A simple way would be to find the index corresponding to the median value, but that gives me only a precision of 1. With a curve fitting I can do better.
Note: I'm in C and can use GSL, but a general math solution would work too!

Suggested algorithm:
Optionally filter data: median of 3, low pass, etc.
Find average value: Avg
Find average index of values above Avg: Center_index.
Average a few of the "values above" near Center_index.

Weighted Mean Center of a line, with an array similar to your data:
int w[] = {0, 0, 0, 1, 10, 52, 80, 100, 100, 100, 100, 90, 45, 5, 0, 0}
...is calculated by multiplying the x and y coordinate by the weight
for that feature and summing all for both x and y individually, and
then dividing this by the sum of all the weights.
Because this is a 1D array, position is expressed using the position within the array, i.e. the index, and looks like this:
weighted mean center = sum(w[i]*i)/sum(w[i]) //for all i
in pseudo code:
double sum_w=0;//sum of all values (weights)
double prod_wx=0;//product of all corresponding weights and positions
double wmc=0; //weighted mean center
for(int i=0;i<sizeof(w)/sizeof(w[0]);i++)
{
prod_wx += w[i]*i;
sum_w += w[i];
}
wmc = prod_wx/sum_w;

Related

Make a list of incremented numbers from a set of numbers

I've been trying to write a formula to create a list from a given set of numbers. I tried to do this with "if" function but failed many times.
The rule is to create a set of numbers by adding -2, -1 and +1, +2 to each number so that growing the size of the original list exponentially.
so if the original list includes is (x,y), the calculated new list would include (x-2,x-1,x,x+1,x+2, y-2,y-1, y, y+1, y+2).
Original numbers are all positive, and the calculated list is expected to be all positive, and should not include zero and repetitive numbers neither.
E.g,
in columns A1,B1,C1,D1, etc
List = 1, 2, 4, 5, 9, 10, 22, 25 (in this case there is a total of 8 numbers)
when you add -2, -1 and +1, +2 to each number we get ;
Calculated values = -1, 0, 1, 2, 3, 0, 1, 2, 3, 4, 2,3,4,5,6,
3,4,5,6,7, 7,8,9,10,11, 8,9,10,11,12, 20,21,22,23,24, 23,24,25,26,27
Final Calculated List should be : 1,2,3,4,5,6,7,8,9,10,11,12,20,21,22,23,24,25,26,27
( all repetitive and negative numbers, 0's are omitted)
I realized that I can't do this with if function, can I get help how to solve this problem? please.
thanks in advance.

Try:
=TEXTJOIN(", ", 1, INDEX(QUERY(SORT(UNIQUE(FLATTEN(
SPLIT(A1, ",")+{-2; -1; 0; 1; 2}))), "where Col1>0", )))

How to set the minimum and maximum value for each item in a Numpy array?

Suppose I have a numpy array
a = np.array([1, 100, 123, -400, 85, -98])
And I want to limit each value between -100 and 90. So basically, I want the numpy array to be like this:
a = np.array([1, 90, 90, -100, 85, -98])
I know this can be done through iterating over the numpy array, but is there any other efficient method to carry out this task?

There are several ways of doing so. First, using a numpy function as proposed by Sridhar Murali :
a = np.array([1, 100, 123, -400, 85, -98])
np.clip(a,-100,90)
Second, using numpy array comparison :
a = np.array([1, 100, 123, -400, 85, -98])
a[a>90] = 90
a[a<-100] = -100
Third, if a numpy is not required for the rest of your code, using list comprehension :
a = [1, 100, 123, -400, 85, -98]
a = [-100 if x<-100 else 90 if x>90 else x for x in a]
They all give the same result :
a = [1, 90, 90, -100, 85, -98]
As for coding style, I would prefer numpy comparison or list comprehension as they state clearly what is done, but it is up to you really. As for speed, with timeit.repeat on 100000 repetitions, I get on average, from the best to the worst :
4.8e-3 sec for list comprehension
1.8e-1 sec for numpy array comparison
2.7e-1 sec for np.clip function
Clearly, if an array is not necessary afterwards, list comprehension is the way to go. And if you need an array, direct comparison is almost twice more efficient that the clip function, while more readable.

I think the easiest way for you to get the result is using the clip function from numpy.
import numpy as np
a = np.array([1, 100, 123, -400, 85, -98])
np.clip(a,-100,90)

Element by Element Comparison of Multiple Arrays in MATLAB

I have a multiple input arrays and I want to generate one output array where the value is 0 if all elements in a column are the same and the value is 1 if all elements in a column are different.
For example, if there are three arrays :
A = [28, 28, 43, 43]
B = [28, 43, 43, 28]
C = [28, 28, 43, 43]
Output = [0, 1, 0, 1]
The arrays can be of any size and any number, but the arrays are also the same size.

A none loopy way is to use diff and any to advantage:
A = [28, 28, 43,43];
B = [28, 43, 43,28];
C = [28, 28, 43,43];
D = any(diff([A;B;C])) %Combine all three (or all N) vectors into a matrix. Using the Diff to find the difference between each element from row to row. If any of them is non-zero, then return 1, else return 0.
D = 0 1 0 1

There are several easy ways to do it.
Let's start by putting the relevant vectors in a matrix:
M = [A; B; C];
Now we can do things like:
idx = min(M)==max(M);
or
idx = ~var(M);

No one seems to have addressed that you have a variable amount of arrays. In your case, you have three in your example but you said you could have a variable amount. I'd also like to take a stab at this using broadcasting.
You can create a function that will take a variable number of arrays, and the output will give you an array of an equal number of columns shared among all arrays that conform to the output you're speaking of.
First create a larger matrix that concatenates all of the arrays together, then use bsxfun to take advantage of broadcasting the first row and ensuring that you find columns that are all equal. You can use all to complete this step:
function out = array_compare(varargin)
matrix = vertcat(varargin{:});
out = ~all(bsxfun(#eq, matrix(1,:), matrix), 1);
end
This will take the first row of the stacked matrix and see if this row is the same among all of the rows in the stacked matrix for every column and returns a corresponding vector where 0 denotes each column being all equal and 1 otherwise.
Save this function in MATLAB and call it array_compare.m, then you can call it in MATLAB like so:
A = [28, 28, 43, 43];
B = [28, 43, 43, 28];
C = [28, 28, 43, 43];
Output = array_compare(A, B, C);
We get in MATLAB:
>> Output
Output =
0 1 0 1

Not fancy but will do the trick
Output=nan(length(A),1); %preallocation and check if an index isn't reached
for i=1:length(A)
Output(i)= ~isequal(A(i),B(i),C(i));
end
If someone has an answer without the loop take that, but i feel like performance is not an issue here.

Searching through an unsorted nonuniform pair array for closest entry

I have an array that looks something like this:
[[320, 80], [300, 70], [300, 80], [270, 75], [260, 70], [280, 70]]
That is just a snippet, the actual array is 338 big.
I am trying to find the next logical element in the array based on some input. So for example I feed in two numbers, I.e. 315, 80 The next logical one is 320, 80 if you wanted to find a bigger entry.
I don't want to correlate logical to closest because it depends on whether you want a bigger or smaller element. So I suppose by logical I mean "closest in the required direction"
As an additional requirement the second number should try and remain as close as possible to the entered value OR the first number should try and remain as close as possible to the original number.
I am having issues when it comes to cases such as 275, 70, and I want to find the next smallest. That should be 260, 70 but my implementation keeps picking 280, 70
My current implementation adds the difference between the two numbers and looks for the smallest difference possible. I'm not sure how to enforce a direction.
Python Example (although really I'm looking for a language agnostic solution)
elements = [ [320, 80],
[300, 70],
[300, 80],
[270, 75],
[260, 70],
[280, 70]
]
target = [275, 70]
bestMatch = []
bestDifference = 0
for e in elements:
currentDifference = abs((target[0] - e[0]) - (target[1] - e[1]))
if not bestMatch or currentDifference < bestDifference:
bestMatch = e
bestDifference = currentDifference
print bestMatch

Based on your description and example input I have interpreted that as you should take the min of the two differences, rather than the difference of them. Then you'll pick the element that has the smallest change in either of the two numbers.
To go in the right direction you can just check whether the element you are currently at is larger or smaller than the target
Doing that you'll get the following:
elements = [ [320, 80],
[300, 70],
[300, 80],
[270, 75],
[260, 70],
[280, 70]
]
def nextLogicalElement(target, bigger=True):
bestScore = 0
bestMatch = []
for e in elements:
score = min(abs(target[0] - e[0]), abs(target[1] - e[1]))
if bigger and target[0] > e[0] or not bigger and target[0] < e[0]:
continue
if not bestMatch or score < bestScore:
bestMatch = e
bestScore = score
return bestMatch
Output:
>>> print nextLogicalElement([315, 80], bigger=True)
[320, 80]
>>> print nextLogicalElement([275, 70], bigger=False)
[260, 70]

Weighted random selection from array

I would like to randomly select one element from an array, but each element has a known probability of selection.
All chances together (within the array) sums to 1.
What algorithm would you suggest as the fastest and most suitable for huge calculations?
Example:
id => chance
array[
0 => 0.8
1 => 0.2
]
for this pseudocode, the algorithm in question should on multiple calls statistically return four elements on id 0 for one element on id 1.

Compute the discrete cumulative density function (CDF) of your list -- or in simple terms the array of cumulative sums of the weights. Then generate a random number in the range between 0 and the sum of all weights (might be 1 in your case), do a binary search to find this random number in your discrete CDF array and get the value corresponding to this entry -- this is your weighted random number.

The algorithm is straight forward
rand_no = rand(0,1)
for each element in array
if(rand_num < element.probablity)
select and break
rand_num = rand_num - element.probability

I have found this article to be the most useful at understanding this problem fully.
This stackoverflow question may also be what you're looking for.
I believe the optimal solution is to use the Alias Method (wikipedia).
It requires O(n) time to initialize, O(1) time to make a selection, and O(n) memory.
Here is the algorithm for generating the result of rolling a weighted n-sided die (from here it is trivial to select an element from a length-n array) as take from this article.
The author assumes you have functions for rolling a fair die (floor(random() * n)) and flipping a biased coin (random() < p).
Algorithm: Vose's Alias Method
Initialization:
Create arrays Alias and Prob, each of size n.
Create two worklists, Small and Large.
Multiply each probability by n.
For each scaled probability pi:
If pi < 1, add i to Small.
Otherwise (pi ≥ 1), add i to Large.
While Small and Large are not empty: (Large might be emptied first)
Remove the first element from Small; call it l.
Remove the first element from Large; call it g.
Set Prob[l]=pl.
Set Alias[l]=g.
Set pg := (pg+pl)−1. (This is a more numerically stable option.)
If pg<1, add g to Small.
Otherwise (pg ≥ 1), add g to Large.
While Large is not empty:
Remove the first element from Large; call it g.
Set Prob[g] = 1.
While Small is not empty: This is only possible due to numerical instability.
Remove the first element from Small; call it l.
Set Prob[l] = 1.
Generation:
Generate a fair die roll from an n-sided die; call the side i.
Flip a biased coin that comes up heads with probability Prob[i].
If the coin comes up "heads," return i.
Otherwise, return Alias[i].

Here is an implementation in Ruby:
def weighted_rand(weights = {})
raise 'Probabilities must sum up to 1' unless weights.values.inject(&:+) == 1.0
raise 'Probabilities must not be negative' unless weights.values.all? { |p| p >= 0 }
# Do more sanity checks depending on the amount of trust in the software component using this method,
# e.g. don't allow duplicates, don't allow non-numeric values, etc.
# Ignore elements with probability 0
weights = weights.reject { |k, v| v == 0.0 } # e.g. => {"a"=>0.4, "b"=>0.4, "c"=>0.2}
# Accumulate probabilities and map them to a value
u = 0.0
ranges = weights.map { |v, p| [u += p, v] } # e.g. => [[0.4, "a"], [0.8, "b"], [1.0, "c"]]
# Generate a (pseudo-)random floating point number between 0.0(included) and 1.0(excluded)
u = rand # e.g. => 0.4651073966724186
# Find the first value that has an accumulated probability greater than the random number u
ranges.find { |p, v| p > u }.last # e.g. => "b"
end
How to use:
weights = {'a' => 0.4, 'b' => 0.4, 'c' => 0.2, 'd' => 0.0}
weighted_rand weights
What to expect roughly:
sample = 1000.times.map { weighted_rand weights }
sample.count('a') # 396
sample.count('b') # 406
sample.count('c') # 198
sample.count('d') # 0

An example in ruby
#each element is associated with its probability
a = {1 => 0.25 ,2 => 0.5 ,3 => 0.2, 4 => 0.05}
#at some point, convert to ccumulative probability
acc = 0
a.each { |e,w| a[e] = acc+=w }
#to select an element, pick a random between 0 and 1 and find the first
#cummulative probability that's greater than the random number
r = rand
selected = a.find{ |e,w| w>r }
p selected[0]

This can be done in O(1) expected time per sample as follows.
Compute the CDF F(i) for each element i to be the sum of probabilities less than or equal to i.
Define the range r(i) of an element i to be the interval [F(i - 1), F(i)].
For each interval [(i - 1)/n, i/n], create a bucket consisting of the list of the elements whose range overlaps the interval. This takes O(n) time in total for the full array as long as you are reasonably careful.
When you randomly sample the array, you simply compute which bucket the random number is in, and compare with each element of the list until you find the interval that contains it.
The cost of a sample is O(the expected length of a randomly chosen list) <= 2.

This is a PHP code I used in production:
/**
* #return \App\Models\CdnServer
*/
protected function selectWeightedServer(Collection $servers)
{
if ($servers->count() == 1) {
return $servers->first();
}
$totalWeight = 0;
foreach ($servers as $server) {
$totalWeight += $server->getWeight();
}
// Select a random server using weighted choice
$randWeight = mt_rand(1, $totalWeight);
$accWeight = 0;
foreach ($servers as $server) {
$accWeight += $server->getWeight();
if ($accWeight >= $randWeight) {
return $server;
}
}
}

Ruby solution using the pickup gem:
require 'pickup'
chances = {0=>80, 1=>20}
picker = Pickup.new(chances)
Example:
5.times.collect {
picker.pick(5)
}
gave output:
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1]]

If the array is small, I would give the array a length of, in this case, five and assign the values as appropriate:
array[
0 => 0
1 => 0
2 => 0
3 => 0
4 => 1
]

"Wheel of Fortune" O(n), use for small arrays only:
function pickRandomWeighted(array, weights) {
var sum = 0;
for (var i=0; i<weights.length; i++) sum += weights[i];
for (var i=0, pick=Math.random()*sum; i<weights.length; i++, pick-=weights[i])
if (pick-weights[i]<0) return array[i];
}

the trick could be to sample an auxiliary array with elements repetitions which reflect the probability
Given the elements associated with their probability, as percentage:
h = {1 => 0.5, 2 => 0.3, 3 => 0.05, 4 => 0.05 }
auxiliary_array = h.inject([]){|memo,(k,v)| memo += Array.new((100*v).to_i,k) }
ruby-1.9.3-p194 > auxiliary_array
=> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]
auxiliary_array.sample
if you want to be as generic as possible, you need to calculate the multiplier based on the max number of fractional digits, and use it in the place of 100:
m = 10**h.values.collect{|e| e.to_s.split(".").last.size }.max

Another possibility is to associate, with each element of the array, a random number drawn from an exponential distribution with parameter given by the weight for that element. Then pick the element with the lowest such ‘ordering number’. In this case, the probability that a particular element has the lowest ordering number of the array is proportional to the array element's weight.
This is O(n), doesn't involve any reordering or extra storage, and the selection can be done in the course of a single pass through the array. The weights must be greater than zero, but don't have to sum to any particular value.
This has the further advantage that, if you store the ordering number with each array element, you have the option to sort the array by increasing ordering number, to get a random ordering of the array in which elements with higher weights have a higher probability of coming early (I've found this useful when deciding which DNS SRV record to pick, to decide which machine to query).
Repeated random sampling with replacement requires a new pass through the array each time; for random selection without replacement, the array can be sorted in order of increasing ordering number, and k elements can be read out in that order.
See the Wikipedia page about the exponential distribution (in particular the remarks about the distribution of the minima of an ensemble of such variates) for the proof that the above is true, and also for the pointer towards the technique of generating such variates: if T has a uniform random distribution in [0,1), then Z=-log(1-T)/w (where w is the parameter of the distribution; here the weight of the associated element) has an exponential distribution.
That is:
For each element i in the array, calculate zi = -log(T)/wi (or zi = -log(1-T)/wi), where T is drawn from a uniform distribution in [0,1), and wi is the weight of the I'th element.
Select the element which has the lowest zi.
The element i will be selected with probability wi/(w1+w2+...+wn).
See below for an illustration of this in Python, which takes a single pass through the array of weights, for each of 10000 trials.
import math, random
random.seed()
weights = [10, 20, 50, 20]
nw = len(weights)
results = [0 for i in range(nw)]
n = 10000
while n > 0: # do n trials
smallest_i = 0
smallest_z = -math.log(1-random.random())/weights[0]
for i in range(1, nw):
z = -math.log(1-random.random())/weights[i]
if z < smallest_z:
smallest_i = i
smallest_z = z
results[smallest_i] += 1 # accumulate our choices
n -= 1
for i in range(nw):
print("{} -> {}".format(weights[i], results[i]))
Edit (for history): after posting this, I felt sure I couldn't be the first to have thought of it, and another search with this solution in mind shows that this is indeed the case.
In an answer to a similar question, Joe K suggested this algorithm (and also noted that someone else must have thought of it before).
Another answer to that question, meanwhile, pointed to Efraimidis and Spirakis (preprint), which describes a similar method.
I'm pretty sure, looking at it, that the Efraimidis and Spirakis is in fact the same exponential-distribution algorithm in disguise, and this is corroborated by a passing remark in the Wikipedia page about Reservoir sampling that ‘[e]quivalently, a more numerically stable formulation of this algorithm’ is the exponential-distribution algorithm above. The reference there is to a sequence of lecture notes by Richard Arratia; the relevant property of the exponential distribution is mentioned in Sect.1.3 (which mentions that something similar to this is a ‘familiar fact’ in some circles), but not its relationship to the Efraimidis and Spirakis algorithm.

I would imagine that numbers greater or equal than 0.8 but less than 1.0 selects the third element.
In other terms:
x is a random number between 0 and 1
if 0.0 >= x < 0.2 : Item 1
if 0.2 >= x < 0.8 : Item 2
if 0.8 >= x < 1.0 : Item 3

I am going to improve on https://stackoverflow.com/users/626341/masciugo answer.
Basically you make one big array where the number of times an element shows up is proportional to the weight.
It has some drawbacks.
The weight might not be integer. Imagine element 1 has probability of pi and element 2 has probability of 1-pi. How do you divide that? Or imagine if there are hundreds of such elements.
The array created can be very big. Imagine if least common multiplier is 1 million, then we will need an array of 1 million element in the array we want to pick.
To counter that, this is what you do.
Create such array, but only insert an element randomly. The probability that an element is inserted is proportional the the weight.
Then select random element from usual.
So if there are 3 elements with various weight, you simply pick an element from an array of 1-3 elements.
Problems may arise if the constructed element is empty. That is it just happens that no elements show up in the array because their dice roll differently.
In which case, I propose that the probability an element is inserted is p(inserted)=wi/wmax.
That way, one element, namely the one that has the highest probability, will be inserted. The other elements will be inserted by the relative probability.
Say we have 2 objects.
element 1 shows up .20% of the time.
element 2 shows up .40% of the time and has the highest probability.
In thearray, element 2 will show up all the time. Element 1 will show up half the time.
So element 2 will be called 2 times as many as element 1. For generality all other elements will be called proportional to their weight. Also the sum of all their probability are 1 because the array will always have at least 1 element.

I wrote an implementation in C#:
https://github.com/cdanek/KaimiraWeightedList
O(1) gets (fast!), O(n) recalculates, O(n) memory use.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight