In R, how do I extract the column number of a certain row in an array that's closest to a certain value? - arrays

Given:
data(veteran)
library(survival)
veteran$prognostic_indicator <- 0
veteran$prognostic_indicator[veteran$karno<50] <- 1
model <- coxph(Surv(time,status)~age+prognostic_indicator,data=veteran)
library(obsSens)
object <- obsSensSCC(model, which = "prognostic_indicator", g0 = seq(1,10,0.01),p0 = c(0.05,0.1,0.2,0.3,0.4), p1 = seq(0, 1, 0.05), logHaz = FALSE, method = "approx")
I can extract the vector:
object$lcl[21,1,1:901]
Which is ordered by descending values. I want to extract the "name" of the number which is closest to 1, but above it. In that case I want to extract the name "2.69" or position 170 since the corresponding number is 1.0001292. The number at position 2.70 is 0.9968844 and thus too low.
How do I extract the position (or name) in a vector of descending values where the number is nearest the value 1.0, but above?

If you create a new vector with that value, then identify the first element that satisfies the condition, then move one back in the sequence.
obj <- object$lcl[21,1,1:901]
obj[which(obj< 1)[1] -1]
# 2.69
#1.000129
The other way would be to work on the reversed vector. Then you do not need to backtrack:
> rev(obj)[which(rev(obj) > 1)[1] ]
2.69
1.000129

Here's another way in addition to DWin's cleaner method.
which.min(subset(object$lcl[21,1,1:901], object$lcl[21,1,1:901] > 1) - 1)

Related

Defining pointers vs accessing values in an Array in Ruby

Let's say I'm passing the following array to a method:
input = [1,9,10,3,2,3,11,0,99,30,40,50]
I need to work with sets of 4 numbers from that array as follows:
OPCODE = input[0] # first of the 4 numbers
pos1_pointer = will always be opcode position + 1 position to the right
pos2_pointer = will always be opcode position + 2 positions to the right
output = will always be opcode position + 3 positions to the right
pos1 and pos2 numbers are actually pointers to the actual values (eg. pos1_pointer = 9 (one position from opcode), actual value 30 (position 9 in the array).
How do I define the pointer based on where the OPCODE is sitting?
I've tried:
pos1_pointer = input[input[opcode] + 1] # points to 40 which is wrong (because opcode = input[0] which is 1, and it sums 9 positions to that, position 10 being value 40)
pos1_pointer = input[opcode + 1] # is also wrong because it assigns a value of 2 to it (it sums 1 to the value of opcode which is 1)
Use Iterators Rather Than Indexing
In Ruby, if you want to work with subsets of an array, there's a method for that! Array#each_slice can be used to feed your slices directly into a method, or to deconstruct them further. For example:
input = [1, 9, 10, 3, 2, 3, 11, 0, 99, 30, 40, 50]
input.each_slice(4) do |slice|
opcode, pos1, pos2, output = slice
pp slice
end
You could replace pp slice with a call to your method(s) of choice, passing in either the whole slice or the deconstructed values as positional or keyword arguments. Let Ruby manage the indexing so you can focus on more important things.
Why not try this. The first argument, arr, is the array. The second argument, i, is the index at which you want to start from.
The method bellow will take the array and the index and then return an array starting from the whichever index, in this case 0, and then return everything between index 0 and index 0+3.
Double dots x..y means go from range x up to y. Triple dot x...y means start from x and go up to BUT exclude y.
def four(arr, i)
return arr[i..i+3]
end
print four([1,9,10,3,2,3,11,0,99,30,40,50],0)
# input = [1,9,10,3,2,3,11,0,99,30,40,50]
# output = [1,9,10,3]

Argmax of a multidimensional array along a subset of dimensions in Matlab

Say, Y is a 7-dimensional array, and I need an efficient way to maximize it along the last 3 dimensions, that will work on GPU.
As a result I need a 4-dimensional array with maximal values of Y and three 4-dimensional arrays with the indices of these values in the last three dimensions.
I can do
[Y7, X7] = max(Y , [], 7);
[Y6, X6] = max(Y7, [], 6);
[Y5, X5] = max(Y6, [], 5);
Then I have already found the values (Y5) and the indices along the 5th dimension (X5). But I still need indices along the 6th and 7th dimensions.
Here's a way to do it. Let N denote the number of dimensions along which to maximize.
Reshape Y to collapse the last N dimensions into one.
Maximize along the collapsed dimensions. This gives argmax as a linear index over those dimensions.
Unroll the linear index into N subindices, one for each dimension.
The following code works for any number of dimensions (not necessarily 7 and 3 as in your example). To achieve that, it handles the size of Y generically and uses a comma-separated list obtained from a cell array to get N outputs from sub2ind.
Y = rand(2,3,2,3,2,3,2); % example 7-dimensional array
N = 3; % last dimensions along which to maximize
D = ndims(Y);
sz = size(Y);
[~, ind] = max(reshape(Y, [sz(1:D-N) prod(sz(D-N+1:end))]), [], D-N+1);
sub = cell(1,N);
[sub{:}] = ind2sub(sz(D-N+1:D), ind);
As a check, after running the above code, observe for example Y(2,3,1,2,:) (shown as a row vector for convenience):
>> reshape(Y(2,3,1,2,:), 1, [])
ans =
0.5621 0.4352 0.3672 0.9011 0.0332 0.5044 0.3416 0.6996 0.0610 0.2638 0.5586 0.3766
The maximum is seen to be 0.9011, which occurs at the 4th position (where "position" is defined along the N=3 collapsed dimensions). In fact,
>> ind(2,3,1,2)
ans =
4
>> Y(2,3,1,2,ind(2,3,1,2))
ans =
0.9011
or, in terms of the N=3 subindices,
>> Y(2,3,1,2,sub{1}(2,3,1,2),sub{2}(2,3,1,2),sub{3}(2,3,1,2))
ans =
0.9011

Is there a way to reshape an array that does not maintain the original size (or a convenient work-around)?

As a simplified example, suppose I have a dataset composed of 40 sorted values. The values of this example are all integers, though this is not necessarily the case for the actual dataset.
import numpy as np
data = np.linspace(1,40,40)
I am trying to find the maximum value inside the dataset for certain window sizes. The formula to compute the window sizes yields a pattern that is best executed with arrays (in my opinion). For simplicity sake, let's say the indices denoting the window sizes are a list [1,2,3,4,5]; this corresponds to window sizes of [2,4,8,16,32] (the pattern is 2**index).
## this code looks long because I've provided docstrings
## just in case the explanation was unclear
def shapeshifter(num_col, my_array=data):
"""
This function reshapes an array to have 'num_col' columns, where
'num_col' corresponds to index.
"""
return my_array.reshape(-1, num_col)
def looper(num_col, my_array=data):
"""
This function calls 'shapeshifter' and returns a list of the
MAXimum values of each row in 'my_array' for 'num_col' columns.
The length of each row (or the number of columns per row if you
prefer) denotes the size of each window.
EX:
num_col = 2
==> window_size = 2
==> check max( data[1], data[2] ),
max( data[3], data[4] ),
max( data[5], data[6] ),
.
.
.
max( data[39], data[40] )
for k rows, where k = len(my_array)//num_col
"""
my_array = shapeshifter(num_col=num_col, my_array=data)
rows = [my_array[index] for index in range(len(my_array))]
res = []
for index in range(len(rows)):
res.append( max(rows[index]) )
return res
So far, the code is fine. I checked it with the following:
check1 = looper(2)
check2 = looper(4)
print(check1)
>> [2.0, 4.0, ..., 38.0, 40.0]
print(len(check1))
>> 20
print(check2)
>> [4.0, 8.0, ..., 36.0, 40.0]
print(len(check2))
>> 10
So far so good. Now here is my problem.
def metalooper(col_ls, my_array=data):
"""
This function calls 'looper' - which calls
'shapeshifter' - for every 'col' in 'col_ls'.
EX:
j_list = [1,2,3,4,5]
==> col_ls = [2,4,8,16,32]
==> looper(2), looper(4),
looper(8), ..., looper(32)
==> shapeshifter(2), shapeshifter(4),
shapeshifter(8), ..., shapeshifter(32)
such that looper(2^j) ==> shapeshifter(2^j)
for j in j_list
"""
res = []
for col in col_ls:
res.append(looper(num_col=col))
return res
j_list = [2,4,8,16,32]
check3 = metalooper(j_list)
Running the code above provides this error:
ValueError: total size of new array must be unchanged
With 40 data points, the array can be reshaped into 2 columns of 20 rows, or 4 columns of 10 rows, or 8 columns of 5 rows, BUT at 16 columns, the array cannot be reshaped without clipping data since 40/16 ≠ integer. I believe this is the problem with my code, but I do not know how to fix it.
I am hoping there is a way to cutoff the last values in each row that do not fit in each window. If this is not possible, I am hoping I can append zeroes to fill the entries that maintain the size of the original array, so that I can remove the zeroes after. Or maybe even some complicated if - try - break block. What are some ways around this problem?
I think this will give you what you want in one step:
def windowFunc(a, window, f = np.max):
return np.array([f(i) for i in np.split(a, range(window, a.size, window))])
with default f, that will give you a array of maximums for your windows.
Generally, using np.split and range, this will let you split into a (possibly ragged) list of arrays:
def shapeshifter(num_col, my_array=data):
return np.split(my_array, range(num_col, my_array.size, num_col))
You need a list of arrays because a 2D array can't be ragged (every row needs the same number of columns)
If you really want to pad with zeros, you can use np.lib.pad:
def shapeshifter(num_col, my_array=data):
return np.lib.pad(my_array, (0, num_col - my.array.size % num_col), 'constant', constant_values = 0).reshape(-1, num_col)
Warning:
It is also technically possible to use, for example, a.resize(32,2) which will create an ndArray padded with zeros (as you requested). But there are some big caveats:
You would need to calculate the second axis because -1 tricks don't work with resize.
If the original array a is referenced by anything else, a.resize will fail with the following error:
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
The resize function (i.e. np.resize(a)) is not equivalent to a.resize, as instead of padding with zeros it will loop back to the beginning.
Since you seem to want to reference a by a number of windows, a.resize isn't very useful. But it's a rabbit hole that's easy to fall into.
EDIT:
Looping through a list is slow. If your input is long and windows are small, the windowFunc above will bog down in the for loops. This should be more efficient:
def windowFunc2(a, window, f = np.max):
tail = - (a.size % window)
if tail == 0:
return f(a.reshape(-1, window), axis = -1)
else:
body = a[:tail].reshape(-1, window)
return np.r_[f(body, axis = -1), f(a[tail:])]
Here's a generalized way to reshape with truncation:
def reshape_and_truncate(arr, shape):
desired_size_factor = np.prod([n for n in shape if n != -1])
if -1 in shape: # implicit array size
desired_size = arr.size // desired_size_factor * desired_size_factor
else:
desired_size = desired_size_factor
return arr.flat[:desired_size].reshape(shape)
Which your shapeshifter could use in place of reshape

Cost efficient algorithm to group array of sets

Can anyone help me out with some effectively good algorithm to carry out the following task:
I got a file of unique row numbers with an array of integer numbers per row.
I need to check each single row for the values of an array that show up in different rows and put them in one group. Here is an example how it may look:
Row Number; Array of data[...]
L1; [1,2,3,4,5]
L2; [2,3]
L3: [8,9]
L4: [6]
L5; [7]
L6; [5,6]
Based on these input data, I expect the algorithm to produce the result:
Group N; Array of rows [...]
G1; [L1,L2,L4,L6]
G2; [ L3]
G3; [ L5]
P.S the original dataset accounts for hundreds of millions of rows and can contain close to a million of array elements... time efficiency is a concern.
Thanks
I believe this is equivalent to finding connected components of a graph in which:
The vertices correspond to the initial row numbers
There is an edge between two vertices x and y if there is a common element in the array for x and the array for y
This can be done efficiently using a disjoint set data structure as follows:
MakeSet(d) for each of the data values d (1,2,3,4,5,6,7,8,9 in your example)
For each row with array A, call join(A[0],A[i]) for each choice of i.
This will produce a set for each connected component. You can then produce your output array by iterating over the rows a second time:
set output to an array of empty lists
for each row r
A = array for row r
id = find(A[0])
output[id].append(r)
Example Python Code
from collections import defaultdict
data=[[1,2,3,4,5],
[2,3],
[8,9],
[6],
[7],
[5,6]]
N=max(max(A) for A in data)
rank=[0]*(N+1)
parent=range(N+1)
def Find(x):
"""Find representative of connected component"""
if parent[x] != x:
parent[x] = Find(parent[x])
return parent[x]
def Union(x,y):
"""Merge sets containing elements x and y"""
x = Find(x)
y = Find(y)
if x == y:
return
if rank[x]<rank[y]:
parent[x] = y
elif rank[x]>rank[y]:
parent[y] = x
else:
parent[y] = x
rank[x] += 1
# First join all data
for row,A in enumerate(data):
for x in A:
Union(A[0],x)
# Then place rows into sets
D=defaultdict(list)
for row,A in enumerate(data):
D[Find(A[0])].append(row+1)
# Then display output
for i,L in enumerate(D.values()):
print i+1,L
Running this code prints the output:
1 [3]
2 [1, 2, 4, 6]
3 [5]

Weighted random selection from array

I would like to randomly select one element from an array, but each element has a known probability of selection.
All chances together (within the array) sums to 1.
What algorithm would you suggest as the fastest and most suitable for huge calculations?
Example:
id => chance
array[
0 => 0.8
1 => 0.2
]
for this pseudocode, the algorithm in question should on multiple calls statistically return four elements on id 0 for one element on id 1.
Compute the discrete cumulative density function (CDF) of your list -- or in simple terms the array of cumulative sums of the weights. Then generate a random number in the range between 0 and the sum of all weights (might be 1 in your case), do a binary search to find this random number in your discrete CDF array and get the value corresponding to this entry -- this is your weighted random number.
The algorithm is straight forward
rand_no = rand(0,1)
for each element in array
if(rand_num < element.probablity)
select and break
rand_num = rand_num - element.probability
I have found this article to be the most useful at understanding this problem fully.
This stackoverflow question may also be what you're looking for.
I believe the optimal solution is to use the Alias Method (wikipedia).
It requires O(n) time to initialize, O(1) time to make a selection, and O(n) memory.
Here is the algorithm for generating the result of rolling a weighted n-sided die (from here it is trivial to select an element from a length-n array) as take from this article.
The author assumes you have functions for rolling a fair die (floor(random() * n)) and flipping a biased coin (random() < p).
Algorithm: Vose's Alias Method
Initialization:
Create arrays Alias and Prob, each of size n.
Create two worklists, Small and Large.
Multiply each probability by n.
For each scaled probability pi:
If pi < 1, add i to Small.
Otherwise (pi ≥ 1), add i to Large.
While Small and Large are not empty: (Large might be emptied first)
Remove the first element from Small; call it l.
Remove the first element from Large; call it g.
Set Prob[l]=pl.
Set Alias[l]=g.
Set pg := (pg+pl)−1. (This is a more numerically stable option.)
If pg<1, add g to Small.
Otherwise (pg ≥ 1), add g to Large.
While Large is not empty:
Remove the first element from Large; call it g.
Set Prob[g] = 1.
While Small is not empty: This is only possible due to numerical instability.
Remove the first element from Small; call it l.
Set Prob[l] = 1.
Generation:
Generate a fair die roll from an n-sided die; call the side i.
Flip a biased coin that comes up heads with probability Prob[i].
If the coin comes up "heads," return i.
Otherwise, return Alias[i].
Here is an implementation in Ruby:
def weighted_rand(weights = {})
raise 'Probabilities must sum up to 1' unless weights.values.inject(&:+) == 1.0
raise 'Probabilities must not be negative' unless weights.values.all? { |p| p >= 0 }
# Do more sanity checks depending on the amount of trust in the software component using this method,
# e.g. don't allow duplicates, don't allow non-numeric values, etc.
# Ignore elements with probability 0
weights = weights.reject { |k, v| v == 0.0 } # e.g. => {"a"=>0.4, "b"=>0.4, "c"=>0.2}
# Accumulate probabilities and map them to a value
u = 0.0
ranges = weights.map { |v, p| [u += p, v] } # e.g. => [[0.4, "a"], [0.8, "b"], [1.0, "c"]]
# Generate a (pseudo-)random floating point number between 0.0(included) and 1.0(excluded)
u = rand # e.g. => 0.4651073966724186
# Find the first value that has an accumulated probability greater than the random number u
ranges.find { |p, v| p > u }.last # e.g. => "b"
end
How to use:
weights = {'a' => 0.4, 'b' => 0.4, 'c' => 0.2, 'd' => 0.0}
weighted_rand weights
What to expect roughly:
sample = 1000.times.map { weighted_rand weights }
sample.count('a') # 396
sample.count('b') # 406
sample.count('c') # 198
sample.count('d') # 0
An example in ruby
#each element is associated with its probability
a = {1 => 0.25 ,2 => 0.5 ,3 => 0.2, 4 => 0.05}
#at some point, convert to ccumulative probability
acc = 0
a.each { |e,w| a[e] = acc+=w }
#to select an element, pick a random between 0 and 1 and find the first
#cummulative probability that's greater than the random number
r = rand
selected = a.find{ |e,w| w>r }
p selected[0]
This can be done in O(1) expected time per sample as follows.
Compute the CDF F(i) for each element i to be the sum of probabilities less than or equal to i.
Define the range r(i) of an element i to be the interval [F(i - 1), F(i)].
For each interval [(i - 1)/n, i/n], create a bucket consisting of the list of the elements whose range overlaps the interval. This takes O(n) time in total for the full array as long as you are reasonably careful.
When you randomly sample the array, you simply compute which bucket the random number is in, and compare with each element of the list until you find the interval that contains it.
The cost of a sample is O(the expected length of a randomly chosen list) <= 2.
This is a PHP code I used in production:
/**
* #return \App\Models\CdnServer
*/
protected function selectWeightedServer(Collection $servers)
{
if ($servers->count() == 1) {
return $servers->first();
}
$totalWeight = 0;
foreach ($servers as $server) {
$totalWeight += $server->getWeight();
}
// Select a random server using weighted choice
$randWeight = mt_rand(1, $totalWeight);
$accWeight = 0;
foreach ($servers as $server) {
$accWeight += $server->getWeight();
if ($accWeight >= $randWeight) {
return $server;
}
}
}
Ruby solution using the pickup gem:
require 'pickup'
chances = {0=>80, 1=>20}
picker = Pickup.new(chances)
Example:
5.times.collect {
picker.pick(5)
}
gave output:
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1]]
If the array is small, I would give the array a length of, in this case, five and assign the values as appropriate:
array[
0 => 0
1 => 0
2 => 0
3 => 0
4 => 1
]
"Wheel of Fortune" O(n), use for small arrays only:
function pickRandomWeighted(array, weights) {
var sum = 0;
for (var i=0; i<weights.length; i++) sum += weights[i];
for (var i=0, pick=Math.random()*sum; i<weights.length; i++, pick-=weights[i])
if (pick-weights[i]<0) return array[i];
}
the trick could be to sample an auxiliary array with elements repetitions which reflect the probability
Given the elements associated with their probability, as percentage:
h = {1 => 0.5, 2 => 0.3, 3 => 0.05, 4 => 0.05 }
auxiliary_array = h.inject([]){|memo,(k,v)| memo += Array.new((100*v).to_i,k) }
ruby-1.9.3-p194 > auxiliary_array
=> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]
auxiliary_array.sample
if you want to be as generic as possible, you need to calculate the multiplier based on the max number of fractional digits, and use it in the place of 100:
m = 10**h.values.collect{|e| e.to_s.split(".").last.size }.max
Another possibility is to associate, with each element of the array, a random number drawn from an exponential distribution with parameter given by the weight for that element. Then pick the element with the lowest such ‘ordering number’. In this case, the probability that a particular element has the lowest ordering number of the array is proportional to the array element's weight.
This is O(n), doesn't involve any reordering or extra storage, and the selection can be done in the course of a single pass through the array. The weights must be greater than zero, but don't have to sum to any particular value.
This has the further advantage that, if you store the ordering number with each array element, you have the option to sort the array by increasing ordering number, to get a random ordering of the array in which elements with higher weights have a higher probability of coming early (I've found this useful when deciding which DNS SRV record to pick, to decide which machine to query).
Repeated random sampling with replacement requires a new pass through the array each time; for random selection without replacement, the array can be sorted in order of increasing ordering number, and k elements can be read out in that order.
See the Wikipedia page about the exponential distribution (in particular the remarks about the distribution of the minima of an ensemble of such variates) for the proof that the above is true, and also for the pointer towards the technique of generating such variates: if T has a uniform random distribution in [0,1), then Z=-log(1-T)/w (where w is the parameter of the distribution; here the weight of the associated element) has an exponential distribution.
That is:
For each element i in the array, calculate zi = -log(T)/wi (or zi = -log(1-T)/wi), where T is drawn from a uniform distribution in [0,1), and wi is the weight of the I'th element.
Select the element which has the lowest zi.
The element i will be selected with probability wi/(w1+w2+...+wn).
See below for an illustration of this in Python, which takes a single pass through the array of weights, for each of 10000 trials.
import math, random
random.seed()
weights = [10, 20, 50, 20]
nw = len(weights)
results = [0 for i in range(nw)]
n = 10000
while n > 0: # do n trials
smallest_i = 0
smallest_z = -math.log(1-random.random())/weights[0]
for i in range(1, nw):
z = -math.log(1-random.random())/weights[i]
if z < smallest_z:
smallest_i = i
smallest_z = z
results[smallest_i] += 1 # accumulate our choices
n -= 1
for i in range(nw):
print("{} -> {}".format(weights[i], results[i]))
Edit (for history): after posting this, I felt sure I couldn't be the first to have thought of it, and another search with this solution in mind shows that this is indeed the case.
In an answer to a similar question, Joe K suggested this algorithm (and also noted that someone else must have thought of it before).
Another answer to that question, meanwhile, pointed to Efraimidis and Spirakis (preprint), which describes a similar method.
I'm pretty sure, looking at it, that the Efraimidis and Spirakis is in fact the same exponential-distribution algorithm in disguise, and this is corroborated by a passing remark in the Wikipedia page about Reservoir sampling that ‘[e]quivalently, a more numerically stable formulation of this algorithm’ is the exponential-distribution algorithm above. The reference there is to a sequence of lecture notes by Richard Arratia; the relevant property of the exponential distribution is mentioned in Sect.1.3 (which mentions that something similar to this is a ‘familiar fact’ in some circles), but not its relationship to the Efraimidis and Spirakis algorithm.
I would imagine that numbers greater or equal than 0.8 but less than 1.0 selects the third element.
In other terms:
x is a random number between 0 and 1
if 0.0 >= x < 0.2 : Item 1
if 0.2 >= x < 0.8 : Item 2
if 0.8 >= x < 1.0 : Item 3
I am going to improve on https://stackoverflow.com/users/626341/masciugo answer.
Basically you make one big array where the number of times an element shows up is proportional to the weight.
It has some drawbacks.
The weight might not be integer. Imagine element 1 has probability of pi and element 2 has probability of 1-pi. How do you divide that? Or imagine if there are hundreds of such elements.
The array created can be very big. Imagine if least common multiplier is 1 million, then we will need an array of 1 million element in the array we want to pick.
To counter that, this is what you do.
Create such array, but only insert an element randomly. The probability that an element is inserted is proportional the the weight.
Then select random element from usual.
So if there are 3 elements with various weight, you simply pick an element from an array of 1-3 elements.
Problems may arise if the constructed element is empty. That is it just happens that no elements show up in the array because their dice roll differently.
In which case, I propose that the probability an element is inserted is p(inserted)=wi/wmax.
That way, one element, namely the one that has the highest probability, will be inserted. The other elements will be inserted by the relative probability.
Say we have 2 objects.
element 1 shows up .20% of the time.
element 2 shows up .40% of the time and has the highest probability.
In thearray, element 2 will show up all the time. Element 1 will show up half the time.
So element 2 will be called 2 times as many as element 1. For generality all other elements will be called proportional to their weight. Also the sum of all their probability are 1 because the array will always have at least 1 element.
I wrote an implementation in C#:
https://github.com/cdanek/KaimiraWeightedList
O(1) gets (fast!), O(n) recalculates, O(n) memory use.

Resources