Propagating a selection on a subset of awkward-array back up - awkward-array

Is there a better way to do this logic? I want to propagate a selection from a lower-level selection available only on a subset of inner elements upwards
Specifically, I am looking to have an event level cut for oppositely charged muon-electron pair.
req_mu = (events.Muon.counts >= 1)
req_ele = (events.Electron.counts >= 1)
req = req_ele & req_mu
def propagate_up(subset, selection):
'''
subset: bool array slice on upper level
'''
dummy = np.zeros_like(subset)
dummy[subset] = selection
return dummy
req_opposite_charge = propagate_up(req, events[req].Muon[:, 0].charge * events[req].Electron[:, 0].charge == -1)

The easiest way to select a pair from separate collections is with cross, e.g.
good_el = electrons[electrons.pt > 10]
good_mu = muons[muons.pt > 10]
pairs = good_el.cross(good_mu)
# filter our pairs to have opposite charge (jagged mask)
pairs = pairs[pairs.i0.charge == -1 * pairs.i1.charge]
# filter events to have exactly one good pair
pairs = pairs[pairs.counts == 1]
# get the two leptons as a now flat array
el, mu = pairs.i0[:, 0], pairs.i1[:, 0]

Related

How many random requests do I need to make to a set of records to get 80% of the records?

Suppose I have an array of 100_000 records ( this is Ruby code, but any language will do)
ary = ['apple','orange','dog','tomato', 12, 17,'cat','tiger' .... ]
results = []
I can only make random calls to the array ( I cannot traverse it in any way)
results << ary.sample
# in ruby this will pull a random record from the array, and
# push into results array
How many random calls like that, do I need to make, to get least 80% of records from ary. Or expressed another way - what should be the size of results so that results.uniq will contain around 80_000 records from ary.
From my rusty memory of Stats class in college, I think it's needs to be 2*result set size = or around 160_000 requests ( assuming random function is random, and there is no some other underlying issue) . My testing seems to confirm this.
ary = [*1..100_000];
result = [];
160_000.times{result << ary.sample};
result.uniq.size # ~ 80k
This is stats, so we are talking about probabilities, not guaranteed results. I just need a reasonable guess.
So the question really, what's the formula to confirm this?
I would just perform a quick simulation study. In R,
N = 1e5
# Simulate 300 times
s = replicate(300, sample(x = 1:N, size = 1.7e5, replace = TRUE))
Now work out when you hit your target
f = function(i) which(i == unique(i)[80000])[1]
stats = apply(s, 2, f)
To get
summary(stats)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 159711 160726 161032 161037 161399 162242
So in 300 trials, the maximum number of simulations needed was 162242 with an average number of 161032.
With Fisher-Yates shuffle you could get 80K items from exactly 80K random calls
Have no knowledge of Ruby, but looking at https://gist.github.com/mindplace/3f3a08299651ebf4ab91de3d83254fbc and modifying it
def shuffle(array, counter)
#counter = array.length - 1
while counter > 0
# item selected from the unshuffled part of array
random_index = rand(counter)
# swap the items at those locations
array[counter], array[random_index] = array[random_index], array[counter]
# de-increment counter
counter -= 1
end
array
end
indices = [0, 1, 2, 3, ...] # up to 99999
counter = 80000
shuffle(indices, 80000)
i = 0
while counter > 0
res[i] = ary[indices[i]]
counter -= 1
i += 1
UPDATE
Packing sampled indices into custom RNG (bear with me, know nothing about Ruby)
class FYRandom
_indices = indices
_max = 80000
_idx = 0
def rand()
if _idx > _max
return -1.0
r = _indices[idx]
_idx += 1
return r.to_f / max.to_f
end
end
And code for sample would be
rng = FYRandom.new
results << ary.sample(random: rng)

Mapping An Array To Logical Array In Matlab

Let's say an array a=[1,3,8,10,11,15,24], and a logical array b=[1,0,0,1,1,1,0,0,0,1,1,1,1,1], how to get [1,1,3,1,3,8,1,3,8,1,2,3,8,10], see where logic becomes 1 in b, the array index of a resets so it starts from the beginning, also the same where the logic becomes 0 a array starts from beginning and continues as 1,3,8,10..etc.
you can use diff to find where b changes, then use arrayfun to generate indexes for a:
a=[1,3,8,10,11,15,24];
b=[1,0,0,1,1,1,0,0,0,1,1,1,1,1];
idxs = find(diff(b) ~= 0) + 1; % where b changes
startidxs = [1 idxs];
endidxs = [idxs - 1,length(b)];
% indexes for a
ia = cell2mat(arrayfun(#(x,y) 1:(y-x+1),startidxs,endidxs,'UniformOutput',0));
res = a(ia);
You can use a for loop and track the state (0 or 1) of the b array:
a = [1,3,8,10,11,15,24];
b = [1,0,0,1,1,1,0,0,0,1,1,1,1,1];
final = []
index = 0;
state = b(1);
for i = 1:numel(b)
if b(i) ~= state
state = b(i);
index = 1;
else
index = index+1;
end
final = [final, a(index) ];
end

adding array to an existing array

I perform calculations using 5 fold cross validation. I want to collect all the predictions in one array in order to avoid statistic calculation per fold. I have tried doing it by extending array of predictions by adding array to an existing array. For example:
for train_index, test_index in skf:
fold += 1
x_train, x_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
rf.fit(x_train, y_train)
predicted = rf.predict_proba(x_test)
round_predicted = rf.predict(x_test)
if fold>1:
allFolds_pred = np.concatenate((predicted, allFolds_pred), axis=1)
allFolds_rpred = np.concatenate((round_predicted, allFolds_rpred), axis=1)
allFolds_y = np.concatenate((y_test, allFolds_y), axis=1)
else:
allFolds_pred = predicted
allFolds_rpred = round_predicted
allFolds_y = y_test
fpr, tpr, _ = roc_curve(allFolds_y, llFolds_pred[:,1])
roc_auc = auc(fpr, tpr)
cm=confusion_matrix(allFolds_y, allFolds_rpred, labels=[0, 1])
Calculate statistics.
However it is not working. What is the best way to proceed? Is there any better way to do the same?

Difference Between Arrays Preserving Duplicate Elements in Ruby

I'm quite new to Ruby, and was hoping to get the difference between two arrays.
I am aware of the usual method:
a = [...]
b = [...]
difference = (a-b)+(b-a)
But the problem is that this is computing the set difference, because in ruby, the statement (a-b) defines the set compliment of a, relative to b.
This means [1,2,2,3,4,5,5,5,5] - [5] = [1,2,2,3,4], because it takes out all of occurrences of 5 in the first set, not just one, behaving like a filter on the data.
I want it to remove differences only once, so for example, the difference of [1,2,2,3,4,5,5,5,5], and [5] should be [1,2,2,3,4,5,5,5], removing just one 5.
I could do this iteratively:
a = [...]
b = [...]
complimentAbyB = a.dup
complimentBbyA = b.dup
b.each do |bValue|
complimentAbyB.delete_at(complimentAbyB.index(bValue) || complimentAbyB.length)
end
a.each do |aValue|
complimentBbyA.delete_at(complimentBbyA.index(aValue) || complimentBbyA.length)
end
difference = complimentAbyB + complimentBbyA
But this seems awfully verbose and inefficient. I have to imagine there is a more elegant solution than this. So my question is basically, what is the most elegant way of finding the difference of two arrays, where if one array has more occurrences of a single element then the other, they will not all be removed?
I recently proposed that such a method, Ruby#difference, be added to Ruby's core. For your example, it would be written:
a = [1,2,2,3,4,5,5,5,5]
b = [5]
a.difference b
#=> [1,2,2,3,4,5,5,5]
The example I've often given is:
a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]
a.difference b
#=> [1, 3, 2, 2]
I first suggested this method in my answer here. There you will find an explanation and links to other SO questions where I proposed use of the method.
As shown at the links, the method could be written as follows:
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
.....
ha = a.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
hb = b.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
ha.merge(hb){|_, va, vb| (va - vb).abs}.inject([]){|a, (k, v)| a + [k] * v}
ha and hb are hashes with the element in the original array as the key and the number of occurrences as the value. The following merge puts them together and creates a hash whose value is the difference of the number of occurrences in the two arrays. inject converts that to an array that has each element repeated by the number given in the hash.
Another way:
ha = a.group_by(&:itself)
hb = b.group_by(&:itself)
ha.merge(hb){|k, va, vb| [k] * (va.length - vb.length).abs}.values.flatten

What feature of table.sort allows it to sort an array by the values of an associative array?

I can't wrap my head around why record[x] would match up it's key name to a string in an array, then take it's value as the item to index the string by.. Is this some special feature of table.sort?
list = {"b", "c", "a"}
record = {a = 1, b = 2, c = 3}
table.sort(list, function (x, y) return record[x] < record[y] end)
for _, v in ipairs(list) do print(v) end
>a
>b
>c
The statement record = {a = 1, b = 2, c = 3} is equivalent to
record = {}
record["a"] = 1
record["b"] = 2
record["c"] = 3
This should make it clear how the values in list map to keys in record.
(I'll factor out the comparison function to make the explanation easier.)
list = {"b", "c", "a"}
record = {a = 1, b = 2, c = 3}
local function compare(x, y)
return record[x] < record[y]
end
table.sort(list, compare)
In the function compare, x and y may be any two elements of list. table.sort must call this function many times to figure out which elements are considered less than others. Without compare, table.sort would just use the < operator. As you can see in the code, compare refers to the record table when deciding whether to return true or false. table.sort merely calls compare without knowing anything about record.

Resources