Pairwise comparison inside array in Julia

Pairwise comparison inside array in Julia - arrays

Suppose we have a 6-element array in Julia, for example, Int64[1,1,2,3,3,4]. If we want to compare two arrays elementwise, we know we can use ".=="; but my goal is to do all the pairwise comparisons inside the above array: if the elements (i,j) of each pair are equal, I set it to 1 (or true), but if they are different, I set it to 0. All the pairwise comparisons are stored in a 6x6 matrix. Is it possible to do that in Julia without the loop for? Thank you.

You can use the fact that broadcasting will compare rows to columns to simply do a comparison between the array and its transpose:
julia> A = [1,1,2,3,3,4]
6-element Array{Int64,1}:
1
1
2
3
3
4
julia> A .== A'
6×6 BitArray{2}:
true true false false false false
true true false false false false
false false true false false false
false false false true true false
false false false true true false
false false false false false true

Related

Julia, integer vs boolean results from selection of instances in two arrays

I've got my 3d array called Pop. I want find out how many times two different conditions are met, and they both work for me independently but I can't put the two together.
Pop[end, :, 1] .== 3
works ok, produces an integer vector of 1's and 0's which is correct. Also
Pop[end-1, :, 1] .== 4
works, again returns integer vector, however when I put the two together as:
count(Pop[end, :, 1] .== 3 && Pop[end-1, :, 1] .== 4)
I get this error:
ERROR: TypeError: non-boolean (BitArray{1}) used in boolean context
Which sort of helps, can see that the two numeric arrays can not be compared in a boolean way. What is wrong with my syntax to get the count of the number of times both of the conditions are met. Simple I know but I can't get it! Thx. J

&& is a short-circuiting boolean, which means that if the first term is true, the rest isn't evaluated (see documentation). It also means it's only for a singular booleans and it cannot be broadcasted over an array.
& is the bitwise AND operator (documentation), that you want to use here, because it can be broadcasted over arrays with the syntax .&, the same way you use .==
julia> [true, true, false, false] .& [true, false, true, false]
4-element BitVector:
1
0
0
0
Update
in Julia 1.7+, the short-circuiting operators && and || can now be dotted to participate in broadcast fusion as .&& and .|| (#39594):
julia> [true, true, false, false] .&& [true, false, true, false]
4-element BitVector:
1
0
0
0

Check if array contains more nils than other values

Given an array with only odd counts:
[1,nil,nil]
[1,nil,Module,nil,2]
[1,Class.new,nil]
I would like to determine if there are nils or more non-nils. The approach I used was to make everything either true or false first. And then to determine if there are more true or false values:
[ 1,nil,nil,nil,2,3].collect {|val| !!!val }.max
#=> ArgumentError: comparison of TrueClass with false failed
The max method does not want to play nice with booleans. How can I accomplish this?
Now this might not be the best approach to determine whether there are more nils or non-nils, but this is the approach that I used.

Given an array with only odd counts
If by that you mean that there will always be the nonequal amount of truthy/falsey values in an array, then, first of all, [] is not a valid input.
And here's the solution:
def truthy?(array)
falsey, truthy = array.partition(&:!)
truthy.size > falsey.size
end
You can go with oneliner if you prefer:
def truthy?(array)
array.partition(&:!).max_by(&:size).any?
end
Spec:
truthy?([1,nil,nil]) #=> false
truthy?([1,nil,nil,nil,2]) #=> false
truthy?([1,4,nil]) #=> true
truthy?([1,nil,nil]) #=> false
truthy?([1,nil,Module,nil,2]) #=> true
truthy?([1,Class.new,nil]) #=> true
It uses
Enumerable#partition method;
BasicObject#! method.
If you indeed intended to only calculate nils, not falsey values (as it was stated in the OP):
def more_nils?(array)
array.partition(&:nil?).max_by(&:size).none?
end
Spec:
more_nils?([1,nil,nil]) #=> true
more_nils?([1,nil,nil,nil,2]) #=> true
more_nils?([1,4,nil]) #=> false
more_nils?([1,nil,nil]) #=> true
more_nils?([1,nil,Module,nil,2]) #=> false
more_nils?([1,Class.new,nil]) #=> false
It uses Object#nil? method.
Inspired by #pjs's answer:
array.sum { |el| el.nil? ? -1 : 1 }.negative?
Even simpler ( from #SagarPandya's comment)
array.count(nil) > array.compact.count

A fairly straightforward solution would be:
def truthy?(ary)
ary.map { |bool| bool ? 1 : -1 }.sum > 0
end
Map entries to +/-1 based on their truthiness, sum, and see whether the sum is positive or negative.
This can deal with empty arrays, it returns false in that case.

Here another one:
if array.size > 2*array.compact.size
# We have more nil than non-nil
end

Assuming that falsy values are nil and false, and everything else is truthy (as conditional statements do), you can leverage Object#itself with Array#select.
irb(main):013:0> ary = [1,nil,nil,false,2]
=> [1, nil, nil, false, 2]
irb(main):014:0> ary.select(&:itself).length
=> 2
irb(main):015:0> ary.reject(&:itself).length
=> 3

Final logical value of boolean array in ruby

Lets say I have an array that looks like:
[true, true, false]
And I am passing an operator along with the array which may be AND, OR or XOR.
So I want to calculate the logical value of array based on the operator specified.
ex:
for the given array [true, true, false] and the operator AND
I should be able to perform in continuation for n number of elements in array
Steps: true AND true -> true, true AND false -> false
therefore the output should be false
the array can be an n number of boolean values.

The best and easiest way to do this is using reduce:
def logical_calculation(arr, op)
op=='AND' ? arr.reduce(:&) : op=='OR' ? arr.reduce(:|) : arr.reduce(:^)
end
and also the other way is might be using inject
OPS = { "AND" => :&, "OR" => :|, "XOR" => :^ }
def logical_calculation(array, op)
array.inject(&OPS[op])
end

fill part of a MATLAB struct array using a binary filter

I can easily fill part of an array using a logical array filter. ie the following works for an array:
mydata=[2 2 2];
myfilter=[false true true true false false];
myarray(myfilter)=mydata;
I tried the following for a struct array but it gives an error.
mydata=[2 2 2];
myfilter=[false true true true false false];
[mystruct(myfilter).myval] = mydata;
If I have already filled my struct array using a loop I can access the data with the same filter as follows:
mydata=[2 2 2];
myfilter=[false true true true false false];
pp=0;
for p=1:length(myfilter)
if myfilter(p)
pp=pp+1;
mystruct(p).myval = mydata(pp);
end
end
[mystruct(myfilter).myval]
So I can make a loop work to load the data then retrieve the data as expected, but is there a vectorised way to fill part of a struct array?

You may proceed as follows:
mydata=[2 2 2];
myfilter=[false true true true false false];
myarray(myfilter) = mydata ;
% make structure
mystruct = struct('myval', num2cell(myarray));

How can I find periodically appearing NA values in an 3D array (along dimension time) with R

I have a time series (monthly values over several years) of spatial data (originally ncdf) in an array. If there are more than 2 consecutive e.g. januaries with NA, I want to ban this pixel (now cell in the matrix of one time step) completely from further studies by putting it to NA in all time steps.
As far as I am concerned, "time.series" is only valid for vectors or matrices (maximum of two-dimensions).
One workaround I can see (but also not manage to implement) is:
Resorting the array in a way that the order isn't purely chronological anymore but sorted by month (jan 2001, jan2002, jan 2003, feb 2001, feb 2002, feb 2003,...) would already help a lot. But it would leave the case that pixels get NA if eg. jan 2002, jan 2003 and feb 2001 are NA.
Any help would be really appreciated. Please ask if my question is unclear - it's my first one - I tried my best.
edit:
My actual dataset is a global satellite based radiation dataset. Due to eg periodically appearing clouds (during rainseason in the same month every year) those pixel should not be considered any further. I also have some other criteria which eliminates pixel. Only that one criteria is missing.
# create any array with scattered NAs
set.seed (10)
array <- replicate(48, replicate(10, rnorm(20)))
na_pixels <- array((sample(c(1, NA), size = 7200, replace = TRUE, prob = c(0.95, 0.05))), dim = c(20,10,48))
na_array <- array * na_pixels
dimnames(na_array) <- list(NULL, NULL, as.character(seq(as.Date("2001-01-01"), as.Date("2004-12-01"), "month")))
#I want to test several conditions that would make a pixel not usable for me
#in the end I want to retrieve a mask of usable "pixels".
#what I am doing already is:
mask <- apply(na_array, MARGIN = c(1,2), FUN=function(x){
#check if more than 10% of a pixel are NA over time
if (sum(is.na(x)) > (length(x)*0.05)){
mask_val <- 0
}
#check if more than 5 pixel are missing consecutively
else if (max(with(rle(is.na(a)), lengths[values])) > 5){
mask_val <- 0
}
#this is the missing part
else if (...more than 2 januaries or 2 feburaries or... are NA){#check for periodically appearing NAs
mask_val <- 0
}
else {
mask_val <- 1
}
return(mask_val)
})

It's, probably, more convenient (if the necessary memory exists) to change your 3D array in a 'long' "data.frame":
as.data.frame(as.table(na_array))
# Var1 Var2 Var3 Freq
#1 A A 2001-01-01 0.01874617
#2 B A 2001-01-01 -0.18425254
#3 C A 2001-01-01 -1.37133055
# ...........................
#9598 R J 2004-12-01 NA
#9599 S J 2004-12-01 -1.11411416
#9600 T J 2004-12-01 0.01435433
Instead of relying on as.table and as.data.frame coercions, it could be done manually and more efficiently:
dat = data.frame(i = rep_len(seq_len(dim(na_array)[1]), prod(dim(na_array))),
j = rep_len(rep(seq_len(dim(na_array)[2]), each = dim(na_array)[1]), prod(dim(na_array))),
date = rep(as.Date(dimnames(na_array)[[3]]), each = prod(dim(na_array)[1:2])) ,
month = rep(format(as.Date(dimnames(na_array)[[3]]), "%b"), each = prod(dim(na_array)[1:2])),
isNA = c(is.na(na_array)))
dat
# i j date month isNA
#1 1 1 2001-01-01 Jan FALSE
#2 2 1 2001-01-01 Jan FALSE
#3 3 1 2001-01-01 Jan FALSE
#4 4 1 2001-01-01 Jan TRUE
# ..............
#9597 17 10 2004-12-01 Dec FALSE
#9598 18 10 2004-12-01 Dec TRUE
#9599 19 10 2004-12-01 Dec FALSE
#9600 20 10 2004-12-01 Dec FALSE
Where i: row in na_array, j: column in na_array, date: 3rd dim of na_array, month: month of the date column (as it will be needed later), isNA: whether the value of na_array is NA.
And building the three conditions:
cond1 = aggregate(isNA ~ i + j, dat, function(x) sum(x) > (dim(na_array)[3] * 0.05))
(A more efficient way to create cond1 is rowSums(is.na(na_array), dims = 2) > (dim(na_array)[3] * 0.05)).
cond2 = aggregate(isNA ~ i + j, dat, function(x) any(with(rle(x), lengths[values]) > 5))
And to compute cond3, first find the number of missing values per "month" per each 'cell' (i.e. [i, j]) ("month" is a variable created/extracted from the dimnames(na_array)[[3]] when creating the 'long' "data.frame" dat in the beginning):
NA_per_month = aggregate(isNA ~ i + j + month, dat, function(x) sum(x))
Having the number of NAs per "month" for each [i, j], we build cond3 by checking if each [i, j] contains any "month" with more than 2 NAs:
cond3 = aggregate(isNA ~ i + j, NA_per_month, function(x) any(x > 2))
(It's trivial to replace aggregate in the above 'group-by' operations by any other available).
Perhaps we could avoid creating a 'long' "data.frame" and operate on na_array directly. For example, calculating cond1 with the rowSums version is much more efficient and straightforward. cond2, too, could be saved by an apply on na_array. But cond3 becomes much more straightforward with a 'long' "data.frame" rather than with a 3D array. So, accounting for efficiency, it's always better to try working with the structure present in the data and if it gets cumbersome enough, then we should probably change the structure of our data once and calculate anything in another scaffold than previously.
To get the final result, allocate a "matrix" of appropriate size:
ans = matrix(NA, dim(na_array)[1], dim(na_array)[2])
and fill in after ORing the conditions:
ans[cbind(cond1$i, cond1$j)] = cond1$isNA | cond2$isNA | cond3$isNA
ans
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
# [2,] TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE
# [3,] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
# [4,] FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
# [5,] FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
# [6,] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# [7,] FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE
# [8,] TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE
# [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
#[10,] TRUE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
#[11,] FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
#[12,] TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
#[13,] FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE
#[14,] FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE
#[15,] TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
#[16,] FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
#[17,] TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE
#[18,] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE
#[19,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
#[20,] TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE

# alexis_laz: Yes, this works now. Unfortunately I realised that the ans[cbind(cond1$i, cond1$j)] = cond1$isNA | cond2$isNA | cond3$isNA is not working. I get the error: number of items to replace is not a multiple of replacement length. I think it only takes the cond1 for replacement. (I am sorry for my example dataset which gives 'FALSE' in all cases for cond2 and cond3 but still, it should check the 'OR' in the code.Even though the result will look the same like cond1) I came up with the following code, which works but is definately not niceor efficient because I am not too familiar with boolean stuff. Perhaps you could optimize my code or edit your line (as my real dataset is huge, i would be greatful fpr any optimization). In the far end I would need all True conditions (meaning NA) to be 0 and all FALSE conditions to be 1. That's why I already did this in my code here.
ans = matrix(NA, dim(na_array)[1], dim(na_array)[2])
cond1_bool <- ans
cond1_bool[cbind(cond1$i, cond1$j)] = cond1$isNA
cond2_bool <- ans
cond2_bool[cbind(cond2$i, cond2$j)] = cond2$isNA
cond3_bool <- ans
cond3_bool[cbind(cond3$i, cond3$j)] = cond3$isNA
ans_bool <- ans
ans_bool[which(cond1_bool == T|cond2_bool == T|cond3_bool == T)] <- 0
ans_bool[which(is.na(ans_bool))] <- 1

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Pairwise comparison inside array in Julia - arrays

Related

Julia, integer vs boolean results from selection of instances in two arrays

Check if array contains more nils than other values

Final logical value of boolean array in ruby

fill part of a MATLAB struct array using a binary filter

How can I find periodically appearing NA values in an 3D array (along dimension time) with R

Categories

Resources