Working with vectors in Matlab - arrays

I have 2 vectors:
1) xvn = [-6.2 -5.2 -4.2 -3.2 -2.2 -1.2 -0.2 0.8 1.8 2.8 3.8 4.8 5.8]
2) xg = [-2.0 -1.0 -0.0 1.0 2.0]
I am trying to get a new vector which looks like this.
xv = [-6.2 -5.2 -4.2 -3.2 -2.2 -2.0 -1.0 -0.0 1.0 2.0 2.8 3.8 4.8 5.8]
Essentially xg has values say between -2.0 to 2.0 and xvn has values between -6.2 to 5.8. The new vector xv contains values of xvn up to -2.0, then all values of xg and values of xvn higher than 2.0.
All vectors are in increasing order.

Since they're monotonically increasing, something like this:
xv = [xvn(xvn<xg(1)) xg xvn(xvn>xg(end))]
If they are column vectors instead of rows, as you've shown, then vertically concatenate (; or vertcat).

Related

julia arrays select the first x rows by group

Using julia, I want to select the first x rows of an array per group.
In the following example, I want the first two rows where the second column is equal to 1.0, then the first two rows where the second column is equal to 2.0, etc.
XX = [repeat([1.0], 6) vcat(repeat([1.0], 3), repeat([2.0], 3))]
XX2 = [repeat([2.0], 6) vcat(repeat([3.0], 3), repeat([4.0], 3))]
beg = [XX;XX2]
> 12×2 Matrix{Float64}:
> 1.0 1.0
> 1.0 1.0
> 1.0 1.0
> 1.0 2.0
> 1.0 2.0
> 1.0 2.0
> 2.0 3.0
> 2.0 3.0
> 2.0 3.0
> 2.0 4.0
> 2.0 4.0
> 2.0 4.0
The final array would look like this:
8×2 Matrix{Float64}:
1.0 1.0
1.0 1.0
1.0 2.0
1.0 2.0
2.0 3.0
2.0 3.0
2.0 4.0
2.0 4.0
I use the following code, but I am not sure whether there is a simpler way (one function) that does already that in a more efficient way?
x = []
for val in unique(beg[:,2])
x = append!(x, findfirst(beg[:,2].==val))
end
idx = sort([x; x.+1])
final = beg[idx, :]
Assuming your data:
is sorted (i.e. groups are forming continuous blocks)
each group is guaranteed to have at least two elements
(your code assumes both)
then you can generate idx filter that you want in the following way:
idx == [i for i in axes(beg, 1) if i < 3 || beg[i, 2] != beg[i-1, 2] || beg[i, 2] != beg[i-2, 2]]
If you cannot assume either of the above please comment and I can show a more general solution.
EDIT
Here is an example without using any external packages:
julia> using Random
julia> XX = [repeat([1.0], 6) vcat(repeat([1.0], 3), repeat([2.0], 3))]
6×2 Matrix{Float64}:
1.0 1.0
1.0 1.0
1.0 1.0
1.0 2.0
1.0 2.0
1.0 2.0
julia> XX2 = [repeat([2.0], 7) vcat(repeat([3.0], 3), repeat([4.0], 3), 5.0)] # last group has length 1
7×2 Matrix{Float64}:
2.0 3.0
2.0 3.0
2.0 3.0
2.0 4.0
2.0 4.0
2.0 4.0
2.0 5.0
julia> beg = [XX;XX2][randperm(13), :] # shuffle groups so they are not in order
13×2 Matrix{Float64}:
2.0 3.0
1.0 2.0
2.0 4.0
2.0 3.0
2.0 4.0
2.0 5.0
2.0 3.0
1.0 2.0
1.0 2.0
1.0 1.0
1.0 1.0
2.0 4.0
1.0 1.0
julia> x = Dict{Float64, Vector{Int}}() # this will store indices per group
Dict{Float64, Vector{Int64}}()
julia> for (i, v) in enumerate(beg[:, 2]) # collect the indices
push!(get!(x, v, Int[]), i)
end
julia> x
Dict{Float64, Vector{Int64}} with 5 entries:
5.0 => [6]
4.0 => [3, 5, 12]
2.0 => [2, 8, 9]
3.0 => [1, 4, 7]
1.0 => [10, 11, 13]
julia> idx = sort!(mapreduce(x -> first(x, 2), vcat, values(x))) # get first two indices per group in ascending order
9-element Vector{Int64}:
1
2
3
4
5
6
8
10
11

Pairwise Cohen's Kappa of rows in DataFrame in Pandas (python)

I'd greatly appreciate some help on this. I'm using jupyter notebook.
I have a dataframe where I want calculate the interrater reliability. I want to compare them pairwise by the value of the ID column (all IDs have a frequency of 2, one for each coder). All ID values represent different articles, so I do not want to compare them all together, but more take the average of the interrater reliability of each pair (and potentially also for each column).
N. ID. A. B.
0 8818313 Yes Yes 1.0 1.0 1.0 1.0 1.0 1.0
1 8818313 Yes No 0.0 1.0 0.0 0.0 1.0 1.0
2 8820105 No Yes 0.0 1.0 1.0 1.0 1.0 1.0
3 8820106 No No 0.0 0.0 0.0 1.0 0.0 0.0
I've been able to find some instructions of the cohen's k, but not of how to do this pairwise by value in the ID column.
Does anyone know how to go about this?
Here is how I will approach it:
from io import StringIO
from sklearn.metrics import cohen_kappa_score
df = pd.read_csv(StringIO("""
N,ID,A,B,Nums
0, 8818313, Yes, Yes,1.0 1.0 1.0 1.0 1.0 1.0
1, 8818313, Yes, No,0.0 1.0 0.0 0.0 1.0 1.0
2, 8820105, No, Yes,0.0 1.0 1.0 1.0 1.0 1.0
3, 8820105, No, No,0.0 0.0 0.0 1.0 0.0 0.0 """))
def kappa(df):
nums1 = [float(num) for num in df.Nums.iloc[0].split(' ') if num]
nums2 = [float(num) for num in df.Nums.iloc[1].split(' ') if num]
return cohen_kappa_score(nums1, nums2)
df.groupby('ID').apply(kappa)
This will generate:
ID
8818313 0.000000
8820105 0.076923
dtype: float64

Use for loop to get i:i+1 columns

I want to use for loop so I can get the result for i:i+1 columns at each iteration. E.g: when i= 1, I get 1st & 2nd cols, i= 2 , 3rd and 4th cols.
m= [2 -3;4 6]
al= [1 3;-2 -4]
l=[1 0; 2 4]
Random.seed!(1234)
d= [rand(2:20, 10) rand(-1:10, 10)]
Random.seed!(1234)
c= [rand(0:30, 10) rand(-1:20, 10)]
n,g=size(w)
mx=zeros(n,2*g)
for i =1:g
mx[ : ,i:i+1] = m[:,i]' .+ (d[:,i]' .* al[:,i])' .+ (l[:,i]' .* c[:,i])
end
return mx
I got the following which is wrong when I compared it with doing the process manually
julia> mx
10×4 Array{Float64,2}:
8.0 -6.0 10.0 0.0
22.0 3.0 18.0 0.0
40.0 18.0 38.0 0.0
9.0 9.0 22.0 0.0
51.0 3.0 18.0 0.0
10.0 18.0 34.0 0.0
52.0 12.0 30.0 0.0
36.0 21.0 42.0 0.0
44.0 24.0 42.0 0.0
33.0 24.0 42.0 0.0
The first 2 cols and last 2 cols in mx should be matched the results here with same order
mx1_2=m[:,1]' .+ (d[:,1]' .* al[:,1])' .+ (l[:,1]' .* c[:,1])
mx2_4=m[:,2]' .+ (d[:,2]' .* al[:,2])' .+ (l[:,2]' .* c[:,2])

Comparing two columns and summing the values in Matlab

I have 2 columns like this:
0.0 1.2
0.0 2.3
0.0 1.5
0.1 1.0
0.1 1.2
0.1 1.4
0.1 1.7
0.4 1.1
0.4 1.3
0.4 1.5
In the 1st column, 0.0 is repeated 3 times. I want to sum corresponding elements
(1.2 + 2.3 + 1.5) in the 2nd column. Similarly, 0.1 is repeated 4 times in the 1st
column. I want to sum the corresponding elements (1.0 + 1.2 + 1.4 + 1.7) in the 2nd
column and so on.
I am trying like this
for i = 1:length(col1)
for j = 1:length(col2)
% if col2(j) == col1(i)
% to do
end
end
end
This is a classical use of unique and accumarray:
x = [0.0 1.2
0.0 2.3
0.0 1.5
0.1 1.0
0.1 1.2
0.1 1.4
0.1 1.7
0.4 1.1
0.4 1.3
0.4 1.5]; % data
[~, ~, w] = unique(x(:,1)); % labels of unique elements
result = accumarray(w, x(:,2)); % sum using the above as grouping variable
You can also use the newer splitapply function instead of accumarray:
[~, ~, w] = unique(x(:,1)); % labels of unique elements
result = splitapply(#sum, x(:,2), w); % sum using the above as grouping variable
a=[0.0 1.2
0.0 2.3
0.0 1.5
0.1 1.0
0.1 1.2
0.1 1.4
0.1 1.7
0.4 1.1
0.4 1.3
0.4 1.5]
% Get unique col1 values, and indices
[uniq,~,ib]=unique(a(:,1));
% for each unique value in col1
for ii=1:length(uniq)
% sum all col2 values that correspond to the current index of the unique value
s(ii)=sum(a(ib==ii,2));
end
Gives:
s =
5.0000 5.3000 3.9000

Difference between elements of two arrays by common values of one column in Julia

Here is a question related to a previous question of mine, which I prefer to submit as a new question. Suppose this time we have only the following 2 arrays in Julia:
[5.0 3.5
6.0 3.6
7.0 3.0]
and
[5.0 4.5
6.0 4.7
8.0 3.0]
I want to obtain an array that calculates the difference between elements of the second column (the first array minus the second array, by this order) but only for common values of the first column. The resulting array must then be the following:
[5.0 -1
6.0 -1.1]
How can we code in Julia for obtaining this last array?
Assume:
x = [5.0 3.5
6.0 3.6
7.0 3.0]
y = [5.0 4.5
6.0 4.7
8.0 3.0]
Again there are many ways to do it. Using DataFrames you can write:
using DataFrames
df = innerjoin(DataFrame(x, [:id, :x]), DataFrame(y, [:id, :y]), on=:id)
df = [df.id df.x-df.y]
## 2×2 Matrix{Float64}:
## 5.0 -1.0
## 6.0 -1.1
You could also convert original arrays to dictionaries and work with them:
dx = Dict(x[i,1] => x[i,2] for i in 1:size(x, 1))
dy = Dict(y[i,1] => y[i,2] for i in 1:size(y, 1))
ks = sort!(collect(intersect(keys(dx), keys(dy))))
[ks [dx[k]-dy[k] for k in ks]]
## 2×2 Matrix{Float64}:
## 5.0 -1.0
## 6.0 -1.1
The difference between those two methods is how they would handle duplicates in either x or y in the first column. The first will produce all combinations, the second will store only last value for each key.
A solution without DataFrames.jl is
julia> idx = findall(x[:,1] .== y[:,1]) # findall match of 1st col
2-element Vector{Int64}:
1
2
julia> [x[idx,1] (x-y)[idx,2]]
2×2 Matrix{Float64}:
5.0 -1.0
6.0 -1.1

Resources