I am new in R and struggle with arrays.My question is very simple but I didnt find easy answer on the web or in R documentation.
I have a table with column and row number that I want to use to generate a new matrix
Original table:
V1 V2 pval
1 1 2 5.914384e-13
2 1 3 8.143390e-01
3 1 4 7.587818e-01
4 1 5 9.734698e-12
5 1 6 7.812521e-19
I want to use:
V1 as the column number for the new matrix;
V2 as the row number
pvals as the value
Targeted matrix:
1 2 3 4
1 0 5e-1 8e-1 7e-1
2 5e-13 0
3 8e-1 0
4 7e-1 0
#some data
set.seed(42)
df <- data.frame(V1=rep(1:6,each=3),V2=rep(1:3,6),pval=runif(18,0,1))
df <- df[df$V1!=df$V2,]
# V1 V2 pval
#2 1 2 0.560332746
#3 1 3 0.904031387
#4 2 1 0.138710168
#6 2 3 0.946668233
#7 3 1 0.082437558
#8 3 2 0.514211784
# ...
#use dcast to change to wide format
library(reshape2)
df2 <- dcast(df,V2~V1,fill=0)
# V2 1 2 3 4 5 6
#1 1 0.0000000 0.1387102 0.08243756 0.9057381 0.7375956 0.685169729
#2 2 0.5603327 0.0000000 0.51421178 0.4469696 0.8110551 0.003948339
#3 3 0.9040314 0.9466682 0.00000000 0.8360043 0.3881083 0.832916080
#in case you really want a matrix object
m <- as.matrix(df2[,-1])
Related
I'm looking for a quick way in MATLAB to do the following:
Given a permutation matrix of a vector, say [1, 2, 3], I would like to remove all duplicate reverse rows.
So the matrix P = perms([1, 2, 3])
3 2 1
3 1 2
2 3 1
2 1 3
1 3 2
1 2 3
becomes
3 2 1
3 1 2
2 3 1
You can noticed that, symetrically, the first element of each rows have to be bigger than the last one:
n = 4; %row size
x = perms(1:n) %all perms
p = x(x(:,1)>x(:,n),:) %non symetrical perms
Or you can noticed that the number of rows contained by the p matrix follows this OEIS sequence for each n and correspond to size(x,1)/2 so since perms output the permutation in reverse lexicographic order:
n = 4; %row size
x = perms(1:n) %all perms
p = x(1:size(x,1)/2,:) %non symetrical perms
You can use MATLAB's fliplr method to flip your array left to right, and then use ismember to find rows of P in the flipped version. At last, iterate all locations and select already found rows.
Here's some code (tested with Octave 5.2.0 and MATLAB Online):
a = [1, 2, 3];
P = perms(a)
% Where can row x be found in the left right flipped version of row x?
[~, Locb] = ismember(P, fliplr(P), 'rows');
% Set up logical vector to store indices to take from P.
n = length(Locb);
idx = true(n, 1);
% Iterate all locations and set already found row to false.
for I = 1:n
if (idx(I))
idx(Locb(I)) = false;
end
end
% Generate result matrix.
P_star = P(idx, :)
Your example:
P =
3 2 1
3 1 2
2 3 1
2 1 3
1 3 2
1 2 3
P_star =
3 2 1
3 1 2
2 3 1
Added 4 to the example:
P =
4 3 2 1
4 3 1 2
4 2 3 1
4 2 1 3
4 1 3 2
4 1 2 3
3 4 2 1
3 4 1 2
3 2 4 1
3 2 1 4
3 1 4 2
3 1 2 4
2 4 3 1
2 4 1 3
2 3 4 1
2 3 1 4
2 1 4 3
2 1 3 4
1 4 3 2
1 4 2 3
1 3 4 2
1 3 2 4
1 2 4 3
1 2 3 4
P_star =
4 3 2 1
4 3 1 2
4 2 3 1
4 2 1 3
4 1 3 2
4 1 2 3
3 4 2 1
3 4 1 2
3 2 4 1
3 1 4 2
2 4 3 1
2 3 4 1
As demanded in your question (at least from my understanding), rows are taken from top to bottom.
Here's another approach:
result = P(all(~triu(~pdist2(P,P(:,end:-1:1)))),:);
pdist computes the distance between rows of P and rows of P(:,end:-1:1).
~ negates the result, so that true corresponds to coincident pairs.
triu keeps only the upper triangular part of the matrix, so that only one of the two rows of the coincident pair will be removed.
~ negates back, so that true corresponds to non-coincident pairs.
all gives a row vector with true for rows that should be kept (because they do not coincide with any previous row).
This is used as a logical index to select rows of P.
I have the following table:
DATA:
Lines <- " ID MeasureX MeasureY x1 x2 x3 x4 x5
1 1 1 1 1 1 1 1
2 1 1 0 1 1 1 1
3 1 1 1 2 3 3 3"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)
What i would like to achieve is :
Create 5 columns(r1-r5)
which is the division of each column x1-x5 with MeasureX (example x1/measurex, x2/measurex etc.)
Create 5 columns(p1-p5)
which is the division of each column x1-x5 with number 1-5 (the number of xcolumns) example x1/1, x2/2 etc.
MeasureY is irrelevant for now, the end product would be the ID and columns r1-r5 and p1-p5, is this feasible?
In SAS i would go with something like this:
data test6;
set test5;
array x {5} x1- x5;
array r{5} r1 - r5;
array p{5} p1 - p5;
do i=1 to 5;
r{i} = x{i}/MeasureX;
p{i} = x{i}/(i);
end;
The reason would be to have more dynamic beacuse the number of columns could change in the future.
Argument recycling allows you do do element-wise division with a constant vector. The tricky part was extracting the digits from the column names. I then repeated each of the digits by the number of rows to do the second division-task.
DF[ ,paste0("r", 1:5)] <- DF[ , grep("x", names(DF) )]/ DF$MeasureX
DF[ ,paste0("p", 1:5)] <- DF[ , grep("x", names(DF) )]/ # element-wise division
rep( as.numeric( sub("\\D","",names(DF)[ # remove non-digits
grep("x", names(DF))] #returns only 'x'-cols
) ), each=nrow(DF) ) # make them as long as needed
#-------------
> DF
ID MeasureX MeasureY x1 x2 x3 x4 x5 r1 r2 r3 r4 r5 p1 p2 p3 p4 p5
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.5 0.3333333 0.25 0.2
2 2 1 1 0 1 1 1 1 0 1 1 1 1 0 0.5 0.3333333 0.25 0.2
3 3 1 1 1 2 3 3 3 1 2 3 3 3 1 1.0 1.0000000 0.75 0.6
This could be greatly simplified if you already know the sequence vector for the second division task would be 1-5, but this was designed to allow "gaps" in the sequence for column names and still use the digit information in the names as the divisor. (You were not entirely clear about what situations this code would be used in.) The construct of r{1-5} in SAS is mimicked by [ , paste0('r', 1:5)]. SAS is a macro language and sometimes experienced users have trouble figuring out how to make R behave like one. Generally it takes a while to lose the for-loop mentality and begin using R as a functional language.
An alternative with the data.table package:
cols <- names(df[c(4:8)])
library(data.table)
setDT(df)[, (paste0("r",1:5)) := .SD / df$MeasureX, by = ID, .SDcols = cols
][, (paste0("p",1:5)) := .SD / 1:5, by = ID, .SDcols = cols]
which results in:
> df
ID MeasureX MeasureY x1 x2 x3 x4 x5 r1 r2 r3 r4 r5 p1 p2 p3 p4 p5
1: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.5 0.3333333 0.25 0.2
2: 2 1 1 0 1 1 1 1 0 1 1 1 1 0 0.5 0.3333333 0.25 0.2
3: 3 1 1 1 2 3 3 3 1 2 3 3 3 1 1.0 1.0000000 0.75 0.6
You could put together a nifty loop or apply to do this, but here it is explicitly:
# Handling the "r" columns.
DF$r1 <- DF$x1 / DF$MeasureX
DF$r2 <- DF$x2 / DF$MeasureX
DF$r3 <- DF$x3 / DF$MeasureX
DF$r4 <- DF$x4 / DF$MeasureX
DF$r5 <- DF$x5 / DF$MeasureX
# Handling the "p" columns.
DF$p1 <- DF$x1 / 1
DF$p2 <- DF$x2 / 2
DF$p3 <- DF$x3 / 3
DF$p4 <- DF$x4 / 4
DF$p5 <- DF$x5 / 5
# Taking only the columns we want.
FinalDF <- DF[, c("ID", "r1", "r2", "r3", "r4", "r5", "p1", "p2", "p3", "p4", "p5")]
Just noting that this is pretty straightforward matrix manipulation that you definitely could have found elsewhere. Perhaps you're new to R, but still put a little more effort in next time. If you are new to R, it's definitely worth the time to look up some basic R coding tutorial or video.
I have 2 arrays in R. I want to combine them into a data.table (or data.frame) such that a row is created for each value from array 1 combined with each value in array 2.
For example, if I had:
Array1 <- c("A", "B", "C")
Array2 <- c(1, 2, 3)
I want the output data.frame to look like:
> DF
Array1 Array2
1 A 1
2 A 2
3 A 3
4 B 1
5 B 2
6 B 3
7 C 1
8 C 2
9 C 3
Does anyone know how to do this?
data.table has a function for this, CJ, that's very similar to expand.grid, and produces a keyed data.table (which can be very advantageous in advanced data.table joins):
CJ(a = Array1, b = Array2)
# a b
#1: A 1
#2: A 2
#3: A 3
#4: B 1
#5: B 2
#6: B 3
#7: C 1
#8: C 2
#9: C 3
key(CJ(a = Array1, b = Array2))
#[1] "a" "b"
The obvious choice:
expand.grid(Array1,Array2)
If you need the variable names:
expand.grid(Array1=Array1,Array2=Array2)
Result:
# Array1 Array2
#1 A 1
#2 B 1
#3 C 1
#4 A 2
#5 B 2
#6 C 2
#7 A 3
#8 B 3
#9 C 3
If you specifically need a data.table output, as #mnel suggests you can do:
out <- setDT(expand.grid(Array1=Array1,Array2=Array2))
Below are the two adjacency matrices.I have to find which row of matrix1 is correspond to which row in matrix2 depending on diagonal values.In below example
1st row=1st row(diagonal value=4)
2nd row=5th row(diagonal value=5)
3rd row=4th row(diagonal value=1)
4th row=2nd row(diagonal value=3)
5th row=3rd row(diagonal value=2)
4 4 1 3 2
4 5 1 3 2
1 1 1 1 1
3 3 1 3 2
2 2 1 2 2
4 3 2 1 4
3 3 2 1 3
2 2 2 1 2
1 1 1 1 1
4 3 2 1 5
How it can be done in matlab?
Use the second output of ismember:
[~, result] = ismember(diag(matrix1), diag(matrix2))
In your example, this returns
result =
1
5
4
2
3
Assuming mat1 and mat2 to be the first and second matrices respectively and that you are looking to find the first match of diagonal values, try this -
[~,ind] = max(bsxfun(#eq,diag(mat2),diag(mat1)'))
or
[~,ind] = max(bsxfun(#eq,diag(mat1),diag(mat2)'),[],2)
If you are certain that there are always unique matches, you can use find too -
[ind,~] = find(bsxfun(#eq,diag(mat2),diag(mat1)'))
I have got a question regarding all the combinations of matrix-rows in Matlab.
I currently have a matrix with the following structure:
1 2
1 3
1 4
2 3
2 4
3 4
Now I want to get all the possible combinations of these "pairs" without using a number twice in the same row:
1 2 3 4
1 3 2 4
1 4 2 3
And it must be possible to make it with n-"doublecolumns". Which means, when my pair-matrix goes for example until "5 6", i want to create the matrix with 3 of these doublecolumns:
1 2 3 4 5 6
1 2 3 5 4 6
1 2 3 6 4 5
1 3 2 4 5 6
1 3 2 5 4 6
....
I hope you understand what I mean :)
Any ideas how to solve this?
Thanks and best regard
Jonas
M = [1 2
1 3
1 4
2 3
2 4
3 4]; %// example data
n = floor(max(M(:))/2); %// size of tuples. Compute this way, or set manually
p = nchoosek(1:size(M,1), n).'; %'// generate all n-tuples of row indices
R = reshape(M(p,:).', n*size(M,2), []).'; %// generate result...
R = R(all(diff(sort(R.'))),:); %'//...removing combinations with repeated values