Switching row-major to column-major dimensions - arrays

I am putting into R a row-major data as a vector. R interprets this as column-major data and as far as I can see there is no way to tell array to behave in a row-major way.
Let's say I have:
array(1:12, c(3,2,2),
dimnames=list(c("r1", "r2", "r3"), c("c1", "c2"),c("t1", "t2"))
)
Which gives:
, , t1
c1 c2
r1 1 4
r2 2 5
r3 3 6
, , t2
c1 c2
r1 7 10
r2 8 11
r3 9 12
I want to transform this data to row-major array:
, , t1
c1 c2
r1 1 2
r2 3 4
r3 5 6
, , t2
c1 c2
r1 7 8
r2 9 10
r3 11 12

Assuming that your array is in a, i.e. that you already have this array and can't change it at read time, then the following will work:
a <- array(1:12, c(3,2,2),
dimnames=list(c("r1", "r2", "r3"), c("c1", "c2"),c("t1", "t2")))
b <- aperm(array(a, dim = c(2,3,2),
dimnames = dimnames(a)[2:1]),
perm = c(2,1,3))
b
> b
, , 1
c1 c2
r1 1 2
r2 3 4
r3 5 6
, , 2
c1 c2
r1 7 8
r2 9 10
r3 11 12

The solution:
aperm(array(1:12, c(2,3,2),
dimnames=list(c("c1","c2"),c("r1","r2","r3"),c("t1","t2"))),
perm=c(2,1,3)
)
Note that aperm switches the dimensions. So essentially columns are switched with rows. In addition I needed to change the order of columns and rows in dimnames.
It produces exactly what is needed:
, , t1
c1 c2
r1 1 2
r2 3 4
r3 5 6
, , t2
c1 c2
r1 7 8
r2 9 10
r3 11 12

Related

Table sort and lookup

I have an excel table (25x25) which looks like this,
C1 C2 C3
R1 5 6 7
R2 1 7 9
R3 2 3 0
my goal is to make it look like this,
C3 R3 0
C1 R2 1
C1 R3 2
C2 R3 3
C1 R1 5
C2 R1 6
C2 R2 7
C3 R1 7
C3 R2 9
It generates a new table ranked by the values in the first. It also tells the corresponding column and row name.The table has duplicates, negatives and decimals.
I'm doing this because I'd like to find the 3 closest candidates (and hence the C's and R's) of a given value. And VLOOKUP() requires a sorted table.
Another problem (a step forward) is that VLOOKUP() returns the closest smaller value instead of actually the smallest. Is there a better way to do it or a workaround? So that the result is a neat table like such,
Value to look up = 2.8
>> C2 R3 3
>> C1 R3 2
>> C1 R1 5
For some reasons I cannot use VBA for this project. Any solutions with just built-in functions in MS Excel?
If you need to use only native worksheet functions, this can be accomplished; even without array formulas.
        
With your original data in A1:D4, the formulas in F3:H3 are,
=INDEX(B$1:D$1, AGGREGATE(15, 6, COLUMN($A:$C)/(B$2:D$4=H3), COUNTIF(H$3:H3, H3)))
=INDEX(A$2:A$4, AGGREGATE(15, 6, ROW($1:$3)/(B$2:D$4=H3), COUNTIF(H$3:H3, H3)))
=SMALL(B$2:D$4,ROW(1:1))
Fill down as necessary.
The formulas in K5:N5 are,
=INDEX(B$1:D$1, AGGREGATE(15, 6, COLUMN($A:$C)/(B$2:D$4=M5), COUNTIF(M$5:M5, M5)))
=INDEX(A$2:A$4, AGGREGATE(15, 6, ROW($1:$3)/(B$2:D$4=H3), COUNTIF(M$5:M5, M5)))
=IF(COUNTIF($B$2:$D$4, N5+$K$2)>=COUNTIF(N$5:N5, N5), N5+$K$2, $K$2-N5)
=AGGREGATE(15,6,ABS($B$2:$D$4-$K$2),ROW(1:1))
Fill down as necessary.
I've included enough rows in the K5:N13 matrix that you can see how the two 7 values are handled.

Array row calculations

I have the following table:
DATA:
Lines <- " ID MeasureX MeasureY x1 x2 x3 x4 x5
1 1 1 1 1 1 1 1
2 1 1 0 1 1 1 1
3 1 1 1 2 3 3 3"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)
What i would like to achieve is :
Create 5 columns(r1-r5)
which is the division of each column x1-x5 with MeasureX (example x1/measurex, x2/measurex etc.)
Create 5 columns(p1-p5)
which is the division of each column x1-x5 with number 1-5 (the number of xcolumns) example x1/1, x2/2 etc.
MeasureY is irrelevant for now, the end product would be the ID and columns r1-r5 and p1-p5, is this feasible?
In SAS i would go with something like this:
data test6;
set test5;
array x {5} x1- x5;
array r{5} r1 - r5;
array p{5} p1 - p5;
do i=1 to 5;
r{i} = x{i}/MeasureX;
p{i} = x{i}/(i);
end;
The reason would be to have more dynamic beacuse the number of columns could change in the future.
Argument recycling allows you do do element-wise division with a constant vector. The tricky part was extracting the digits from the column names. I then repeated each of the digits by the number of rows to do the second division-task.
DF[ ,paste0("r", 1:5)] <- DF[ , grep("x", names(DF) )]/ DF$MeasureX
DF[ ,paste0("p", 1:5)] <- DF[ , grep("x", names(DF) )]/ # element-wise division
rep( as.numeric( sub("\\D","",names(DF)[ # remove non-digits
grep("x", names(DF))] #returns only 'x'-cols
) ), each=nrow(DF) ) # make them as long as needed
#-------------
> DF
ID MeasureX MeasureY x1 x2 x3 x4 x5 r1 r2 r3 r4 r5 p1 p2 p3 p4 p5
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.5 0.3333333 0.25 0.2
2 2 1 1 0 1 1 1 1 0 1 1 1 1 0 0.5 0.3333333 0.25 0.2
3 3 1 1 1 2 3 3 3 1 2 3 3 3 1 1.0 1.0000000 0.75 0.6
This could be greatly simplified if you already know the sequence vector for the second division task would be 1-5, but this was designed to allow "gaps" in the sequence for column names and still use the digit information in the names as the divisor. (You were not entirely clear about what situations this code would be used in.) The construct of r{1-5} in SAS is mimicked by [ , paste0('r', 1:5)]. SAS is a macro language and sometimes experienced users have trouble figuring out how to make R behave like one. Generally it takes a while to lose the for-loop mentality and begin using R as a functional language.
An alternative with the data.table package:
cols <- names(df[c(4:8)])
library(data.table)
setDT(df)[, (paste0("r",1:5)) := .SD / df$MeasureX, by = ID, .SDcols = cols
][, (paste0("p",1:5)) := .SD / 1:5, by = ID, .SDcols = cols]
which results in:
> df
ID MeasureX MeasureY x1 x2 x3 x4 x5 r1 r2 r3 r4 r5 p1 p2 p3 p4 p5
1: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.5 0.3333333 0.25 0.2
2: 2 1 1 0 1 1 1 1 0 1 1 1 1 0 0.5 0.3333333 0.25 0.2
3: 3 1 1 1 2 3 3 3 1 2 3 3 3 1 1.0 1.0000000 0.75 0.6
You could put together a nifty loop or apply to do this, but here it is explicitly:
# Handling the "r" columns.
DF$r1 <- DF$x1 / DF$MeasureX
DF$r2 <- DF$x2 / DF$MeasureX
DF$r3 <- DF$x3 / DF$MeasureX
DF$r4 <- DF$x4 / DF$MeasureX
DF$r5 <- DF$x5 / DF$MeasureX
# Handling the "p" columns.
DF$p1 <- DF$x1 / 1
DF$p2 <- DF$x2 / 2
DF$p3 <- DF$x3 / 3
DF$p4 <- DF$x4 / 4
DF$p5 <- DF$x5 / 5
# Taking only the columns we want.
FinalDF <- DF[, c("ID", "r1", "r2", "r3", "r4", "r5", "p1", "p2", "p3", "p4", "p5")]
Just noting that this is pretty straightforward matrix manipulation that you definitely could have found elsewhere. Perhaps you're new to R, but still put a little more effort in next time. If you are new to R, it's definitely worth the time to look up some basic R coding tutorial or video.

Natural join subset decomposition

For this question , I found the answer is (c). but I can give an example to show that (c) is not correct. which is the answer?
Let r be a relation instance with schema R = (A, B, C, D). We define r1 = ‘select A,B,C from r’ and r2 = ‘select A, D from r’. Let s = r1 * r2 where * denotes natural join. Given that the decomposition of r into r1 and r2 is lossy, which one of the following is true?
(a) s is subset of r
(b) r U s = r
(c) r is a subset of s
(d) r * s = s
If the Answer is (c) , consider the following example with lossy decomposition of r into r1 and r2.
Table r
A B C D
1 10 100 1000
2 20 200 1000
3 20 200 1001
Table r1
A B C
1 10 100
2 20 200
Table r2
A D
2 1000
3 1001
Table s (natural join of r1 and r2)
A B C D
2 20 200 1000
The answer is not (c) . but I can also give you an example that (c) can be an answer.
What should be the answer?
Table r
A B C D
1 10 100 1000
1 20 200 2000
Table r1
A B C
1 10 100
1 20 200
Table r2
A D
1 1000
1 2000
Table s (natural join of r1 and r2)
A B C D
1 10 100 1000
1 20 200 1000
1 10 100 2000
1 20 200 2000
The decomposition is called "lossy" because we have lost the ability to recompose the original relation value using natural join. This lost ability manifests itself by extra rows appearing when we try the natural join. The underlying cause why this happens is that the decomposition did not retain any key ( { {B} {C} {D} } ) fully in both tables of the decomposition. If any key of the original is fully retained in all components of a decomposition, then the decomposition isn't lossy.
Table r is:
A B C D
1 10 100 1000
2 20 200 1000
3 20 200 1001
r1 is:
r1 = ‘select A,B,C from r’
A B C
1 10 100
2 20 200
3 20 200
r2 is:
r2 = ‘select A, D from r'
A D
1 1000
2 1000
3 1001
s is:
s = r1 * r2
1 10 100 1000
2 20 200 1000
3 20 200 1001
So effectively r is as subset of s. If the definicion of r1 is ‘select A,B,C from r’ you just can't remove a row (as you did in your example) from the result and said that r1 still complies with the definition, the same applies to r2 where you remove the first row.

matrix multiplication with factor of array matlab

Have matrix A that have size x by y
and matrix B with x by 1
in matrix B have an element that represent kind of co factor that correspondent with matrix A
I want the program A * B ( A * factor of each array )
Example
A (4 * 3) = [ 2 4 6 ;
5 10 15 ;
7 11 13 ;
1 1 1];
B (4 * 1) = [ 4 ; 1/5 ; 3 ; 7];
I want A * B like [ 2*4 , 4*4 , 6*4
;5/5 , 10/5 , 15/5
;7*3 , 11*3 , 13*3
;1*7 , 1*7 , 1*7];
expected RESULT = [ 8 16 24 ; 1 2 3 ; 21 33 39 ; 7 7 7];
I try to use scalar multiplication but it didn't work since scalar multiplication must have same size of array how do I to solve this?
Use bsxfun to get your desired result of multiplying the row elements of A with the single row value in B
bsxfun(#times,A,B)

R generate matrix from linear table with column and row numbers

I am new in R and struggle with arrays.My question is very simple but I didnt find easy answer on the web or in R documentation.
I have a table with column and row number that I want to use to generate a new matrix
Original table:
V1 V2 pval
1 1 2 5.914384e-13
2 1 3 8.143390e-01
3 1 4 7.587818e-01
4 1 5 9.734698e-12
5 1 6 7.812521e-19
I want to use:
V1 as the column number for the new matrix;
V2 as the row number
pvals as the value
Targeted matrix:
1 2 3 4
1 0 5e-1 8e-1 7e-1
2 5e-13 0
3 8e-1 0
4 7e-1 0
#some data
set.seed(42)
df <- data.frame(V1=rep(1:6,each=3),V2=rep(1:3,6),pval=runif(18,0,1))
df <- df[df$V1!=df$V2,]
# V1 V2 pval
#2 1 2 0.560332746
#3 1 3 0.904031387
#4 2 1 0.138710168
#6 2 3 0.946668233
#7 3 1 0.082437558
#8 3 2 0.514211784
# ...
#use dcast to change to wide format
library(reshape2)
df2 <- dcast(df,V2~V1,fill=0)
# V2 1 2 3 4 5 6
#1 1 0.0000000 0.1387102 0.08243756 0.9057381 0.7375956 0.685169729
#2 2 0.5603327 0.0000000 0.51421178 0.4469696 0.8110551 0.003948339
#3 3 0.9040314 0.9466682 0.00000000 0.8360043 0.3881083 0.832916080
#in case you really want a matrix object
m <- as.matrix(df2[,-1])

Resources