Extracting array values based on values in different dimension - arrays

I've got a problem with subsetting values of an array.
raw.table <- array(data = c(1:12,13:24,rep(1:6, each=2)),
dim=c(3,4,3),
dimnames=list(LETTERS[1:3],1:4,c("target","ctrl","samples")))
The first two dimensions of my array represent some values that I want to do statistics on and the higher dimensions contain different attributes I want to use to access specific subsets. In this case I have only sample numbers, whereas there are always two values assigned to the same sample number (measurement replicates).
, , target
1 2 3 4
A 1 4 7 10
B 2 5 8 11
C 3 6 9 12
, , ctrl
1 2 3 4
A 13 16 19 22
B 14 17 20 23
C 15 18 21 24
, , samples
1 2 3 4
A 1 2 4 5
B 1 3 4 6
C 2 3 5 6
How do I access the values in dimension 1 (= target) that have the same sample number denoted in dimension 3 (= samples)? I tried out different approaches using unique(), duplicated() and match() but without coming to a result. I just cannot wrap my head about the indexing of arrays -.-
Cheers,
zuup

Form a logical index with a logical test (across dimensions):
> raw.table[,,1] == raw.table[,,3]
1 2 3 4
A TRUE FALSE FALSE FALSE
B FALSE FALSE FALSE FALSE
C FALSE FALSE FALSE FALSE
And use it to select items from the first dimension (and since they will be equal length there is no recycling):
> raw.table[, , 1 ][ raw.table[,,1] == raw.table[,,3] ]
[1] 1
Chaining calls to the Extract-operator is perfectly acceptable in R

Related

How to find minimum value of a column imported from Excel using MATLAB

I have a set of values in the following pattern.
A B C D
1 5 6 11
2 6 5 21
3 7 3 42
4 3 7 22
1 2 3 54
2 3 2 43
3 4 3 27
4 3 2 14
I exported the every column into MATLAB workspace as follows.
A = xlsread('F:\R.xlsx','Complete Data','A2:A43');
B = xlsread('F:\R.xlsx','Complete Data','B2:B43');
C = xlsread('F:\R.xlsx','Complete Data','C2:C43');
D = xlsread('F:\R.xlsx','Complete Data','D2:D43');
I need help with code where the it has to check the Column A, find the lowest D value and output the corresponding B and C values. I need the output to look like.
1 5 6 11
2 6 5 21
3 4 3 27
4 3 2 14
I read through related questions and understand that I need to make it a matrix and sort it based on the element on the 4th column using
sortrows
and get indices of the sorted elements. But I am stuck here. Please Guide me.
You can export those columns in one go as:
ABCD = xlsread('F:\R.xlsx','Complete Data','A2:D43');
Now use sortrows to sort the rows according to the first and the fourth column.
req = sortrows(ABCD, [1 4]);
☆ If all elements of the first column exist twice then:
req = req(1:2:end,:);
☆ If it is not necessary that all elements of the first column will exist twice then:
[~, ind] = unique(req(:,1));
req = req(ind,:);

xor after applying filters on an array

We have an original array and a list of filters where each filter consists of indices which are allowed through the filter. The filters are rather nice, e.g. they are grouped for each power of 2 in the following way (the filters are upto n = 20).
1 (2^0) = 1 3 5 7 9 11 13 15 17 19
2 (2^1) = 1 2 5 6 9 10 13 14 17 18
4 (2^2) = 1 2 3 4 9 10 11 12 17 18 19 20
8 (2^3) = 1 2 3 4 5 6 7 8 17 18 19 20
I hope you get the idea. Now we would apply some or all of these filters (user dictates which filters to apply) to the original array and the xor of the elements of the transformed array is the answer. To take an example if the original array would have been [3 7 8 1 2 9 6 4 11] e.g. n = 9 and we needed to apply the filters of 4, 2 and 1, the transformations would be like this.
After applying filter of 4 - [3 7 8 1 x x x x 11]
After applying filter of 2 - [3 7 x x x x x x 11]
After applying filter of 1 - [3 x x x x x x x 11]
Now the xor of 3 and 11 e.g. 8 is the answer. I can solve this O(n * no. of filters) time, but I need a better solution which might give the answer in O(no of filters) time. Is there any way to take advantage of the properties of xor and/or pre-compute the results for some and then give the answer for the filters. This is because there are many queries with filters, so I need to answer the queries in O(no of filters) time. Any kind of help will be appreciated.
It can be done in O(M) where M is the number of items that pass all filters (independent of the number of filters) by iterating over the array in a particular way, generating only the indexes that pass all the filters.
This is easier to see if you write down the examples starting at zero:
1: 0 2 4 6 8 10 12 14 16 18 (numbers that don't contain 1)
2: 0 1 4 5 8 9 12 13 16 17 (numbers that don't contain 2, etc)
4: 0 1 2 3 8 9 10 11 16 17 18 19
8: 0 1 2 3 4 5 6 7 16 17 18 19
The filters are really just a constraint on the bits of the indexes in the array. That constraint is of the form index & filters = 0, where filters is just the sum of all the individual filters (eg 1 + 2 + 4 = 7). Given a valid index i the next valid index i' can be computed with only primitive operations: i' = (i | filters) + 1 & ~filters. The idea here is to set the bits that are filtered to zero so the +1 will carry through them, then filtered bits are cleared again to make the index valid. The total effect is that the unfiltered bits are incremented and the filtered bits stay zero.
This gives a simple algorithm to iterate directly over all valid indexes. Start at 0 (which is always valid) and increment using the rule above until the end of the array is reached:
for (int i = 0; i < N; i = (i | filters) + 1 & ~filters)
// do something with array[i], like XOR them all together

Julia: Sort the columns of a matrix by the values in another vector (in place...)?

I am interested in sorting the columns of a matrix in terms of the values in 2 other vectors. As an example, suppose the matrix and vectors look like this:
M = [ 1 2 3 4 5 6 ;
7 8 9 10 11 12 ;
13 14 15 16 17 18 ]
v1 = [ 2 , 6 , 6 , 1 , 3 , 2 ]
v2 = [ 3 , 1 , 2 , 7 , 9 , 1 ]
I want to sort the columns of A in terms of their corresponding values in v1 and v2, with v1 taking precedence over v2. Additionally, I am interested in trying to sort the matrix in place as the matrices I am working with are very large. Currently, my crude solution looks like this:
MM = [ v1' ; v2' ; M ] ; ## concatenate the vectors with the matrix
MM[:,:] = sortcols(MM , by=x->(x[1],x[2]))
M[:,:] = MM[3:end,:]
which gives the desired result:
3x6 Array{Int64,2}:
4 6 1 5 2 3
10 12 7 11 8 9
16 18 13 17 14 15
Clearly my approach is not ideal is it requires computing and storing intermediate matrices. Is there a more efficient/elegant approach for sorting the columns of a matrix in terms of 2 other vectors? And can it be done in place to save memory?
Previously I have used sortperm for sorting an array in terms of the values stored in another vector. Is it possible to use sortperm with 2 vectors (and in-place)?
I would probably do it this way:
julia> cols = sort!([1:size(M,2);], by=i->(v1[i],v2[i]));
julia> M[:,cols]
3×6 Array{Int64,2}:
4 6 1 5 2 3
10 12 7 11 8 9
16 18 13 17 14 15
This should be pretty fast and uses only one temporary vector and one copy of the matrix. It's not fully in-place, but doing this operation completely in-place is not easy. You would need a sorting function that moves columns as it works, or alternatively a version of permute! that works on columns. You could start with the code for permute!! in combinatorics.jl and modify it to permute columns, reusing a single column-size temporary buffer.

How to logically index entire columns in MATLAB

Given a logical column vector (size n x 1) v and an array a (size m x n) how do I generate a new array consisting of all the columns in a where the numerical index of said column (1...n) is 1 at the corresponding location in v.
So for example if v was
1
0
0
1
and a was
1 4 7 10
2 5 8 11
3 6 9 12
the new array would be
1 10
2 11
3 12
because the first and fourth elements of v are 1 (true), so the new array should contain the first and fourth columns of a.
I have tried a bunch of things involving normal logical indexing and transpose but I can't get it to work. All help is appreciated
You want to use the logical indexing to select the columns and select all rows. In the example below, I have explicitly cast v as a logical just in case it's not a logical matrix already.
new = a(:, logical(v))
1 10
2 11
3 12

VTK Structured Point file

I am trying to parse a VTK file in C by extracting its point data and storing each point in a 3D array. However, the file I am working with has 9 shorts per point and I am having difficulty understanding what each number means.
I believe I understand most of the header information (please correct me if I have misunderstood):
ASCII: Type of file (ASCII or Binary)
DATASET: Type of dataset
DIMENSIONS: dims of voxels (x,y,z)
SPACING: Volume of each voxel (w,h,d)
ORIGIN: Unsure
POINT DATA: Total number of points/voxels (dimx.dimy.dimz)
I have looked at the documentation and I am still not getting an understanding on how to interpret the data. Could someone please help me understand or point me to some helpful resources
# vtk DataFile Version 3.0
vtk output
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 256 256 130
SPACING 1 1 1.3
ORIGIN 86.6449 -133.929 116.786
POINT_DATA 8519680
SCALARS scalars short
LOOKUP_TABLE default
0 0 0 0 0 0 0 0 0
0 0 7 2 4 5 3 3 4
4 5 5 1 7 7 1 1 2
1 6 4 3 3 1 0 4 2
2 3 2 4 2 2 0 2 6
...
thanks.
You are correct regarding the meaning of fields in the header.
ORIGIN corresponds to the coordinates of the 0-0-0 corner of the grid.
An example of a DATASET STRUCTURED_POINTS can be found in the documentation.
Starting from this, here is a small file with 6 shorts per point. Each line represents a point.
# vtk DataFile Version 2.0
Volume example
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 3 4 2
ASPECT_RATIO 1 1 1
ORIGIN 0 0 0
POINT_DATA 24
SCALARS volume_scalars char 6
LOOKUP_TABLE default
0 1 2 3 4 5
1 1 2 3 4 5
2 1 2 3 4 5
0 2 2 3 4 5
1 2 2 3 4 5
2 2 2 3 4 5
0 3 2 8 9 10
1 3 2 8 9 10
2 3 2 8 9 10
0 4 2 8 9 10
1 4 2 8 9 10
2 4 2 8 9 10
0 1 3 18 19 20
1 1 3 18 19 20
2 1 3 18 19 20
0 2 3 18 19 20
1 2 3 18 19 20
2 2 3 18 19 20
0 3 3 24 25 26
1 3 3 24 25 26
2 3 3 24 25 26
0 4 3 24 25 26
1 4 3 24 25 26
2 4 3 24 25 26
The 3 first fields may be displayed to understand the data layout : x change faster than y, which change faster than z in file.
If you wish to store the data in an array a[2][4][3][6], just read while doing a loop :
for(k=0;k<2;k++){ //z loop
for(j=0;j<4;j++){ //y loop : y change faster than z
for(i=0;i<3;i++){ //x loop : x change faster than y
for(l=0;l<6;l++){
fscanf(file,"%d",&a[k][j][i][l]);
}
}
}
}
To read the header, fscanf() may be used as well :
int sizex,sizey,sizez;
char headerpart[100];
fscanf(file,"%s",headerpart);
if(strcmp(headerpart,"DIMENSIONS")==0){
fscanf(file,"%d%d%d",&sizex,&sizey,&sizez);
}
Note than fscanf() need the pointer to the data (&sizex, not sizex). A string being a pointer to an array of char terminated by \0, "%s",headerpart works fine. It can be replaced by "%s",&headerpart[0]. The function strcmp() compares two strings, and return 0 if strings are identical.
As your grid seems large, smaller files can be obtained using the BINARY kind instead of ASCII, but watch for endianess as specified here.

Resources