Replace the values in a vector with the keys from a map in MATLAB? - arrays

I have a vector of nondecreasing data. Here is a sample:
1
1
1
2
2
2
2
2
2
2
2
3
3
4
4
6
Clearly there are duplicates and missing numbers. I can remove the duplicates using unique, so my unique values are:
uniqueVals = unique(sortedData);
So far, so good. Now, I want to change the data so that the values in sortedData are replaced with their index number in uniqueVals. For instance, uniqueVals first 5 elements would be 1,2,3,4,6, with indices 1,2,3,4,5. I want to change sortedData so that 1 maps to 1, 2 maps to 2, 3 to 3, 4 to 4, 6 to 5 and so on.
I know I can create a "map" object, but that seems to just be used to map uniqueVals to its index. How do I apply that mapping so that the entries in sortedData are changed?
I have no need for this to be a particularly fast operation. sortedData contains only a few hundred thousand rows and it only needs to be done once.

You can use the third output from unique
[uniqueVals,~,yourOutput] = unique(sortedData);
yourOutput =
1
1
1
2
2
2
2
2
2
2
2
3
3
4
4
5

You can also use g = findgroups(sortedData);, which will give you the group index, where there is one group per unique value. The 2nd output of this tells you the value itself
[g, gValue] = findgroups( sortedData );

Related

How to compare and add elements of an array

I'd like to take a single array lets say 3x5 as follows
1 3 5
1 3 5
2 8 6
4 5 7
4 5 8
and write a code that will create a new array that adds the third column together with the previous number if the numbers in the first and second columns equal the numbers in the row below it.
since the first two values in row 1 and 2, then add the third elements in row 1 and 2 together
so the output from the array above should look like this
1 3 10
2 8 6
4 5 15
The function accumarray(subs,val) accumulate elements of vector val using the subscripts subs. So we can use this function to sum the elements in the third column having the same value in the first and second column. We can use unique(...,'rows') to determine which pairs of value are unique.
%Example data
A = [1 3 5,
1 3 5,
2 3 6,
4 5 7,
4 5 7]
%Get the unique pair of value based on the first two column (uni) and the associated index.
[uni,~,sub] = unique(A(:,1:2),'rows');
%Create the result using accumarray.
R = [uni, accumarray(sub,A(:,3))]
If the orders matters the script would be a little bit more complex:
%Get the unique pair of value based on the first two column (uni) and the associated index.
[uni,~,sub] = unique(A(:,1:2),'rows');
%Locate the consecutive similar row with this small tricks
dsub = diff([0;sub])~=0;
%Create the adjusted index
subo = cumsum(dsub);
%Create the new array
R = [uni(sub(dsub),:), accumarray(subo,A(:,3))]
Or you can get an identical result with a for loop:
R = A(1,:)
for ii = 2:length(A)
if all(A(ii-1,1:2)==A(ii,1:2))
R(end,3) = R(end,3)+A(ii,3)
else
R(end+1,:) = A(ii,:)
end
end
Benchmark:
With an array A of size 100000x3 on the mathworks live editor:
The for loop take about 5.5s (no pre-allocation, so it's pretty slow)
The vectorized method take about 0.012s

Intersection with two columns in each matrix

I would like to find the intersection of two columns in two matrix (see example below). So to find the position where A and B intersect -- in this case in position 3 and 5.
My solution so far, was to combine the two columns to one column and use intersect function on one column afterwards with a string. Is there a more elegant solution?
A = [1,1;1,3;1,4;2,1;2,5;3,1]
A =
1 1
1 3
1 4
2 1
2 5
3 1
B = [2,5;1,4]
B =
2 5
1 4
You can avoid combining the columns. When using intersect you can use the rows option.
A = [1,1;1,3;1,4;2,1;2,5;3,1]
B = [2,5;1,4]
[C,ia,ib] = intersect(B,A,'rows');
>>ib
3
5
Additionally, if you do not want the intersection result to be ordered you can use the stable option.
[C,ia,ib] = intersect(B,A,'rows','stable');
>>ib
5
3

MATLAB - extract array values based on conditions

I have 4x4 matrix A
[1 2 3 4;
2 2 2 3;
5 5 5 5;
4 4 4 4]
I know how to locate all values less than 4. A<4. But I'm not sure how to write an 'if' statement for; three or more values, all which are less than 4, contained in the same row. For instance; see above A(1,:) and A(2,:) satisfies my conditions.
You can basically do A<4 to know which ones are smaller. If you want to know which rows contain N values smaller than 4 then you can do
rows=find(sum(A<4,2)>=3)
This basically does:
find smaller than 4
Count how many of them in each row (sum(_,2))
find if they are 3 or more
give the row index of those find()

Count based on column and row

I can't seem to find something quite like this problem...
I have an array table where each row contains a random assortment of numbers 1-N
On another sheet, I have a table with column and row headers numbered 1-N
I want to count how many rows in the array contain both the column and row headers for a given cell in the table. Since countifs only reference the current cell in the specified array, they don't seem to be working in this scenario.
Example array:
A B C D
1 3 5 7
1 2 3 4
2 3 4 5
2 4 6 8
...
Table results (symmetrical about the diagonal):
A B C D E F
. 1 2 3 4 5 ...
1 - 1 2 1 1
2 1 - 2 2 1
3 2 2 - 2 2
4 1 2 2 - 1
5 1 1 2 1 -
Would using nested countifs work?
I don't agree with your results corresponding to 4/2, which surely should be 3, not 2, but this formula, based on the array table being in Sheet1 A1:D4 and the results table being in Sheet2 A1:F6, placed in cell B2 of the latter, should work:
=IF($A2=B$1,"-",SUMPRODUCT(N(MMULT(N(COUNTIF(OFFSET(Sheet1!$A$1:$D$1,ROW(Sheet1!$A$1:$D$4)-MIN(ROW(Sheet1!$A$1:$D$4)),),CHOOSE({1,2},B$1,$A2))>0),{1;1})=2)))
Copy across and down as required.
Note: If your actual table is in fact much larger than that given, it will probably be worth adding a simple clause into the above to the effect that the results for approximately half of the cells are obtained from their symmetrical counterparts, rather than via calculation of this construction, thus saving resource.
Regards

Calling multiple values from data frame by row and column in R

I'm working in R and I'd like to call a selection of values from a data frame by their column and row indices. However doing this yields a matrix rather than an array. I shall demonstrate:
Given the data.frame:
a = data.frame( a = array(c(1,2,3,4,5,6,7,8,9), c(3,3)) )
(for those of you who don't want to plug it in, it looks like this)
a.1 a.2 a.3
1 1 4 7
2 2 5 8
3 3 6 9
And lets say I have two arrays pointing to the values I'd like to grab
grab_row = c(3,1,2)
grab_col = c(1,2,1)
Now I'd expect this to be the code I want...
a[ grab_row, grab_col ]
To get these results...
[1] 3 4 2
But that comes out as a 3x3 matrix, which makes enough sense in and of itself
a.1 a.2 a.1.1
3 3 6 3
1 1 4 1
2 2 5 2
Alright, I also see my answer is in the diagonal of the 3x3 matrix... but I'd really rather stick to an array as the output.
Any thoughts? Danka.
Passing the row and column indices in as a two-column matrix (here constructed using cbind()) will get you the elements you were expecting:
a[cbind(grab_row, grab_col)]
[1] 3 4 2
This form of indexing is documented in ?"[":
Matrices and array:
[...snip...]
A third form of indexing is via a numeric matrix with the one
column for each dimension: each row of the index matrix then
selects a single element of the array, and the result is a vector.
Try this:
> mapply(function(i,j)a[i,j], grab_row, grab_col)
[1] 3 4 2
Works for both dataframes and matrices.

Resources