join columns of arrays in matlab - arrays

I have the following inputs
dataset 1 with tens of thousands of rows and 5 array columns
dataset 2 with tens of thousands of rows and 3 array columns
I want to join/merge (add) the 3th column of dataset 1 to a new 4th array column of dataset 2 for the elements for which the ID is the same (same value in column 1 of dataset 1 and column 1 of dataset 2). Mathematically you can write it like this I think:
dataset2(i,4)=dataset1(find(dataset1(:,1)==c(i,1)),3);
but how to put it in MATLAB?
None of the methods mentioned in the MATLAB help function or elsewhere on the internet seem to work. I have already tried merge, join, ismember, vectors, but I can't solve the problem.
Does someone have any ideas? I know the problem can be solved with for loops, but i'm not allowed to use them, so I am searching for alternatives.

I believe this is what you want
%We keep the index of all the matching rows
%NOTICE: I changed c(i,1) to dataset2(:,1)
%matches_in_col_1 = find(dataset1(:,1)==dataset2(:,1));
%EDIT: HOW TO COMPARE MORE THAN 2 COLUMNS
%if you want to find matches in 4 datasets just use this
matches_in_col_1 = find(dataset1(:,1)==dataset2(:,1)==dataset3(:,1)==dataset4(:,1));
%now copy the values from those rows into the corresponding row
%of datsaset2
dataset2(matches_in_col_1,4) = dataset1(matches_in_col_1,3);
I'm not 100% sure. Why is i present? were you trying a loop implementation? My solution also assumes that c was supposed to be dataset2

Related

Performing operations on items within an array - VBA

Sorry for this question, but I haven’t found the answer in any of the texts or sites I’ve been researching. I am trying to do something that seems like it should be easy, but I don’t understand enough about arrays to pull it off. I am trying to create an array that is some number of rows; let’s say 10 rows, by 3 columns, or Myarr(1 to 10 , 1 to 3) – and then populate it as follows in memory before pasting it back into an excel sheet. Here’s an example using very simple constants and functions, not the ones I really need to run.
The reason is that I've found that running my particular construct as set of Excel formulas and VBA custom functions is very slow and results in a recalculation problem that I have written about in this forum that is not yet solved, so I am trying a work-around that performs all operations in an array, and then just pastes the result back to Excel.
Column 1 is just the list of numbers 1 to 10
Column 2 is the value of the previous row of Column 2 plus a constant; “Constant”; this is the part I really am puzzled by
Column 3 is just a function of the value of this row of Column 2
For example:
Constant = 2
Function of Column 2 value is simply Column 2 value x 4
So the output should be
Value col 1, previous value col 2 + Constant, column 2 x 4 as follows:
1,2,8
2,4,16
3,6,24
4,8,32
5,10,40
6,12,48
7,14,56
8,16,64
9,18,72
10,20,80
I just cant find any instructions about how to refer backwards to previous row values in an array and use them to produce a new value for that same column,
The simplest example would be a 1 dimensional array making a list of numbers where you started with a number and each successive row was the previous value + 1.
I realize this is probably basic stuff, but I must be searching on the wrong term to find an answer so I turn to you. Thank you very much for your help.
Did you try something like
Myarr(i,2)=Myarr(i-1,2)+const

Get Row and Column Totals from 2d Array in Excel

I wanted to know how to get row and column totals from a 2D array in Excel. This is a fairly common thing to do but I couldn't find an answer to it by searching on row and column totals so I thought it would be worth posting it as a question.
Supposing I wanted to find the lowest column total and highest row total in the following array which is in cells A1:D3:-
1 2 3 4
5 6 7 8
9 10 11 12
my initial thoughts were along the lines of
=min(A1:D3*(column(A1:D3)={1,2,3,4}))
but this kind of simple approach doesn't work. I remembered reading that you had to use mmult in some of these situations and have seen advanced formulae using them but couldn't quite remember how. I shall try and answer my own question but other suggestions are more than welcome.
You can do it with MMULT as you mentioned. The following should work with your setup:
Smallest column
=MIN(MMULT({1,1,1},A1:D3))
Largest row:
=MAX(MMULT(A1:D3,{1;1;1;1}))
Note how many 1s in the array - for the rows calc you need a 1 for each column (i.e. 3) and vica versa for columns. Also note the order of the arrays - it won't work the other way around
Yes you have to mmult to deliver either a column array or row array containing the required totals, then use can use MAX, MIN or any other aggregate function to get the value you require.
Column totals
=MIN(MMULT(TRANSPOSE(ROW(A1:D3))^0,A1:D3))
Row Totals
=MAX(MMULT(A1:D3,TRANSPOSE(COLUMN(A1:D3))^0))
So the idea is that you create a single-row array {1,1,1} and multiply it by the 2D array to end up with an array {15,18,21,24} and take the minimum value from it.
Or create a single-column array {1;1;1;1} and multiply the original array by it to end up with an array {10;26;42} from which you get the maximum value.
Remember that mmult works like the matrix multiplication you might have learned at college where for each cell it works across the cells in the corresponding row of the first array and down the cells of the corresponding column in the second array multiplying each pair and adding them to the total. So the number of columns in the first array must always equal the number of rows in the second array.
These are, as #Scott Craner reminds me, array formulae that have to be entered with
Ctrl Shift Enter

speed up array formula

I have the following formula which I will explain below:
{=SUM(IF(($G$1:$L$1=$O$1)*($G$2:$L$2=$O$2)*($G$3:$L$3=$O$3)*($G$4:$L$4=$O$4)*($G$5:$L$5=$O$5)*($G$6:$L$6=$O$6)*($G$7:$L$7=$O$7);G21:L21))}
Here is what the worksheet looks like:
Under columns G - L we have a 'database' of all data. These columns will be added cumulatively each quarter (approx 30 columns a quarter). So after a few years we have ended up with a bunch of database columns (1000 + columns of raw data). For the sake of this demo, I have only included those 6 columns.
As you can see, each column contains specific parameters, between rows 1 - 7, which allows to identify specific CountryCode + Project Code + Category + Fiscal Year, + ... (etc.). This allows us to track down a unique specific project and retrieve its data.
What we have afterwards on the column O is a specific project we are trying to retrieve values for (you can see that the rows 1 - 7 are the same as under column G (we are trying to retrieve values for this particular project).
Here comes our formula. I have attached above. Here is what it looks like when I press F2. As you can see the IF statement is first simply checking whether the particular columns match the pre-defined criteria under column O and it sums all the columns that match all the criteria between rows 1-7.
Now here is the problem. We have a worksheet, which contains 20 projects (such as column O) and we are using this array formula there to retrieve values. The problem is that retrieving data using this way takes A LOT OF TIME. We have also adopted a principle via VBA that we iterate through all the cells, then we insert a formula, calculate array cell, and then we copy & Paste resulting value inside (so that we won't end up with full sheet of array formulas). However it still takes LONG to calculate (1 minute or so).
I was wondering, if there is a better solution how to retrieve the data in the already mentioned format (that means we have a specific criteria we are trying to find)? Maybe SUMIFS could be better? Or sumproduct? Or even compeltely different solution?
I am open to any proposal which would fasten the process.
i met similar problem about 2 weeks ago. At first i use a helper column/row. The helper column is to concatenate the 7 string in each column. then only use the IF function to check if the joined text match. Such as, assuming the helper row is row 8 per your sample, cell G8 formula would be
=CONCATENATE(G1,"|",G2,"|",G3,"|",G4,"|",G5,"|",G6,"|",G7)
and do the same for the rest including column O
=CONCATENATE(O1,"|",O2,"|",O3,"|",O4,"|",O5,"|",O6,"|",O7)
Then do a HLOOKUP
=HLOOKUP(O8,G8:L21,14,0)
In my case, the calculation time reduce from 10 min to a few seconds!
Alternatively I also found a way to do without helper column, using array again, but the idea is pretty much the same,
the formula in O21 as per your sample would be
=SUM(IF(CONCATENATE(G1:L1,G2:L2,G3:L3,G4:L4,G5:L5,G6:L6,G7:L7)=CONCATENATE(O1,O2,O3,O4,O5,O6,O7),G21:L21))
(i didn't add in the "|" delimiter for this formula, but it is better to do so)
But in the end I prefer the helper column method.
For your reference
HTH
To improve performance avoid reapeating same calculations multiple times.
This allows us to track down a unique specific project and retrieve its data.
If a combination of 7 values is unique, calculate the position of chosen project only once in helper cell (for example O15) with array formula (confirmed with Ctrl+Shift+Enter:
=MATCH(1;(G1:L1=O1)*(G2:L2=O2)*(G3:L3=O3)*(G4:L4=O4)*(G5:L5=O5)*(G6:L6=O6)*(G7:L7=O7);0)
Use the following formula in O21 and drag down:
=INDEX(G21:L21;1;$O$15)

Find a MAX and SUM a variable number of columns in Excel without Macros

I have a table in Excel:
A B C D
Day1 Day2 Day3
1 Ron 3 2 2
2 Don 4 2 1
3 Ton 1 5 2
On a different worksheet, I need to produce a table of the type (without using macros):
A B C D
Ron Don Ton
1 Ron - - -
2 Don 8 - -
3 Ton 10 11 -
Where the value is the MAX of each pairs value for each days, summed across all days. So if it was just for 3 names and days, I would just use the formula below with a VLOOKUP to see what the value for each name is on a particular day and copy/paste it for each day. (Side note, actually my project is not that big, so by the time I posted this question, I could have been done with it, but I really want to learn how to do this in a more intelligent way).
=SUM(MAX( VLOOKUP(Table2!$A2,Table1!$A:$D,2,FALSE), VLOOKUP(Table2!B$1,Table1!$A:$D,2,FALSE)), and so on Day2, Day3...
I tried the following:
{=SUM(MAX(VLOOKUP(Table2!$A2,Table1!$A:$D,{2,3,4},FALSE),VLOOKUP(Table2!D$1,Table1!$A:$D,{2,3,4},FALSE)))}
However, apparently the VLOOKUP can't return an array (and INDEX MATCH can't either).
Any help would be much appreciated.
L42 is right - VLOOKUP has trouble returning an array when the lookup value is a range or array but with an array in place of column index number it returns an array no problem
To see that just put a single VLOOKUP in a cell, e.g.
=VLOOKUP($A2,Table1!$A:$D,{2,3,4},0)
then select the cell, press F2 to select the formula and F9 to see the result and you get a result like ={3,2,2}
The problem with your formulas is that MAX just takes the largest value of the six values returned by the two VLOOKUPs, it doesn't compare the two arrays
To do that in a single formula you can use a formula of the following form:
=SUM(IF(array1>array2,array1,array2))
that will compare the 2 arrays and sum the largest value in each position, for your setup that would be a formula like this in Table2 B2
=IF($A2=B$1,"-",SUM(IF(VLOOKUP($A2,Table1!$A:$D,{2,3,4},0)>VLOOKUP(B$1,Table1!$A:$D,{2,3,4},0),VLOOKUP($A2,Table1!$A:$D,{2,3,4},0),VLOOKUP(B$1,Table1!$A:$D,{2,3,4},0))))
confirm with CTRL+SHIFT+ENTER and copy across and down
If you have a larger number of columns then to avoid using {2,3,4,5,6,7,8,9,10......etc.} you can use INDEX/MATCH instead of VLOOKUP like this
=INDEX(Table1!$B:$D,MATCH($A2,Table1!$A:$A,0),0)
Note: INDEX here is returning a range (not an array). The MATCH function determines the row number and a zero as the column argument means you get the whole row returned

turn a matrix into a sorted list google docs/spreadsheets

I've created a large-ish matrix by doing a =pearson( analysis on survey responses in google docs/spreadsheets and would like to convert it into a sorted list.
The matrix has labels (the survey questions) in row 2 and column b. Each intersecting cell has the value. Here's what the formula looks like.
=pearson(FILTER( Pc!$C$2:$AW$999 ; Pc!$C$2:$AW$2= C$2 ),FILTER(Pc!$C$2:$AW$999 ;Pc!$C$2:$AW$2=$B3))
This is what I'd like to get to:
a b c
Question one question 2 correlation
Then sorting by column c is easy.
How can I get all the points out of the matrix/array, along with the labels in this way?
Ideally I'd be able to do this only to points below the diagonal as there of course are dupes above..
Thanks!
I think I found a solution to placing the combination of the headers in a single column.
It involves a series of auxiliary columns, but works.
Let's say we have a single column with all unique headers on column A. I'll assume it's 6 values. So, on cell B1 we paste:
=ArrayFormula(join(";";A1&","&A2:A$6))
And then copy it down to B5. On C1 we join it all and split making a single column:
=transpose(split(join(";";B1:B5);";"))
If needed, we can split the combination in two columns again on D1
=ArrayFormula(split(C1:C15;","))
I don't know why, but the value on E1 does not work correctly, so I just pasted =A2
With these columns you can easily do your nice Pearson-Filter trick again to have it all in a single column. Hope this helps :)
Maybe something like this will help:
=ArrayFormula(transpose(split(CONCATENATE(transpose(C2:AW999)&char(9)), char(9))))
(C2:AW999 is your data range)

Resources