Finding Most Common Word In A Tally/Ledger System - arrays

I currently use the following array formula to find the most common word or number in a range, ignoring any blank cells:
{=(INDEX(D1:D10,MODE(IF((D1:D10<>"")*ISNA(MATCH(D1:D10,$A$1:$A1,0)),MATCH(D1:D10,D1:D10,0)))))}
I am now looking to do something slightly different. I still want to find the most common word or number in a range, however I now have 2 lists: the first is a list of 'positive' words/numbers and the second is a list of 'negative' words/numbers.
To illustrate using an example: the colour green appears in the 'positive' list 4 times and the colour blue appears twice in the 'positive list', but green appears 3 times in the 'negative' list and blue does not appear at all in the 'negative' list. Using the above formula on the first list would return green as the most common word. However I now want it to take into account that green is not the most common word given the combined lists (i.e. 4 positives - 3 negatives = 1 green, and 2 positives - 0 negatives = 2 blue).
In the below image, using the formula under each list shows green to be the most common word. I would like to combine these lists and cancel out any instances where the colour appears on both instances - so 3 of the greens on the positive list would be cancelled out with the 3 greens on the negative list, leaving only one left.
In essence, I suppose I am trying to create a tally or ledger of some kind where rather than numbers that add or subtract I have words whose frequency is added or subtracted.
Thanks for the help, and apologies if I haven't been too clear in the task!

This should work:
=IF(SUMPRODUCT((MMULT(COUNTIF(OFFSET(B2:B11,,{0,1}),B2:B11),{1;-1})=MAX(MMULT(COUNTIF(OFFSET(B2:B11,,{0,1}),B2:B11),{1;-1})))/COUNTIF(B2:B11,B2:B11&""))>1,"No Favourite",INDEX(B2:B11,MATCH(MAX(MMULT(COUNTIF(OFFSET(B2:B11,,{0,1}),B2:B11),{1;-1})),MMULT(COUNTIF(OFFSET(B2:B11,,{0,1}),B2:B11),{1;-1}),0)))
And for non-contiguous, dynamically-defined ranges, assumed to be stored as Defined Names Positive and Negative, array formula**:
=IF(SUM((COUNTIF(Positive,Positive)-COUNTIF(Negative,Positive)=MAX(COUNTIF(Positive,Positive)-COUNTIF(Negative,Positive)))/COUNTIF(Positive,Positive&""))>1,"No Favourite",INDEX(Positive,MATCH(MAX(COUNTIF(Positive,Positive)-COUNTIF(Negative,Positive)),COUNTIF(Positive,Positive)-COUNTIF(Negative,Positive),0)))
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).

First list your candidates in column D starting at D2
Then in E2 enter:
=COUNTIF(B$2:B$12,D2)-COUNTIF(C$2:C$12,D2)
and copy down.
Finally in F2 enter:
=INDEX(D:D,MATCH(MAX(E:E),E:E,0))
With your data:

Related

Count repeating occurrences in existing table

I have some 15 items which I wish to count how many times it repeats over a series of time. This is highlighted by 1 in that time period. As per the attached pic, the blue area is the data and the yellow is the result that I wish to obtain.
E.g. In occurrence 5, this repeats current record - 1 three times, current record - 2 two times etc. I also do not want to count the current wekks item twice as per the Observation item 14. The results in the yellow have been calculated manually, and the only alternative is to do individual IF statements based upon my knowledge...
EXAMPLE
Is there a formula to do this automatically? I have tried to work with arrays and matching items to check for repeats but cannot work it out.
Update for Clarification: In the Current Occurrence 5 - Obs 1, 6 and 15 came up in Occurrence 4 (Repeats Current -1 or Occurrence 4 - three times), Obs 5 and 8 came up in Occurrence 3 (Repeats Current -2 or Occurrence 4 - two times). (Occurrences are just points in time. Observations might be people having been observed doing something - the number of times is irrelevant, it is just a true or false outcome)
Looking at Occurrence 4, Obs 6 and 9 (Repeats Current -3) or Occurrence 1 two times (my bad I only specified 1 times which might have caused confusion).
I would like the formula to stop searching for prior occurrences once the next recent one has been found eg Obs 14 in Occurrence 3 should stop looking once it finds a repeat in Occurrence 2.
To expand on Edward's answer I used the following formula as intermediate formula:
=IF(AND(ISNUMBER(C7);C7=C8;C8=C9;C9=C10);"-3";IF(AND(ISNUMBER(C7);C7=C8;C8=C9);"-2";IF(AND(ISNUMBER(C7);C7=C8);"-1";"")))
Then use Countif(R3:AF3;"-3") to count the number of -3.
By starting the If constraints at -3 and working your way back, the formula stops automatically when the last match is found.
You can use intermediate calculated cells, which you can hide later if you wish.
Insert 15 columns (1 for each observation) between observations and repeats.
In the first inserted cell create a formula for an individual repeated observation:
=AND(ISNUMBER(B3),B3=B4)
i.e. To be counted as a repeat, the observation must be a number AND the same as the cell immediately below it.
Duplicate this formula horizontally so it calculates across for all 15 observations.
Then create a formula to the right to count the matching observations
=COUNTIF(R3:AF3,TRUE)
Now duplicate the formulas down for all the occurrences.
For current-2 you can reference the cell directly below and to the left, and duplicate this across and down.
For clarity, I have left out your last requiremnet, to stop searching once a match is found. To do this you will need to expand the intermediate formula to take this into account.

Cell array matches common values excludes blank

I have a list of name that are on one sheet and need to find the most common repeat name of that list. The list spans for the entire month. Example:
10/1 James
10/2 Bill
10/3 Fred
10/4 Hank
etc...
On another sheet I have this in-cell array that finds the most common, BUT if the list has blanks, it returns an error. Only when the list is full does it then give you an answer.
{=INDEX('Sept 18'!B$2:B$151,MODE(MATCH('Sept 18'!B$2:B$151,'Sept 18'!B$2:B$151,0)+{0,0})),"")}
Is there a way to make it always show a name and exclude the blanks as it goes?
If there is data in B2 through B11 that may include blanks, then use the array formula:
=INDEX(B2:B21,MODE(IF(B2:B21<>"",MATCH(B2:B21,B2:B21,0))))
Array formulas must be entered with Ctrl + Shift + Enter rather than just the Enter key. If this is done correctly, the formula will appear with curly braces around it in the Formula Bar.

Excel Array: Reducing the Number of Calculation Steps

As per my on-going journey through the world of Excel arrays, I was wondering if someone might be able to give me a pointer or two.
On the excel sheet attached, I currently have a four-step process to get from a segregated lookup to a gapless list:
Step 1 (yellow): For the 50-word long list in sheet 'Data', a 50-cell lookup is performed to see whether the input in row 1 (red) appears somewhere in the corresponding cell. In this case, the lookup is performed three times for three different inputs, i.e. in columns C-E.
Step 2 (orange): An array then relists the contents of the 50-cell lookup above it but removes all empty cells (i.e. where there is no match to the input in row 1)
Step 3 (green): The results from step 2 are listed out in a single column.
Step 4 (blue): The results from step 3 are listed out using the same technique as in step 2 in order to remove the blank cells.
Collectively, this enables a gapless listing of all data objects which contain the given inputs somewhere in their string.
However, my real list of data objects is 5000 entries long and I would like to look up the results for 100 or more inputs. As step 1 requires each combination to be looked up separately, this requires 500,000 calculations for step 1 alone, which causes a heavy toll on the processors.
Therefore, I was wondering if anyone had an idea as to how I could shortcut this process to reduce the number of cells / calculations involved. I assume that step 1 and 2 could somehow be merged, but my knowledge of arrays is not sufficient to think of how this could be done.
It would be brilliant to hear from somebody who may have some advice on the matter!
Kind regards,
Rob
File Link: https://drive.google.com/open?id=10O91QDD78RkbWtQx2iWfax17Dt5TPw1G
Since you're not removing duplicated entries from the final list, this is quite straightforward.
Based on the workbook you provided, to be entered within the Lookup sheet:
In cell A1:
=SUMPRODUCT(0+ISNUMBER(FIND(C1:E1,Data!A1:A50)))
In any cell of your choice, to begin the list of returns, array formula**:
=IF(ROWS($1:1)>A$1,"",INDIRECT("'Data'!"&TEXT(SMALL(IF(ISNUMBER(FIND(C$1:E$1,Data!A$1:A$50)),10^5*ROW(Data!A$1:A$50)+COLUMN(Data!A$1:A$50)),ROWS($1:1)),"R0C00000"),0))
and copied down until you start to get blanks for the results.
Notes:
Instructions for entering an array formula are at the foot of this post.
The sheet name (emboldened within the second formula) should be amended as required.
It is important that the range containing the values being searched for (A1:C1 here) and that containing the entries to be searched within (A1:A50) be orthogonal, i.e. one is a horizontal range, the other a vertical range.
If you are not using an English-language version of Excel then the part "R0C00000" within the second formula may need amending.
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).

make link to column excluding certain value(s) in excel

Ok,. Here, I have a column of values, which are ascending order numbers. At certain points, sequence is broken and 0 's are replacing the values. Its like , 1,2,3,0,0,6,0,8,... in continuous cells of a column. Now, I want another column to be linked to this one, but instead of the zeros, the next non zero number appearing on the sequence should be shown. ie. a link to array which exclude a certain value and skips its place. I want it to be in realtime, either using formulas or using macros. Thank you in advance.
OK I got it now (hopefully)... what you are looking for is doable with a simple "SMALL"-function like:
D2: =IFERROR(SMALL(A:A,COUNTIF(A:A,0)+ROW()-1),"")
E2: =IFERROR(INDEX(B:B,MATCH(D2,A:A,0)),"")
The formulas then simply can be copied down. The -1 is the offset for not starting in row 1 (starting at row 25 would need -24). The COUNTIF is simply for skipping out the 0's.
Should be pretty much self explaining, but if you still have any questions, just ask :)
Starting with a blank or zero in C1, and assuming the list of numbers starts in A2 and numbers aren't repeated, you could look for the next number which is greater than the number above starting in C2:-
=IFERROR(INDEX(A$2:A$10,MATCH(TRUE,INDEX(A$2:A$10>C1,0),0)),"")
Is this what you meant?

How to count groups of same cells in a 2d array?

Here's the example (counting black ones):
input:
output:
5 4 // 5 groups (4 squares each)
1 1 // 1 group containing 1 square
For Now, I can't think of anything better than a painfull for iteration. Would it be possible to get these groups in a recursive way?
Thanks
Set all black squares as nodes. Connection between black squares (if the squares are next to each other) will be an edge.
This gives you a graph.
A DFS in the graph will get you all the groups. Note that DFS is recursive by nature.
At the beginning, each cell be "unvisited".
I would iterate through the cells until you meet an "unvisited" black cell. Each white cell you hit up to that point
Once you hit a black cell, you "expand" it to all directions if possible (similar to "floodfilling"). You expand as long as you can and mark all the visited cells as "visited". After you did that, you count how many black cells you infected, and you know how big the group was. After detecting the group, you go on to the next "unvisited" black cell.
You can use algorithm for connected component labeling with 4-connectivity

Resources