Pulling ordered list using array functions in Excel - arrays

I have a report in excel that displays the sales results from each employee. The columns are Location, Region, Username & Sales. It is sorted by Sales descending, showing which employee has the best sales in the company.
I am attempting to have an additional sheet per region that displays the results for all employees in that region also sorted by Sales (to avoid sorting the results of the many regions myself everyday).
An example version of the first 12 rows of the Data Sheet:
G H I J K X
Row Location Username Sales Region Region
1 38 John.Doe 85 North1 North1
2 154 John.Smith 83 South2
3 23 E.Williams 83 North1
4 210 M.Williams 79 East5
5 139 Joe.Dawn 77 North2
6 22 Kay.Smith 69 South2
7 51 Jay.Smith 69 South2
8 125 L.Smith 69 East2
9 51 L.Day 69 South2
10 23 23.Guest2 67 North1
11 92 U.Goode 65 North4
I have successfully created an array function that pulls the Sales column of only the results in the specified region.
{=LARGE(SMALL(IF(IF(ISERROR(K:K),"",K:K)=$X$2,J:J),
ROW(INDIRECT("1:"&COUNTIF(K:K,$X$2)))),F2)}
I am attempting now for an array function that pulls the Username that matches the corresponding sales amount in the original array, and also matches the region. I am having trouble when a single region has 'ties' or more than one employee with the same sales that month. Here is what I started with for that function:
=INDEX(I:I,MATCH(1,(Y2=J:J)*($X$1=K:K),0)
but that is having trouble when a single region has multiple users with the same sales. So I am trying a conditional to accomodate, with the function I know that works for singles when there's only one of that sales for that region.
{=IF(COUNTIF($AB$2:AB2,AB2)>1,
INDEX(I:I,
SMALL(IF(J:J=AB2,
IF(K:K=$AB$2,ROW(K:K)-ROW(INDEX(K:K,1,1))+1)),
COUNTIF($AB$2:AB2,AB2))),
INDEX(I:I,MATCH(1,(AC2=J:J)*($AB$2=K:K),0)))}
The inner piece may be sufficient if it worked, excluding the need for the conditional:
{=INDEX(I:I,
SMALL(IF(J:J=AB2,
IF(K:K=$AB$2,ROW(K:K)-ROW(INDEX(K:K,1,1))+1)),
COUNTIF($AB$2:AB2,AB2)))}
I'll use the same function for Username.
Expected results for two regions:
X Y Z AA AB AC AD AE
Region Sales Username Location Region Sales Username Location
North1 85 John.Doe 38 South2 83 John.Smith 154
83 E.Williams 23 69 Kay.Smith 22
67 23.Guest2 23 69 Jay.Smith 51
69 L.Day 51
Since beginning to type this question I have found a work around that includes a few additional columns to complete the calculation, but still wanted to ask this to see if it was possible for knowledge's sake.

With North1 in X2, these are the formulas for Y2:AA2.
=IFERROR(AGGREGATE(14, 6, ($J$2:$J$999)/($K$2:$K$999=X$2), ROW(1:1)), "")
=IFERROR(INDEX($H:$H, AGGREGATE(15, 6, ROW($2:$999)/(($K$2:$K$999=X$2)*($J$2:$J$999=Y2)), COUNTIF(Y$2:Y2, Y2))), "")
=IFERROR(INDEX($H:$H, AGGREGATE(15, 6, ROW($2:$999)/(($K$2:$K$999=X$2)*($J$2:$J$999=Y2)), COUNTIF(Y$2:Y2, Y2))), "")
Fill down as necessary.
With South2 in AB2, copy Y2:AA2 to AC2:AE2 and fill down as necessary.

Related

when duplicate values found then

I want to have a query that selects all duplicate values in a column. If those value meet the conditions then I'd like the query to return only those values.
Class Student_ID Location
Biology 511 4A
Biology 512 15B
Biology 513 15B
English 514 6A
Biology 521 6A
Spanish 522 6A
Spanish 523 15B
Chemistry 524 4A
English 531 15B
Biology 532 4A
Chemistry 534 4A
Select all duplicate values in the class column and if among those values there is location in both 4A and 15B then assign 1.
CASE WHEN count(class) > 1 AND (Location = '4A' AND Location = '15B') THEN 1
ELSE 0 END
what is most important is how to select duplicate values as a group and then look at the condition (location must be 4A and 15B). So the query must first group the duplicated values from the class column and then see if within the group the values meet the condition of location. So for example we first group the class column we get 5x biology this is then seen as a group and then within this group if there exist one row with location 4A AND one row with location 15B then and only then assign value 1 for biology. Almost all the values in class column have duplicates.
Desired Output
Class Location
Biology 1
Chemistry 0
English 0
Spanish 0
As an alternative to Tim Schmelter's answer, you can also do this with a LEFT JOIN.
SELECT yt1.Class, IIF(COUNT(yt2.Class) > 0, 1, 0) AS IsMatch
FROM YourTable yt1
LEFT JOIN YourTable yt2 ON yt1.Location = '4A' AND yt2.Location = '15B' AND
yt2.Class = yt1.Class
GROUP BY yt1.Class

Add Countif to Array Formula (Subtotal) in Excel

I am new to array formulae and have noticed that while SUBTOTAL includes many functions, it does not feature COUNTIF (only COUNT and COUNTA).
I'm trying to figure out how I can integrate a COUNTIF-like feature to my array formula.
I have a matrix, a small subset of which looks like:
A B C D E
48 53 46 64 66
48 66 89
40 38 42 49 44
37 33 35 39 41
Thanks to the help of #Tom Shape in this post, I (he) was able to average the sum of each row in the matrix provided it had complete data (so rows 2 and 4 in the example above would not be included).
Now I would like to count the number of rows with complete data (so rows 2 and 4 would be ignored) which include at least one value above a given threshold (say 45).
In the current example, the result would be 2, since row 1 has 5/5 values > 45, and row 3 has 1 value > 45. Row 5 has values < 45 and rows 2 and 3 have partially or fully missing data, respectively.
I have recently discovered the SUMPRODUCT function and think that perhaps SUMPRODUCT(--(A1:E1 >= 45 could be useful but I'm not sure how to integrate it within Tom Sharpe's elegant code, e.g.,
=AVERAGE(IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))=COLUMNS(A1:E1),SUBTOTAL(9,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1))),""))
Remember, I am no longer looking for the average: I want to filter rows for whether they have full data, and if they do, I want to count rows with at least 1 entry > 45.
Try the following. Enter as array formula.
=COUNT(IF(SUBTOTAL(4,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))>45,IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))=COLUMNS(A1:E1),SUBTOTAL(9,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1))))))
Data

comparing multiple column files using python3

input_file1:
a 1 33
a 34 67
a 68 78
b 1 99
b 100 140
c 1 70
c 71 100
c 101 190
input file2:
a 5 23
a 30 72
a 76 78
b 5 30
c 23 88
c 92 98
I want to compare these two files such that for every value of 'a' in file2 the two integers (boundary) fall in the range (boundaries) of 'a' in file1 or between two ranges.
Instead of storing values like this 'a 1 33', you can make one structure (like 'a:1:33') for your data while writing into file. So that it will become easy to read data also.
Then, you can read each line and can split it based on ':' separator and you can compare with another file easily.

Retrieving specific elements in matlab from arrays in MATLAB

I have an array in MATLAB
For example
a = 1:100;
I want to select the first 4 element in every successive 10 elements.
In this example I want to b will be
b = [1,2,3,4,11,12,13,14, ...]
can I do it without for loop?
I read in the internet that i can select the element for each step:
b = a(1:10:end);
but this is not working for me.
Can you help me?
With reshape
%// reshaping your matrix to nx10 so that it has successive 10 elements in each row
temp = reshape(a,10,[]).'; %//'
%// taking first 4 columns and reshaping them back to a row vector
b = reshape(temp(:,1:4).',1,[]); %//'
Sample Run for smaller size (although this works for your actual dimensions)
a = 1:20;
>> b
b =
1 2 3 4 11 12 13 14
To vectorize the operation you must generate the indices you wish to extract:
a = 1:100;
b = a(reshape(bsxfun(#plus,(1:4)',0:10:length(a)-1),[],1));
Let's break down how this works. First, the bsxfun function. This performs a function, here it is addition (#plus) on each element of a vector. Since you want elements 1:4 we make this one dimension and the other dimension increases by tens. this will lead a Nx4 matrix where N is the number of groups of 4 we wish to extract.
The reshape function simply vectorizes this matrix so that we can use it to index the vector a. To better understand this line, try taking a look at the output of each function.
Sample Output:
>> b = a(reshape(bsxfun(#plus,(1:4)',0:10:length(a)-1),[],1))
b =
Columns 1 through 19
1 2 3 4 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43
Columns 20 through 38
44 51 52 53 54 61 62 63 64 71 72 73 74 81 82 83 84 91 92
Columns 39 through 40
93 94

Sum of multiple variables by group

I have a dataset with over 900 observations, each observation represents the population of a sub-geographical area for a given year by gender (male, female, all) and 20 different age groups.
I have dropped the variable for the sub-geographical area and I want to collape into the greater geographical area (called Geo).
I am having a difficult time doing a SUM or PROC MEANS because I have so many age groups to sum up and I am trying to avoid writing them all out. I want to collapse across the group year, geo, sex so that I only have 3 observations per Geo (my raw data could have as many as 54 observations).
This is an example of what a tiny section of the raw data looks like:
Year Geo Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
This is how I want it to look:
Year Group Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 133 111 118
2010 1 2 109 122 08
2010 1 3 252 233 226
2010 2 1 103 101 102
2010 2 2 92 95 106
2010 2 3 195 196 208
Any ideas? Please help!
You don't have to write out each variable name individually - there are ways of getting around that. E.g. if all of the age group variables that need to be summed up start with age then you can use a : wildcard to match them:
proc summary nway data = have;
var age:;
class year geo sex;
output out = want sum=;
run;
If your variables don't have a common prefix, but are all next to each other in one big horizontal group in your dataset, you can use a double dash list instead:
proc summary nway data = have;
var age005--age1115; /*Includes all variables between these two*/
class year geo sex;
output out = want sum=;
run;
Note also the use of sum= - this means that each summarised variable is reproduced with its original name in the output dataset.
I personally like to use proc sql for this, since it makes it very clear what you're summing and grouping by.
data old ;
input Year Geo Sex Age0005 Age0610 Age1115 ;
datalines;
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
;
run;
proc sql ;
create table new as select
year
, geo label = 'Group'
, sex
, sum(age0005) as age0005
, sum(age0610) as age0610
, sum(age1115) as age1115
from old
group by geo, year, sex ;
quit;

Resources