I am new to array formulae and have noticed that while SUBTOTAL includes many functions, it does not feature COUNTIF (only COUNT and COUNTA).
I'm trying to figure out how I can integrate a COUNTIF-like feature to my array formula.
I have a matrix, a small subset of which looks like:
A B C D E
48 53 46 64 66
48 66 89
40 38 42 49 44
37 33 35 39 41
Thanks to the help of #Tom Shape in this post, I (he) was able to average the sum of each row in the matrix provided it had complete data (so rows 2 and 4 in the example above would not be included).
Now I would like to count the number of rows with complete data (so rows 2 and 4 would be ignored) which include at least one value above a given threshold (say 45).
In the current example, the result would be 2, since row 1 has 5/5 values > 45, and row 3 has 1 value > 45. Row 5 has values < 45 and rows 2 and 3 have partially or fully missing data, respectively.
I have recently discovered the SUMPRODUCT function and think that perhaps SUMPRODUCT(--(A1:E1 >= 45 could be useful but I'm not sure how to integrate it within Tom Sharpe's elegant code, e.g.,
=AVERAGE(IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))=COLUMNS(A1:E1),SUBTOTAL(9,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1))),""))
Remember, I am no longer looking for the average: I want to filter rows for whether they have full data, and if they do, I want to count rows with at least 1 entry > 45.
Try the following. Enter as array formula.
=COUNT(IF(SUBTOTAL(4,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))>45,IF(SUBTOTAL(2,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1)))=COLUMNS(A1:E1),SUBTOTAL(9,OFFSET(A1,ROW(A1:A5)-ROW(A1),0,1,COLUMNS(A1:E1))))))
Data
Related
Table copied as Text
Column1 Column2 Column3 Column4 Column5 Column6
A AA AAA 100 95 92
A AA AAA 85 83 81
A AA BBB 200 199 160
A BB AAA 65 55 49
B AA AAA 89 88 83
B AA BBB 150 149 145
B BB AAA 140 135
B BB BBB 190 185
B AA AAA 510
AA
AAA BBB
A 173 160
B 593 145
and some more explanation
Basically i want the sum of "Column 6" for the given criteria but the data in Column 6 can only be entered after some delay w.r.t. Column 1, Column 2, Column 3 & Column 4.
Till Column 6 data is entered, i want excel to use the number available in Column 5 which is also entered after some delay w.r.t. Column 1, Column 2, Column 3 & Column 4 but before Column 6.
And till Column 5 data is entered, i want excel to use the number available in Column 4.
Now I am familiar with two SUM/IF arrangements as included below in post.
First one is array sum/if arrangement which is convenient to write but results in terribly long calculation time with 1.5 seconds for just one column and I have over 100 columns in one sheet and about 9 sheets.
Second one is using SUMIFS which requires extensive time to write but relatively better calculation time of 0.5 seconds for column but is still quite high.
Now I need to do away with the array arrangement but doing so will take quite some time and I want to know if there is any better/other arrangement.
Just let me know other arrangement which can get the required result and I will check the arrangement for calculation timing. If the other arrangement is also convenient to write than that is a plus.
This is my table:
And I want to add the right most columns which are not empty i.e. have a number in it, but with the criteria for the first three columns in cell D15.
I only found option to add image. Please let me know how to upload excel file.
enter image description here
Can somebody please suggest an alternate to this array formula so it can calculate way faster
{=SUM(
IF(
($B$2:$B$10=$C15)*
($C$2:$C$10=$C$13)*
($D$2:$D$10=D$14)>0,
IF(
$G$2:$G$10<>"",
$G$2:$G$10,
IF(
$F$2:$F$10<>"",
$F$2:$F$10,
$E$2:$E$10))))}
I have tried below which reduces the calculation time to 1/3 but it is too much typing for the large data I am dealing with
=SUMIFS(
$G$2:$G$10,
$B$2:$B$10,$C15,
$C$2:$C$10,$C$13,
$D$2:$D$10,H$14,
$G$2:$G$10,"<>"&"")
+SUMIFS(
$F$2:$F$10,
$B$2:$B$10,$C15,
$C$2:$C$10,$C$13,
$D$2:$D$10,H$14,
$G$2:$G$10,"="&"",
$F$2:$F$10,"<>"&"")
+SUMIFS(
$E$2:$E$10,
$B$2:$B$10,$C15,
$C$2:$C$10,$C$13,
$D$2:$D$10,H$14,
$G$2:$G$10,"="&"",
$F$2:$F$10,"="&"")
If you're OK with using a helper column (which you should be), you can use this formula in a helper cell and drag down. (In my example at bottom, this formula is in cell H2 and drag down.)
= INDEX(E2:G2,MATCH(-1E+300,E2:G2,-1))
This gets all of the data in either column 4 5 or 6 all into one column.
Then you can use a simpler SUMIFS formula in cell D15:
= SUMIFS($H$2:$H$10, // Sum range (helper column)
$B$2:$B$10,$C15, // Criteria 1 (A or B)
$C$2:$C$10,$C$13, // Criteria 2 (AA or BB)
$D$2:$D$10,D$14) // Criteria 3 (AAA or BBB)
See below, working example:
DISCLAIMER
This answer will simplify your formulas, but I'm not sure if this will help with the performance problems you are experiencing. SUMIFS in itself I don't see being likely the cause of long calculation times. Probably you are experiencing long calculation times because other parts of your spreadsheet are using inefficient formulas and/or formulas involving volatile cells, but that is just a guess because I have no idea what the rest of your spreadsheet looks like.
I am new to using Julia and have little experience with the language. I am trying to understand how multi-dimensional arrays work in it and how to access the array at the different dimensions. The documentation confuses me, so maybe someone here can explain it better.
I created an array (m = Array{Int64}(6,3)) and am trying to access the different parts of that array. Clearly I am understanding it wrong so any help in general about Arrays/Multi-Dimensional Arrays would help.
Thanks
Edit I am trying to read a file in that has the contents
58 129 10
58 129 7
25 56 10
24 125 25
24 125 15
13 41 10
0
The purpose of the project is to take these fractions (58/129) and round the fractions using farey sequence. The last number in the row is what both numbers need to be below. Currently, I am not looking for help on how to do the problem, just how to create a multidimensional array with all the numbers except the last row (0). My trouble is how to put the numbers into the array after I have created it.
So I want m[0][0] = 58, so on. I'm not sure how syntax works for this and the manual is confusing. Hopefully this is enough information.
Julia's arrays are not lists-of-lists or arrays of pointers. They are a single container, with elements arranged in a rectangular shape. As such, you do not access successive dimensions with repeated indexing calls like m[j][i] — instead you use one indexing call with multiple indices: m[i, j].
If you trim off that last 0 in your file, you can just use the built-in readdlm to load that file into a matrix. I've copied those first six rows into my clipboard to make it a bit easier to follow here:
julia> str = clipboard()
"58 129 10\n58 129 7\n25 56 10\n24 125 25\n24 125 15\n13 41 10"
julia> readdlm(IOBuffer(str), Int) # or readdlm("path/to/trimmed/file", Int)
6×3 Array{Int64,2}:
58 129 10
58 129 7
25 56 10
24 125 25
24 125 15
13 41 10
That's not very helpful in teaching you how Julia's arrays work, though. Constructing an array like m = Array{Int64}(6,3) creates an uninitialized matrix with 18 elements arranged in 6 rows and 3 columns. It's a bit easier to see how things work if we fill it with a sensible pattern:
julia> m .= [10,20,30,40,50,60] .+ [1 2 3]
6×3 Array{Int64,2}:
11 12 13
21 22 23
31 32 33
41 42 43
51 52 53
61 62 63
This has set up the values of the array to have the row number in their tens place and the column number in the ones place. Accessing m[r,c] returns the value in m at row r and column c.
julia> m[2,3] # second row, third column
23
Now, r and c don't have to be integers — they can also be vectors of integers to select multiple rows or columns:
julia> m[[2,3,4],[1,2]] # Selects rows 2, 3, and 4 across columns 1 and 2
3×2 Array{Int64,2}:
21 22
31 32
41 42
Of course ranges like 2:4 are just vectors themselves, so you can more easily and efficiently write that example as m[2:4, 1:2]. A : by itself is a shorthand for a vector of all the indices within the dimension it indexes into:
julia> m[1, :] # the first row of all columns
3-element Array{Int64,1}:
11
12
13
julia> m[:, 1] # all rows of the first column
6-element Array{Int64,1}:
11
21
31
41
51
61
Finally, note that Julia's Array is column-major and arranged contiguously in memory. This means that if you just use one index, like m[2], you're just going to walk down that first column. As a special extension, we support what's commonly referred to as "linear indexing", where we allow that single index to span into the higher dimensions. So m[7] accesses the 7th contiguous element, wrapping around into the first row of the second column:
julia> m[5],m[6],m[7],m[8]
(51, 61, 12, 22)
Hi i have a 289x2 array that i want to sort in MatLab. I want to sort the first column into numerical ascending order. However I want to keep the second column entry that is associated with it. Best way to explain is through an example.
x = 76 1
36 2
45 3
Now I want to sort x so that it returns an array that looks like:
x = 36 2
45 3
76 1
So the first column has been sorted into numerical order but has retained its second column value. So far I have tried sort(x,1). This sorts the first column as i want but does not keep the pairing. This returns x as:
x = 36 1
45 2
76 3
Any help would be great. Cheers!!
This is exactly what sortrows does.
x=sortrows(x); % or x=sortrows(x,1);
or if you want to use sort then get the sorted indexes first and then arrange the rows accordingly like this:
[~, idx] = sort(x); %Finding the sorted indexes
x = x(idx(:,1),:) ; %Arranging according to the indexes of the first column
Output for both approaches:
x =
36 2
45 3
76 1
I struggle with this simple problem: I want to create some random poll numbers. I have 4 variables I need to fill with data (actually an array of integer). These numbers should represent a random percentage. All percentages added will be 100% . Sounds simple.
But I think it isn't that easy. My first attempt was to generate a random number between 10 and base (base = 100), and substract the number from the base. Did this 3 times, and the last value was assigned the base. Is there a more elegant way to do that?
My question in a few words:
How can I fill this array with random values, which will be 100 when added together?
int values[4];
You need to write your code to emulate what you are simulating.
So if you have four choices, generate a sample size of random number (0..1 * 4) and then sum all the 0's, 1's, 2's, and 3's (remember 4 won't be picked). Then divide the counts by the sample size.
for (each sample) {
poll = random(choices);
survey[poll] += 1;
}
It's easy to use a computer to simulate things, simple simulations are very fast.
Keep in mind that you are working with integers, and integers don't divide nicely without converting them to floats or doubles. If you are missing a few percentage points, odds are it has to do with your integers dividing with remainders.
What you have here is a problem of partitioning the number 100 into 4 random integers. This is called partitioning in number theory.
This problem has been addressed here.
The solution presented there does essentially the following:
If computes, how many partitions of an integer n there are in O(n^2) time. This produces a table of size O(n^2) which can then be used to generate the kth partition of n, for any integer k, in O(n) time.
In your case, n = 100, and k = 4.
Generate x1 in range <0..1>, subtract it from 1, then generate x2 in range <0..1-x1> and so on. Last value should not be randomed, but in your case equal 1-x1-x2-x3.
I don't think this is a whole lot prettier than what it sounds like you've already done, but it does work. (The only advantage is it's scalable if you want more than 4 elements).
Make sure you #include <stdlib.h>
int prev_sum = 0, j = 0;
for(j = 0; j < 3; ++j)
{
values[j] = rand() % (100-prev_sum);
prev_sum += values[j];
}
values[3] = 100 - prev_sum;
It takes some work to get a truly unbiased solution to the "random partition" problem. But it's first necessary to understand what "unbiased" means in this context.
One line of reasoning is based on the intuition of a random coin toss. An unbiased coin will come up heads as often as it comes up tails, so we might think that we could produce an unbiased partition of 100 tosses into two parts (head-count and tail-count) by tossing the unbiased coin 100 times and counting. That's the essence of Edwin Buck's proposal, modified to produce a four-partition instead of a two-partition.
However, what we'll find is that many partitions never show up. There are 101 two-partitions of 100 -- {0, 100}, {1, 99} … {100, 0} but the coin sampling solution finds less than half of them in 10,000 tries. As might be expected, the partition {50, 50} is the most common (7.8%), while all of the partitions from {0, 100} to {39, 61} in total achieved less than 1.7% (and, in the trial I did, the partitions from {0, 100} to {31, 69} didn't show up at all.) [Note 1]
So that doesn't seem like a unbiased sample of possible partitions. An unbiased sample of partitions would return every partition with equal probability.
So another temptation would be to select the size of the first part of the partition from all the possible sizes, and then the size of the second part from whatever is left, and so on until we've reached one less than the size of the partition at which point anything left is in the last part. However, this will turn out to be biased as well, because the first part is much more likely to be large than any other part.
Finally, we could enumerate all the possible partitions, and then choose one of them at random. That will obviously be unbiased, but unfortunately there are a lot of possible partitions. For the case of 4-partitions of 100, for example, there are 176,581 possibilities. Perhaps that is feasible in this case, but it doesn't seem like it will lead to a general solution.
For a better algorithm, we can start with the observation that a partition
{p1, p2, p3, p4}
could be rewritten without bias as a cumulative distribution function (CDF):
{p1, p1+p2, p1+p2+p3, p1+p2+p3+p4}
where the last term is just the desired sum, in this case 100.
That is still a collection of four integers in the range [0, 100]; however, it is guaranteed to be in increasing order.
It's not easy to generate a random sorted sequence of four numbers ending in 100, but it is trivial to generate three random integers no greater than 100, sort them, and then find adjacent differences. And that leads to an almost unbiased solution, which is probably close enough for most practical purposes, particularly since the implementation is almost trivial:
(Python)
def random_partition(n, k):
d = sorted(randrange(n+1) for i in range(k-1))
return [b - a for a, b in zip([0] + d, d + [n])]
Unfortunately, this is still biased because of the sort. The unsorted list is selected without bias from the universe of possible lists, but the sortation step is not a simple one-to-one match: lists with repeated elements have fewer permutations than lists without repeated elements, so the probability of a particular sorted list without repeats is much higher than the probability of a sorted list with repeats.
As n grows large with respect to k, the number of lists with repeats declines rapidly. (These correspond to final partitions in which one or more of the parts is 0.) In the asymptote, where we are selecting from a continuum and collisions have probability 0, the algorithm is unbiased. Even in the case of n=100, k=4, the bias is probably ignorable for many practical applications. Increasing n to 1000 or 10000 (and then scaling the resulting random partition) would reduce the bias.
There are fast algorithms which can produce unbiased integer partitions, but they are typically either hard to understand or slow. The slow one, which takes time(n), is similar to reservoir sampling; for a faster algorithm, see the work of Jeffrey Vitter.
Notes
Here's the quick-and-dirty Python + shell test:
$ python -c '
from random import randrange
n = 2
for i in range(10000):
d = n * [0]
for j in range(100):
d[randrange(n)] += 1
print(' '.join(str(f) for f in d))
' | sort -n | uniq -c
1 32 68
2 34 66
5 35 65
15 36 64
45 37 63
40 38 62
66 39 61
110 40 60
154 41 59
219 42 58
309 43 57
385 44 56
462 45 55
610 46 54
648 47 53
717 48 52
749 49 51
779 50 50
788 51 49
723 52 48
695 53 47
591 54 46
498 55 45
366 56 44
318 57 43
234 58 42
174 59 41
118 60 40
66 61 39
45 62 38
22 63 37
21 64 36
15 65 35
2 66 34
4 67 33
2 68 32
1 70 30
1 71 29
You can brute force it by, creating a calculation function that adds up the numbers in your array. If they do not equal 100 then regenerate the random values in array, do calculation again.
I am having difficulties with making an array formula work the way I want it to work.
Out of a column of dates which is not sorted, I want it to extract values into a new column. The formula below identifies the required cells of a given month and year, but they appear in their original row rather than on top of the output range. Moreover, I want all ""/FALSE cells to be excluded from the output array.
=IF((MONTH($I$15:$I$1346)=1)*(YEAR($I$15:$I$1346)=2008),$I$15:$I$1346,"")
In fact, the $I$15:$I$1346 should be dynamic and go to the last filled range (I could make a named range for that)
Part two is to expand on that formula so that it calculates the data that is an two column offset of the data described above.
Is the above possible to build into one cell probably with a combination of IF, INDEX, SMALL and maybe others?
I'm not looking for a filter solution. Hope the above is clear enough and that you can help!
Here's a shortened sample layout:
A B C
1 Date Series_A Series_B
2 03/01/2011 45 20
3 04/01/2011 73 30
4 06/01/2011 95 40
5 08/01/2011 72 50
6 06/02/2011 5 13
7 09/02/2011 12 #N/A
8 05/02/2011 23 65
9 07/03/2011 12 65
Then I want three input cells for the year and and the month and series name (index/match, as there are many more columns with data). If it would be 2011, Feb and Series_A, I want it to calculate the average for that month. In this case it would be (5+12+23)/3. If it would be Feb-2011 and Series_B instead, which has an error, it should show (13+65)/2 rather than an error.
Aside from that I want a separate which will output an array with the data instead without 'holes' in between and with the right 'length'. Example for Feb-2011 in Column C:
A B C D
1 Date Series_A Desired Output Output based on f above
2 03/01/2011 45 5
3 04/01/2011 73 12
4 06/01/2011 95 23
5 08/01/2011 72
6 06/02/2011 5 5
7 09/02/2011 12 12
8 05/02/2011 23 23
9 07/03/2011 12
If I then run a =ISBLANK(C5) it should be true, rather than =""=C5
Hope the edit clarifies
I reached out to various platsforms to get an answer, and here you have one which is ok. Still doesn't fully answer part 1, but works nonetheless.
http://www.excelforum.com/excel-formulas-and-functions/905356-exclude-blank-false-cells-in-in-excel-array-if-formula-output.html