I am using VBScript to find a quarterbacks passer rating. This rating is based on a single season. I would like to find a QBs overall career passer rating by:
Running the program to get the single season passer rating program
to record the result from the given input.
Running the program again based on a subsequent or previous season's
stats
Take the collective passer ratings and find the average of all
seasons
My logic is that I would place each seasons passer rating into an array.
dim array()
arrPasserRating = array(Passer1,Passer2, ....)
This array would have no upper limit so I assume I would have to do something like this:
ReDim Preserve ArrPasserRating(UBound(yourArray) + 1)
arrPasserRating(UBound(arrPasserRating) = "..."
where "..." is the next passer rating I calculated.
Now I would have the problem of a naming scheme and creating a for ... next in order to create a new array value with the name of the previous item in the array and adding the "number of the index that it occupies in the array.
For instance passerRating would stay passerRating, the next inputted passerRating would be passerRating1, etc.
Does anyone have a starting point for me or a solution to this problem?
Related
Cheers, I am trying to find an algorithm/data structure I can use to rank elements by their frequency.
For example, let's say I am given 5 names and I want to rank them based on their frequency. I am given the names consecutively, and every insertion and query I perform MUST be in O(log(n)) time, where n is the number of given names.
For example let's say I am given:
"foo"
"bar"
"bar"
"pop"
"foo"
"bar"
Then, by ranking the 1st should be "bar" (3 times), 2nd => "foo" and 3rd "pop". Keep in mind that when two or more elements have the same frequency (and the same ranking), which ever I return is correct.
I have tried using a Map (Hash), to keep the frequency in which the strings are given, for example if given "foo" I can return 3 (NOT the rank however), or even thought of using a Set (using an AVL tree) in order to arrange them by their frequency, but again I can't turn that into a Ranking data structure in logarithmic time. Any ideas ?
Return rating by name.
You can do insert and query in constant time O(1). For this, you need to employ two structures hash-map and something that I call doubly-linked-list.
Hash-map contains pairs - a name and pointer to a list item/bucket with this name statistics.
Doubly-linked-list bucket stores two numbers: an integer for the number of names pointing to the lower buckets (Rating) and a number of repetitions for the names in it (RepCount).
Initialization:
Create the first bucket, put all names into the hash-map and initialize pointers with the address of the first bucket. Create another bucket with RepCount = INFINITY and Rating = #names.
OPERATIONS:
Insert name. Find the address of the corresponding bucket Target, check if the bucket OneMore with OneMore.RepCount == Target.RepCount + 1 true exists. If it exists then --OneMore.Rating, if not then create one with RepCount = Target.RepCount + 1 and Rating = NextToTarget.Rating - 1. Observe that NextToTarget always exists due to initialization. Repoint hash-map entry to OneMore.
Query rating. Extract appropriate pointer from the hash-map and read Target.Rating.
Return name by rating (and rating by names)
You need two hash-maps and doubly-linked-list. In hash-map names store name => name-in-list*, in hash-map ratings store rating and a pointer to the first and the last name with this rating in the list rating => (first, last). In the list store pairs (name, rating) in the order described below.
Initialization:
Insert all names into the list. Insert a single entry into the hash-map (0, (list.head, list.tail)).
OPERATIONS:
Inset name. Recover name list node using names. Using ratings find out there node.rating finishes and move node next to it increasing its rating by one. Compare new rating with the next node's rating and see if you need to update an existing rating or create a new one in ratings. Remove ratings entry in case the old rating is empty now or update it if node was first or last.
Query name. Use ratings[..].first or return null if not exists.
Query rating. Return names[..].rating.
As you can see, I have a database table on the left. And I want to add in IF statement that allows me to lookup the [Code], [Name] and [Amount] of the top 5 of Company A ONLY. Then do a top 5 for Company B and so on. I have managed to lookup the top 5 out of ALL companies but cannot seem to add a criteria to target specific company.
Here are my formulas so far:
Formula in Column K [Company]: = INDEX(Database,MATCH(N3,sales,0),1)
Formula in Column L [Code]: = INDEX(Database,MATCH(N3,sales,0),2)
Formula in Column M [Name]: = INDEX(Database,MATCH(N3,sales,0),2)
Formula in Column N [Amount]: = LARGE(sales,ROW(1:20))
The intended result is to show the top 5 sales person in each company along with their [Code], [Name] and [Amount], feel free to suggest any edits to the worksheet.
Here's an alternative if you know the code is unique. After putting A into K3:K7
First get the highest amounts for Company A starting in N3
=AGGREGATE(14,6,Database[Amount]/(Database[Company]=K3),ROWS(N$1:N1))
Then find the code which matches the amount, but only if it hasn't been used before (this assumes that the code is unique) starting in L3
=INDEX(Database[Code],MATCH(1,INDEX((Database[Company]="A")*(Database[Amount]=N3)*ISNA(MATCH(Database[Code],L$2:L2,0)),0),0))
Then find the matching name with a normal INDEX/MATCH starting in M3
=INDEX(Database[Name],MATCH(L3,Database[Code],0))
Okay, I have achieved this with the use of a helper column which you can hide. Please nnote though that this will only work as long as there are not more than 9 identical totals for any 1 company, I don't think you should have that issue but it may occur, the digits being added by the helper column would need to be tweaked
First Helper Column:
Adds a digit to the end of the total representing the number of times that amount already exists above for that company. This formula is =CONCATENATE([#Amount],COUNTIFS($A$1:A1,A2,$D$1:D1,D2))*1
This is multiplied by 1 to keep the number format for LARGEto work with.
Second Helper Column:
This is an array formula and will need to be input by using Ctrl+Shift+Enter while still in the formula bar.
The formula for this one is:
=LARGE(IF(Company="A",Helper),ROW(1:1))
What this formula does as an array formula is produce a list of results based on the IF statement that LARGE can use. Rather than the entire column being ranked largest to smallest, we can now single out the rows that have company "A" like so:
=LARGE({20000;20001;20002;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;15000;14000;30000;FALSE;FALSE;FALSE;FALSE},ROW(1:1))
LARGE will only work with numeric values so the FALSES produced where column A does not match "A" will be ignored. Notice why I have used the helper column here to eliminate unique values but not affect the top 5.
ROW(1:1) has been used as this will automatically update when the formula is dragged down to produce the next highest result in this array.
The main formula for top 5 array
Again this is an Array formula so will need to be input by using Ctrl+Shift+Enter while still in the formula bar.
=INDEX(Database,SMALL(IF(Company="A",IF(Helper=$O3,ROW(Company))),1)-1,COLUMN(A:A))
With array formulas for some unknown reason IF(AND()) just does not work for me so I have nested two IF's instead.
Notice how I am again checking whether the first column matches "A" and then whether the last column matches the result of the second formula. What will happen is where both of these conditions match in the array (as in both produce TRUE for the same row) I wanted the row number to be returned.
IF({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE},IF({FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;FALSE;FALSE},{2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20}))
It looks like a mess I know, but the position where both TRUEs align gives us the row 16 as a result.
{FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;16;FALSE;FALSE;FALSE;FALSE}
As I know that there can only be one match possible for this, I use SMALL to grab the first smallest number to use in the INDEX formula for row and deduct 1 as we are not considering the headers in the INDEX formula so we actually want the 15th result.
Again, COLUMN(A:A) has been used for the column number to return as this will automatically update when the formula is dragged across.
If you are struggling with my explanation and want me to provide more clarity, feel free to reach out and I will try my best to explain the logic in more detail
almost new year. I've got an Excel question I already said.
In worksheet "Workouts", I have a "WorkoutsSummary" table, containing the following information:
As you can see, there are two cells containing the workout "Ares" (that's just a name for the training session), however, in the table column header "Workout Order" (second column), they have different values. In the table I can add more rows that can contain the same values (same Workout, same Start Time, same End Time, same Workout Duration), except in the Workout Order column (is unique for each row).
So, having said that, what I want to do is to show all Ares workouts (that means showing the Workout Duration value), in other worksheet named "Results", in ascending order. Now, as I said, it's in ascending order, and even though Ares is repeated twice, the numeric value in Workout Order isn't. This means that in the Results worksheet it should first be displayed the first Ares (containing the Workout Duration of 16:03), and below it the second Ares (containing the Workout Duration of 20:04).
I created this code:
=INDEX(WorkoutsSummary[Workout Duration], MATCH("Ares", WorkoutsSummary[Workout], 0))
This code, used in cell C13 in the worksheet Results, displays "16:03", which is the workout duration of the first Ares. THE PROBLEM is I can't figure out how I can use another code to display the second Ares. Obviously I can't use the same code because I'll always show the duration of the same first Ares. That's when I think the Workout Order values comes in play.
Any unclear thing, comment and I'll try to explain better.
You want to use small(if()):
=INDEX(WorkoutsSummary[Workout Duration], MATCH(SMALL(IF( "Ares" = WorkoutsSummary[Workout],WorkoutsSummary[Workout Order]),1),IF( "Ares" = WorkoutsSummary[Workout],WorkoutsSummary[Workout Order]), 0))
This is an array and needs to be confirmed with Ctrl-Shift-Enter.
Change the 1 in the second argument of the small to 2 to get the second and so forth.
You could use Count() or some other function to dynamically count, but this will give you some place to start.
I have a cell that currently uses an array formula to return the name associated with the minimum hours worked for all my employees. However, what I am trying to do now is write an array formula that lists the three next employees with lowest hours. I have written a formula similar to this in the past, but can't seem to get the two formulas to appropriately match up.
My current return minimum employee formula in G5:
={INDEX(A:A,MATCH(MIN(IF(B:B=G3,IF(C:C>=$G$2,D:D)))&G3,D:D&B:B,0))}
Here is an example of my data:
...and now I'm attempting to incorporate in into the following array formula that would return a list of qualifying results as I dragged it down a column:
={(IF(ROWS(G$7:G7)<=F$8,INDEX($A$2:$A$8,SMALL(IF(Employees!$B$2:$B$8=$G$3,ROW($A$2:$A$8)-ROW($A$2)+1),ROWS(G$7:G7))),""))}
Currently, this array formula is only set up to match on position title and not the other qualifiers that I need from my minimum employee formula. How can I mesh the two formulas correctly? Thank you for any and all help and please, let me know if you need any clarification.
The ideal array result would show Boris and two blanks in consecutive rows in the Next 3 Employees chart.
Set your page up like this:
With the ranking in column F.
Then it is a quick modification of the last formula. Instead of MIN we use Small. The k part of the small equation is the ranking number:
=INDEX(A:A,MATCH(SMALL(IF(B:B=$G$3,IF(C:C>=$G$2,D:D)),F5)&$G$3,D:D&B:B,0))
This goes in G5. Is confirmed with ctrl-shift-enter. Then copied down for rows.
If do not want the errors to show then wrap it in IFERROR:
=IFERROR(INDEX(A:A,MATCH(SMALL(IF(B:B=$G$3,IF(C:C>=$G$2,D:D)),F5)&$G$3,D:D&B:B,0)),"NO MATCHES")
I have the following array formula that calculates the returns on a particular stock in a particular year:
=IF(AND(NOT(E2=E3),H2=H3),PRODUCT(IF($E$2:E2=E1,$O$2:O2,""))-1,"")
But since I have 500,000 row entries as soon as I hit row 50,000 I get an error from Excel stating that my machine does not have enough resources to compute the values.
How shall I optimize the function so that it actually works?
E column refers to a counter to check the years and ticker values of stocks. If year is different from the previous value the function will output 1. It will also output 1 when the name of stock has changed. So for example you may have values for year 1993 and the next value is 1993 too but the name of stock is different, so clearly the return should be calculated anew, and I use 1 as an indication for that.
Then I have another column that runs a cumulative sum of those 1s. When a new 1 in that previous column is encountered I add 1 to the running total and keep printing same number until I observe a new one. This makes possible use of the array function, if the column that contains running total values (E column) has a next value that is different from previous I use my twist on SUMIF but with PRODUCT IF. This will return the product of all the corresponding running total E column values.
The source of the inefficiency, I believe, is in the steady increase with row number of the number of cells that must be examined in order to evaluate each successive array formula. In row 50,000, for example, your formula must examine cells in all the rows above it.
I'm a big fan of array formulas, so it pains me to say this, but I wouldn't do it this way. Instead, use additional columns to compute, in each row, the pieces of your formula that are needed to return the desired result. By taking that approach, you're exploiting Excel's very efficient recalculation engine to compute only what's needed.
As for the final product, compute that from a cumulative running product in an auxiliary column, and that resets to the value now in column O when column P in the row above contains a number. This approach is much more "local" and avoids formulas that depend on large numbers of cells.
I realize that text is not the best language for describing this, and my poor writing skills might be adding to the challenge, so please let me know if more detail is needed.
Interesting problem, thanks.
Could I suggest a really quick and [very] dirty vba? Something like the below. Obviously, have a backup of your file before running this. This assumes you want to start calculating from row 13.
Sub calculateP()
'start on row 13, column P:
Cells(13, 16).Select
'loop through every row as long as column A is populated:
Do
If ActiveCell(1, -14).Value = "" Then Exit Do 'column A not populated so exit loop
'enter formula:
Selection.FormulaR1C1 = _
"=IF(AND(NOT(RC[-11]=R[1]C[-11]),RC[-8]=R[1]C[-8]),PRODUCT(IF(R[-11]C5:RC[-11]=R[-1]C[-11],R2C15:RC[-1],""""))-1,"""")"
'convert cell value to value only (remove formula):
ActiveCell.Value = ActiveCell.Value
'select next row:
ActiveCell(2, 1).Select
Loop
End Sub
Sorry, this is definitely not a great answer for you... in fact, even this method could be achieved more elegantly using range... but, the quick and dirty approach may help you in the interim ??