Approximate match within a sub-array - arrays

I have a table with the following:
Name Quota-Date Quota
Ami 5/1/2010 75000
Ami 1/1/2012 100000
Ami 6/1/2014 150000
John 8/1/2014 0
John 4/1/2015 50000
Rick 5/1/2011 100000
(Dates are shown in American format: m/d/yyyy). "Quota Date" is the first month of the active new "Quota" next to it.
E.g. Ami's quota is 75000 for each month between May 2010 and December 2011.
I need a formula to fetch the quota of a given person and a given month: the active quota of a person in every month. This needed formula is to calculate the third column of this table:
Name Month Quota
Ami 6/1/2010 75000
Ami 12/1/2011 75000
Ami 1/1/2012 100000
Ami 7/1/2014 150000
John 10/1/2014 0
John 4/1/2015 50000
I prefer not to maintain the first table sorted, but if it will make things significantly simpler, I would.
What would be the correct formula for "Quota" on the second table?

If your new data is in columns A-C and original data is also columns A-C in Sheet1, then enter this formula in B2:
=SUMIFS(Sheet1!C:C,Sheet1!A:A,A2,Sheet1!B:B,MAX(IF((Sheet1!A:A=A2)*(Sheet1!B:B<=B2),Sheet1!B:B,"")))
This formula works well if you have only numbers in your 3rd column, but would be more complicated to make it working on text too.

thanks, Máté Juhász!
I just worked out another solution for that, not as an Array Formula, but I like your solution better - more elegant, I will use it!
My solution for the record:
=INDEX(INDIRECT("Quota!$E$" & MATCH([#PM],PMQuotaTable[PM],0)+ROW(PMQuotaTable[#Headers]) & ":$E$" & MATCH([#PM],PMQuotaTable[PM],0)+ROW(PMQuotaTable[#Headers])+COUNTIF(PMQuotaTable[PM],[#PM])-1),MATCH([#Month],INDIRECT("Quota!$D$" & MATCH([#PM],PMQuotaTable[PM],0)+ROW(PMQuotaTable[#Headers]) & ":$D$" & MATCH([#PM],PMQuotaTable[PM],0)+ROW(PMQuotaTable[#Headers])+COUNTIF(PMQuotaTable[PM],[#PM])-1),1))
I'm running a regular index/match with match type = 1 to find the largest date row, but I construct the target range dynamically to scope only the rows of the current person (PM).
I identify the first row of this PM with this part:
MATCH([#PM],PMQuotaTable[PM],0)+ROW(PMQuotaTable[#Headers])
...and the the last row for the PM by adding the number of rows he has in the table, so retrieved using this:
MATCH([#PM],PMQuotaTable[PM],0)+ROW(PMQuotaTable[#Headers]) + COUNTIF(PMQuotaTable[PM],[#PM])-1
The dynamic range is then constructed using INDIRECT. So, the complete range is determined with this part (for the needed column to be retrieved eventually):
INDIRECT("Quota!$E$" & MATCH([#PM],PMQuotaTable[PM],0)+ROW(PMQuotaTable[#Headers]) & ":$E$" & MATCH([#PM],PMQuotaTable[PM],0)+ROW(PMQuotaTable[#Headers])+COUNTIF(PMQuotaTable[PM],[#PM])-1)
Mor

Related

Sheets header row ARRAYFORMULA to look up rate based on the job's turnaround AND date received within a range of dates

I've got a Google Sheets workbook with two sheets: Jobs
A
B
C
D
E
1
Turnaround
Received
Rate
Pages
Total
2
Standard
12/2/2021
$0.40
204
$81.60
3
Rush
12/9/2021
$0.60
79
$47.40
4
Rush
12/29/2021
$0.60
24
$14.40
5
Standard
1/1/2022
$0.45
81
$36.45
6
Standard
1/2/2022
$0.45
137
$61.65
7
Standard
1/5/2022
$0.45
95
$42.75
8
Standard
1/15/2022
$0.45
162
$72.90
Rates
A
B
C
D
1
Turnaround
Base Rate
Start Date
End Date
2
Standard
$0.40
9/1/2021
12/31/2021
3
Rush
$0.60
8/17/2018
6/10/2022
4
Expedited
$0.80
8/17/2018
6/10/2022
5
Daily
$1.00
8/17/2018
6/10/2022
6
Standard
$0.45
1/1/2022
6/10/2022
I'm trying to use an ARRAYFORMULA in Jobs!C1 to look up the value in Rates!B:B where the Turnaround in Jobs!A:A matches the Turnaround in Rates!A:A and the Date Received in Jobs!B:B falls on or between the Start Date in Rates!C:C and End Date in Rates!D:D.
The idea is that rates may change over time, but the job totals will still calculate using the correct rate at the time each job came in.
I know I can't use SUMIFS with ARRAYFORMULA, so I tried using QUERY, but this only populates the rate for the first job.
={"Rate";
ARRAYFORMULA(QUERY(Rates!A:D,
"select B where A contains '"&Jobs!A2:A
&"' and C < date'"&TEXT(Jobs!B2:B, "YYYY-MM-DD")
&"' and D > date'"&TEXT(Jobs!B2:B, "YYYY-MM-DD")&"'",0))}
I'm okay with adding helper columns if needed. I'm trying to avoid having to manually fill the formula down the column as jobs are added.
Here is a link to the workbook:
Job Rate Lookup By Turnaround + Date Range
I appreciate any help on this.
try:
={"Rate"; ARRAYFORMULA(IFNA(VLOOKUP(A2:A&B2:B, SORT({
FILTER(Rates!A2:A, Rates!A2:A<>"")&Rates!C2:C, Rates!B2:B}, Rates!C2:C, 1, Rates!A2:A, 1), 2, 1)))}
When using ARRAYFORMULA you won't be able to use QUERY in order to get the whole array of values as it will only return the first value that is found.
I created a formula that matches the value using VLOOKUP however I had to modify the name in Jobs from Standard to Standard 2.
This is the formula:
=IFERROR(ARRAYFORMULA(VLOOKUP(A2:A,Rates!A2:D6,2,0)))
These are the results:

How can I aggregate on 2 dimensions in Google Data Studio?

I have data in 2 dimensions (let's say, time and region) counting the number of visitors on a website for a given day and region, as per the following:
time
region
visitors
2021-01-01
Europe
653
2021-01-01
America
849
2021-01-01
Asia
736
2021-01-02
Europe
645
2021-01-02
America
592
2021-01-02
Asia
376
...
...
...
2021-02-01
Asia
645
...
...
...
I would like to create a table showing the average daily worldwide visitors for each month, that is:
time
visitors
2021-01
25238
2021-02
16413
This means, I need to aggregate the data this way:
first, sum over regions for distinct dates
then, calculate average on dates
Is was thinking of doing a global average of all lines of data for each month, and then multiply the value by the number of days in the month but since that number is variable I can't do it.
Is there any way to do this ?
Create 2 calculated fields:
Month(time)
SUM(visitors)/COUNT(DISTINCT(time))
In case it might help someone... so far (January 2021) it seems there is no way to do that in DataStudio. Calculated fields or data blending do not have a GROUP BY-like function.
So, I found 2 alternative solutions:
create an additional table in my data with the first aggregation (sum over regions). This gives a table with the number of visitors for each date.
Then I import it in DataStudio and do the second aggregation in the table.
since my data is stored in BigQuery, a custom SQL query can be used to create another data source from the same dataset. This way, a GROUP BY statement can be used to sum over regions before the average is calculated.
These solutions have a big drawback that is, I cannot add controls to filter by region (since data from all regions is aggregated before entering datastudio).

How to calculate New Fans from Total Fans using dates in Google Data Studio

I am pulling Facebook Fan data daily via Supermetrics into Data Studio and I was hoping someone could share a formula that I could use to calculate New Fans, as a calculated field, from Total Fans.
The formula would need Identify the last day of the month and the subtract followers from the first day of the month.
For example: If there 100 Fans at the end of September and 60 Fans at the beginning September, the formula would show 40 New Fans.
Formula Example
Assuming the net fan numbers is always increasing, first set the aggregation method for Total Fans to None in Fields screen by editing the data source. Then you can create a new calculated metric with the formula max(Total Fans)-min(Total Fans). This will work only at aggregate level (using scorecard or table total) and not at row level in tables.

Calculate Facebook likes, comments, and shares for different time zones from saved UTC

I've been struggle with this for a while and hope someone can give me an idea to tackle this.
We have a service that goes out and collects Facebook likes, comments, and shares for each status update multiple times a day. The table that stores this data is something like this:
PostId EngagementTypeId Value CollectedDate
100 1(for likes) 10 1/1/2013 1:00
100 2 (comments) 2 1/1/2013 1:00
100 3 0. 1/1/2013 1:00
100. 1. 12 1/1/2013 3:00
100. 2. 3. 1/1/2013 3:00
100. 3 5. 1/1/2013 3:00
Value holds the total for each engagement type at the time of collection.
I got a requirement to create a report that shows new value per day at different time zones.
Currently,I'm doing the calculation in a stored procedure that takes in a time zone offset and based on that I calculate the delta for each day. If this is for someone in California, the report will show 12 likes, 3 comments, and 5 shares for 12/31/2012. But if someone with the time zone offset of -1, he will see 10 likes on 12/31/2012 and 2 likes on 1/1/2013.
The problem I'm having is doing the calculation on the fly can be slow if we have a lot of data and a big date range. We're talking about having the delta pre-calculated for each day and stored in a table and I can just query from that ( we're considering SSAS but that's for the next phase). But doing this, I would need to have the data for each day for 24 time zones. Am I correct (and if so, this is not ideal) or is there a better way to approach this?
I'm using SQL 2012.
Thank you!
You need to convert UTC DateTime stored in your column to Date based on users UTC time. This way you don't have to worry about any table that has to be populated with data. To get users date from your UTC column you will use something like this
SELECT CONVERT(DATE,(DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), '01/29/2014 04:00')))
AS MyLocalDate
The select statement above figures out Local date based on the difference of UTC date and local Date. You will need to replace GETDATE() with users DATETIME that is passed in to your procedure and replace '01/29/2014 04:00' with your column. This way when you select any date from your table it will be according to what that date was at users local time. Than you can calculate other fields accordingly.

DATE lookup table (1990/01/01:2041/12/31)

I use a DATE's master table for looking up dates and other values in order to control several events, intervals and calculations within my app. It has rows for every single day begining from 01/01/1990 to 12/31/2041.
One example of how I use this lookup table is:
A customer pawned an item on: JAN-31-2010
Customer returns on MAY-03-2010 to make an interest pymt to avoid forfeiting the item.
If he pays 1 months interest, the employee enters a "1" and the app looks-up the pawn
date (JAN-31-2010) in date master table and puts FEB-28-2010 in the applicable interest
pymt date. FEB-28 is returned because FEB-31's dont exist! If 2010 were a leap-year, it
would've returned FEB-29.
If customer pays 2 months, MAR-31-2010 is returned. 3 months, APR-30... If customer
pays more than 3 months or another period not covered by the date lookup table,
employee manually enters the applicable date.
Here's what the date lookup table looks like:
{ Copyright 1990:2010, Frank Computer, Inc. }
{ DBDATE=YMD4- (correctly sorted for faster lookup) }
CREATE TABLE datemast
(
dm_lookup DATE, {lookup col used for obtaining values below}
dm_workday CHAR(2), {NULL=Normal Working Date,}
{NW=National Holiday(Working Date),}
{NN=National Holiday(Non-Working Date),}
{NH=National Holiday(Half-Day Working Date),}
{CN=Company Proclamated(Non-Working Date),}
{CH=Company Proclamated(Half-Day Working Date)}
{several other columns omitted}
dm_description CHAR(30), {NULL, holiday description or any comments}
dm_day_num SMALLINT, {number of elapsed days since begining of year}
dm_days_left SMALLINT, (number of remaining days until end of year}
dm_plus1_mth DATE, {plus 1 month from lookup date}
dm_plus2_mth DATE, {plus 2 months from lookup date}
dm_plus3_mth DATE, {plus 3 months from lookup date}
dm_fy_begins DATE, {fiscal year begins on for lookup date}
dm_fy_ends DATE, {fiscal year ends on for lookup date}
dm_qtr_begins DATE, {quarter begins on for lookup date}
dm_qtr_ends DATE, {quarter ends on for lookup date}
dm_mth_begins DATE, {month begins on for lookup date}
dm_mth_ends DATE, {month ends on for lookup date}
dm_wk_begins DATE, {week begins on for lookup date}
dm_wk_ends DATE, {week ends on for lookup date}
{several other columns omitted}
)
IN "S:\PAWNSHOP.DBS\DATEMAST";
Is there a better way of doing this or is it a cool method?
This is a reasonable way of doing things. If you look into data warehousing, you'll find that those systems often use a similar system for the time fact table. Since there are less than 20K rows in the fifty-year span you're using, there isn't a huge amount of data.
There's an assumption that the storage gives better performance than doing the computations; that most certainly isn't clear cut since the computations are not that hard (though neither are they trivial) and any disk access is very slow in computational terms. However, the convenience of having the information in one table may be sufficient to warrant having to keep track of an appropriate method for each of the computed values stored in the table.
It depends on which database you are using. SQL Server has horrible support for temporal data and I almost always end up using a date fact table there. But databases like Oracle, Postgres and DB2 have really good support and it is typically more efficient to calculate dates on the fly for OLTP applications.
For instance, Oracle has a last_day() function to get the last day of a month and an add_months() function to, well, add months. Typically in Oracle I'll use a pipelined function that takes start and end dates and returns a nested table of dates.
The cool way of generating a rowset of dates in Oracle is to use the hierarchical query functionality, connect by. I have posted an example of this usage in another thread.
It gives a lot of flexibility without the PL/SQL overhead of a pipelined function.
OK, so I tested my app using 31 days/month to calculate interest rates & pawnshops are happy with it! Local Law prays as follows: From pawn or last int. pymt. date to 5 elapsed days, 5% interest on principal, 6 to 10 days = 10%, 11 to 15 days = 15%, and 16 days to 1 "month" = 20%.
So the interest table is now defined as follows:
NUMBER OF ELAPSED DAYS SINCE
PAWN DATE OR LAST INTEREST PYMT
FROM TO ACUMULATED
DAY DAY INTEREST
----- ---- ----------
0 5 5.00%
6 10 10.00%
11 15 15.00%
16 31 20.00%
32 36 25.00%
37 41 30.00%
42 46 35.00%
47 62 40.00%
[... until day 90 (forfeiture allowed)]
from day 91 to 999, daily prorate based on 20%/month.
Did something bad happen in the UK on MAR-13 or SEP-1752?

Resources