How can I aggregate on 2 dimensions in Google Data Studio? - google-data-studio

I have data in 2 dimensions (let's say, time and region) counting the number of visitors on a website for a given day and region, as per the following:
time
region
visitors
2021-01-01
Europe
653
2021-01-01
America
849
2021-01-01
Asia
736
2021-01-02
Europe
645
2021-01-02
America
592
2021-01-02
Asia
376
...
...
...
2021-02-01
Asia
645
...
...
...
I would like to create a table showing the average daily worldwide visitors for each month, that is:
time
visitors
2021-01
25238
2021-02
16413
This means, I need to aggregate the data this way:
first, sum over regions for distinct dates
then, calculate average on dates
Is was thinking of doing a global average of all lines of data for each month, and then multiply the value by the number of days in the month but since that number is variable I can't do it.
Is there any way to do this ?

Create 2 calculated fields:
Month(time)
SUM(visitors)/COUNT(DISTINCT(time))

In case it might help someone... so far (January 2021) it seems there is no way to do that in DataStudio. Calculated fields or data blending do not have a GROUP BY-like function.
So, I found 2 alternative solutions:
create an additional table in my data with the first aggregation (sum over regions). This gives a table with the number of visitors for each date.
Then I import it in DataStudio and do the second aggregation in the table.
since my data is stored in BigQuery, a custom SQL query can be used to create another data source from the same dataset. This way, a GROUP BY statement can be used to sum over regions before the average is calculated.
These solutions have a big drawback that is, I cannot add controls to filter by region (since data from all regions is aggregated before entering datastudio).

Related

How to calculate New Fans from Total Fans using dates in Google Data Studio

I am pulling Facebook Fan data daily via Supermetrics into Data Studio and I was hoping someone could share a formula that I could use to calculate New Fans, as a calculated field, from Total Fans.
The formula would need Identify the last day of the month and the subtract followers from the first day of the month.
For example: If there 100 Fans at the end of September and 60 Fans at the beginning September, the formula would show 40 New Fans.
Formula Example
Assuming the net fan numbers is always increasing, first set the aggregation method for Total Fans to None in Fields screen by editing the data source. Then you can create a new calculated metric with the formula max(Total Fans)-min(Total Fans). This will work only at aggregate level (using scorecard or table total) and not at row level in tables.

Calculating a % rate based on two date dimension in one cube

Set up - SSAS 2012 with OLAP cubes (built by supplier) and MS Report Builder v3. No access to BIDS.
I am building a report which needs to calculate a disposal rate based on data from a single cube. Historically this would have been calculated from two separate tables of data, giving a count by month of new items by date recorded and a count by month of items disposed by month of disposal. This can then be turned to a disposal rate using a lookup or similar.
Blank disposal dates are fine (can take months to dispose of items).
I would like to keep this in a single query so that I can introduce extra dimensions to analyse the data and represent it multiple ways easily. My suspicion is that I need a calculated member but I am not sure where to start with these. Any help would be greatly received - I am trying out a few things and will update this should I solve myself.
Simple formula would be
=(sumif(Items, DateReported="July 2014"))/(sumif(Items, Disposal Date="July 2014"))`
So the following data...
Month Recorded Month Disposed No of Items
May-14 May-14 25
May-14 Jun-14 3
May-14 Jul-14 45
Jun-14 232
Jun-14 Jun-14 40
Jun-14 Jul-14 46
Should produce...
Month No Recorded No Disposed Disposal Rate
01/05/2014 73 25 34%
01/06/2014 48 43 90%
01/07/2014 45 91 202%
My current MDX statement:
SELECT
NON EMPTY { [Measures].[No of Items] } ON COLUMNS,
NON EMPTY
{
([Date Reported].[Calendar Months].[Month].ALLMEMBERS
*
[Disposal Date].[Calendar Months].[Month].ALLMEMBERS )
} ON ROWS
FROM [Items]
You can use LinkMember to move a reference to one hierarchy (like [Date Reported].[Calendar Months]) to another one (like [Disposal Date].[Calendar Months]), provided both hierarchies have the exact same structure. Thus, only using [Date Reported] in your query, the calculation can use [Disposal Date]. The query would be like the following:
WITH MEMBER Measures.[Disposed in Date Reported] AS
(Measures.[No of Items],
LinkMember([Date Reported].[Calendar Months].CurrentMember, [Disposal Date].[Calendar Months]),
[Date Reported].[Calendar Months].[All]
)
MEMBER Measures.[Disposal Rate] AS
IIf([Measures].[No of Items] <> 0,
Measures.[Disposed in Date Reported] / [Measures].[No of Items],
NULL
), FORMAT_STRING = '0%'
SELECT { [Measures].[No of Items], Measures.[Disposed in Date Reported], Measures.[Disposal Rate] }
ON COLUMNS,
[Date Reported].[Calendar Months].[Month].ALLMEMBERS
ON ROWS
FROM [Items]
Possibly, you would want to adapt the column titles in your report. I left that out and used member names that desribe more what they do than what should be shown to users.

Calculate Facebook likes, comments, and shares for different time zones from saved UTC

I've been struggle with this for a while and hope someone can give me an idea to tackle this.
We have a service that goes out and collects Facebook likes, comments, and shares for each status update multiple times a day. The table that stores this data is something like this:
PostId EngagementTypeId Value CollectedDate
100 1(for likes) 10 1/1/2013 1:00
100 2 (comments) 2 1/1/2013 1:00
100 3 0. 1/1/2013 1:00
100. 1. 12 1/1/2013 3:00
100. 2. 3. 1/1/2013 3:00
100. 3 5. 1/1/2013 3:00
Value holds the total for each engagement type at the time of collection.
I got a requirement to create a report that shows new value per day at different time zones.
Currently,I'm doing the calculation in a stored procedure that takes in a time zone offset and based on that I calculate the delta for each day. If this is for someone in California, the report will show 12 likes, 3 comments, and 5 shares for 12/31/2012. But if someone with the time zone offset of -1, he will see 10 likes on 12/31/2012 and 2 likes on 1/1/2013.
The problem I'm having is doing the calculation on the fly can be slow if we have a lot of data and a big date range. We're talking about having the delta pre-calculated for each day and stored in a table and I can just query from that ( we're considering SSAS but that's for the next phase). But doing this, I would need to have the data for each day for 24 time zones. Am I correct (and if so, this is not ideal) or is there a better way to approach this?
I'm using SQL 2012.
Thank you!
You need to convert UTC DateTime stored in your column to Date based on users UTC time. This way you don't have to worry about any table that has to be populated with data. To get users date from your UTC column you will use something like this
SELECT CONVERT(DATE,(DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), '01/29/2014 04:00')))
AS MyLocalDate
The select statement above figures out Local date based on the difference of UTC date and local Date. You will need to replace GETDATE() with users DATETIME that is passed in to your procedure and replace '01/29/2014 04:00' with your column. This way when you select any date from your table it will be according to what that date was at users local time. Than you can calculate other fields accordingly.

Database schema design for financial forecasting

I need to develop a web app that allows companies to forecast financials.
the app has different screens, one for defining employee salaries, another for sales projections etc..
basically turn an excel financial forecast model into an app.
question is, what would be the best way to design the database, so that financial reports (e.g. a profit and loss statement or balance sheet) could be quickly generated?
assuming the forecast period is for 5 years, would you have a table
with 5 years*12 months = 60 fields per each row? is that performant
enough?
would you use DB triggers to recalculate salary expenses whenever a single employee data is changed?
I'd think it would be better to store each month's forecast in its own row in a table that looks like this
month forecast
----- --------
1 30000
2 31000
3 28000
... ...
60 52000
Then you can use the aggregate functions to calculate forecast reports, discounted cash flow etc. ( Like if you want the un-discounted total for just 4 years):
SELECT SUM(forecast) from FORECASTS where month=>1 and month<=48
For salary expenses, I would think that having a view that does calculations on the fly (or if you DB engine supports "materialized views" should have sufficient performance unless we're talking some giant number of employees or really slow DB.
Maybe have a salary history table, that trigger populates when employee data changes/payroll runs
employeeId month Salary
---------- ----- ------
1 1 4000
2 1 3000
3 1 5000
1 2 4100
2 2 3100
3 2 4800
... ... ...
Then again, you can do SUM or other aggregate function to get to the reported data.

SSAS -> AdventureWorks Example -> Using the browser to splice a measure by week, shows results that have two of the same week records?

I have been working on a cube and noticed that when I am browsing measures in my cube by weeks, I am getting an unexpected result, but first let me display my current scenario. I am looking at counts of a fact load by weeks. When I do so I am getting results like these. :
Weeks | Fact Internet Sales Count
2001-07-01 00:00:00.000 | 28
2001-07-08 00:00:00.000 | 29
....and so on as you would expect.
Further down I noticed this. :
2001-09-30 00:00:00.000 | 10
2001-09-30 00:00:00.000 | 24
As you can see, it shows the week twice with different counts, when you add these counts together it is the correct number of counts for this week (i.e. 34).
I am just confused why it is showing two weeks, when I look at the data in sql I can see that the difference in data between these two is strictly the month in which these dates fell (10 in the earliest month and 24 and the later month in any example).
I initially saw this in my original cube that I created on my own, in turn, I pulled up trusty adventureWorks practice cube and found that it was present in that cube also.
This is due to the fact that within this date hierarchy, the lowest attribute in the hierarchy was date not week. Therefore, there was always a split for weeks by date. This can be alleviated by making a date hierarchy with week as the lowest portion of a date hierarchy.

Resources