Google Data Studio with BigQuery Data Source Issue in Calculated Fields and Aggregation

Google Data Studio with BigQuery Data Source Issue in Calculated Fields and Aggregation - google-data-studio

I have a Google Data Studio dashboard that loads really slowly since it's using Google Sheets as a Data Source. I migrated the same data to BigQuery then used it as my new Data Source however, I came across an issue:
When creating a calculated field, the new calculated field is not tagged as Auto in the Default Aggregation I still have to select Sum as a Default Aggregation. This causes problems in my report. Also, it's not Blue, where normal fields are shown as green, and calculated fields are shown as Blue.
When I was using Google Sheets, I could do direct computations in the calculated fields.
Example:
Handle Time = Talk Time / Number of calls
I just create a calculated field called Handle Time, then put the formula Talk Time / Number of calls
Now, I need to create 3 separate Calculated Fields:
Calculated Field 1: SUM(Talk Time)
Calculated Field 2: SUM(Number of calls)
Calculated Field 3: Calculated Field 1 / Calculated Field 2
This is even though I already tagged them as Sum in the Default Aggregation. Can anyone help me understand what I'm doing wrong?

Solution:
A single calculated field will do the trick; the aggregation of each respective field needs to be stated explicitly in the calculated field:
SUM(Talk Time) / SUM(Number of calls)
Why the Change?
To elaborate, the change was part of the Data Modeling update on 31st October 2020; one of the benefits of explicitly stating the aggregation is that it offers greater flexibility with the ability to aggregate fields as required when creating a calculated field, for example, something like:
MAX(Talk Time) - MIN(Talk Time) / COUNT(Handle Time) * AVG(Handle Time) / COUNT_DISTINCT(Text_Field1) * COUNT(Text_Field2)
Speed
Regarding speed, where the Data Set is large and static (daily updates are fine and real time data is not required), then a Data Extract would be a good option.

Dimensions are shown as green, metrics are shown as blue. Data imported from other sources, particularly from Google sheets tend to show metrics as green but when you add them to a chart or table they get recognised as metrics and change to blue.

Related

Issue with slicer Filtering from different data sets/ columns

I am having a problem trying to understand how to accomplish this. I want to use one set of slicers in my Excel spreadsheet to drill down to specific information. The problem is that I have duplicated Model names under the "Intel" worksheet. The reason is that Model Name could have one or two controllers. I have created all the queries, Power Pivots, and relationships. The link to the file is available here (this is all public data) if someone is willing to take a look and provide the guideline.
PROBLEM:
Due to Model Name's duplication under the "intel" worksheet, I have created a "DUP" column to identify duplicates in my data with the "X." I thought if I made a column “RELATED -Devide by 2” in the Power Pivot “Intel” with the formula =IF([DUP]="X," [RELATED - 12 Month Volume]/2, [RELATED - 12 Month Volume])", I would be able to show correct 12Month Volume based on Volume worksheet. This is partially true. I came to an understanding that I need to use both, “RELATED - 12 Month Volume” and “RELATED -Devide by 2” depending on what slicer I am filtering with
If Filtered by FORM Factor or Vendor, I can use RELATED - Divide by 2 (Orange color as shown below).
Now, if I filter above with Controller (like X710-TM4), this is not good. For Controller Filter, I would need to use “RELATED - 12 Month Volume” (Blue color as shown below), which is NOT suitable for above
How do I accomplish this? One set of slicers and be able to drill down and show correct value based on slicer used
enter image description here

Never mind... I figured it out with the CROSSFILTER measure

Google Data Studio : how to obtain a SUM related to a COUNT_DISTINCT?

I have a dataset including 3 columns :
ID transac (The unique ID of the transaction - Dimension)
Source (The source of the transaction - Dimension)
Amount € (The amount of the transaction - Stat)
screenshot of my dataset
To Count the number of transactions (for one or more sources), i use COUNT_DISTINCT function
I want to make the sum of the transactions amounts (for one or more sources). But i don't want to additionate the amounts of the transactions with the same ID !
Is there a way to do this calcul with a DataStudio function ?
Thanks for your answers. :-)
EDIT : I saw that we could do this type of calculation via SQL here and I would like to do this in DataStudio (so that I don't have to pre-calculate the amounts per source.)

IMO, your dataset contains wrong data. Each value should be relative only to that line, but this is not the case: if the total is =20, each line should describe the participation of that line to the total. With 4 sources, each line should be =5 or something else that sums 20.
To solve it in DataStudio, you need something like CALCULATE function in PowerBI, but currently DataStudio doesn't support this feature.
But there are some options to consider to repair your data:
If you're sure there are always 4 sources, just create a new calculated field with the expression Amount/4 and SUM it. It is not an elegant solution, but it works.
If your data source is Google Sheets, you can easily repair the data using formulas, like in this example:
Link to spreadsheet
For this spreadsheet, I used this formula in adjusted_amount column: =C2/COUNTIF(A:A,A2). With this column in DataStudio, just use the usual SUM aggregation function to summarize it correctly.

representing 2 different metrics with different columns in line chart

I am preparing a datastudio report. The data studio report consists of the following columns below:
As seen in the picture it captures metric data at a particular time.
The date range is set as the end_time
The X axis will represent the end_time column and the breakdown dimension will be InstanceName column and I am preparing to show it as line chart.
There are 2 metrics readops and writeops columns.
I need to represent these 2 metrics as 2 different lines in the same chart so that the read and write operation fluctuations for the instance at a particular time can be easily viewed.
I am not sure on how to represent this on data studio. If it is 1 metrics, i know it is straight forward. I can set instancename as breakdown dimension and end_time as dimension and date range is set to end_time and can represent it. But to represent the 2 metrics as 2 different lines for a particular Instance, with the time range on x axis, i don't know as I am very new to datastudio. I want to do it without drill-down. similar to the picture below which I got from google cloud console which shows ReadBytes the Big Triangle and the smaller one at the bottom in red for WriteBytes
Can anyone help me? Thanks

There are two approaches based on how the the charts need to be displayed:
1) Filter Control
If the aim is to only display 2 lines (2 Metrics):
readops
writeops
While allowing the user to select the required InstanceName, then a Filter Control (optionally with a default selection) could be used.
The chart would be set up using:
Dimension: end_time
Metric #1: readops
Metric #2: writeops
Editable Google Data Studio Report and a GIF to expand on the above:
2) Multiple Metrics
If the objective is to display a line for each of the InstanceName values as well as both the Metrics (readops and writeops), then the below approach would be one way.
Currently, when using a Breakdown Dimension, Google Data Studio charts (such as a Time Series chart) support a single metric.
Using the Data Set below, based on the screenshot in the question (Editable Google Sheets):
One approach is to create and use multiple CASE statements at the Data Source; for example:
readops_dum
CASE
WHEN REGEXP_MATCH(InstanceName, "(dum)") THEN readops
ELSE NULL
END
writeops_dum
CASE
WHEN REGEXP_MATCH(InstanceName, "(dum)") THEN writeops
ELSE NULL
END
etc...
Editable Google Data Studio Data Source and an image to elaborate:
A Chart could then be created with end_time as the Dimension and using the newly created Metrics; Editable Google Data Studio Report and an image to visualise with a Time Series chart:

How do I calculate the percentage of a count function?

I am trying to take the percentage of a count function so to create a MS BIDS report resembling this excel file:
Excel Close Rate Summary
The unique identifier for the opportunities is the field "opportunityid", so I am using COUNT(Fields!opportunityid.Value) to determine the number of cases in each stage. I want to write an expression that will return the percentage of cases in each stage per creation month. Which can be seen in the above excel screenshot.
This is my current MS BIDS report when i preview it
To be more specific, I want to have the percentage of "Active" and "New" opportunities in January to represent 67% and 33% respectively. 67% comes from 4/6. The 4 comes from the active opportunities out of the 6 opportunities created in January. Likewise, the 33% comes from the 2 new opportunities out of the 6 that were created in January.
There are more stage names than Active and New. Other options include New, Warm, Hot, Implementation, Active, Hibernate or Canceled. This is relevant to mention because I have tried to create an expression that counts based on the number of opportunities with a specific stage name, but have been unsuccessful.
Currently the expression I am using to calculate the percentage is:
=COUNT(Fields!new_rptstage.Value)/SUM(COUNT(Fields!opportunityid.Value),"GroupbyStageName")
Based on this expression, I am only able to get 1/1 or 100% for each of the stage names. I have tried a bunch of variations of the above expression by changing the scope, but have been unsuccessful in getting the desired results. Can someone explain how to correct this?
SAMPLE DATA:
In the sample data, I want the expression to be in the percentage column. The percentage should be the # of cases in a particular stage for the total cases that month. So looking at the above picture:
Active February 54 54/168 [have 54/168 display as a percentage]
Warm February 8 8/168
etc.
EDIT:
These are the expressions that may help show the underlying data in the chart.
The creation month expression is
=Fields!MonthCreated.Value & " " & year(Fields!createdon.Value)
The percent expression is listed above.

You don't want to use the COUNT() function. COUNT(*) returns a count of the number of rows that have a value. It doesn't return the actual value.
Since you've only showed a screen shot of your report, I don't know how your underlying data columns relate to it, but what you want to do for your Percent column expression is this:
This is psuedo code because I don't know your dataset field names:
CaseCount.Value / SUM(CaseCount.Value)
EDIT: Now that I better understand how your data relates to your report, I think the only change you need to make to your existing formula is casting it to a decimal type. It's probably rounding all fractions up to 1.
Try this for the expression in your percentage column:
=CDbl(COUNT(Fields!new_rptstage.Value))/CDbl(SUM(COUNT(Fields!opportunityid.Value),"GroupbyStageName"))

Creating custom rollups with SSAS

I am currently working on a requirement as follows and would appreciate some help in figuring out a way to configure the aggregation of my measure:
I have a fact table that contains the following Item ID, DateID,StoreID, ReceivedComments. The way received comments work is that on a daily basis a new record is created that adds to the value of received comments (for example if Item 5 in Store 5 on 1 Jan had 23 Received Comments and it received 5 comments the following day, the row for Jan 2 would be Item 5, Store 5, Jan 2, 28)
We created a measure using MAX and it works fine whenever Item ID is used in the query. When we start moving to a higher level the max produces wrong results. Our requirement is to setup the measure to be as follows:
If the member selected is on the Item Level then MAX, if it's on any other level (Date or Store) then the measure should aggregate the Max of all Items under this date or store.
Due to the business rules and structure of the database Store and Item are different dimensions so I can not include them in 1 Hierarchy.
We have been playing around with Custom RollUps but so far haven't been able to get it to work.
Thanks

I would solve this by using a more traditional approach to your fact table. Instead of keeping a cumulative count in the ReceivedComments column, I would keep only the number of comments received THAT DAY.
That way, instead of using MAX, you can create your measure using SUM, and it will automatically rollup when you go to higher levels.
The only disadvantage I can see to this approach is that you will need to use a range of dates, instead of only the most recent date, to get a full total of all the comments for a given item/store/date. But that's a very small change to your MDX.

Someone suggested using ISLEAF to determine the level, Instead of using ISLeaf i went with AS CASE WHEN [Item].[ItemID].CURRENTMEMBER.LEVEL IS [Item].[ItemID].[(All)] so I don't have to account for other dimensions such as Date, Store, etc as I have several other dimensions that all behave the same way.
And then I went with this formula to determine the Sum of the Max of the items in a particular store like this:
SUM({[Item].[Item ID].children},[Measures].[ReceivedComments]), Now I expect some performance issues with this measure but we are currently running some tests to see if it's gonna be reliable to work with it on actual data.