Google Data Studio - Custom Aggregation formula - google-data-studio

I am building a report in Google Data Studio, and I run into a problem with the aggregation for a couple of metrics.
As an example, I have the following table:
name. |. M1. |. M2. |. M3
A. 23 45 1,9
B. 45 6 0,1
C. 23 45 1,9
D. 12 34 2,8
E. 4 2 0,5
Where
M3 = M2/M1
Now, when I display this in a table in GDS, the totals for M1 and M2 are the sum of the values, and that is ok, but I can only choose between fixed aggregation operations, and the total for M3 should be:
Total M3 = sum(M2)/sum(M1)
Any idea if this is possible?

You can create your own custom field:
https://support.google.com/datastudio/answer/6299685?hl=en

Related

how to use the Pivoted Column values in Matrix use in another Tablix and write expressions on top of it

I have one set of data with fields
StudentId, Name , Address in one dataset and being used in one Tablix.
also another set of data: StudentID Subject Marks in another Dataset and using Matrix to Pivot in the Report.
I am able to fetch the Report in this way
StudentID Name Address MAths Physcis Chemistry Median
1 Mike NJ 85 70 90 2
2 David CA 81 85 90 1
I was calculating Median by counting number of Subject Marks greater than 80.
Now how do I use the value of Median in Tablix instead of in Matrix.
Below should be the expected output format
StudentID Median Name Address MAths Physcis Chemistry
1 2 Mike NJ 85 70 90
2 3 David CA 81 85 90
Note: I am using Matrix to Pivot Subject Column in SSRS Report. I am using Pivot operation in SSRS instead of performing in SP because I get 40 columns after Pivoting in SP and need to physically map 40 columns. Here in example I have only given 3 columns(Maths, Physcis and Chemistry).
Also please do let me know if expected output format is at least possible.
Is there any way that I will be able to Pivot Subject Columns inside the Tablix itself instead of using the another Matrix??
Thank you.
There are two ways to typically go about an aggregation like this. If you stick with the two existing datasets, you'll have to use the Lookup or LookupSet functions to get data from the other dataset. For example, if your table/matrix is using the second dataset as it's source, you would Lookup the Name of each student. Keep in mind that this is not efficient for large reports.
The other approach, which I would recommend, is to join these two datasets in SQL and use that as the data source for the report. This is more efficient and makes the report simpler to maintain.
It's good that you are letting the report do the pivoting for you, it works much better that way.

Snowflake run multiple MERGE statements on same table

I have a very large table (4+ billion rows) structured as follows :
user_id month nb_logins
-----------------------------
1 01 0
1 02 1
1 03 4
1 04 0
... ... ...
2 01 5
2 02 0
2 03 0
2 04 1
I would like to add a new column which is simply the cumulative sum of nb_logins partitioned by user_id and ordered by month.
I used to compute the whole thing in one query, however I decided to parallelize this as each user is independant (i.e. I can compute the cumsum for user 1 and user 2 in parallel).
Now, to parallelize I've create a list of "partitions" based on user_id (it's an evenly balanced int), let's say I have the following two partitions :
user_id between 0 and 10
user_id between 11 and 20
So I run two MERGE requests in parallel, one for each partition, however I see that most of the requests are BLOCKED in Snowflake because they try to write onto the same micro-partition.
Question : what would be the best way to parallelize this operation ?

How should i format/set up my dataset/dataframe? and factor ->numeric problems

New to R and new to this forum, tried searching, hope i dont embarass myself by failing to identify previous answers.
So i got my data, and i intend to do some kind of glmm's in the end but thats far away in the future, first im going to do some simple glm/lm's to learn what im doing
first about my data:
I have data sampled from 2 "general areas" on opposite sides of the country.
in these general areas there are roughly 50 trakts placed (in a grid, random staring point)
Trakts have been revisited each year for a duration of 4 years
A tract contains 16 sample plots, i intend to work on trakt-level so i use the means of the 16 sample plots for each trakt.
2x4x50 = 400 rows (actual number is 373 rows when i have removed trakts where not enough plots could be sampled due to terrain etc)
the data in my excel file is currently divided like this:
rows = trakts
Columns= the measured variable
i got 8-10 columns i want to use
short example how the data looks now:
V1 - predictor, 4 different columns
V2 - Response variable = proportional data, 1-4 columns depending on which hypothesis i end up testing,
the glmm in the end would look something like, (V2~V1+V1+V1,(area,year))
Area Year Trakt V1 V2
A 2015 1 25.165651 0
A 2015 2 11.16894652 0.1
A 2015 3 18.231 0.16
A 2014 1 3.1222 N/A
A 2014 2 6.1651 0.98
A 2014 3 8.651 1
A 2013 1 6.16416 0.16
B 2015 1 9.12312 0.44
B 2015 2 22.2131 0.17
B 2015 3 12.213 0.76
B 2014 1 1.123132 0.66
B 2014 2 0.000 0.44
B 2014 3 5.213265 0.33
B 2013 1 2.1236 0.268
How should i get started on this?
8 different files?
Nested by trakts ( do i start nesting now or later when i'm doing glmms?)
i load my data into r through the read.tables function
If i run: sapply(dataframe,class)
V1 and V2 are factors, everything else integer
if i run sapply(dataframe,mode)
everything is numeric
so finally to my actual problems, i have been trying to do normality tests (only trid shapiro so far) but i keep getting errors that imply my data is not numeric
also, when i run a normality test, do i only run one column and evaluate it before moving on to the next column or should i run several columns? the entire dataset?
should i in my case run independent normality tests for each of my areas and year?
hope it didnt end up to cluttered
best regards

Can Someone Normalize this table?

I have this table to normalize for a uni project, now every time I think it should just
be two tables, I then think no it should be three... I am going to throw this out to one of you guys superior knowledge as maybe you can indicate the best way it should be done and why.
Number Type Single rate Double rate Family rate
1 D 56 72
2 D 56 72
3 T 50 72
4 T 50 72
5 S 48
6 S 48
7 S 48
8 T 50 72
9 T 50 72
10 D 56 72
11 D 56 72
12 D 56 72
13 D 56 72
14 F 56 72 84
15 F 56 72 84
16 S 48
17 S 48
18 T 50 72
20 D 56 72
Many thanks for anyone that can help me to see the corret way
It is not possible to produce correct table design unless one understands exactly what the columns mean and how the data columns depend on one another. However, here is an attempt that can be refined once you provide more information for us. The used naming is not as good as I'd like it to be but as I said, the purpose is not clear in the question. Anyway, this is a start, hope it would help you.
Also note that Normalization is not always required for all types of applications. For example, Business Intelligence could use schema that are deliberately not fully normalized (e.g. Star Schema). So the database design may sometimes depend on the application nature and how data change.
Main
----
MainID int PK
MainTypeID Char(1) Example: D, T, S etc.
MainRateIntersectionID Int
MainRateIntersection
--------------------
MainRateIntersectionID int PK
MainID int
RateCategoryID int
The combination of MainID and RateCategoryID should be constrained
using UNIQUE INDEX
RateCategory
------------
RateCategoryID int PK
RateCategoryText Varchar2(15) Not Null Example:Single, Family, etc.
RateValue Int Nullable
MainType
---------
MainTypeID Char(1) PK
Edit
Based on the new information, I have revised the model. I have removed the 'artificial' IDs since this is a training project for Normalization. Artificial IDs (surrogate keys) are correct to add but is not your objective as I guess. I have to add booking table that where a row would be inserted for each customer that makes a booking. You need to add appropriate customer information in that table. The table you provided is more of a logical view that could be returned form a query but not a physical table to store and update in the database. Instead, the bookings table should be used.
I hope this could help you.

multiple criteria for sumifs

I have two sheets: one is input and the other is master.
A snapshot of my input sheet is shown below:
Workers Name WEEK working hrs
a11 w1 40
a22 w5 30
a33 w9 10
a44 w10 80
A snapshot of my master sheet is shown below (NB: the workers names are unique)
Workers Name W1 W2 W3 W4 W5 W6 W7 W8 W9 W10
a11 40
a22 30
a33 10
a44 80
I want a sumifs loop so that it can give result for workers working hrs in my masters table in their respective weeks(w1 to w13).
I am using a sum ifs formula for this:
Sheets("Master").Range("B2:B" & Range("A" & Rows.Count).End(xlUp).Row).Formula = "=SUMIFS(Input!C32,Input!C37,Master!C1,Input!C31,Master!B1)"
Any suggestions how to loop it in VBA
Please help....
I'm sorry but I'm not sure you need vba - if you know what your row and column criteria are in master you can write something like
=sumifs(workinghrs,workersname,$a2,week,b$2)
Then fill it down and across for your grid.
NB I've used named ranges for your input sheet for readability

Resources