How to calculate tables with percentages - pivot-table

I am having trouble creating a table in qliksense expressed in percentages. My raw data are like the following:
raw data:
I want to create a table that shows the number of new people per year among the different ages, both in absolute values and in percentage (of the total new students of each year).
I am using the following code to calculate the absolute value
Count(IF(NUM=1 and [NEW STUDENT]='YES', ((NAME))))
But I am really having trouble to calculate the same value in percentage. I studied the function TOTAL, but it does not work with IFS, so I tried with something like this after navigating on different forums
Count(IF(NUM=1 and [NEW STUDENT]='YES', ((NAME))))
/
Count(TOTAL {<[NEW STUDENT] = {'YES'}>} {<[NUM] = {1}>} NAME)
But it still does not work

please try following expression in table/chart:
Count({<[NEW STUDENT]={'YES'}, [NUM]={1}>} NAME)
/
Count({<[NEW STUDENT]={'YES'}, [NUM]={1}, [NAME]= >} NAME)
Please provide desired outcome table from included sample data so I can test it (there is also ready options like "show in percent" which can be used)

Related

Google Data Studio : how to obtain a SUM related to a COUNT_DISTINCT?

I have a dataset including 3 columns :
ID transac (The unique ID of the transaction - Dimension)
Source (The source of the transaction - Dimension)
Amount € (The amount of the transaction - Stat)
screenshot of my dataset
To Count the number of transactions (for one or more sources), i use COUNT_DISTINCT function
I want to make the sum of the transactions amounts (for one or more sources). But i don't want to additionate the amounts of the transactions with the same ID !
Is there a way to do this calcul with a DataStudio function ?
Thanks for your answers. :-)
EDIT : I saw that we could do this type of calculation via SQL here and I would like to do this in DataStudio (so that I don't have to pre-calculate the amounts per source.)
IMO, your dataset contains wrong data. Each value should be relative only to that line, but this is not the case: if the total is =20, each line should describe the participation of that line to the total. With 4 sources, each line should be =5 or something else that sums 20.
To solve it in DataStudio, you need something like CALCULATE function in PowerBI, but currently DataStudio doesn't support this feature.
But there are some options to consider to repair your data:
If you're sure there are always 4 sources, just create a new calculated field with the expression Amount/4 and SUM it. It is not an elegant solution, but it works.
If your data source is Google Sheets, you can easily repair the data using formulas, like in this example:
Link to spreadsheet
For this spreadsheet, I used this formula in adjusted_amount column: =C2/COUNTIF(A:A,A2). With this column in DataStudio, just use the usual SUM aggregation function to summarize it correctly.

Vlookup from multiple criteria to display nearest answer

I was hoping someone can help me. I have hit a solid wall.
I have a table with product information included and I am building a calculator which should spit out a number of options based on set criteria which is in the table. I am failing at just pulling through a code. I feel rather embarassed asking about how to do a vlookup here. But basically I have a vlookup which depends on multiple criteria and for the calc to cough out the nearest match (if applicable) based on this criteria.
Criteria 1 = Product
Criteria 2 = Type
Criteria 3 = Height
Criteria 4 = Min
I have created a search key in the table to concatenate all of these columns and then done a vlookup, which is =Vlookup(Criteria1 & Criteria2 & Criteria3 & Criteria4, Table Data, Code Required) But this does not appear to be giving me results, it either coughs out an error or the incorrect product. Below is my data and my calc I am hoping to complete. Can someone please help?
Here is an example looking for a closest match on Min. It demonstrates the principle so you can extend.
The closest match formula part is:
MATCH(MIN(ABS(E2:E4-K2)),ABS(E2:E4-K2),0))
Column E for column with Min values in. And K2 for target Min. This is an array formula entered with Ctrl + Shift+Enter. You would adjust the range of E2:E4.
The multiple criteria part is using:
=MATCH(lookup_value_1&lookup_value_2&lookup_value_3, lookup_array_1&lookup_array_2&lookup_array_3, match_type)
Where you are concantenating your parameters and searching for a match of the concatenation of those parameters in the table (you could do this against the key column if the key is made up of the same parameters.)
Overall formula with some test data (using one estimate figure):
=INDEX(F:F,MATCH(K1&K5&J5&INDEX(E2:E4,MATCH(MIN(ABS(E2:E4-K2)),ABS(E2:E4-K2),0)),B:B&C:C&D:D&E:E,0))
Above entered combined formula remember is an array formula so entered with Ctrl+Shift+Enter . You can reduce the ranges from entire columns to only those rows holding data.
Data data:
I am not typing all that out from picture so here is a quick n dirty
I tried with the QHarr's solution but it didn't work with all the rows.
My solution is:
Add a column with:
=IF(E2 < $K$2, E2, 0) and copy for all rows
In L5 create the formula:
{=INDEX(F2:F19,MATCH($K$1&K5&$J$5&INDEX(E2:E19,MATCH(MAX(SI(B2:B19=$K$1,1,0)*IF(C2:C19=K5,1,0)*IF(D2:D19=$J$5,1,0)*G2:G19,0),E2:E19,0)),B2:B19&C2:C19&D2:D19&E2:E19,0))}
Copy the formula to L6 and L7
Excel exercise printscreen
Originally marked this as answered and it did work initially but as I added more products it began to fail. I did manage to (after much trial and error) find a simple solution {=INDEX(Calc!$I$2:$I$189,MATCH(Output!$H$7,IF(Calc!$B$2:$B$189=Output!A12,Calc!$H$2:$H$189),1))}

How do I calculate the percentage of a count function?

I am trying to take the percentage of a count function so to create a MS BIDS report resembling this excel file:
Excel Close Rate Summary
The unique identifier for the opportunities is the field "opportunityid", so I am using COUNT(Fields!opportunityid.Value) to determine the number of cases in each stage. I want to write an expression that will return the percentage of cases in each stage per creation month. Which can be seen in the above excel screenshot.
This is my current MS BIDS report when i preview it
To be more specific, I want to have the percentage of "Active" and "New" opportunities in January to represent 67% and 33% respectively. 67% comes from 4/6. The 4 comes from the active opportunities out of the 6 opportunities created in January. Likewise, the 33% comes from the 2 new opportunities out of the 6 that were created in January.
There are more stage names than Active and New. Other options include New, Warm, Hot, Implementation, Active, Hibernate or Canceled. This is relevant to mention because I have tried to create an expression that counts based on the number of opportunities with a specific stage name, but have been unsuccessful.
Currently the expression I am using to calculate the percentage is:
=COUNT(Fields!new_rptstage.Value)/SUM(COUNT(Fields!opportunityid.Value),"GroupbyStageName")
Based on this expression, I am only able to get 1/1 or 100% for each of the stage names. I have tried a bunch of variations of the above expression by changing the scope, but have been unsuccessful in getting the desired results. Can someone explain how to correct this?
SAMPLE DATA:
In the sample data, I want the expression to be in the percentage column. The percentage should be the # of cases in a particular stage for the total cases that month. So looking at the above picture:
Active February 54 54/168 [have 54/168 display as a percentage]
Warm February 8 8/168
etc.
EDIT:
These are the expressions that may help show the underlying data in the chart.
The creation month expression is
=Fields!MonthCreated.Value & " " & year(Fields!createdon.Value)
The percent expression is listed above.
You don't want to use the COUNT() function. COUNT(*) returns a count of the number of rows that have a value. It doesn't return the actual value.
Since you've only showed a screen shot of your report, I don't know how your underlying data columns relate to it, but what you want to do for your Percent column expression is this:
This is psuedo code because I don't know your dataset field names:
CaseCount.Value / SUM(CaseCount.Value)
EDIT: Now that I better understand how your data relates to your report, I think the only change you need to make to your existing formula is casting it to a decimal type. It's probably rounding all fractions up to 1.
Try this for the expression in your percentage column:
=CDbl(COUNT(Fields!new_rptstage.Value))/CDbl(SUM(COUNT(Fields!opportunityid.Value),"GroupbyStageName"))

Creating custom rollups with SSAS

I am currently working on a requirement as follows and would appreciate some help in figuring out a way to configure the aggregation of my measure:
I have a fact table that contains the following Item ID, DateID,StoreID, ReceivedComments. The way received comments work is that on a daily basis a new record is created that adds to the value of received comments (for example if Item 5 in Store 5 on 1 Jan had 23 Received Comments and it received 5 comments the following day, the row for Jan 2 would be Item 5, Store 5, Jan 2, 28)
We created a measure using MAX and it works fine whenever Item ID is used in the query. When we start moving to a higher level the max produces wrong results. Our requirement is to setup the measure to be as follows:
If the member selected is on the Item Level then MAX, if it's on any other level (Date or Store) then the measure should aggregate the Max of all Items under this date or store.
Due to the business rules and structure of the database Store and Item are different dimensions so I can not include them in 1 Hierarchy.
We have been playing around with Custom RollUps but so far haven't been able to get it to work.
Thanks
I would solve this by using a more traditional approach to your fact table. Instead of keeping a cumulative count in the ReceivedComments column, I would keep only the number of comments received THAT DAY.
That way, instead of using MAX, you can create your measure using SUM, and it will automatically rollup when you go to higher levels.
The only disadvantage I can see to this approach is that you will need to use a range of dates, instead of only the most recent date, to get a full total of all the comments for a given item/store/date. But that's a very small change to your MDX.
Someone suggested using ISLEAF to determine the level, Instead of using ISLeaf i went with AS CASE WHEN [Item].[ItemID].CURRENTMEMBER.LEVEL IS [Item].[ItemID].[(All)] so I don't have to account for other dimensions such as Date, Store, etc as I have several other dimensions that all behave the same way.
And then I went with this formula to determine the Sum of the Max of the items in a particular store like this:
SUM({[Item].[Item ID].children},[Measures].[ReceivedComments]), Now I expect some performance issues with this measure but we are currently running some tests to see if it's gonna be reliable to work with it on actual data.

couchdb map / reduce multiple keys filtering by date

I have a view setup with a map reduce. Right now this code works great:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([id,doc.items[id].name], 1);
}
}
}
}
function(keys,prices){
return (keys, sum(prices));
}
I get a return and when using the group parameter, it condenses everything just fine.
My issue/question, I want to add a third key.... DATE, so I may only reduce records from certain dates. So for example:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([date,id,doc.items[id].name], 1);
}
}
}
}
My issue is that since date is at the beginning of the array, the reduce groups by date, id etc. I know I use group_level and say just take the first key from the array or the first 2 keys, but that doesn't help either because afaik, group_level goes from left to right in the array. I could put the date on the end of the emit array, but that doesn't help either because I need to have values at the beginning of my startkey and endkey to search on.
Here is an example of the output of data:
{"key":["2012-03-13","356752b8a5f6871f3","Apple"],"value":1},
{"key":["2012-03-20","123752b8a76986857","Pear"],"value":1},
{"key":["2012-04-12","3013531de05871194","Grapefruit"],"value":1},
{"key":["2012-04-12","356752b8a5f6871f3","Apple"],"value":1},
I want APPLE to be added up in one row, here it's adding up apples by date first. I was able to successfully just add up all the apples if I remove DATE as the first key in the array, but then I can't search by date range.
Any ideas on how to accomplish this?
If I correctly understand what you want to do, then you'd want to put the date as the first element of your array, and use group_level as well as start_key and end_key.
Eg. startkey=[1, "someid"] endkey=[1,"someid",{}] group_level=2
Will get you all items from date 1 (obviously choose your own format here), with id "someid" and any name. It seems funny that you emit id's before names, and without having more information about what you're actually trying to accomplish, it's hard to advise your general data model. If ID is a "type" id meaning that many items share the same ID then this makes sense. If ID is a unique per item ID, then it does not. In that case, you'd want to emit "name" before ID...
Edit 1
As per your comment, to do a range of dates you do this:
startkey=[1] endkey=[5,{}] group_level=2
You will get everything from date 1 to date 5 grouped by id ie. apples, oranges etc. I use this exact technique in a very large scale production application. I actually formatted the dates as an easily human readable integers of the format yyyymmdd, so 20140624 would sort to the top. If I want everything from the start of the month till now grouped by my group ids, I call
startkey=[20140601] endkey=[20140624,{}] group_level=2
It works perfectly and as far as I can tell that's what you're looking to do. I also have a third key layer "detail" which allows me to provide a deeper level of grouping for items that need it. I can then call
startkey=[20140601, "someid"] endkey=[20140624, "someid",{}] group_level=3
To drill to the detail level for a particular id, or just use the previous query with group_level=3 if I want the details for every id. I'm certain you can make this work - I've solved this exact problem in a production application using the techniques described.
Edit 2
If you want to group all apples regardless of date, then you'll need to let apples be the first element in the key. You can then get all apples over all time as a single row in the view result using group_level=1, and Apples over a date range using group_level=2. The difference here is that you'll only be able to do the group_level=2 query on a single item type at a time. If you want the best of both worlds, you unfortunately just need to make 2 views. That's just how key ordering works... If you need fast response times for both types of queries, all item types over a date range, and all of a particular item not grouped by date, I believe 2 views is the only way to achieve that.
Note
Another thing to note is about your reduce function. Wherever possible it is highly recommended that you use the built in reduce functions. They're implemented in erlang and are highly optimized compared to custom javascript reduce functions.
In your case, just replace your reduce function with this
_sum
Easy hey?
If you post more info about your application, data model etc. then I'd be happy to help out more with your database design.

Resources