How Can I get Tableau to just Pull a Specific Data Point - database

I am new to Tableau and am trying to figure out how to isolate a row of data in a calculated field.
For example, to get a certain piece of data for a certain year only and then use that for calculations.
I want to find the percentage change using a calculated field between undocumented immigration growth from 2000-2010-2018.
I have tried the following to just try to isolate the value point for 2018:
(IF STR([Year])='2018' THEN [Undocmented Population] ELSE NULL end)
and also this but it needs to be aggregate measures:
{ FIXED [State]: IF STR([Year])='2018' THEN [Undocmented Population] ELSE NULL end}
I just want to isolate the date points for the various years per state so then I can perform a simple change in percentage calculation and cannot seem to figure out how to do it...
Tableau Excerpt

Related

Google Data Studio with BigQuery Data Source Issue in Calculated Fields and Aggregation

I have a Google Data Studio dashboard that loads really slowly since it's using Google Sheets as a Data Source. I migrated the same data to BigQuery then used it as my new Data Source however, I came across an issue:
When creating a calculated field, the new calculated field is not tagged as Auto in the Default Aggregation I still have to select Sum as a Default Aggregation. This causes problems in my report. Also, it's not Blue, where normal fields are shown as green, and calculated fields are shown as Blue.
When I was using Google Sheets, I could do direct computations in the calculated fields.
Example:
Handle Time = Talk Time / Number of calls
I just create a calculated field called Handle Time, then put the formula Talk Time / Number of calls
Now, I need to create 3 separate Calculated Fields:
Calculated Field 1: SUM(Talk Time)
Calculated Field 2: SUM(Number of calls)
Calculated Field 3: Calculated Field 1 / Calculated Field 2
This is even though I already tagged them as Sum in the Default Aggregation. Can anyone help me understand what I'm doing wrong?
Solution:
A single calculated field will do the trick; the aggregation of each respective field needs to be stated explicitly in the calculated field:
SUM(Talk Time) / SUM(Number of calls)
Why the Change?
To elaborate, the change was part of the Data Modeling update on 31st October 2020; one of the benefits of explicitly stating the aggregation is that it offers greater flexibility with the ability to aggregate fields as required when creating a calculated field, for example, something like:
MAX(Talk Time) - MIN(Talk Time) / COUNT(Handle Time) * AVG(Handle Time) / COUNT_DISTINCT(Text_Field1) * COUNT(Text_Field2)
Speed
Regarding speed, where the Data Set is large and static (daily updates are fine and real time data is not required), then a Data Extract would be a good option.
Dimensions are shown as green, metrics are shown as blue. Data imported from other sources, particularly from Google sheets tend to show metrics as green but when you add them to a chart or table they get recognised as metrics and change to blue.

ValueFilter for DateTime Attributes

I'm working with the Blog app and I see how to filter the Blog posts by year using the Visual Query Designer. I use the querystring value that has the year and in the ValueFilter and my properties are as follows:
Attribute: PublicationMoment
Value: [QueryString:year]-01-01 and [QueryString:year]-12-31
Operation: between
How would I get the posts from a specific month and year, if those values are passed via query string parameters. Because the months of the year have a varying number of days, I'm not sure how you would accomplish this in the Value field of the ValueFilter. Currently I'm passing the 2 digit month as the parameter.
I tried something like: [QueryString:year]-[Querystring:month]
Operation: contains
but the above operation doesn't really work because the datatype is a DateTime object.
I could do it in the razor view but I'm afraid that the paging datasource would have too many pages in it since it would be based on the larger subset of posts for the given year that was passed in the querystring parameter.
Is there any way to do this with the filter?
Basically dates are not perfectly handled yet, but there are a few ways to do it using the visual query:
Use the correct date in the query like between [QueryString:Start] and [QueryString:End] and calculate the correct dates there where you generate the links
Since your main problem with the "between" filter is actually that it would include the last day too, you could also use a two filters a >= first date and another < second date, so the first-date would be the year/month and day 1; the second one is year-month and day 1 as well
Last but not least: if you do it with razor and LINQ you shouldn't run into any performance issues - it's technically the same thing the pipeline does and it's been tested to perform well with tens of thousands of records.

How do I calculate the percentage of a count function?

I am trying to take the percentage of a count function so to create a MS BIDS report resembling this excel file:
Excel Close Rate Summary
The unique identifier for the opportunities is the field "opportunityid", so I am using COUNT(Fields!opportunityid.Value) to determine the number of cases in each stage. I want to write an expression that will return the percentage of cases in each stage per creation month. Which can be seen in the above excel screenshot.
This is my current MS BIDS report when i preview it
To be more specific, I want to have the percentage of "Active" and "New" opportunities in January to represent 67% and 33% respectively. 67% comes from 4/6. The 4 comes from the active opportunities out of the 6 opportunities created in January. Likewise, the 33% comes from the 2 new opportunities out of the 6 that were created in January.
There are more stage names than Active and New. Other options include New, Warm, Hot, Implementation, Active, Hibernate or Canceled. This is relevant to mention because I have tried to create an expression that counts based on the number of opportunities with a specific stage name, but have been unsuccessful.
Currently the expression I am using to calculate the percentage is:
=COUNT(Fields!new_rptstage.Value)/SUM(COUNT(Fields!opportunityid.Value),"GroupbyStageName")
Based on this expression, I am only able to get 1/1 or 100% for each of the stage names. I have tried a bunch of variations of the above expression by changing the scope, but have been unsuccessful in getting the desired results. Can someone explain how to correct this?
SAMPLE DATA:
In the sample data, I want the expression to be in the percentage column. The percentage should be the # of cases in a particular stage for the total cases that month. So looking at the above picture:
Active February 54 54/168 [have 54/168 display as a percentage]
Warm February 8 8/168
etc.
EDIT:
These are the expressions that may help show the underlying data in the chart.
The creation month expression is
=Fields!MonthCreated.Value & " " & year(Fields!createdon.Value)
The percent expression is listed above.
You don't want to use the COUNT() function. COUNT(*) returns a count of the number of rows that have a value. It doesn't return the actual value.
Since you've only showed a screen shot of your report, I don't know how your underlying data columns relate to it, but what you want to do for your Percent column expression is this:
This is psuedo code because I don't know your dataset field names:
CaseCount.Value / SUM(CaseCount.Value)
EDIT: Now that I better understand how your data relates to your report, I think the only change you need to make to your existing formula is casting it to a decimal type. It's probably rounding all fractions up to 1.
Try this for the expression in your percentage column:
=CDbl(COUNT(Fields!new_rptstage.Value))/CDbl(SUM(COUNT(Fields!opportunityid.Value),"GroupbyStageName"))

Creating custom rollups with SSAS

I am currently working on a requirement as follows and would appreciate some help in figuring out a way to configure the aggregation of my measure:
I have a fact table that contains the following Item ID, DateID,StoreID, ReceivedComments. The way received comments work is that on a daily basis a new record is created that adds to the value of received comments (for example if Item 5 in Store 5 on 1 Jan had 23 Received Comments and it received 5 comments the following day, the row for Jan 2 would be Item 5, Store 5, Jan 2, 28)
We created a measure using MAX and it works fine whenever Item ID is used in the query. When we start moving to a higher level the max produces wrong results. Our requirement is to setup the measure to be as follows:
If the member selected is on the Item Level then MAX, if it's on any other level (Date or Store) then the measure should aggregate the Max of all Items under this date or store.
Due to the business rules and structure of the database Store and Item are different dimensions so I can not include them in 1 Hierarchy.
We have been playing around with Custom RollUps but so far haven't been able to get it to work.
Thanks
I would solve this by using a more traditional approach to your fact table. Instead of keeping a cumulative count in the ReceivedComments column, I would keep only the number of comments received THAT DAY.
That way, instead of using MAX, you can create your measure using SUM, and it will automatically rollup when you go to higher levels.
The only disadvantage I can see to this approach is that you will need to use a range of dates, instead of only the most recent date, to get a full total of all the comments for a given item/store/date. But that's a very small change to your MDX.
Someone suggested using ISLEAF to determine the level, Instead of using ISLeaf i went with AS CASE WHEN [Item].[ItemID].CURRENTMEMBER.LEVEL IS [Item].[ItemID].[(All)] so I don't have to account for other dimensions such as Date, Store, etc as I have several other dimensions that all behave the same way.
And then I went with this formula to determine the Sum of the Max of the items in a particular store like this:
SUM({[Item].[Item ID].children},[Measures].[ReceivedComments]), Now I expect some performance issues with this measure but we are currently running some tests to see if it's gonna be reliable to work with it on actual data.

couchdb map / reduce multiple keys filtering by date

I have a view setup with a map reduce. Right now this code works great:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([id,doc.items[id].name], 1);
}
}
}
}
function(keys,prices){
return (keys, sum(prices));
}
I get a return and when using the group parameter, it condenses everything just fine.
My issue/question, I want to add a third key.... DATE, so I may only reduce records from certain dates. So for example:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([date,id,doc.items[id].name], 1);
}
}
}
}
My issue is that since date is at the beginning of the array, the reduce groups by date, id etc. I know I use group_level and say just take the first key from the array or the first 2 keys, but that doesn't help either because afaik, group_level goes from left to right in the array. I could put the date on the end of the emit array, but that doesn't help either because I need to have values at the beginning of my startkey and endkey to search on.
Here is an example of the output of data:
{"key":["2012-03-13","356752b8a5f6871f3","Apple"],"value":1},
{"key":["2012-03-20","123752b8a76986857","Pear"],"value":1},
{"key":["2012-04-12","3013531de05871194","Grapefruit"],"value":1},
{"key":["2012-04-12","356752b8a5f6871f3","Apple"],"value":1},
I want APPLE to be added up in one row, here it's adding up apples by date first. I was able to successfully just add up all the apples if I remove DATE as the first key in the array, but then I can't search by date range.
Any ideas on how to accomplish this?
If I correctly understand what you want to do, then you'd want to put the date as the first element of your array, and use group_level as well as start_key and end_key.
Eg. startkey=[1, "someid"] endkey=[1,"someid",{}] group_level=2
Will get you all items from date 1 (obviously choose your own format here), with id "someid" and any name. It seems funny that you emit id's before names, and without having more information about what you're actually trying to accomplish, it's hard to advise your general data model. If ID is a "type" id meaning that many items share the same ID then this makes sense. If ID is a unique per item ID, then it does not. In that case, you'd want to emit "name" before ID...
Edit 1
As per your comment, to do a range of dates you do this:
startkey=[1] endkey=[5,{}] group_level=2
You will get everything from date 1 to date 5 grouped by id ie. apples, oranges etc. I use this exact technique in a very large scale production application. I actually formatted the dates as an easily human readable integers of the format yyyymmdd, so 20140624 would sort to the top. If I want everything from the start of the month till now grouped by my group ids, I call
startkey=[20140601] endkey=[20140624,{}] group_level=2
It works perfectly and as far as I can tell that's what you're looking to do. I also have a third key layer "detail" which allows me to provide a deeper level of grouping for items that need it. I can then call
startkey=[20140601, "someid"] endkey=[20140624, "someid",{}] group_level=3
To drill to the detail level for a particular id, or just use the previous query with group_level=3 if I want the details for every id. I'm certain you can make this work - I've solved this exact problem in a production application using the techniques described.
Edit 2
If you want to group all apples regardless of date, then you'll need to let apples be the first element in the key. You can then get all apples over all time as a single row in the view result using group_level=1, and Apples over a date range using group_level=2. The difference here is that you'll only be able to do the group_level=2 query on a single item type at a time. If you want the best of both worlds, you unfortunately just need to make 2 views. That's just how key ordering works... If you need fast response times for both types of queries, all item types over a date range, and all of a particular item not grouped by date, I believe 2 views is the only way to achieve that.
Note
Another thing to note is about your reduce function. Wherever possible it is highly recommended that you use the built in reduce functions. They're implemented in erlang and are highly optimized compared to custom javascript reduce functions.
In your case, just replace your reduce function with this
_sum
Easy hey?
If you post more info about your application, data model etc. then I'd be happy to help out more with your database design.

Resources