Return the first non-zero in a column/row in tableau - zero

I am trying to return the appearance of first non-zeros in a row. The variable I want to return is Fiscal Year that when each customer first started to buy the product.
In my case, I would like to return the Year they first started. The first appearance of "1" in each row represents when they started for the first time, so I want to return the Year for that customer when that first number appears.
ID 1950 1951 1953 1955 1959 1965 1968 1972 1974 1975 1976
1 1 1 1 1 1 1
2 1
3 1 1 1
4 1 1 1 1
5 1 1
6 1
7 1
8 1 1
9
10 1 1 1 1 1
11 1 1 1 1
12 1

Use a level-of-detail (LOD) calculation. An LOD allows you to apply a calculation, in this case min() to a dataset for a given set of dimensions. You will need to decide whether to used FIXED or INCLUDE for your particular situation (they behave differently in the presence of filters). I'm making an assumption that your ID column is a customer Id.
{ INCLUDE [ID] : Min([Fiscal Year])}
Much more info available in the online help documents at https://onlinehelp.tableau.com/current/pro/desktop/en-us/calculations_calculatedfields_lod_overview.html.

Related

Partition by with conditions

I have a table which contains info on customer purchases per year and month respectively. Here is a simplified version.
id
year
month
nb_purch
1
2001
1
1
1
2001
2
4
1
2001
3
7
...
...
...
...
1
2001
12
3
1
2003
1
3
1
2003
2
2
1
2003
3
5
1
2003
4
7
...
...
...
...
1
2003
12
3
2
2001
1
3
2
2001
2
2
2
2001
3
5
2
2001
4
7
Basically there are several constraints. The database contains only the years when the client has made a purchase. If the client has made a purchase within the year X then X will be divided into 12 rows according to months. The months with no purchases have the value 0.
What I am trying to do is to retrieve the number of purchases per certain "windows". Currently its value sits at 3 years. For example i want to retrieve the sum of nb_purch within the last 3 years starting from 2003 march. This means i need to add all values from
march 2001 to march 2003.
SELECT SUM(nb_purch) OVER (PARTITION BY id ORDER BY year, month ASC ROWS BETWEEN 36 PRECEDING AND CURRENT ROW) AS LAST_3_YEARS FROM T
The issue i am facing here is that the table does not contain all years and therefore in my example of purchases between (2001 and 2003) if the year 2002 is missing then i am getting false results. I would like to avoid having to add all missing years and fill them with NULL values for each customer.

How to aggregate number of notes sent to each user?

Consider the following tables
group (obj_id here is user_id)
group_id obj_id role
--------------------------
100 1 A
100 2 root
100 3 B
100 4 C
notes
obj_id ref_obj_id note note_id
-------------------------------------------
1 2 10
1 3 10
1 0 foobar 10
1 4 20
1 2 20
1 0 barbaz 20
2 0 caszes 30
2 1 30
4 1 70
4 0 taz 70
4 3 70
Note: a note in the system can be assigned to multiple users (for instance: an admin could write "sent warning to 2 users" and link it to 2 user_ids). The first user the note gets linked to is stored differently than the other linked users. The note itself is linked to the first linked user only. Whenever group.obj_id = notes.obj_id then ref_obj_id = 0 and note <> null
I need to make an overview of the notes per user. Normally I would do this by joining on group.obj_id = notes.obj_idbut here this goes wrong because of ref_obj_id being 0 (in which case I should join on notes.obj_id)
There are 4 notes in this system (foobar, barbaz, caszes and taz).
The desired output is:
obj_id user_is_primary notes_primary user_is_linked notes_linked
-------------------------------------------------------------------
1 2 10;20 2 30;70
2 1 30 2 10;20
3 0 2 10;70
4 1 70 1 20
How can I get to this aggregated result?
I hope that I was able to explain the situation clearly; perhaps it is my inexperience but I find the data model not the most straightforward.
Couldn't you simply put this in the ON clause of your join?
case when notes.ref_obj_id = 0 then notes.obj_id else notes.ref_obj_id end = group.obj_id

Grain of factless fact table

Designing a factless fact table in sql server 14. Should be quite simple, yet..: I have the need to check the amount of visits per day/client/team/status.
Aside from this amount of visits, i need to track the amount of actions done at every visit.
SELECT [VISITS_PK]
,[PERIOD_SK]
,[CLIENT_SK]
,[TEAM_SK]
,[STATUSS_SK]
,[ACTIONS_SK]
FROM [dbo].[FACT_VISITS]
Will return
VISITS_PK PERIOD_SK CLIENT_SK TEAM_SK STATUSS_SK ACTIONS_SK
1 20160515 1 1 1 1
2 20160515 1 1 1 2
3 20160515 1 1 1 3
4 20160515 2 2 1 1
5 20160515 2 2 1 2
Summary: 2 visits are done, 5 actions are done in total.
Tracking the amount of actions allows me to use COUNT, yet if i want to not take into account the actions and just see how many visits i got in total, do i need another fact table with another grain? I'd rather use one fact table as the amount of visits is in fact just more aggregated.
Edit: The actions_sk contains a link to a dimension table with detailed informations on the performed actions. the first 3 lines are one visit with 3 actions, the 2 last lines are one visit with 2 performed actions.
Instead of a row for every action, just have one row per visit, with the SUM of the actions in that visit:
VISITS_PK PERIOD_SK CLIENT_SK TEAM_SK STATUSS ACTIONS
1 20160515 1 1 1 3
2 20160515 2 2 1 2
EDIT based on new understanding of your data:
Ok, I would change the table name to Fact_Actions, since that is the lowest level of granularity, and Make visits a SK, like so:
VISITS_SK PERIOD_SK CLIENT_SK TEAM_SK STATUSS_SK ACTIONS_PK
1 20160515 1 1 1 1
1 20160515 1 1 1 2
1 20160515 1 1 1 3
2 20160515 2 2 1 4
2 20160515 2 2 1 5
Now you can count Actions by counting rows, and count Visits by counting DISTINCT Visits_SK values.

Comparisons across multiple rows in Stata (household dataset)

I'm working on a household dataset and my data looks like this:
input id id_family mother_id male
1 2 12 0
2 2 13 1
3 3 15 1
4 3 17 0
5 3 4 0
end
What I want to do is identify the mother in each family. A mother is a member of the family whose id is equal to one of the mother_id's of another family member. In the example above, for the family with id_family=3, individual 5 has mother_id=4, which makes individual 4 her mother.
I create a family size variable that tells me how many members there are per family. I also create a rank variable for each member within a family. For families of three, I then have the following piece of code that works:
bysort id_family: gen family_size=_N
bysort id_family: gen rank=_n
gen mother=.
bysort id_family: replace mother=1 if male==0 & rank==1 & family_size==3 & (id[_n]==id[_n+1] | id[_n]==id[_n+2])
bysort id_family: replace mother=1 if male==0 & rank==2 & family_size==3 & (id[_n]==id[_n-1] | id[_n]==id[_n+1])
bysort id_family: replace mother=1 if male==0 & rank==3 & family_size==3 & (id[_n]==id[_n-1] | id[_n]==id[_n-2])
What I get is:
id id_family mother_id male family_size rank mother
1 2 12 0 2 1 .
2 2 13 1 2 2 .
3 3 15 1 3 1 .
4 3 17 0 3 2 1
5 3 4 0 3 3 .
However, in my real data set, I have to get the mother for families of size 4 and higher (up to 9), which makes this procedure very inefficient (in the sense that there are too many row elements to compare "manually").
How would you obtain this in a cleaner way? Would you make use of permutations to index the rows? Or would you use a for-loop?
Here's an approach using merge.
// create sample data
clear
input id id_family mother_id male
1 2 12 0
2 2 13 1
3 3 15 1
4 3 17 0
5 3 4 0
end
save families, replace
clear
// do the job
use families
drop id male
rename mother_id id
sort id_family id
duplicates drop
list, clean abbreviate(10)
save mothers, replace
use families, clear
merge 1:1 id_family id using mothers, keep(master match)
generate byte is_mother = _merge==3
list, clean abbreviate(10)
The second list yields
id id_family mother_id male _merge is_mother
1. 1 2 12 0 master only (1) 0
2. 2 2 13 1 master only (1) 0
3. 3 3 15 1 master only (1) 0
4. 4 3 17 0 matched (3) 1
5. 5 3 4 0 master only (1) 0
where I retained _merge only for expositional purposes.

In MDX how do I do a group by count(*) for a given dimension?

In T-SQL I would just use a group by clause and a count(*) in the select statement to give me the value I need. But with cubes it's different, because the count isn't just over rows, but dimensional combinations. So I've googled for an answer to no avail. Here is a detailed explanation of my problem:
My original MDX is:
SELECT
NON EMPTY
{
[Measures].[Budget]
} ON COLUMNS
,NON EMPTY
{
[Location].[Category - Entity - Facility].[Facility].ALLMEMBERS*
[Location].[Category - Facility - Unit].[Location].ALLMEMBERS*
[Calendar].[Day].[Day].ALLMEMBERS
} ON ROWS
FROM
(
SELECT
{[Location].[Category - Entity - Facility].[Category].&[3]} ON COLUMNS
FROM
(
SELECT
[Calendar].[Year - Quarter - Month - Day].[Day].&[2012-01-01T00:00:00]
: [Calendar].[Year - Quarter - Month - Day].[Day].&[2012-05-31T00:00:00]
ON COLUMNS
FROM [PHI Census]
)
)
Results look like this:
Facility 1 Location 1 Day 1 100
Facility 1 Location 1 Day 2 100
Facility 1 Location 1 Day 3 100
Facility 1 Location 1 Day 4 100
Facility 1 Location 2 Day 1 80
Facility 1 Location 2 Day 2 80
Facility 1 Location 2 Day 3 80
Facility 2 Location 1 Day 1 65
Facility 2 Location 1 Day 2 65
Facility 2 Location 1 Day 3 65
Facility 2 Location 1 Day 4 65
Facility 2 Location 2 Day 1 73
Facility 2 Location 2 Day 2 73
Facility 2 Location 2 Day 3 73
This gives me the [Budget] listed once for each Facility-Location-Day combination. I would like to remove [Calendar].[Day].[Day].ALLMEMBERS from the ON ROWS clause and simply use a calculate member that would return the count of the number of days for each Facility-Location combination along with each row. So basically,
The results would look like this:
Facility Location Budget DayCount
Facility 1 Location 1 100 4
Facility 1 Location 2 80 3
Facility 2 Location 1 65 4
Facility 2 Location 2 73 3
The expression of DayCount could be:
MEMBER [Measures].[DayCount] AS Count(NonEmpty([Calendar].[Day].[Day].ALLMEMBERS, [Measures].[Budget]))

Resources