Tableau pivot rows into fixed number of columns - analytics

I am new to Tableau so maybe this is an easy question but I can't get it done yet. I have my data in the following format:
EntityId | ActionId
------------|------------
1 | 2
1 | 6
1 | 1
1 | 7
1 | 7
2 | 1
2 | 2
2 | 3
2 | 3
My desired table format for my visualizations looks like the following:
EntityId | 1stActId | 2ndActId | 3rdActId
-------------------------------------------------
1 | 2 | 6 | 1
1 | 6 | 1 | 7
1 | 1 | 7 | 7
2 | 1 | 2 | 3
2 | 2 | 3 | 3
So I want to extract all Action Triples where every action is in one column. The next step would be to have the number of columns variable so that I can get Tuples, Triples, Quadruples and so on.
Is there a way to do this in Tableau directly or do I have to transform it before importing it in Tableau?
Thanks in advance!
Best regards,
Tim

Interestingly Tableau works best with your current data format rather than your desired table format. There is a functionality called Pivot which transforms your desired table to your current table format but not vice versa. To achieve what you want, you will have to transform the data before importing it into Tableau. Otherwise, consider the format below, depending on your objective, it may give you opportunity to filter, group and drill down into your data. However, it will duplicate the EntityId, assuming this is not an issue for you.
EntityId Value ActionId
1 2 1st
1 6 1st
1 1 1st
2 1 1st
2 2 1st
1 6 2nd
1 1 2nd
1 7 2nd
2 2 2nd
2 3 2nd
1 1 3rd
1 7 3rd
1 7 3rd
2 3 3rd
2 3 3rd

Related

How to show/group 2 columns by single column in report builder 3.0

I have a dataset with below data. I want to group Column 2 and 3 in single row against Value in Column 1. Is it possible.
From:
Column 1 | Column 2 | Column 3
A | X | 1
A | Y | 2
B | Z | 3
To:
Column 1 | Column 2 | Column 3
A | X | 1
| Y | 2
B | Z | 3
Right now I am getting below table on RB3.0
Column 1 | Column 2 | Column 3
A | X | 1
| | 2
A | Y | 1
| | 2
B | Z | 3
You just need to add a row group for both Columns 1 and 2.
Under the main design window, if you click on your table, you should see the row and column group panels. In row groups you will probably have a details group.
Right click the details group, add a parent group and choose Column 2. Once this is done right click the new group and add a parent group again, this time choose Column 1.
You will probably now have column 1 & 2 in multiple columns, just delete the columns you don't need and that should be it.

What is the maximum number of tuples that can be returned by natural join?

Consider that the relation R(A,B,C) contains 200 tuples and relation S(A,D,E) contains 100 tuples, then the maximum number of tuples possible in a natural join of R and S.
Select one:
A. 300
B. 200
C. 100
D. 20000
It will be great if the answer is provided with some explanation.
The maximum number of tuples possible in natural join will be 20000.
You can find what natural join exactly is in this site.
Let us check for the given example:
Let the table R(A,B,C) be in the given format:
A | B | C
---------------
1 | 2 | 4
1 | 6 | 8
1 | 5 | 7
and the table S(A,D,E) be in the given format:
A | D | E
---------------
1 | 2 | 4
1 | 6 | 8
Here, the result of natural join will be:
A | B | C | D | E
--------------------------
1 | 2 | 4 | 2 | 4
1 | 2 | 4 | 6 | 8
1 | 6 | 8 | 2 | 4
1 | 6 | 8 | 6 | 8
1 | 5 | 7 | 2 | 4
1 | 5 | 7 | 6 | 8
Thus we can see the resulting table has 3*2=6 rows. This is the maximum possible value because both the input tables have the same single value in column A (1).
Natural join returns all tuple values that can be formed from (tuple-joining or tuple-unioning) a tuple value from one input relation and a tuple value from the other. Since they could agree on a single subtuple value for the common set of attributes, and there could be unique values for the non-common subtuples within each relation, you could get a unique result tuple from every pairing, although no more than that. So the maximum number of tuples is the product of the tuple counts of the relations.
Here that's D 20000.
A and A present in R and S so according to natural join 100 tuples take part in join process.
Option C 100 is the answer.

SQL query/functions to flatten a multiple table item and hierarchical item links

I have the data structure below, storing items and links between them in parent-child relashionship.
I need to display the result as show below, one line by parent, with all children.
Values are the ItemCodes by item type, for ex. C-1 and C-2 are the 2 first items of type C, and so on.
In a previous application version, there were only one C and one H maximum for each P.
So I did a max() and group by mix and the result was there.
But now, parents may be linked to different types and number of children.
I tried several techniques including adding temporary tables, views, use of PIVOT, ROLLUP, CUBE, stored procedures and cursors (!), but nothing worked for this specific problem.
I finally succeeded to adapt the query. However, there are many select from (select ...) clauses, as well as row_number based queries.
Also, the result is not dynamic, meaning the number of columns is fixed (which is acceptable).
My question is: what would be your approach for such issue (if possible in a single query)? Thank you!
The table structure:
Item
-------------------------------
ItemId | ItemCode | ItemType
-------------------------------
1 | P1 | P
2 | C11 | C
3 | H11 | H
4 | H12 | H
5 | P2 | P
6 | C21 | C
7 | C22 | C
8 | C23 | C
9 | H21 | H
ItemLink
---------------------------------------
LinkId | ParentItemId | ChildItemId
---------------------------------------
1 | 1 | 2
2 | 1 | 3
3 | 1 | 4
4 | 2 | 6
5 | 2 | 7
6 | 2 | 8
7 | 2 | 9
Expcted Result
-----------------------------------------------------
P C-1 C-2 ... C-N H1 H2 ... H-N
-----------------------------------------------------
P1 C11 NULL NULL NULL H11 H12 NULL NULL
P2 C21 C22 C23 NULL H21 NULL NULL NULL
...
Part of my current query (which is working):
!http://s12.postimg.org/r64tgjjnh/SOQuestion.png

Why do these join differently based on size?

In Postgresql, if you unnest two arrays of the same size, they line up each value from one array with one from the other, but if the two arrays are not the same size, it joins each value from one with every value from the other.
select unnest(ARRAY[1, 2, 3, 4, 5]::bigint[]) as id,
unnest(ARRAY['a', 'b', 'c', 'd', 'e']) as value
Will return
1 | "a"
2 | "b"
3 | "c"
4 | "d"
5 | "e"
But
select unnest(ARRAY[1, 2, 3, 4, 5]::bigint[]) as id, -- 5 elements
unnest(ARRAY['a', 'b', 'c', 'd']) as value -- 4 elements
order by id
Will return
1 | "a"
1 | "b"
1 | "c"
1 | "d"
2 | "b"
2 | "a"
2 | "c"
2 | "d"
3 | "b"
3 | "d"
3 | "a"
3 | "c"
4 | "d"
4 | "a"
4 | "c"
4 | "b"
5 | "d"
5 | "c"
5 | "b"
5 | "a"
Why is this? I assume some sort of implicit rule is being used, and I'd like to know if I can do it explicitly (eg if I want the second style when I have matching array sizes, or if I want missing values in one array to be treated as NULL).
Support for set-returning functions in SELECT is a PostgreSQL extension, and an IMO very weird one. It's broadly considered deprecated and best avoided where possible.
Avoid using SRF-in-SELECT where possible
Now that LATERAL is supported in 9.3, one of the two main uses is gone. It used to be necessary to use a set-returning function in SELECT if you wanted to use the output of one SRF as the input to another; that is no longer needed with LATERAL.
The other use will be replaced in 9.4, when WITH ORDINALITY is added, allowing you to preserve the output ordering of a set-returning function. That's currently the main remaining use: to do things like zip the output of two SRFs into a rowset of matched value pairs. WITH ORDINALITY is most anticipated for unnest, but works with any other SRF.
Why the weird output?
The logic that PostgreSQL is using here (for whatever IMO insane reason it was originally introduced in ancient history) is: whenever either function produces output, emit a row. If only one function has produced output, scan the other one's output again to get the rows required. If neither produces output, stop emitting rows.
It's easier to see with generate_series.
regress=> SELECT generate_series(1,2), generate_series(1,2);
generate_series | generate_series
-----------------+-----------------
1 | 1
2 | 2
(2 rows)
regress=> SELECT generate_series(1,2), generate_series(1,3);
generate_series | generate_series
-----------------+-----------------
1 | 1
2 | 2
1 | 3
2 | 1
1 | 2
2 | 3
(6 rows)
regress=> SELECT generate_series(1,2), generate_series(1,4);
generate_series | generate_series
-----------------+-----------------
1 | 1
2 | 2
1 | 3
2 | 4
(4 rows)
In the majority of cases what you really want is a simple cross join of the two, which is a lot saner.
regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,2) b;
a | b
---+---
1 | 1
1 | 2
2 | 1
2 | 2
(4 rows)
regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,3) b;
a | b
---+---
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
2 | 3
(6 rows)
regress=> SELECT a, b FROM generate_series(1,2) a, generate_series(1,4) b;
a | b
---+---
1 | 1
1 | 2
1 | 3
1 | 4
2 | 1
2 | 2
2 | 3
2 | 4
(8 rows)
The main exception is currently for when you want to run multiple functions in lock-step, pairwise (like a zip), which you cannot currently do with joins.
WITH ORDINALITY
This will be improved in 9.4 with WITH ORDINALITY, a d while it'll be a bit less efficient than a multiple SRF scan in SELECT (unless optimizer improvements are added) it'll be a lot saner.
Say you wanted to pair up 1..3 and 10..40 with nulls for excess elements. Using with ordinality that'd be (PostgreSQL 9.4 only):
regress=# SELECT aval, bval
FROM generate_series(1,3) WITH ORDINALITY a(aval,apos)
RIGHT OUTER JOIN generate_series(1,4) WITH ORDINALITY b(bval, bpos)
ON (apos=bpos);
aval | bval
------+------
1 | 1
2 | 2
3 | 3
| 4
(4 rows)
wheras the srf-in-from would instead return:
regress=# SELECT generate_series(1,3) aval, generate_series(1,4) bval;
aval | bval
------+------
1 | 1
2 | 2
3 | 3
1 | 4
2 | 1
3 | 2
1 | 3
2 | 4
3 | 1
1 | 2
2 | 3
3 | 4
(12 rows)

Conditional SUM using multiple tables in EXCEL

I have a table that I'm trying to populate based on the values of two reference tables.
I have various different projects 'Type 1', 'Type 2' etc. that each run for 4 months and cost different amounts depending on when in their life cycle they are. These costings are shown in Ref Table 1.
Ref Table 1
Month | a | b | c | d
---------------------------------
Type 1 | 1 | 2 | 3 | 4
Type 2 | 10 | 20 | 30 | 40
Type 3 | 100 | 200 | 300 | 400
Ref Table 2 shows my schedule of projects for the next 3 months. With 2 new ones starting in Jan, one being a Type 1 and the other being a Type 2. In Feb, I'll have 4 projects, the first two entering their second month and two new ones start, but this time a Type 1 and a Type 3.
Ref table 2
Date | Jan | Feb | Mar
--------------------------
Type 1 | a | b | c
Type 1 | | a | b
Type 2 | a | b | c
Type 2 | | | a
Type 3 | | a | b
I'd like to create a table which calculates the total costs spent per project type each month. Example results are shown below in Results table.
Results
Date | Jan | Feb | Mar
-------------------------------
Type 1 | 1 | 3 | 5
Type 2 | 10 | 20 | 40
Type 3 | 0 | 100 | 200
I tried doing it with an array formula:
Res!b2 = {sum(if((Res!A2 = Ref2!A2:A6) * (Res!A2 = Ref1!A2:A4) * (Ref2!B2:D6 = Ref1!B1:D1), Ref!B2:E4))}
However it doesn't work and I believe that it's because of the third condition trying to compare a vector with another vector rather than a single value.
Does anyone have any idea how I can do this? Happy to use arrays, index, match, vector, lookups but NOT VBA.
Thanks
Assuming that months in results table headers are in the same order as Ref table 2 (as per your example) then try this formula in Res!B2
=SUM(SUMIF(Ref1!$B$1:$E$1,IF(Ref2!$A$2:$A$6=Res!$A2,Ref2!B$2:B$6),INDEX(Ref1!$B$2:$E$4,MATCH(Res!$A2,Ref1!$A$2:$A$4,0),0)))
confirm with CTRL+SHIFT+ENTER and copy down and across
That gives me the same results as you get in your results table
If the months might be in different orders then you can add something to check that too - I assumed that the types in results table row labels might be in a different order to Ref table 1, but if they are always in the same order too (as per your example) then the INDEX/MATCH part at the end can be simplified to a single range

Resources