How to create a Power BI pivot table without drill-down? - pivot-table

I have data of the following form for several categories and years. The data is too large for Import so I am using DirectQuery.
Id Cat1 Cat2 Cat3 Year Value
1 A X Q 2000 1
2 A X R 2000 2
3 A Y Q 2000 3
4 A Y R 2000 4
5 A X Q 2000 1
6 A X R 2000 2
7 A Y Q 2000 3
8 A Y R 2000 4
9 A X Q 2001 1
10 A X R 2001 2
11 A Y Q 2001 3
12 A Y R 2001 4
13 A X Q 2001 1
14 A X R 2001 2
15 A Y Q 2001 3
16 A Y R 2001 4
I would like construct a pivot table similar to what can be done in Excel.
Cat1 Cat2 Cat3 2000 2001
A X Q 2 2
A X R 4 4
A Y Q 6 6
A Y R 8 8
I tried this with the Matrix option by; placing columns Cat1, Cat2, and Cat3 in the Rows, placing Year in the Columns, and placing Value in the Values. Unfortunately, this produces a hierarchical view.
Cat1 2000 2001
A 20 20
X 6 6
Q 2 2
R 4 4
Y 14 14
Q 6 6
R 8 8
How do I get the simpler Excel pivot table view of the data instead of the hierarchical view?

I'm not sure if it's possible to get the row headers to repeat like in your example, but if you go to Format > Row headers > Stepped layout and toggle that off, then your matrix should change from what you are seeing (below left) to something closer to what you want (below right).

Related

How to make Groupby dataframe using list?

I have xyz dataframe like below.
x y z
1 2 1
1 2 2
3 3 1
3 1 2
4 1 2
'''''
9 3 4
and I have to make dataframes by x.
df1(x=1)
x y z
1 2 3
1 3 3
df2(x=2)
x y z
2 3 3
2 4 5
dfx(x=n)
x y z
n y z
- - -
I know pandas df.groupby("x") makes dataframe by "x".
but there are so many "x" value in my data, so I couldn't define all "x".
Is there any function which makes dataframes using list like groupby(list)?
Thanks in advance.
In your case save the df into dict
d = {x : y for x , y in df.groupby('x')}
d[1]

SQLite: combination of different ordering policies in one query

I have a table which holds 4 columns: int x, int y, int z, int type.
I need to select sorted list of records when next properties preserved:
All records with type 1 must be ordered by x and y (y is secondary key) in ascending order;
All records of type 2 must be ordered by z. In ascending order.
Records of type 2 usually ordered by x and y and when ordered by z the ordering should be the same. In theory. However, in reality some of type 2 records may be partially disordered with respect to z, i.e. for them table.x2 > table.x1 but table.z2 < table.z1.
I need for all selected records of type 2 to ensure table.z2 > table.z1 even at price of having table.x1 > table.x2 (implied, that table.x1 < table.x2 for all other records).
I figured out some query like this:
SELECT * FROM aTable ORDER BY CASE type WHEN 1 THEN x ASC, y ASC ELSE z ASC END;
but for some reason SQLite rejects it as syntactically wrong.
Is it possible anyway to construct such query?
UPDATE:
Example
Data in table
x y z type
1 4 3 1
2 3 6 1
2 2 6 1
8 3 5 2
4 6 3 2
5 8 6 1
7 6 2 2
Expected result:
1 4 3 1
2 2 6 1
2 3 6 1
7 6 2 2
4 6 3 2
5 8 6 1
8 3 5 2

Loop though each observation in SAS

Let say I have a table of 10000 observations:
Obs X Y Z
1
2
3
...
10000
For each observation, I create a macro: mymacro(X, Y, Z) where I use X, Y, Z like inputs. My macro create a table with 1 observation, 4 new variables var1, var2, var3, var4.
I would like to know how to loop through 10000 observations in my initial set, and the result would be like:
Obs X Y Z Var1 Var2 Var3 Var4
1
2
3
...
10000
Update:
The calculation of Var1, Var2, Var3, Var4:
I have a reference table:
Z 25 26 27 28 29 30
0 10 000 10 000 10 000 10 000 10 000 10 000
1 10 000 10 000 10 000 10 000 10 000 10 000
2 10 000 10 000 10 000 10 000 10 000 10 000
3 10 000 10 000 10 000 10 000 10 000 10 000
4 9 269 9 322 9 322 9 381 9 381 9 436
5 8 508 8 619 8 619 8 743 8 743 8 850
6 7 731 7 914 7 914 8 102 8 102 8 258
7 6 805 7 040 7 040 7 280 7 280 7 484
8 5 864 6 137 6 137 6 421 6 421 6 655
9 5 025 5 328 5 328 5 629 5 629 5 929
10 4 359 4 648 4 648 4 934 4 934 5 320
And my have set is like:
Obs X Y Z
1 27 4 9
2
3
10000
So for the first observation (27, 4, 9):
Var1 = (8 619+ 7 914+ 7 040 + 6 137 + 5 328)/ 9 322
Var2 = (8 743+ 8 102+ 7 280+ 6 421 + 5 629 )/ 9 381
So that:
Var1 = Sum of all number in column 27 (X), from the observation 5 (Z+1) to the observation 9 (Z), and divided by the value in the (column 27 (X) - observation 4 (Z))
Var2 = Sum of all number in column 28 (X+1), from the observation 5 (Z+1) to the observation 9 (Z), and divided by the value in the (column 28 (X+1) - observation 4 (Z))
I would convert the reference table to a form that lets you do the calculations for all observations at once. So make your reference table into a tall structure, either by transposing the existing table or just reading it that way to start with:
data ref_tall;
input z #;
do col=25 to 30 ;
input value :comma9. #;
output;
end;
datalines;
0 10,000 10,000 10,000 10,000 10,000 10,000
1 10,000 10,000 10,000 10,000 10,000 10,000
2 10,000 10,000 10,000 10,000 10,000 10,000
3 10,000 10,000 10,000 10,000 10,000 10,000
4 9,269 9,322 9,322 9,381 9,381 9,436
5 8,508 8,619 8,619 8,743 8,743 8,850
6 7,731 7,914 7,914 8,102 8,102 8,258
7 6,805 7,040 7,040 7,280 7,280 7,484
8 5,864 6,137 6,137 6,421 6,421 6,655
9 5,025 5,328 5,328 5,629 5,629 5,929
10 4,359 4,648 4,648 4,934 4,934 5,320
;
Now take your list table HAVE:
data have;
input id x y z;
datalines;
1 27 4 9
2 25 2 4
;
And combine it with the reference table and make your calculations:
proc sql ;
create table want1 as
select a.id
, sum(b.value)/min(c.value) as var1
from have a
left join ref_tall b
on a.x=b.col
and b.z between a.y+1 and a.z
left join ref_tall c
on a.x=c.col
and c.z = a.y
group by a.id
;
create table want2 as
select a.id
, sum(d.value)/min(e.value) as var2
from have a
left join ref_tall d
on a.x+1=d.col
and d.z between a.y+1 and a.z
left join ref_tall e
on a.x+1=e.col
and e.z = a.y
group by a.id
;
create table want as
select *
from want1 natural join want2 natural join have
;
quit;
Results:
Obs id x y z var1 var2
1 1 27 4 9 3.75864 3.85620
2 2 25 2 4 1.92690 1.93220
The reference table can be established in an array that makes performing the specified computations easy. The reference values can than be accessed using a direct address reference.
Example
The reference table data was moved into a data set so the values can be changed over time or reloaded from some source such as Excel. The reference values can be loaded into an array for use during a DATA step.
* reference information in data set, x property column names are _<num>;
data ref;
input z (_25-_30) (comma9. &);
datalines;
0 10,000 10,000 10,000 10,000 10,000 10,000
1 10,000 10,000 10,000 10,000 10,000 10,000
2 10,000 10,000 10,000 10,000 10,000 10,000
3 10,000 10,000 10,000 10,000 10,000 10,000
4 9,269 9,322 9,322 9,381 9,381 9,436
5 8,508 8,619 8,619 8,743 8,743 8,850
6 7,731 7,914 7,914 8,102 8,102 8,258
7 6,805 7,040 7,040 7,280 7,280 7,484
8 5,864 6,137 6,137 6,421 6,421 6,655
9 5,025 5,328 5,328 5,629 5,629 5,929
10 4,359 4,648 4,648 4,934 4,934 5,320
;
* computation parameters, might be a thousand of them specified;
data have;
input id x y z;
datalines;
1 27 4 9
;
* perform computation for each parameters specified;
data want;
set have;
array ref[0:10,1:30] _temporary_;
if _n_ = 1 then do ref_row = 0 by 1 until (last_ref);
* load reference data into an array for direct addressing during computation;
set ref end=last_ref;
array ref_cols _25-_30;
do index = 1 to dim(ref_cols);
colname = vname(ref_cols[index]);
colnum = input(substr(colname,2),8.);
ref[ref_row,colnum] = ref_cols[index];
end;
end;
* perform computation for parameters specified;
array vars var1-var4;
do index = 1 to dim(vars);
ref_column = x + index - 1 ; * column x, then x+1, then x+2, then x+3;
numerator = 0; * algorithm against reference data;
do ref_row = y+1 to z;
numerator + ref[ref_row,ref_column];
end;
denominator = ref[y,ref_column];
vars[index] = numerator / denominator; * result;
end;
keep id x y z numerator denominator var1-var4;
run;

Cumulative Sum from a range identified based on Vlookup

In Excel sheet 1, I have the following data:
A B C D E F G
------------------------------
Name1 1 2 3 4 5 6
Name2 2 9 3 8 4 7
Name3 4 6 0 3 2 1
In Excel sheet 2, I have to calculate cumulative sum based on values in sheet 1
For example,
A B C D E F G
------------------------------
Name1 1 3 6 10 15 21
While I can calculate cumulative sum easily, I do not know how to select the correct range of cells from sheet 1, by searching for 'Name1'
You need a SUMPRODUCT with both relative and absolute column/row cell references.
=SUMPRODUCT(($A2:INDEX($A:$A,MATCH(1E+99,$B:$B))=$I5)*($B2:INDEX(B:B,MATCH(1E+99, B:B))))

Informix subqueries with FIRST option

What is the best way of transcribing the following Transact-SQL code to Informix Dynamic Server (IDS) 9.40:
Objective: I need the first 50 orders with their respective order lines
select *
from (select top 50 * from orders) a inner join lines b
on a.idOrder = b.idOrder
My problem is with the subselect because Informix does not allow the FIRST option in the subselect.
Any simple idea?.
The official answer would be 'Please upgrade from IDS 9.40 since it is no longer supported by IBM'. That is, IDS 9.40 is not a current version - and should (ideally) not be used.
Solution for IDS 11.50
Using IDS 11.50, I can write:
SELECT *
FROM (SELECT FIRST 10 * FROM elements) AS e
INNER JOIN compound_component AS a
ON e.symbol = a.element
INNER JOIN compound AS c
ON c.compound_id = a.compound_id
;
This is more or less equivalent to your query. Consequently, if you use a current version of IDS, you can write the query using almost the same notation as in Transact-SQL (using FIRST in place of TOP).
Solution for IDS 9.40
What can you do in IDS 9.40? Excuse me a moment...I have to run up my IDS 9.40.xC7 server (this fix pack was released in 2005; the original release was probably in late 2003)...
First problem - IDS 9.40 does not allow sub-queries in the FROM clause.
Second problem - IDS 9.40 does not allow 'FIRST n' notation in either of these contexts:
SELECT FIRST 10 * FROM elements INTO TEMP e;
INSERT INTO e SELECT FIRST 10 * FROM elements;
Third problem - IDS 9.40 doesn't have a simple ROWNUM.
So, to work around these, we can write (using a temporary table - we'll remove that later):
SELECT e1.*
FROM elements AS e1, elements AS e2
WHERE e1.atomic_number >= e2.atomic_number
GROUP BY e1.atomic_number, e1.symbol, e1.name, e1.atomic_weight, e1.stable
HAVING COUNT(*) <= 10
INTO TEMP e;
SELECT *
FROM e INNER JOIN compound_component AS a
ON e.symbol = a.element
INNER JOIN compound AS c
ON c.compound_id = a.compound_id;
This produces the same answer as the single query in IDS 11.50. Can we avoid the temporary table? Yes, but it is more verbose:
SELECT e1.*, a.*, c.*
FROM elements AS e1, elements AS e2, compound_component AS a,
compound AS c
WHERE e1.atomic_number >= e2.atomic_number
AND e1.symbol = a.element
AND c.compound_id = a.compound_id
GROUP BY e1.atomic_number, e1.symbol, e1.name, e1.atomic_weight,
e1.stable, a.compound_id, a.element, a.seq_num,
a.multiplicity, c.compound_id, c.name
HAVING COUNT(*) <= 10;
Applying that to the original orders plus order lines example is left as an exercise for the reader.
Relevant subset of schema for 'Table of Elements':
-- See: http://www.webelements.com/ for elements.
-- See: http://ie.lbl.gov/education/isotopes.htm for isotopes.
CREATE TABLE elements
(
atomic_number INTEGER NOT NULL UNIQUE CONSTRAINT c1_elements
CHECK (atomic_number > 0 AND atomic_number < 120),
symbol CHAR(3) NOT NULL UNIQUE CONSTRAINT c2_elements,
name CHAR(20) NOT NULL UNIQUE CONSTRAINT c3_elements,
atomic_weight DECIMAL(8,4) NOT NULL,
stable CHAR(1) DEFAULT 'Y' NOT NULL
CHECK (stable IN ('Y', 'N'))
);
CREATE TABLE compound
(
compound_id SERIAL NOT NULL PRIMARY KEY,
name VARCHAR(100) NOT NULL UNIQUE
);
-- The sequence number is used to order the components within a compound.
CREATE TABLE compound_component
(
compound_id INTEGER REFERENCES compound,
element CHAR(3) NOT NULL REFERENCES elements(symbol),
seq_num SMALLINT DEFAULT 1 NOT NULL
CHECK (seq_num > 0 AND seq_num < 20),
multiplicity INTEGER NOT NULL
CHECK (multiplicity > 0 AND multiplicity < 20),
PRIMARY KEY(compound_id, seq_num)
);
Output (on my sample database):
1 H Hydrogen 1.0079 Y 1 H 1 2 1 water
1 H Hydrogen 1.0079 Y 3 H 2 4 3 methane
1 H Hydrogen 1.0079 Y 4 H 2 6 4 ethane
1 H Hydrogen 1.0079 Y 5 H 2 8 5 propane
1 H Hydrogen 1.0079 Y 6 H 2 10 6 butane
1 H Hydrogen 1.0079 Y 11 H 2 5 11 ethanol
1 H Hydrogen 1.0079 Y 11 H 4 1 11 ethanol
6 C Carbon 12.0110 Y 2 C 1 1 2 carbon dioxide
6 C Carbon 12.0110 Y 3 C 1 1 3 methane
6 C Carbon 12.0110 Y 4 C 1 2 4 ethane
6 C Carbon 12.0110 Y 5 C 1 3 5 propane
6 C Carbon 12.0110 Y 6 C 1 4 6 butane
6 C Carbon 12.0110 Y 7 C 1 1 7 carbon monoxide
6 C Carbon 12.0110 Y 9 C 2 1 9 magnesium carbonate
6 C Carbon 12.0110 Y 10 C 2 1 10 sodium bicarbonate
6 C Carbon 12.0110 Y 11 C 1 2 11 ethanol
8 O Oxygen 15.9990 Y 1 O 2 1 1 water
8 O Oxygen 15.9990 Y 2 O 2 2 2 carbon dioxide
8 O Oxygen 15.9990 Y 7 O 2 1 7 carbon monoxide
8 O Oxygen 15.9990 Y 9 O 3 3 9 magnesium carbonate
8 O Oxygen 15.9990 Y 10 O 3 3 10 sodium bicarbonate
8 O Oxygen 15.9990 Y 11 O 3 1 11 ethanol
If I understand your question you are having a problem with "TOP". Try using a TOP-N query.
For example:
select *
from (SELECT *
FROM foo
where foo_id=[number]
order by foo_id desc)
where rownum <= 50
This will get you the top fifty results (because I order by desc in the sub query)

Resources