Summarize rows based on multiple cells - pivot-table

Let's say columns 1-10 are identifiers and 11-15 are values. Identifiers might occur multiple times and I'd like to add up the values in each column. Example:
A1|B1|1|2|
A1|B1|5|3|
A2|B2|3|6|
A2|B2|4|2|
should become:
A1|B1|6|5|
A2|B2|7|8|

This is a straightforward case for a Pivot Table.
Add column headings in the first row: Identifier 1|Identifier 2|Value 1|Value 2.
Select the data (A1 to D5 in this example) and go to Data -> Pivot Table -> Create.
Drag Identifier 1 and Identifier 2 to Row Fields, and Value 1 and Value 2 to Data Fields. Uncheck Total Rows and Total Columns.
The result:

Related

Join tables in google sheet - full join

I have two tables in google spreadsheet. They have a common unique identifier (Account id). Now I need to join these tables into a third table containing all rows from both tables.
Please have a look at this sheet:
or follow the link to an example spreadsheet:
https://docs.google.com/spreadsheets/d/17ka2tS5ysXqJnrpCxCTwNmCsTORPFP1Gatq4p1fPldA/edit?usp=sharing
I have manage to join the tables using this arrayformula:
=ARRAYFORMULA({G3:H8,VLOOKUP(G3:G8,{A3:A7,B3:C7},{2,3},false)})
But with this formula the joined table "misses" two rows:
20 N/A Klaus Berlin
4 VW David Paris
The first missing row is found only in Table 1. The second missing has an ID that is found in Table 2 and has two (2) matching ID's in Table2, but only one row in the joined table
Is there a way to provide a formula that can handle this?
previous answer is incorrect. use:
=ARRAYFORMULA(QUERY(UNIQUE(IFNA({G2:H10,
VLOOKUP(G2:G10, A2:C10, {2, 3}, 0);
VLOOKUP({A3:A10; G3:G10}, {G3:H10; {A3:A10, IF(A3:A10, )}}, {1, 2}, 0), {B3:C10;
VLOOKUP(G3:G10, {A3:C10; {G3:G10, IF(G3:G10, ), IF(G3:G10, )}}, {2, 3}, 0)}})),
"where Col1 is not null order by Col1", 1))
Formula
=ArrayFormula(UNIQUE({
VLOOKUP({A3:A7;G3:G8},{G3:H8;{A3:A7,IF(A3:A7,)}},{1,2},FALSE),
{B3:C7;VLOOKUP(G3:G8,{A3:C7;{G3:G8,IF(G3:G8,),IF(G3:G8,)}},{2,3},FALSE)}
}))
Result
Explanation
{A3:A7;G3:G8} creates a list of all the account ids (it includes duplicates)
{G3:H8;{A3:A7,IF(A3:A7,)}} creates an array having three columns with values from Table 2 and blanks for ids from Table 1
{A3:C7;{G3:G8,IF(G3:G8,),IF(G3:G8,)}} creates an array having three columns with values from Table 1 and blanks for ids from Table 2
The first vlookup populates the first two columns (account id and car)
{B3:C7;VLOOKUP(G3:G8,{A3:C7;{G3:G8,IF(G3:G8,),IF(G3:G8,)}},{2,3},FALSE)} populates the third and fourth columns (name and city)
UNIQUE removes duplicates
Related
Google spreadsheet "=QUERY" join() equivalent function?

Distinct values for first/last date in group

I have data in the format below with unique ID's in Column A, but these ID's could appear on multiple rows representing repeat transactions against that individual. In col B i have the datetime stamp of that transaction, and in Col C, the name of the transaction;
Col A Col B Col C
ABC1 15/02/2018 16:26 Apple
ABC1 14/02/2018 11:26 Pear
ABC1 13/02/2018 09:11 Pear
ABC2 15/02/2018 16:26 Orange
ABC2 14/02/2018 11:26 Pear
ABC2 13/02/2018 09:11 Apple
ABC3 15/02/2018 16:26 Grape
ABC3 14/02/2018 11:26 Orange
ABC3 13/02/2018 09:11 Apple
I'm trying to pivot this data with MIN and MAX criteria on the datestamp to get the count of how many records had which transaction in Col C as their first transaction, how many had X transaction in Col C as their latest transaction etc, the aim to finalise the data in something like this;
MIN (first) transactions:
Distinct Count Col A Col C
1 Pear
2 Apple
MAX (last) Transactions:
Distinct Count Col A Col C
1 Grape
1 Orange
1 Apple
Is there a way to do this with Pivot tables I'm missing? I'm working with several million rows of data here so manipulating via a pivot is easier for me to do (data loaded via power query) than using a formula or something. I can concatenate columns during the load process if needed.
Thanks in advance for your help.
Use helper columns as this will allow you to use page filters for max and min rather than relying on ordering each column in question.
Set your data up as a table. Then add a max column and a min column.
Max column formula:
=IF([#[Col B]]=MAX([Col B]),1,0)
Min column formula:
=IF([#[Col B]]=MIN([Col B]),1,0)
Create 2 pivots. 1 for max and 1 for min and put the max or min in the page field and filter on 1 (i.e. date is max or min of source values)
Order the Column C by count of Column of Column C (the fruit name column), in which ever way you see fit. Ascending for the min if you are interested in the fruit with the smallest count for the min date.
Final outcome:
You can always remove unwanted fields e.g. Column B to get the exact same look:
Edit:
If you want to show the count of each fruit, by ID, for the minimum date for that ID you can use lookup table pivot(s)
An example lookup table pivot for minimum values for each ID
You then reference this table in your source table, in a helper column, using index match to retrieve the minimum date and compare against the date in your data table for the same ID:
Formula in helper column (MinMatch):
=IF(INDEX(LookupMin!B:B,MATCH(A2,LookupMin!A:A,0))=[#Date],1,0)
Note: This would be a lot easier if you created a unique key of ID & Fruit and lookup against that.
The helper column formula is:
=IFERROR(IF([#[Col B]]=INDEX(LookupMin!$A:$E,MATCH([#[Col A]],LookupMin!$A:$A,0),MATCH([#[Col C]],LookupMin!$4:$4)),1,0),"")
LookupMin! is the sheet with the minimum pivot in.
Note that I have used a pivot on the data table to see count of each fruit, on the minimum date for each ID.
You could have used a formula instead, but then you would have repeating sums i.e see Column F
Formula in E (then dragged down):
=SUMIFS([MinMatch],[Fruit],C2,[ID],A2)
Finally, if you then decided you wanted earliest date for ID and fruit you could change the lookup as follows:

SQL Shift Table Column Down 1

I have a table of +15 million rows and 36 columns, there are two rows of data for every object to which the table refers. I need to:
Move one Column 0 down one space so that the useful information from that column appears in the row below.
Here is a sample of the data with less columns:
Table name = ekd0310
I want to shift Column 0 down 1
Column 0 Column 1 Column 2 Column 3
B02100AA.CZE
B02100AA.CZF I MIGA0027 SUBDIREC.019
B02100AA.CZG
B02100AA.CZH I MIGA0027 SUBDIREC.019
B02100AA.CZI
B02100AA.CZJ I MIGA0027 SUBDIREC.019
B02100AA.CZK '
THe function that you are looking for is probably lead(). You can use this if you assume that there is a column that specifies the ordering. An example:
select e.*, lead(col) over (order by id) as nextcol
from ekd0310 e;
Although this is an ANSI standard function, not all databases support it (yet). You can do something similar with correlated subqueries. Similarly, the above returns the information, but it is possible to do this as an update as well.

Database column Varchar Int dilemma

I have an erd for recipes,
recipe->recipecomponent<-component
If i insert a recipe with ingredients, i would then insert on both recipe and component table then take the ids of both inserted then insert it to the middle table.
so the middle table has 2 col which are Foreign keys to the table and PK to the other 2 tables which are Auto Increment int types.
The problem now is that, when if i insert a recipe with 2 ingredients, since i would insert 2 rows on component which means i need to insert 2 ids from component into recipe component.
For example.
Say, i just inserted a recipe with 2 ingredients,
As i inserted in recipe the id is 1(AI,INT).
since it has 2 ingredients, i insert the 2 in component.
should then have 1(AI,INT) and 2(AI,INT).
i would then have to insert those ids(Which are PK to the 2 tables) as FK to the middle table.
Expected row would be on recipecomponent table is
recipeid - componentid
1 || 1 2
How do i insert on component id. Do i insert it with an array?
$insert_row = array('recipeid'=>$recipeid,'componentid'=>componentids);
Assuming that componentids is an array that contains 1,2 ids from component table.
This is no problem, but when you try to insert this. It will show in the value as ARRAY which gives off an error
Severity: Notice
Message: Array to string conversion
Filename: mysqli/mysqli_driver.php
Line Number: 553
and
Error Number: 1054
Unknown column 'Array' in 'field list'
INSERT INTO recipecomponent ( recipeid, componentid) VALUES ( 1,
Array)
Filename: C:\www\KG\system\database\DB_driver.php
Line Number: 330
I found a solution to this though, I converted it to string with implode
$new_component_id = implode(' ',$componentid);
but then since its now a string "1 2" and when i insert it to the column which is an int type it only shows in the row the first digit which is 1.
I thought about just inserting separately. this would have no problem for a recipe with only 2 ingredients.
would be like this then:
recipeid - componentid
1 || 1
1 || 2
but say i inserted a recipe with atleast 4 ingredients and many more to be inserted. Would it be a waste for memory?
If so, I was thinking if there was any character thats considered an integer but is accepted as a value to be inserted like, assume the character -
so when i insert the string 1-2 it would show up as 1-2 on my col which is an int type.
I need some professional help and advice.
The last option, with one record per ingredient, is the correct way to go. This is what the bridge table "recipecomponent" is for.
Inserting multiple values in the same column (like in your first example) is against normalisation (again, that's what the bridge table is for). More importantly, when you're querying for a particular id, having the ids on different records is quicker than parsing a string with multiple ids.
What happens with $new_component_id (i.e. the second id is cut off) is probably because of some data type conversion (whether this happens on the database side, or on the PHP side, is not explicit from the problem you're reporting).
If you wish to insert multiple ingredients using only one query, you can use the following syntax:
INSERT INTO recipecomponent (recipeid, componentid) VALUES (1,1), (1,2), (1,3);

Sql query to check combinations and match it to the given combination

I've a scenario in Sql where I've following schema.
If I have 3 items in the Item table then one unique combination of all the items will be assigned to a user. For ex:
Items:
1
2
3
Then combinations will be: {1}, {2}, {3}, {1,2}, {1,2,3}, {1,3}, {2,3) all are unique combinations.
Any of these combination will be assigned to a single user.
Now I want to find out given combination belongs to which user, how can I find that? For ex: I'll pass items list {2,3} then it should return the userid who is having that combination from the table UserItemCombinations. {2,3} is passed as comma separated string to a SP. I've taken 3 items as example, this table may contain n number of items. Users number will be dependent on the number of combinations. For ex: For three items there are 7 combinations so 7 users will be there user table.
UserItemCombinations will have one row for each user-item, and one user can have only one combination, so if the query combination is {2,3}
select userid from user where userid not in
(select distinct userid from UserItemCombinations where itemid in
(select itemid from item where itemid not in (2,3));
If there's not too much updates in UserItemCombinations and the performance of the desired query is critical enough, you would make additional field in User table, i.e. Items and create SP that fills in those values per every user. Stored proc will select sorted items per each user in loop and concatenate them into one string to put in User.Items field. You can also make trigger on UserItemCombinations for INSERT, UPDATE, DELETE and recalc the value again.
You may also create index on that field.
select userid from UserItemCombinations where itemid in (2,3) and itemid not in
(select itemid from Item where itemid not in (2,3));

Resources