how to query different columns and stack into new table in google sheet - arrays

Hi I have columns like so, where it's auto fill every rows.
Where column BCD is from source a, column EFG from source b and HIJ from source c
sheet data
A
B
C
D
E
F
G
H
I
J
1
Date
Name
Cost
Date
Name
Cost
Date
Name
Cost
2
2022-01-02
Alan
5
2022-01-03
James
6
2022-01-02
Timmy
5
3
2022-01-02
Hana
5
2022-01-03
Paul
6
2022-01-02
Jane
5
into
summary sheet
A
B
C
D
E
1
Date
Name
Cost
Source
2
2022-01-02
Alan
5
sourceA
3
2022-01-02
Hana
5
sourceA
4
2022-01-03
James
6
sourceB
5
2022-01-03
Paul
6
sourceB
6
2022-01-02
Timmy
5
sourceC
7
2022-01-02
Jane
5
sourceC
How do I achieve this with formula query, stacking it on top one another.
Source is using if but then how do you detect last row and used it for the if.
the rows for each source might be different.

this is array: {} inside of it you can use comma , to put something next to each other or semicolon ; to put something under something else. eg. having:
={1,2;3,4}
will yield:
A B
------+-------+
1 | 1 | 2
------+-------+
2 | 3 | 4
in that manner you can do:
={QUERY(B:D);
QUERY(E:G);
QUERY(H:J)}
side note: if your locale is non-english then comma , in array is replaced by backslash \

Use array notation to combine the ranges, and combine them with either filter() or query() to remove the empty rows.
Filter docs
Query docs

Related

Query min column header while excluding blanks and handling duplicates

I have the following table.
Name
Score A
Score B
Score C
Bob
8
6
Sue
9
12
9
Joe
11
2
Susan
7
9
10
Tim
10
12
4
Ellie
9
8
7
In my actual table there are about 2k rows.
I am trying to get the min score (excluding blanks & handles duplicate scores) for each person into another column using the QUERY formula or ARRAYFORMULA, really to avoid entering a formula for each row.
As I do currently have this
=INDEX($B$1:$D$1,MATCH(MIN(B2:D2),B2:D2,0))
But that involves dragging down through each cell, as I do this on a few sheets that have circa 2k rows, it's very slow when inputting new data.
This should be the end result
Name
Score A
Score B
Score C
Min Score
Bob
8
6
Score C
Sue
9
12
9
Score A
Joe
11
2
Score B
Susan
7
9
10
Score A
Tim
10
12
4
Score C
Ellie
9
8
7
Score C
use:
=INDEX(SORTN(SORT(SPLIT(QUERY(FLATTEN(
IF(B2:D="",,B1:D1&"×"&B2:D&"×"&ROW(B2:D))),
"where Col1 is not null", ),
"×"), 3, 1, 2, 1), 9^9, 2, 3, 1),, 1)
The following answer employs three of the newest set of functions that are still being rolled out by Google so you might not be able to use it right now, but in a few weeks when they're fully rolled out you definitely will (this worked using the Android version of Sheets just now for me):
=arrayformula(if(len(A2:A),byrow(B2:D,lambda(row,xlookup(min(row),row,B1:D1))),))
Assuming the names are in column A, this should give a result for every row which has a name in it. I'm sure there are other ways of doing this, but these 'row/column-wise' problems are really ideal use-cases for LAMBDA and its helper functions like BYROW.

how to count classses in columns

I'm trying to make a query and i'm having a bad time with one thing. Suppose I have a table that looks like this:
id
Sample
Species
Quantity
Group
1
1
AA
5
A
2
1
AB
6
A
3
1
AC
10
A
4
1
CD
15
C
5
1
CE
20
C
6
1
DA
13
D
7
1
DB
7
D
8
1
EA
6
E
9
1
EF
4
E
10
1
EB
2
E
In the table I filter to have just 1 sample (but i have many), it has the species, the quantity of that species and a functional group (there are only five groups from A to E). I would like to make a query to group by the samples and make columns of the counts of the species of certain group, something like this:
Sample
N_especies
Group A
Group B
Group C
Group D
Group E
1
10
3
0
2
2
3
So i have to count the species (thats easy) but i don't know how to make the columns of a certain group, can anyone help me?
You can use PIVOT :
Select a.Sample,[A],[B],[C],[D],[E], [B]+[A]+[C]+[D]+[E] N_especies from
(select t.Sample,t.Grp from [WS_Database].[dbo].[test1] t) t
PIVOT (
COUNT(t.Grp)
for t.Grp in ([A],[B],[C],[D],[E])
) a

Equivalent of Excel Pivoting in Stata

I have been working with country-level survey data in Stata that I needed to reshape. I ended up exporting the .dta to a .csv and making a pivot table in in Excel but I am curious to know how to do this in Stata, as I couldn't figure it out.
Suppose we have the following data:
country response
A 1
A 1
A 2
A 2
A 1
B 1
B 2
B 2
B 1
B 1
A 2
A 2
A 1
I would like the data to be reformatted as such:
country sum_1 sum_2
A 4 4
B 3 2
First I tried a simple reshape wide command but got the error that "values of variable response not unique within country" before realizing reshape without additional steps wouldn't work anyway.
Then I tried generating new variables conditional on the value of response and trying to use reshape following that... the whole thing turned into kind of a mess so I just used Excel.
Just curious if there is a more intuitive way of doing that transformation.
If you just want a table, then just ask for one:
clear
input str1 country response
A 1
A 1
A 2
A 2
A 1
B 1
B 2
B 2
B 1
B 1
A 2
A 2
A 1
end
tabulate country response
| response
country | 1 2 | Total
-----------+----------------------+----------
A | 4 4 | 8
B | 3 2 | 5
-----------+----------------------+----------
Total | 7 6 | 13
If you want the data to be changed to this, reshape is part of the answer, but you should contract first. collapse is in several ways more versatile, but your "sum" is really a count or frequency, so contract is more direct.
contract country response, freq(sum_)
reshape wide sum_, i(country) j(response)
list
+-------------------------+
| country sum_1 sum_2 |
|-------------------------|
1. | A 4 4 |
2. | B 3 2 |
+-------------------------+
In Stata 16 up, help frames introduces frames as a way to work with multiple datasets in the same session.

Clustering Coefficient using SQL Server/C#

I have two tables in SQL Server i.e.
one table is GraphNodes as:
---------------------------------------------------------
id | Node_ID | Node | Node_Label | Node_Type
---------------------------------------------------------
1 677 Nuno Vasconcelos Author 1
2 1359 Peng Shi Author 1
3 6242 Z. Q. Shi Author 1
4 8318 Kiyoung Choi Author 1
5 12405 Johan A. K. Author 1
6 26615 Tzung-Pei Hong Author 1
7 30559 Luca Benini Author 1
...
...
and other table is GraphEdges as:
-----------------------------------------------------------------------------------------
id | Source_Node | Source_Node_Type | Target_Node | Target_Node_Type | Year | Edge_Type
-----------------------------------------------------------------------------------------
1 1 1 10965 2 2005 1
2 1 1 10179 2 2007 1
3 1 1 10965 2 2007 1
4 1 1 19741 2 2007 1
5 1 1 10965 2 2009 1
6 1 1 4816 2 2011 1
7 1 1 5155 2 2011 1
...
...
I also have two tables i.e. GraphNodeTypes as:
-------------------------
id | Node | Node_Type
-------------------------
1 Author 1
2 CoAuthor 2
3 Venue 3
4 Paper 4
and GraphEdgeTypes as:
-------------------------------
id | Edge | Edge_Type
-------------------------------
1 AuthorCoAuthor 1
2 CoAuthorVenue 2
3 AuthorVenue 3
4 PaperVenue 4
5 AuthorPaper 5
6 CoAuthorPaper 6
Now, I want to calculate clustering coefficient for this graph i.e of two types:
If N(V) is # of links b/w neighbors of node V and K(V) is degree of node V then,
Local Clustering Coefficient(V) = 2 * N(V)/K(V) [K(V) - 1]
and
Global Clustering Coefficient = 3 * # of Triangles / # of connected Triplets of V
The questions is, how can I calculate degree of a node? Is it possible in SQL Server or C# programming required. And also please suggest hints for calculating Local and Global CCs as well.
Thanks!
The degree of a node is not "calculated". It's simply the number of edges this node has.
While you can try to do this in SQL, the performance will likely be mediocre. Such type of analysis is commonly done in specialized databases and, if possible, in memory.
Count the degree of each vertices as the number of edges connected to it. Using COUNT(source_node) and GROUP BY(source_node) will be helpful in this case.
To find N(V), you can join the edge table with itself and then take the intersection between the resulting table and edge table. From the result, for each vertex take the COUNT().

Excel Formula: Skip zero values in column when repeating Indexed values N times in array?

What function can I add to my G2 formula to skip names with zero values (Col E)?
A B C D E G
1 John 0 SUE
2 Sue 2:00 3:00 2 SUE
3 Dan 0 JOE
4 Joe 1:00 1 -
5 -
Formula in G1: Uses index to find first name with non-zero value in Col E
{=INDEX(A1:A20,MATCH(TRUE,E1:E20<>0,0))}
Formula in G2: Lists names N times based on value in Col E (but stops listing names as soon as it encounters a zero) {=IFERROR(IF(COUNTIF($G$1:G1,G1)=INDEX(E:E,MATCH(G1,A:A,0)),IF(AND(INDEX(A:A,MATCH(G1,A:A,0)+1)<>0,INDEX(E:E,MATCH(G1,A:A,0)+1)<>0),INDEX(A:A,MATCH(G1,A:A,0)+1),"-"),G1),"-")}

Resources