Google Sheets: Chart counting combinations - arrays

In the following Google spreadsheet
Description
No. of participants
Severity
Problem 1
2
Minor
Problem 2
2
Major
Problem 3
1
Minor
Problem 4
1
Minor
Problem 5
2
Major
I would like to create a chart that has the No. of participants as x-axis, the Severity as y-axis and counts the number of combinations.
The example above would look like this:
1
2
Minor
2
1
Major
0
2
Also, I would like to list all items in the column description according to their combination of No. of participants and severity like this:
1
2
Minor
Problem 3
Problem 1
Minor
Problem 4
Major
Problem 2
Major
Problem 5
How can I get this chart extracted from the table in Google Sheets?
Thanks to #player0 the first part is answered. He also suggested something for the second part that does not quite work out yet:
To get the basic table:
=QUERY(A2:C6; "select C, max(A) where A is not null group by C pivot B order by C desc")
To display several items in one cell:
=MAP(B2:B6; C2:C6; LAMBDA(x; y; TEXTJOIN(CONCATENATE(CHAR(10);CHAR(10)); 1; FILTER(A2:A6; B2:B6=x; C2:C6=y))))
Unfortunately, combining both does not work out:
=QUERY((MAP(B2:B6; C2:C6; LAMBDA(x; y; TEXTJOIN(CONCATENATE(CHAR(10);CHAR(10)); 1; FILTER(A2:A6; B2:B6=x; C2:C6=y)))); B2:C6); "select C,max(A) where A is not null group by C pivot B order by C desc")
Nor does this one work:
=QUERY((MAP(B2:B6; C2:C6; LAMBDA(x; y; TEXTJOIN(CONCATENATE(CHAR(10);CHAR(10)); 1; FILTER(A2:A6; B2:B6=x; C2:C6=y)))); B2:C6); "select Col1,max(Col1) where Col1 is not null group by Col3 pivot Col2 order by Col3 desc")

use:
=QUERY(A2:C; "select C,count(A) where A is not null group by C pivot B order by C desc")
update:
=QUERY({MAP(B2:B, C2:C, LAMBDA(x, y, TEXTJOIN(", ", 1, FILTER(A2:A, B2:B=x, C2:C=y)))), B2:C},
"select Col3,max(Col1) where Col1 is not null group by Col3 pivot Col2 order by Col3 desc")

Related

How to combine group by, join, COUNT, SUM and subquery clauses in sql

I am not sure how to write the SQL query for the following problem:
There are two tables, Worker and Product (one worker can make many products) which I describe in this link:https://docs.google.com/spreadsheets/d/1Yk2vKKmUEyuN-QfgTEbmF4suHFtuDkkrsUf-wqvOoKQ/edit?fbclid=IwAR3ipjwNrfhGXg3fCyAri4tD1Q4WqWuKVAqagvbsZg9Sn1myDwkWbWcl_6E#gid=0
The calculation of the total salary of a worker at month x is as follows
totalSalary = salaryPerMonth + SUM(salaryPerProduct * COUNT(pid))
I want to use join statement (regardless of INNER JOIN, LEFT, OR RIGHT JOIN) combined with group by clause to solve this problem but my statements are wrong.
Expect a specific SQL statement in this case.
I hope to be able to express my ideas in this photo
UPDATE: my picture quality is not good so i will repost my picture on this linkenter image description here
#phi nguyễn quốc - Welcome to StackOverflow. What you posted has the makings of a good question. It contains:
Brief summary of the issue
Table structure, sample data
Explanation of expected results
Code you've tried
It just needs a few modifications to conform to the guidelines and avoid being closed. A few tips on posting:
Help others to help you by including a Minimal, Reproducible Example. (With SQL questions include table definitions and sample data). That way folks who want to help can spend their time answering your question, instead of on writing set-up code to replicate your tables, environment, etc..
Make it easy for others to be able to test your code. Always post code as text, not as an image.
Use collaborative tools like db<>fiddle for sharing
One example of how you might improve the question and avoid it being closed:
Issue:
I am trying to write a SQL query to calculate the total salary for workers for a given month X. There are two tables: [Worker] and [Product]. One worker can make many products.
wid
wname
salaryPerMonth
salaryPerProduct
phoneNumber
1
Mr A
500
5
2
Mr B
100
30
3
Mr C
200
20
pid
pname
manufacturedDate
wid
1
Product A
2013-12-01
1
2
Product B
2013-12-09
1
3
Product C
2013-09-08
1
4
Product D
2013-01-30
2
5
Product E
2013-09-20
2
6
Product F
2013-12-23
3
The "Total Salary" of a worker for month X is calculated as follows:
SalaryPerMonth +
( SalaryPerProduct *
Number of Products for Month
)
Expected Results: (December 2013)
wid
wname
salaryPerMonth
salaryPerProduct
totalSalary
** Formula
1
Mr A
500
5
510
= 500 + (5*2)
2
Mr B
100
30
100
= 100 + (30*0)
3
Mr C
200
20
220
= 200 + (20*1)
Actual Results
I've tried this query
SELECT W.wid, W.wname, W.phoneNumber, W.salaryPerMonth, W.salaryPerProduct, (W.salaryPerMonth - SUM(W.salaryPerMonth*COUNT(p.pid))) AS Total
FROM Worker W INNER JOIN Product P ON p.Wid = W.wid
WHERE MONTH(P.manufacturedDate) = 12
GROUP BY W.wid, W.wname, W.phoneNumber, W.salaryPerMonth, W.salaryPerProduct
.. but am getting the error below:
Msg 130 Level 15 State 1 Line 1
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
Here is my db<>fiddle
CREATE TABLE Product (
pid int
, pname varchar(40)
, manufacturedDate date
, wid int
);
CREATE TABLE Worker (
wid int
, wname varchar(40)
, salaryPerMonth int
, salaryPerProduct int
, phoneNumber varchar(20)
)
INSERT INTO Product(pid, pname, manufacturedDate, wid)
VALUES
(1,'Product A','2013-12-01',1)
,(2,'Product B','2013-12-09',1)
,(3,'Product C','2013-09-08',1)
,(4,'Product D','2013-01-30',2)
,(5,'Product E','2013-09-20',2)
,(6,'Product F','2013-12-23',3)
;
INSERT INTO Worker (wid, wname, salaryPerMonth,salaryPerProduct)
VALUES
(1,'Mr A', 500, 5)
,(2, 'Mr B', 100, 30)
,(3,'Mr C', 200, 20)
;

Alternative solutions to an array search in PostgreSQL

I am not sure if my database design is good for this tricky case and I also ask for help how the query for this could look like.
I plan a query with the following table:
search_array | value | id
-----------------------+-------+----
{XYa,YZb,WQb} | b | 1
{XYa,YZb,WQb,RSc,QZa} | a | 2
{XYc,YZa} | c | 3
{XYb} | a | 4
{RSa} | c | 5
There are 5 main elements in the search_array: XY, YZ, WQ, RS, QZ and 3 Values: a, b, c that are concardinated to each element.
Each row has also one value: a, b or c.
My aim is to find all rows that fit to a specific row in this sense: At first it should be checked if they have any same main elements in their search_arrays (yellow marked in the example).
As example:
Row id 4 an row id 5 wouldnt match because XY != RS.
Row id 1, 2 and 3 would match two times because they have all XY and YZ.
Row id 1 and 2 would even match three times because they have also WQ in common.
And second: if there is a Main Element match it should be 'crosschecked' if the lowercase letters after the Main Elements fit to the value of the other row.
As example: The only match for Row id 1 in the table would be Row id 4 because they both search for XY and the low letters after the elements match each value of the two rows.
Another match would be ROW id 2 and 5 with RS and search c to value c and search a to value a (green and orange marked).
My idea was to cut the search_array elements in the query in two parts with the RIGHT and LEFT command for strings. But I dont know how to combine the subqueries for this search.
Or would be a complete other solution faster? Like splitting the search array into another table with the columns 'foregin key' to the maintable, 'main element' and 'searched_value'. I am not sure if this is the best solution because the program would all the time switch to the main table to find two rows out of 3 million rows to compare their searched_values to the values?
Thank you very much for your answers and your time!
You'll have to represent the data in a normalized fashion. I'll do it in a WITH clause, but it would be better to store the data in this fashion to begin with.
WITH unravel AS (
SELECT t.id, t.value,
substr(u.val, 1, 2) AS arr_main,
substr(u.val, 3, 1) AS arr_val
FROM mytable AS t
CROSS JOIN LATERAL unnest(t.search_array) AS u(val)
)
SELECT a.id AS first_id,
a.value AS first_value,
b.id AS second_id,
b.value AS second_value,
a.arr_main AS main_element
FROM unravel AS a
JOIN unravel AS b
ON a.arr_main = b.arr_main
AND a.arr_val = b.value
AND b.arr_val = a.value;

Google Sheets - Query, select x or y and excluded 0 results

Keep getting #VALUE! or #ERROR on everything I try, I think I'm using the wrong function entirely but don't know the best alternative.
I've tried a few IF blanks and what not but no luck, I've only just started on google sheets a few days ago.
I've tried a query I used on another sheet and changed it around a little, I had some success but with my lack of knowledge.... it didn't do everything I needed and it didn't work when I tried to use it involving all the sheets I needed, just a single sheet it worked for. I tried to search for something but I was just going in circles.
=QUERY({KME!$A$2:$D$157;'KME1'!$A$2:$D$157;'KME2'!$A$2:$D$157},"select * WHERE C < 300 OR D <10000",1)
=QUERY(KME!$A$2:$D$157,"select * WHERE C < 300 OR D <10000",1)
I have a set of data across 3 sheets. I need to filter the data, I want it to select all the data which I used select * for.
I want it to only show results from:
Column C if below 300
OR
Column D if below 10,000
I also want it to not show results that have empty cells in C and D
try it like this:
=QUERY({KME!A2:D157; 'KME1'!A2:D157; 'KME2'!A2:D157},
"where Col3 < 300
or Col4 < 10000
and Col3 !=''
and Col4 !='' ", 0)
or like this:
=QUERY(QUERY({KME!A2:D157;'KME1'!A2:D157;'KME2'!A2:D157},
"where Col3 < 300 OR Col4 <10000", 0),
"where Col3 is not null and Col4 is not null")

Using multiple aggregate functions - sum and count

I've tried several of the solutions to my question on the site but could not find one that worked. Please help!
Other than taking some liberties with the report_names, the data is realistic of what I am trying to accomplish and is just a small portion of what I am up against, roughly 97K rows of data with the same type of repetition of branch, file_count, report_name...the file numbers are unique and are insignificant. It is for informational purposes of my question and explains why the amounts are unique - they are tied to the file_name
I am looking for one report_name with the sum of the two amounts.
Here are the current results to my query:
branch file_count file_volume net_profit report_name file_number
Northeast 1 $200,000.00 $200,000.00 bogart.hump.new 12345
Northeast 1 $195,000.00 $197,837.00 bogart.hump.new 23456
Northeast 1 $111,500.00 $113,172.00 bogart.hump.new 34567
Northwest 1 $66,000.00 -$1,500.18 jolie.angela.new 45678
Northwest 1 $159,856.00 -$2,745.58 jolie.angela.new 56789
Northwest 1 $140,998.00 -$2,421.69 jolie.angela.new 67890
Southwest 1 $74,000.00 $73,904.00 Man.bat.net 78901
Southwest 1 $186,245.00 -$4,231.25 Man.bat.net 89012
Southwest 1 $72,375.00 $73,641.00 Man.bat.net 90123
Southeast 1 $79,575.00 -$1,821.76 zep.led.new 1234A
Southeast 1 $268,600.00 $268,600.00 zep.led.new 2345A
Southeast 1 $77,103.00 -$1,751.68 zep.led.new 3456A
This is what I am looking for:
branch file_count file_volume net_profit report_name file_number
Northeast 3 $506,500.00 $511,009.00 bogart.hump.new
Northwest 3 $366,854.00 -$6,667.45 jolie.angela.new
Southwest 3 $332,620.00 $143,313.75 Man.bat.net
Southeast 3 $425,278.00 $265,026.56 zep.led.new
My query:
SELECT
branch,
count(filenumber) AS file_count,
sum(fileAmount) AS file_amount,
sum(netprofit*-1) AS net_profit,
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
FROM user.summary u
inner join user.db1 d1 ON d1.loaname = u.loaname
inner join user.db2 d2 ON d2.cn = u.loaname
WHERE d2.filedate = '2015-09-01'
AND filenumber is not null
GROUP BY branch,concat(d2.lastname,'.',d2.firstname,'.','new')
The only issue i see with your current query is that you have a comma at the end of this line that would give you a syntax error:
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
If you want the blank field file_number as shown in your desired result set though, you could leave the comma and follow it with the blank field by adding to it:
concat(d2.lastname,'.',d2.firstname,'.','new') AS report_name,
'' file_number
I figured it out but could not have done it without airing it out in this forum. In my actual query, I included the "file_name" column, so I had both the "count(file_name)" and "file_name" columns...but in my query example, I only had the "count(file_name)" column. When I removed the "file_column" column from my actual query, it worked. Side note...it was obvious that I excluded a key component in my query. On any future query questions, I will include the complete query but substitute actual column names with col1, col2, db1, db2, etc... thanks very much for responding to my question.

T-SQL Get Rows With Similar Company Name Using Levenshtein

I'm using this Levenshtein function for T-SQL which works well (I'm not worried about performance). Now I want to write a query that returns all rows where the Levenshtein distance is less than x (where x might be 5 for example) using the Company name field to do the comparison.
I've tried the following, but it returns thousands of duplicate rows.
SELECT * FROM Contacts c1, Contacts c2
WHERE dbo.ufnCompareString(c1.Company, c2.Company) < 5
AND c1.id <> c2.id
I would like it to show a list like this:
1 Apple Experts
20 Apple Experts Inc.
240 H&K Paving
21 H and K Paving
98 HK Paving
189 H.K. Paving
5 J.M. Lawn Care
105 JM Lawn Care
Is it possible to do something like this? What am I doing wrong?
EDIT
I ended up with a query that looks something like this. I found that there were some "invalid" entries causing the problems I was having:
SELECT c1.ContactId, c1.Company, c1.LastName, c1.FirstName,
c2.ContactId, c2.Company, c2.LastName, c2.FirstName
FROM Contacts c1, Contacts c2
WHERE Cast(c1.ContactId AS INT) < Cast(c2.ContactId AS INT)
AND c1.Company IS NOT NULL
AND Replace(c1.Company, ' ', '') <> ''
AND c2.Company IS NOT NULL
AND Replace(c2.Company, ' ', '') <> ''
AND Len(c1.Company) > 6
AND Len(c2.Company) > 6
AND dbo.ufnCompareString(c1.Company, c2.Company) < 5
Note that the query is pretty slow running (on about 12,000 records) and I also have a different query that is more effective. The goal was to find duplicate companies that had been entered using slightly different company names and this query returned too many false positives. As to the query I actually used, it's too complicated to show here and outside the scope of this question.
To reduce the duplicates, use this instead:
SELECT * FROM Contacts c1, Contacts c2
WHERE dbo.ufnCompareString(c1.Company, c2.Company) < 5
AND c1.id < c2.id
It returns all unique pairs of contacts, whose distance is less than 5.
The query you have there should work properly, if you are getting duplicates look at the content of the Contacts table.

Resources