Google BigQuery SQL: working with an array? - arrays

I'm a newbie at SQL and Google BigQuery.
I'm trying to run the following query to get a list of names and counts, however I see that I am getting an array error and don't know how to fix it. Any help appreciated.
ERROR MESSAGE:
Cannot access field harmonized on a value with type ARRAY at [5:27]
#standardSQL
-- Applications_Per_Assignee
SELECT assignee_harmonized.name AS Assignee_Name, COUNT(*) AS Number_of_Patent_Apps
FROM (
SELECT ANY_VALUE(assignee.harmonized.name) AS Assignee_Name
FROM `patents-public-data.patents.publications` AS patentsdb
GROUP BY Number_of_Patent_Apps
)
GROUP BY assignee_harmonized.name
ORDER BY Number_of_Patent_Apps DESC;

Below is for BigQuery Standard SQL
#standardSQL
SELECT
ah.name AS Assignee_Name,
COUNT(*) AS Number_of_Patent_Apps
FROM `patents-public-data.patents.publications`,
UNNEST(assignee_harmonized) ah
GROUP BY Assignee_Name
HAVING Number_of_Patent_Apps < 1000
ORDER BY Number_of_Patent_Apps DESC
-- LIMIT 10
with output
Row Assignee_Name Number_of_Patent_Apps
1 SAMSUNG ELECTRONICS CO LTD 600678
2 CANON KK 579731
3 MATSUSHITA ELECTRIC IND CO LTD 560644
4 HITACHI LTD 531286
5 SIEMENS AG 486276
6 MITSUBISHI ELECTRIC CORP 461673
7 IBM 438822
8 SONY CORP 438039
9 FUJITSU LTD 384270
10 NEC CORP 357193

Looks like there are a few things wrong with your query.
assignee is a string, I think you want to look at assignee_harmonized.name
You will want to UNNEST() assignee_harmonized
ANY_VALUE() only selects a random value, which does not sound like what you want
You have a GROUP BY in your inner select, which will not give you the results you want
You don't really need a subquery for this type of query.
#standardSQL
SELECT ah.name AS Assignee_Name, COUNT(*) AS Number_of_Patent_Apps
FROM `patents-public-data.patents.publications` AS patentsdb
LEFT JOIN UNNEST(assignee_harmonized) ah
GROUP BY 1
ORDER BY 2 DESC

Related

Combine multiple similar columns into single column

I have a table like
Id RefNumber LotNum
---------------------------
1 Ref-1 10
2 Ref-1 11
Lotnumber:
Lot-Id Lot-Name
-------------------
10 Apple
11 Banana
I need my output to look like this:
Ref-1 Apple,Banana
Please help me - how can I achieve this?
On SQL Server 2017 and later, we can use STRING_AGG here:
SELECT
r.RefNumber,
STRING_AGG(l.[Lot-Name]) WITHIN GROUP (ORDER BY l.[Lot-Id]) AS LotNames
FROM Refs r
LEFT JOIN Lotnumber l
ON r.LotNum = l.[Lot-Id]
GROUP BY
r.RefNumber;

Query Most Recent Records in MS Access Based on Date Provided in Form Field

Let me start by noting I have spent a few days searching through S.O. and have not been able to find a solution. I apologize in advance if the solution is very simple, but I am still learning and appreciate any help I can get.
I have a MS Access 2010 Database, and I am trying to create a set of queries to inform other forms and queries. There are two tables: Borrower Contact Info (BC_Info) and Basic Financial Indicators (BF_Indicators). Each month, I review and track key performance metrics of each borrower. I would like to create a query that supplies the most recent record based on a textbox input (Forms![Portfolio_Review Menu]!Text47).
Two considerations have separated this from other posts I have seen in the 'greatest-n-per-group' tag:
Not every borrower will have data for every month.
I need to be able to see back in time, i.e. if it is January 1, 2019 and I want to see the metrics as of July 31, 2017, I want to make
sure I am only seeing data from before July 31, 2017 but as close to
this date as possible.
Fields are as follows:
BC_Info
- BorrowerName
-PartnerID
BF_Indicators
-Fin_ID
-DateUpdated
The tables are connected by BorrowerName -- which is a unique naming convention used for the primary key of BC_Info.
What I currently have is:
SELECT BCI.BorrowerName, BCI.PartnerID, BFI.Fin_ID, BFI.DateUpdated
FROM ((BC_Info AS BCI
INNER JOIN BF_Indicators AS BFI
ON BFI.BorrowerName = BCI.BorrowerName)
INNER JOIN
(
SELECT Fin_ID, MAX(DateUpdated) AS MAX_DATE
FROM BF_Indicators
WHERE (DateUpdated <= Forms![Portfolio_Review Menu]!Text47 OR
Forms![Portfolio_Review Menu]!Text47 IS NULL)
GROUP BY Fin_ID
) AS Last_BF ON BFI.Fin_ID = Last_BF.Fin_ID AND
BFI.DateUpdated = Last_BF.MAX_DATE);
This gives me the fields I need, and will keep records out that are past the date given in the textbox, but will give all records from before the textbox input -- not just the most recent.
Results (Date Entered is 12/31/2018; MEHN-45543 is only Borrower with information later than 09/30/2018):
BorrowerName PartnerID Fin_ID DateUpdated
MEHN-45543 19 9 12/31/2018
ARYS-7940 5 10 9/30/2018
FINS-21032 12 11 9/30/2018
ELET-00934 9 12 9/30/2018
MEHN-45543 19 18 9/30/2018
Expected Results (Date Entered is 12/31/2018; MEHN-45543 is only Borrower with information later than 09/30/2018):
BorrowerName PartnerID Fin_ID DateUpdated
MEHN-45543 19 9 12/31/2018
ARYS-7940 5 10 9/30/2018
FINS-21032 12 11 9/30/2018
ELET-00934 9 12 9/30/2018
As mentioned, I am planning to use the results of this Query to generate further queries that use aggregated information from the Financial Indicators to determine portfolio quality at the time.
Please let me know if there is any other information I can provide. And again, thank you in advance.
Try joining BC_Info to a query that aggregates BF_Indicators on BorrowerName, not Fin_ID. Tested with literal date value:
SELECT BC_Info.*, MaxDate
FROM BC_Info
INNER JOIN
(SELECT BorrowerName, Max(DateUpdated) AS MaxDate
FROM BF_Indicators WHERE DateUpdated <=#12/31/2018# GROUP BY BorrowerName) AS Q1
ON BC_Info.BorrowerName=Q1.BorrowerName;
If you need to include Fin_ID in the results, then:
SELECT BC_Info.*, Fin_ID, DateUpdated FROM BC_Info
INNER JOIN
(SELECT * FROM BF_Indicators WHERE Fin_ID IN
(SELECT TOP 1 Fin_ID FROM BF_Indicators AS Dupe
WHERE Dupe.BorrowerName=BF_Indicators.BorrowerName AND DateUpdated<=#12/31/2018#
ORDER BY Dupe.DateUpdated DESC)
) AS Q1
ON BC_Info.BorrowerName = Q1.BorrowerName;
If you don't like TOP N, adjust your original query:
SELECT BCI.BorrowerName, BCI.PartnerID, BFI.Fin_ID, BFI.DateUpdated
FROM ((BC_Info AS BCI
INNER JOIN BF_Indicators AS BFI
ON BFI.BorrowerName = BCI.BorrowerName)
INNER JOIN
(
SELECT BorrowerName, MAX(DateUpdated) AS MAX_DATE
FROM BF_Indicators
WHERE (DateUpdated <= #12/31/2018#)
GROUP BY BorrowerName
) AS Last_BF ON BFI.BorrowerName = Last_BF.BorrowerName AND
BFI.DateUpdated = Last_BF.MAX_DATE);
And 1 more to think about:
SELECT BC_Info.PartnerID, BC_Info.BorrowerName, BF_Indicators.Fin_ID, BF_Indicators.DateUpdated
FROM BC_Info RIGHT JOIN BF_Indicators ON BC_Info.BorrowerName = BF_Indicators.BorrowerName
WHERE (((BF_Indicators.DateUpdated)=DMax("DateUpdated","BF_Indicators","BorrowerName='" & [BC_Info].[BorrowerName] & "' AND DateUpdated<=#12/31/2018#")));

SQL Server 2008 Perform a draw between 2 tables

I have 2 tables on SQL Server 2008, each one has a single column and the same rows count number:
USERS OPERATION
Name Operation
----------- -----------
John W383
William R823
Karen X933
Peter M954
Alex S744
I need to perform every week a random draw between the 2 tables to get something like the follow and save it into a 3rd. table:
DRAW_RESULT:
Name Operation_Assigned Week_Number
----------------------------------------------
Peter M954 2
William W383 2
John S744 2
Alex X933 2
Karen R823 2
Name Operation_Assigned Week_Number
----------------------------------------------
William R823 3
Alex M954 3
Karen X933 3
John S744 3
Peter W383 3
How can I do this using T-SQL?
If I understood correctly what you're doing, something like this should work:
select name, operation from (
select
row_number() over (order by (select null)) as RN,
name
from
users
) U join (
select
row_number() over (order by newid()) as RN,
operation
from
operation
) O on U.RN = O.RN
Edit: row_number with newid() works, so removed the extra derived table.
Here's also SQL Fiddle to test this.

T-Sql syntax to RANK a field based on several criteria

I have an SQL query in SQL Server 2014 that outputs the following (extract shown, real output is around 45,000 records):
ResaID Agency Sales MTH Market Property
235 Smith 500 February 2015 UK RAV
451 John 1600 February 2015 France PLN
258 Alan 800 January 2015 UK BLS
I need an SQL Query that will RANK the agency column based on the following criteria: MTH, Market and Property and give me the following output (fictitious ranking shown below):
ResaId Rank
235 10
451 2
258 9
I will then use a JOIN based on ResaID to join the "Rank output" with my initial query.
In simpler terms, the ranking of the Agency will need to be done after grouping MTH, Market and Property.
Can this be achieved using T-SQL syntax?
Edit: I want the ranking to be done based on the Sales amount.
Yes, you can write something like this:
SELECT *, RANK() OVER(PARTITION BY Agency, mht ORDER BY sales DESC)
FROM [yourTable]

SQL Server query to non normalized table

I have an unnormalized table of customer orders. I want count how many products are sold and display it in table by type.
OnlineSalesKey SalesOrderNumber ProductKey
--------------------------------------------
1 20121018 778
2 20121018 774
3 20121018 665
4 20121019 772
5 20121019 778
9 20121019 434
10 20121019 956
11 20121020 772
12 20121020 965
15 20121020 665
16 20121020 778
17 20121021 665
My query:
SELECT
s.ProductKey, COUNT (*) As Purchased
FROM
Sales s
GROUP BY
s.ProductKey
Question #1.
That query does a job. But now I want display and take into account only those orders where more than one item is purchased. Not sure how do i do that in one query. Any ideas?
Question #2
Is it possible to normalize results and get back data separated by semi column?
20121018 | 778; 774; 665
Thanks!
You don't say which SQL database you're using, and there will be different, more-or-less efficient answers for each database. (Sorry, just noticed MSSQL is in the question title.)
Here's a solution that will work in all or most databases:
SELECT s.ProductKey, COUNT (*) As Purchased
FROM Sales s
WHERE SalesOrderNum IN
(SELECT SalesOrderNum FROM Sales GROUP BY SalesOrderNum HAVING COUNT(*) > 1)
GROUP BY s.ProductKey
This is not the most efficient, but should work across the most products.
Also, please note that you're using the terms normalized and unnormalized in reverse. The table you have is normalized, the results you want are de-normalized.
There is no standard SQL statement to get the de-normalized results you want using SQL alone, but some databases (MySQL and SQLite) provide the group_concat function to do just this.
Q1: Look at HAVING clause
display and take into account only those orders...
SELECT s.SalesOrderNumber, COUNT (*) As Purchased
FROM Sales s
GROUP BY s.SalesOrderNumber
HAVING COUNT(*) > 1
So we group by the orders, apply the condition in HAVING, then display the SalesOrderNumber in the SELECT clause.
Q2: Look at several group concatenation techniques.
MySQL:
SELECT s.SalesOrderNumber, GROUP_CONCAT(DISTINCT ProductKey
ORDER BY ProductKey SEPARATOR '; ')
FROM Sales s
GROUP BY s.SalesOrderNumber
SQL Server: See this answer to a duplicate question. Basically, using FOR XML.
SELECT s.ProductKey, COUNT (*) As Purchased
FROM
Sales s
GROUP BY s.ProductKey
having count(*) > 1
EDIT -
Answer 1 - To display products for which orders had more than one product purchased -
SELECT s.ProductKey, COUNT (*) As Purchased
FROM Sales s
WHERE SalesOrderNum in
(
select SalesOrderNum from Sales
group by SalesOrderNum having count(*) > 1
)
GROUP BY s.ProductKey
1) Try this one:
SELECT s.ProductKey, COUNT (*) As Purchased
FROM
Sales s
GROUP BY s.ProductKey
HAVING COUNT(*) > 1

Resources