BigQuery union over nested fields - union

I'm trying to create a union of 3 tables. All 3 tables are subselects on the same table: each subselect contains only one field, with the same alias to the field on all subselects, so the resulting schema will be compatible and the union will succeed (following the example from Support UNION function in BigQuery SQL).
The resulting query yields an error:
Union results in ambiguous schema. [foo] is ambiguous and is aliasing multiple fields. Aliased fields: ...
It may be that there error is related to the fact the field I select is a nested in multiple records and repeated fields.
Fictitious Query example:
select * from
(SELECT record.list1.list2.listA.foo as foo from sample),
(SELECT record.list1.list2.listB.foo as foo from sample),
(SELECT record.list1.list2.listC.foo as foo from sample)
See job job_eZm0F1cGA2leE37D8-N5NHNTTYU for a concrete example (this is a table that contains data I cannot share).

You can reproduce this with a public dataset:
SELECT x FROM
(SELECT phoneNumber.areaCode x
FROM [bigquery-samples:nested.persons_living] LIMIT 1),
(SELECT citiesLived.numberOfYears x
FROM [bigquery-samples:nested.persons_living] LIMIT 1);
Error: Union results in ambiguous schema.
[x] is ambiguous and is aliasing multiple fields.
Aliased fields: x,CitiesLived.x,
As noted, this only happens when mixing different multiple records and repeated fields - and that has a quick fix: FLATTEN() before querying:
SELECT x FROM
(SELECT phoneNumber.areaCode x
FROM [bigquery-samples:nested.persons_living] LIMIT 1),
(SELECT citiesLived.numberOfYears x
FROM (FLATTEN([bigquery-samples:nested.persons_living], citiesLived)) LIMIT 1);

Related

Summarizing count of multiple talbes in one row or column

I've designed a migration script and as the last sequence, I'm running the following two lines.
select count(*) from Origin
select count(*) from Destination
However, I'd like to present those numbers as cells in the same table. I haven't decided yet if it's most suitable to put them as separate rows in one column or adjacent columns on one row but I do want them in the same table.
How can I select stuff from those selects into vertical/horizontal line-up?
I've tried select on them both with and without parentheses but id didn't work out (probably because of the missing from)...
This questions is related to another one but differs in two aspects. Firstly, it's much more to-the-point and clearer states the issue. Secondly, it asks about both horizontal and vertical line-up of the selected values whereas the linked questions only regards the former.
select
select count(*) from Origin,
select count(*) from Destination
select(
select count(*) from Origin,
select count(*) from Destination)
You need to nest the two select statements under a main (top) SELECT in order to get one row with the counts of both tables:
SELECT
(select count(*) from Origin) AS OriginCount,
(select count(*) from Destination) AS DestinationCount
SQLFiddle for the above query
I hope this is what you are looking for, since the "same table" you are mentioning is slightly confusing. (I'm assuming you're referring to result set)
Alternatively you can use UNION ALL to return two cells with the count of both tables.
SELECT COUNT(*), 'Origin' 'Table' FROM ORIGIN
UNION ALL
SELECT COUNT(*), 'Destination' 'Table' FROM Destination
SQLFiddle with UNION ALL
SQLFiddle with UNION
I recommend adding the second text column so that you know the corresponding table for each number.
As opposed to simple UNION the UNION ALL command will return two rows everytime. The UNION command will generate a single result (single cell) if the count of rows in both tables is the same (the same number).
...or if you want vertical...
select 'OriginalCount' as Type, count(*)
from origin
union
select 'DestinationCount' as Type, count(*)
from destination

Conversion failed when converting date and/or time from character string. UNION CTE

I've been given a view to troubleshoot. It begins with a CTE that gets all values of a surrogate key used in several tables with UNIONs between each query. It then uses this CTE as an INNER JOIN to select data about those surrogate keys. The query is like the following code:
WITH emp AS (
SELECT DISTINCT EmployeeSk FROM view1
UNION
SELECT DISTINCT EmployeeSK FROM Table1
UNION
--then there are two views that if you include BOTH, the overall query fails
SELECT DISTINCT EmployeeSK FROM view2
UNION
SELECT DISTINCT EmployeeSK FROM View3 )
SELECT Col1, col2, col3
FROM Table2 t2
INNER JOIN emp
ON t2.EmployeeSK = emp.EmployeeSK
Whenever you use the CTE, users get the error:
Msg 241, Level 16, State 1, Line 22
Conversion failed when converting date and/or time from character string.
If you select each query separately all are returning INTs. In fact, I double checked each query by running the following query against each source:
SELECT DISTINCT SQL_VARIANT_PROPERTY(<tableSK>, 'BaseType') FROM <source Table or view>
All queries report that all columns returned are INT.
I have also created a #tempTable:
CREATE TABLE #tmp ( SurrogateKey INT)
Then inserted each query into the #tempTable and all inserts work. All values are INTs. The values range from -1 to 16435.
I can get all the queries to work with the UNION as long as I return one of two of the queries. If I remove either of these queries the CTE and UNION statements will work, but I cannot use both. The values for these two queries are also INTs and have values from -1 to 16435 too.
I'm at a loss as to why I get this error. Other than re-writing the view to be a stored procedure, and putting the values into a temp table, what could I try?

Perform Query and count rows on multiple identical table

I have multiple tables created for each date to store some information for each date.
For example History3108,History0109..etc All of these tables share same schema. Some time i need to query multiple tables and get the rows and count of records. What is the faster way of doing this in oracle and SQL Server?
Currently i am doing like this...
When i need count of multiple tables: Select count(*) for each table and add
When i need records of multiple tables: select * from table1, select * from table2 (Basically select * for each table.)
Would this give better performance if we include all of the queries in one transaction?
With UNION you can get records from multiple tables that shares the same datatype group and column names. For example, if you want to see all records from multiple tables:
(select * from history3108)
union all
(select * from history0109)
union all
(select * from history0209)
/* [...] and so on */
and if you want to count all records from these tables:
select count(*) from (
(select * from history3108)
union all
(select * from history0109)
union all
(select * from history0209)
/* [...] and so on */
);
Oracle Docs - The UNION [ALL], INTERSECT, MINUS Operators

Wrong case in subquery column name causes incorrect results, but no error

Using SQL Server Management Studio, I am getting some undesired results (looks like a bug to me..?)
If I use (FIELD rather than field for the other_table):
SELECT * FROM main_table WHERE field IN (SELECT FIELD FROM other_table)
I get all results from main_table.
Using the correct case:
SELECT * FROM main_table WHERE field IN (SELECT field FROM other_table)
I get the expected results where field appears in other.
Running the subquery on it's own:
SELECT FIELD FROM other_table
I get an invalid column name error.
Surely I should get this error in the first case?
Is this related to collation?
The DB is binary collation.
The server is case insensitive however.
It seems to me like the server component is saying "this code is OK" and not allowing the DB to say the field is the wrong name..?
What are my options for a solution?
Let's illustrate what is happening using something that doesn't depend on case sensitivity:
USE tempdb;
GO
CREATE TABLE dbo.main_table(column1 INT);
CREATE TABLE dbo.other_table(column2 INT);
INSERT dbo.main_table SELECT 1 UNION ALL SELECT 2;
INSERT dbo.other_table SELECT 1 UNION ALL SELECT 3;
SELECT column1 FROM dbo.main_table
WHERE column1 IN (SELECT column1 FROM dbo.other_table);
Results:
column1
-------
1
2
Why doesn't that raise an error? SQL Server is looking at your queries and seeing that the column1 inside can't possibly be in other_table, so it is extrapolating and "using" the column1 that exists in the outer referenced table (just like you could reference a column that only exists in the outer table without a table reference). Think about this variation:
SELECT [column1] FROM dbo.main_table
WHERE EXISTS (SELECT [column1] FROM dbo.other_table WHERE [column2] = [column1]);
Results:
column1
-------
1
Again SQL Server knows that column1 in the where clause also doesn't exist in the locally referenced table, but it tries to find it in the outer scope. So in an imaginary world you might consider the query to actually be saying:
SELECT m.[column1] FROM dbo.main_table AS m
WHERE EXISTS (SELECT m.[column1] FROM dbo.other_table AS o WHERE o.[column2] = m.[column1]);
(Which is not how I typed it, but if I do type it that way, it still works.)
It doesn't make logical sense in some of the cases but this is the way the query engine does it and the rule has to be applied consistently. In your case (no pun intended), you have an extra complication: case sensitivity. SQL Server didn't find FIELD in your subquery, but it did find it in the outer query. So a couple of lessons:
Always prefix your column references with the table name or alias (and always prefix your table references with the schema).
Always create and reference your tables, columns and other entities using consistent case. Especially when using a binary or case-sensitive collation.
Very interesting find. The unspoken mandate is that you always should alias tables in your subqueries and use those aliases to be explicit about which table your column comes from. Subqueries allow you to make reference to a field from your outer query which is the cause of your issue, but in your scenario I would agree that either the default should be the internal query's field list, or to give you a column ambiguity error. Regardless, this method below is always preferable:
select * from main_table a where a.field in
(select x.field from other_table x)

Possible to test for null records in SQL only?

I am trying to help a co-worker with a peculiar problem, and she's limited to MS SQL QUERY code only. The object is to insert a dummy record (into a surrounding union) IF no records are returned from a query.
I am having a hard time going back and forth from PL/SQL to MS SQL, and I am appealing for help (I'm not particularly appealing, but I am appealing to the StackOverflow audiance).
Basically, we need a single, testable value from the target Select ... statement.
In theory, it would do this:
(other records from unions)
Union
Select "These" as fld1, "are" as fld2, "Dummy" as fld3, "Fields" as fld4
where NOT (Matching Logic)
Union
Select fld1, fld2, fld3, fld4 // Regular records exist
From tested_table
Where (Matching Logic)
Forcing an individual dummy record, with no conditions, works.
IS there a way to get a single, testable result from a Select?
Can't do it in code (not allowed), but can feed SQL
Anybody? Anybody? Bbeller?
You could put the unions in a with, then include another union that returns a null only when the big union is empty:
; with BigUnion as
(
select *
from table1
union all
select *
from table2
)
select *
from BigUnion
union all
select null
where not exists (select * from BigUnion)

Resources