I'm struggling with joining tables, when using REPEATED RECORD fields on the ON clause. The error i get is:
No matching signature for operator = for argument types: ARRAY<STRUCT<experiment INT64>>, INT64. Supported signature: ANY = ANY at [6:5]
My REPEATED RECORD is called ab_test and it has 4 fields inside (experiment, group ,name, state)
My Query:
SELECT be.type, be.group, be.user.id, be.uid,
ARRAY(SELECT STRUCT(ab_test.experiment as experiment , ab_test.group as group, ab_test.name as name, ab_test.state, uid_allocation_timestamp) FROM UNNEST(ab_test) AS ab_test) as ab_test
FROM fiverr-bigquery.dwh.bi_events be
JOIN staging_tables.ab_tests_uid_allocation_history uid_alloc
ON be.uid = uid_alloc.uid
AND ***ARRAY(SELECT STRUCT(ab_test.experiment) FROM UNNEST(ab_test) AS ab_test ) = uid_alloc.test_id***
WHERE be._PARTITIONTIME = '2017-04-24 00:00:00'
AND DATE(created_at) = DATE('2017-04-24')
AND ARRAY(SELECT STRUCT(ab_test.experiment) FROM UNNEST(ab_test) AS ab_test ) IS NOT NULL
AND type = 'order.success'
I also tried replacing the second ON clause with:
CAST((SELECT experiment FROM UNNEST(ab_test) as experiment ) AS INT64) = uid_alloc.test_id
But with no luck (the error i get:Invalid cast from STRUCT<experiment INT64,groupINT64, name STRING, ...> to INT64 at [40:10]
Any ideas ?
I also tried replacing ... But with no luck ... Any ideas ?
Below is attempt to mimic your use case - at least that part of it that responsible for the error you see
If you run below (BigQuery Standard SQL) - you will get exactly same error as in your case
#dtandardSQL
WITH data AS (
SELECT 1 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
(911, 2, 'a'), (2, 2, 'b'), (3, 2, 'c')] AS ab_test UNION ALL
SELECT 2 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
(11, 3, 'a'), (12, 3, 'b'), (13, 3, 'c')] AS ab_test UNION ALL
SELECT 3 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
(21, 4, 'a'), (911, 4, 'b'), (23, 4, 'c')] AS ab_test
)
SELECT id
FROM data
WHERE CAST((SELECT experiment FROM UNNEST(ab_test) AS experiment ) AS INT64) = 911
The error will be
Error: Invalid cast from STRUCT<experiment INT64, grp INT64, name STRING> to INT64 at [12:12]
To resolve this - use below approach
#dtandardSQL
WITH data AS (
SELECT 1 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
(911, 2, 'a'), (2, 2, 'b'), (3, 2, 'c')] AS ab_test UNION ALL
SELECT 2 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
(11, 3, 'a'), (12, 3, 'b'), (13, 3, 'c')] AS ab_test UNION ALL
SELECT 3 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
(21, 4, 'a'), (911, 4, 'b'), (23, 4, 'c')] AS ab_test
)
SELECT id
FROM data
WHERE (SELECT COUNT(1)
FROM UNNEST(ab_test) AS ab_test
WHERE ab_test.experiment = 911
) > 0
No errors now and output will be
id
1
3
because those rows have elements of ab_test with experiment = 911
Finally, below is example with test values from JOIN table as in your question
#dtandardSQL
WITH data AS (
SELECT 1 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
(911, 2, 'a'), (2, 2, 'b'), (3, 2, 'c')] AS ab_test UNION ALL
SELECT 2 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
(11, 3, 'a'), (12, 3, 'b'), (13, 3, 'c')] AS ab_test UNION ALL
SELECT 3 AS id, [ STRUCT<experiment INT64, grp INT64, name STRING>
(21, 4, 'a'), (911, 4, 'b'), (23, 4, 'c')] AS ab_test
),
tests AS (
SELECT 911 AS test_id UNION ALL
SELECT 912 AS test_id
)
SELECT data.id
FROM data
CROSS JOIN tests
WHERE (SELECT COUNT(1)
FROM UNNEST(ab_test) AS ab_test
WHERE ab_test.experiment = tests.test_id
) > 0
Hope you can apply above to your specific case
When joinning two tables, if the column is not supplied with the table name, it will return STRUCT data type.
To solved this, can you try:
select table.column
Related
Problem & expected result
Suppose I have the following table t:
id f_id col1
---------- ---------- ----------
1 1 B
2 1 C
3 2 A
4 2 C
5 2 D
6 2 E
7 3 A
8 3 D
9 3 E
10 4 A
11 4 B
12 5 C
13 5 D
I would like to select all distinct f_id such that col1 contains one of the following combinations of values:
A and C
A and D
A and C and D
The expected result would therefore be:
f_id
----------
2
3
Own attempt
Based on a previous question I tried the following query
SELECT f_id
FROM t
WHERE (col1 IN ('A', 'C')) or (col1 in ('A', 'D'))
GROUP BY f_id
HAVING COUNT(distinct col1) >= 2;
This query however also matches with groups which contain C and D, but not A. I do not want this because A is important. The above query results in the following:
f_id
----------
2
3
5
How do I obtain the desired result?
Original script
For convenience, here is the code to generate the original table:
drop table if exists t;
CREATE TABLE t (id INTEGER, f_id INTEGER, col1 VARCHAR(1));
INSERT INTO t (id, f_id, col1) VALUES
(1, 1, 'B'),
(2, 1, 'C'),
(3, 2, 'A'),
(4, 2, 'C'),
(5, 2, 'D'),
(6, 2, 'E'),
(7, 3, 'A'),
(8, 3, 'D'),
(9, 3, 'E'),
(10, 4, 'A'),
(11, 4, 'B'),
(12, 5, 'C'),
(13, 5, 'D')
;
First filter the rows of the table so that only rows containing 'A' or 'C' or 'D' in col1 are returned and group by f_id.
Finally set the conditions in the HAVING clause, so that you get only f_ids that contain at least 1 'A' and any of the other 2:
SELECT f_id
FROM t
WHERE col1 IN ('A', 'C', 'D')
GROUP BY f_id
HAVING SUM(col1 = 'A') > 0
AND COUNT(DISTINCT col1) > 1
If there are no duplicates in col1 for each f_id you may change COUNT(DISTINCT col1) > 1 with COUNT(*) > 1.
Or, with EXISTS:
SELECT t1.f_id
FROM t t1
WHERE t1.col1 = 'A'
AND EXISTS (
SELECT 1
FROM t t2
WHERE t2.f_id = t1.f_id AND t2.col1 IN ('C', 'D')
)
See the demo.
Assuming that the order of the combinations is not important:
select f_id, group_concat(col1, '') agg
from t
GROUP BY f_id
HAVING (agg LIKE '%A%' AND agg LIKE '%C%')
OR (agg LIKE '%A%' AND agg LIKE '%D%');
There might be a better way to compare (with regex for example).
I am trying to get the statement on fetching the previous and next rows of a selected row.
Declare #OderDetail table
(
Id int primary key,
OrderId int,
ItemId int,
OrderDate DateTime2,
Lookup varchar(15)
)
INSERT INTO #OderDetail
VALUES
(1, 10, 1, '2018-06-11', 'A'),
(2, 10, 2, '2018-06-11', 'BE'), --this
(3, 2, 1, '2018-06-04', 'DR'),
(4, 2, 2, '2018-06-04', 'D'), --this
(5, 3, 2, '2018-06-14', 'DD'), --this
(6, 4, 2, '2018-06-14', 'R');
DECLARE
#ItemId int = 2,
#orderid int = 10
Required output:
Input for the procedure is order id =10 and item id =2 and i need to check item-2 is in the any other order i.e only previous and next item of matched record/order as per order date
Is this what your after? (Updated to reflect edit [OrderDate] to question)
Declare #OderDetail table
(
Id int primary key,
OrderId int,
ItemId int,
OrderDate DateTime2,
Lookup varchar(15)
)
INSERT INTO #OderDetail
VALUES
(1, 10, 1, '2018-06-11', 'A'),
(2, 10, 2, '2018-06-11', 'BE'), --this
(3, 2, 1, '2018-06-04', 'DR'),
(4, 2, 2, '2018-06-04', 'D'), --this
(5, 3, 2, '2018-06-14', 'DD'), --this
(6, 4, 2, '2018-06-14', 'R');
declare #ItemId int=2 , #orderid int = 10;
Query
With cte As
(
Select ROW_NUMBER() OVER(ORDER BY OrderDate) AS RecN,
*
From #OderDetail Where ItemId=#ItemId
)
Select Id, OrderId, ItemId, [Lookup] From cte Where
RecN Between ((Select Top 1 RecN From cte Where OrderId = #orderid) -1) And
((Select Top 1 RecN From cte Where OrderId = #orderid) +1)
Order by id
Result:
Id OrderId ItemId Lookup
2 10 2 BE
4 2 2 D
5 3 2 DD
Another possible approach is to use LAG() and LEAD() functions, that return data from a previous and subsequent row form the same resul tset.
-- Table
DECLARE #OrderDetail TABLE (
Id int primary key,
OrderId int,
ItemId int,
OrderDate DateTime2,
Lookup varchar(15)
)
INSERT INTO #OrderDetail
VALUES
(1, 10, 1, '2018-06-11', 'A'),
(2, 10, 2, '2018-06-11', 'BE'), --this
(3, 2, 1, '2018-06-04', 'DR'),
(4, 2, 2, '2018-06-04', 'D'), --this
(5, 3, 2, '2018-06-14', 'DD'), --this
(6, 4, 2, '2018-06-14', 'R');
-- Item and order
DECLARE
#ItemId int = 2,
#orderid int = 10
-- Statement
-- Get previois and next ID for every order, grouped by ItemId, ordered by OrderDate
;WITH cte AS (
SELECT
Id,
LAG(Id, 1) OVER (PARTITION BY ItemId ORDER BY OrderDate) previousId,
LEAD(Id, 1) OVER (PARTITION BY ItemId ORDER BY OrderDate) nextId,
ItemId,
OrderId,
Lookup
FROM #OrderDetail
)
-- Select current, previous and next order
SELECT od.*
FROM cte
CROSS APPLY (SELECT * FROM #OrderDetail WHERE Id = cte.Id) od
WHERE (cte.OrderId = #orderId) AND (cte.ItemId = #ItemId)
UNION ALL
SELECT od.*
FROM cte
CROSS APPLY (SELECT * FROM #OrderDetail WHERE Id = cte.previousId) od
WHERE (cte.OrderId = #orderId) AND (cte.ItemId = #ItemId)
UNION ALL
SELECT od.*
FROM cte
CROSS APPLY (SELECT * FROM #OrderDetail WHERE Id = cte.nextId) od
WHERE (cte.OrderId = #orderId) AND (cte.ItemId = #ItemId)
Output:
Id OrderId ItemId OrderDate Lookup
2 10 2 11/06/2018 00:00:00 BE
4 2 2 04/06/2018 00:00:00 D
5 3 2 14/06/2018 00:00:00 DD
Update to given this data set: I see where you are going with this. Note that in SOME cases there IS no row before the given one - so it only returns 2 not 3. Here I updated the CTE version. Un-comment the OTHER row to see 3 not 2 as there is then one before the selected row with that Itemid.
Added a variable to demonstrate how this is better allowing you to get 1 before and after or 2 before/after if you change that number (i.e. pass a parameter) - and if less rows, or none are before or after it gets as many as it can within that constraint.
Data setup for all versions:
Declare #OderDetail table
(
Id int primary key,
OrderId int,
ItemId int,
OrderDate DateTime2,
Lookup varchar(15)
)
INSERT INTO #OderDetail
VALUES
(1, 10, 1, '2018-06-11', 'A'),
(2, 10, 2, '2018-06-11', 'BE'), --this
(3, 2, 1, '2018-06-04', 'DR'),
(4, 2, 2, '2018-06-04', 'D'), --this
(5, 3, 2, '2018-06-14', 'DD'), --this
(9, 4, 2, '2018-06-14', 'DD'),
(6, 4, 2, '2018-06-14', 'R'),
--(10, 10, 2, '2018-06-02', 'BE'), -- un-comment to see one before
(23, 4, 2, '2018-06-14', 'R');
DECLARE
#ItemId int = 2,
#orderid int = 2;
CTE updated version:
DECLARE #rowsBeforeAndAfter INT = 1;
;WITH cte AS (
SELECT
Id,
OrderId,
ItemId,
OrderDate,
[Lookup],
ROW_NUMBER() OVER (ORDER BY OrderDate,Id) AS RowNumber
FROM #OderDetail
WHERE
ItemId = #itemId -- all matches of this
),
myrow AS (
SELECT TOP 1
Id,
OrderId,
ItemId,
OrderDate,
[Lookup],
RowNumber
FROM cte
WHERE
ItemId = #itemId
AND OrderId = #orderid
)
SELECT
cte.Id,
cte.OrderId,
cte.ItemId,
cte.OrderDate,
cte.[Lookup],
cte.RowNumber
FROM ctE
INNER JOIN myrow
ON ABS(cte.RowNumber - myrow.RowNumber) <= #rowsBeforeAndAfter
ORDER BY OrderDate, OrderId;
You probably want the CTE method (See an original at the end of this) however:
Just to point out, this gets the proper results but is probably not what you are striving for since it is dependent on the row order and the item id not the actual row with those two values:
SELECT TOP 3
a.Id,
a.OrderId,
a.ItemId,
a.Lookup
FROM #OderDetail AS a
WHERE
a.ItemId = #ItemId
To fix that, you can use an ORDER BY and TOP 1 with a UNION, kind of ugly. (UPDATED with date sort and != on the id.)
SELECT
u.Id,
u.OrderId,
u.OrderDate,
u.ItemId,
u.Lookup
FROM (
SELECT
a.Id,
a.OrderId,
a.OrderDate,
a.ItemId,
a.Lookup
FROM #OderDetail AS a
WHERE
a.ItemId = #ItemId
AND a.OrderId = #orderid
UNION
SELECT top 1
b.Id,
b.OrderId,
b.OrderDate,
b.ItemId,
b.Lookup
FROM #OderDetail AS b
WHERE
b.ItemId = #ItemId
AND b.OrderId != #orderid
ORDER BY b.OrderDate desc, b.OrderId
UNION
SELECT top 1
b.Id,
b.OrderId,
b.OrderDate,
b.ItemId,
b.Lookup
FROM #OderDetail AS b
WHERE
b.ItemId = #ItemId
AND b.OrderId != #orderid
ORDER BY b.OrderDate asc, b.OrderId
) AS u
ORDER BY u.OrderDate asc, u.OrderId
I think its simple, you can check with min(Id) and Max(id) with left outer join or outer apply
like
Declare #ItemID int = 2
Select * From #OderDetail A
Outer Apply (
Select MIN(A2.Id) minID, MAX(A2.Id) maxID From #OderDetail A2
Where A2.ItemId =#ItemID
) I05
Outer Apply(
Select * From #OderDetail Where Id=minID-1
Union All
Select * From #OderDetail Where Id=maxID+1
) I052
Where A.ItemId =#ItemID Order By A.Id
Let me know if this helps you or you face any problem with it...
Regards,
I have an example on sql fiddle. What I am trying to do is divide the overall COUNT(DISTINCT ID) by the weekly COUNT(DISTINCT ID). For example if I have the following conceptual setup of what the result should be.
year week id_set overall_distinct week_distinct result
2016 1 A,A,A,B,B,C 0 3 0
2016 2 A,B,C,C,D 1 4 .25
2016 3 A,B,C,E,F 2 5 .4
The table linked to on sql fiddle has the following schema. Also, in reality I do have multiple values for 'year'.
CREATE TABLE all_ids
([year] int, [week] int, [id] varchar(57))
;
INSERT INTO all_ids
([year], [week], [id])
VALUES
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 1, 'B'),
(2016, 1, 'B'),
(2016, 1, 'C'),
(2016, 2, 'A'),
(2016, 2, 'B'),
(2016, 2, 'C'),
(2016, 2, 'C'),
(2016, 2, 'D'),
(2016, 3, 'A'),
(2016, 3, 'B'),
(2016, 3, 'C'),
(2016, 3, 'E'),
(2016, 3, 'F')
;
Edit
I apologize for the confusion. The above table was just a conceptual example of the result. The actual result only needs to look like the following.
year week overall_distinct week_distinct result
2016 1 0 3 0
2016 2 1 4 .25
2016 3 2 5 .4
there is no need to include id_set
I used dense_rank and max() over () to simulate count (distinct ...) with window functions. You could try to do it with another subquery
select
year, week
, id_set = stuff((
select
',' + a.id
from
all_ids a
where
a.year = t.year
and a.week = t.week
order by a.id
for xml path('')
), 1, 1, '')
, overall_distinct = count(case when cnt = 1 then 1 end)
, week_distinct = count(distinct id)
, result = cast(count(case when cnt = 1 then 1 end) * 1.0 / count(distinct id) as decimal(10, 2))
from (
select
year, week, id, cnt = max(dr) over (partition by id)
from (
select
*, dr = dense_rank() over (partition by id order by year, week)
From
all_ids
) t
) t
group by year, week
Output
year week id_set overall_distinct week_distinct result
--------------------------------------------------------------------------
2016 1 A,A,A,B,B,C 0 3 0.00
2016 2 A,B,C,C,D 1 4 0.25
2016 3 A,B,C,E,F 2 5 0.40
This would be one way, probably not the best one:
;with weekly as
(
select year, week, count(distinct id) nr
from all_ids
group by year, week
),
overall as
(
select a.week, count(distinct a.id) nr
from all_ids a
where a.id not in (select id from all_ids where week <> a.week and id = a.id )
group by week
)
select distinct a.year
, a.week
, stuff((select ', ' + id
from all_ids
where year = a.year and week = a.week
for xml path('')), 1, 1, '') ids
, w.Nr weeklyDistinct
, isnull(t.Nr, 0) overallDistinct
from all_ids a join weekly w on a.year = w.year and a.week = w.week
left join overall t on t.week = a.week
One statement count only
declare #t table (y int, w int, id varchar(57));
INSERT #t (y, w, id)
VALUES
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 1, 'A'),
(2016, 1, 'B'),
(2016, 1, 'B'),
(2016, 1, 'C'),
(2016, 2, 'A'),
(2016, 2, 'B'),
(2016, 2, 'C'),
(2016, 2, 'C'),
(2016, 2, 'D'),
(2016, 3, 'A'),
(2016, 3, 'B'),
(2016, 3, 'C'),
(2016, 3, 'E'),
(2016, 3, 'F');
select t1.w, count(distinct t1.id) as wk
, (count(distinct t1.id) - count(distinct t2.id)) as [all]
, (cast(1 as smallmoney) - cast(count(distinct t2.id) as smallmoney) / count(distinct t1.id)) as [frac]
from #t t1
left join #t t2
on t2.id = t1.id
and t2.w <> t1.w
group by t1.w
order by t1.w;
I want to get same output:
using the following sample data
create table x
(
id int,
date datetime,
stat int
)
insert into x
values (1, '2017-01-01', 100), (1, '2017-01-03', 100), (1, '2017-01-05', 100),
(1, '2017-01-07', 150), (1, '2017-01-09', 150), (1, '2017-02-01', 150),
(1, '2017-02-02', 100), (1, '2017-02-12', 100), (1, '2017-02-15', 100),
(1, '2017-02-17', 150), (1, '2017-03-09', 150), (1, '2017-03-11', 150),
(2, '2017-01-01', 100), (2, '2017-01-03', 100), (2, '2017-01-05', 100),
(2, '2017-01-07', 150), (2, '2017-01-09', 150), (2, '2017-02-01', 150),
(2, '2017-02-02', 100), (2, '2017-02-12', 100), (2, '2017-02-15', 100),
(2, '2017-02-17', 150), (2, '2017-03-09', 150), (2, '2017-03-11', 150)
I tried to use something like this
with a as
(
select
id, date,
ROW_NUMBER() over (partition by date order by id) as rowNum
from
x
), b as
(
select
id, date,
ROW_NUMBER() over (partition by id, stat order by date) as rowNum
from
x
)
select min(b.date)
from a
join b on b.id = a.id
having max(a.date) > max(b.date)
What you are looking for is a gaps-and-islands scenario, where you only have islands. In this scenario what defines the start of an island is a change in the stat value within a id, while evaluating the dataset in date order.
The lag window function is used below to compare values across rows, and see if you need to include it in the output.
select b.id
, b.stat
, b.date
from (
select a.id
, a.date
, a.stat
, case lag(a.stat,1,NULL) over (partition by a.id order by a.date asc) when a.stat then 0 else 1 end as include_flag
from x as a
) as b
where b.include_flag = 1
Its an Sql query.
I want to know duplicate data.
Sample:
Table 1
Col1, col2, col3, col4
1, A, AA, AAA
2, A, BB, AAA
3, A, BB, AAA
4, B, AA, AAA
5, B, AA, BBB
6, B, AA, CCC
7, B, BB, AAA
8, B, CC, AAA
the result should be :
2, A, BB, AAA
3, A, BB, AAA
Or
A, BB, AAA
So I can found where's my doubles.
Thank you.
You could also do it like this:
Test data
DECLARE #T TABLE(Col1 int, col2 VARCHAR(100), col3 VARCHAR(100),
col4 VARCHAR(100))
INSERT INTO #T
VALUES
(1, 'A', 'AA', 'AAA'),
(2, 'A', 'BB', 'AAA'),
(3, 'A', 'BB', 'AAA'),
(4, 'B', 'AA', 'AAA'),
(5, 'B', 'AA', 'BBB'),
(6, 'B', 'AA', 'CCC'),
(7, 'B', 'BB', 'AAA'),
(8, 'B', 'CC', 'AAA')
Query1
;WITH CTE
AS
(
SELECT
COUNT(Col1) OVER(PARTITION BY col2,col3,col4) AS Counts,
T.*
FROM
#T AS T
)
SELECT
*
FROM
CTE
WHERE
Counts>1
Result
2 2 A BB AAA
2 3 A BB AAA
Query2
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY col2,col3,col4 ORDER BY col1) AS RowNbr,
T.*
FROM
#T AS T
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr>1
Result
2 3 A BB AAA
You can group by the columns and check whether there is more than 1 record for each group:
select
col2, col3, col4
from
MyTable
group by
col2, col3, col4
having
count(*) > 1
Demo: http://www.sqlfiddle.com/#!3/5c3a7/2