Unnecessary DS_BCAST_INNER after GROUP BY in Redshift

Unnecessary DS_BCAST_INNER after GROUP BY in Redshift - database

I have a problem and would like to know if there is a solution to this.
I am having absolutely unnecessarily table broadcast(DS_BCAST_INNER) in my query.
Imagine you have Table1 and Table2 both having the same distkey MediaId.
When I join both tables directly there is no redistribution which is good. But when I try to do something similar to:
WITH t1
AS
(
SELECT MediaId, ... FROM Table1 ...predicates... GROUP BY MediaId, ...
),
t2 AS
(
SELECT MediaId, ... FROM Table2 ...predicates... GROUP BY MediaId, ...
)
Select ... FROM t1 JOIN t2 ON t.MediaId = t2.MediaId ....
I see DS_BCAST_INNER in execution plan shown by explain command while it is obviously useless.
How can I avoid it?

Run an EXPLAIN on this and look at the underlying data types of your tables (before the group by).
I've seen this recently where Table1 was a char(36) and Table2 has a varchar(36); this caused a cast and a broadcast, since the hashing of a char and a varchar is (probably) different. (The varchar will have a length prefix that is probably being included in the hash... :-( )
The data types on the join must be EXACTLY the same, not nearly. E.g. an INT to a BIGINT will probably have the same issue.
(Haven't checked this, but possibly even nullability?)

Related

Using special character to JOIN

I have a table A which loaded from a external stage, one column col1 has special char for example 'Español'. I need to join tableA with tableB
select * from tableA
join tableB
on tableA.col1 = tableB.col1
I know tableB.col1 has exactly same value 'Español', but this join couldn't catch it. Anyone knows why and how to get it joined?
Thanks.

If the join is failing it is because the values in col1 are not equivalent even though they might look the same when displayed. I wonder if using hex_encode(col1) on each table might surface the difference?

Can you set multiple column names as a macro in SQL to query against?

Can you set multiple column names from a SQL table as a macro in SQL to query against?
For example I have multiple columns I am hitting against multiple times, can I use a macro or some type of reference to identify them ONCE to avoid displaying them repetitively and cluttering up the code?
The current code works, I am just looking for a cleaner/streamlined option.
Current Code:
WHERE ('ABC') IN
([CODE1],[CODE2],[CODE3],[CODE4],[CODE5],[CODE6],[CODE7],[CODE8]
,[CODE9],[CODE10],[CODE11],[CODE12],[CODE13],[CODE14],[CODE15]
,[CODE16],[CODE17],[CODE18],[CODE19],[CODE20],[CODE21],[CODE22]
,[CODE23],[CODE24],[CODE25]
AND ('CFS') IN
([CODE1],[CODE2],[CODE3],[CODE4],[CODE5],[CODE6],[CODE7],[CODE8]
,[CODE9],[CODE10],[CODE11],[CODE12],[CODE13],[CODE14],[CODE15]
,[CODE16],[CODE17],[CODE18],[CODE19],[CODE20],[CODE21],[CODE22]
,[CODE23],[CODE24],[CODE25]
ect...(20 more times)
Goal:
WHERE 'ABC' IN (&columnsmentionedabove)
OR 'FGS' in (&columnsmentionedabove)
OR 'g6s' in (&columnsmentionedabove)
etc.....
This is inherited code and just seems very clunky.
Thank you

Numbered columns like this are almost always a sign you should have an additional table. So if your existing table structure is like this:
Table1
Table1ID, OtherFields, Code1, Code2, Code3.... Code25
You really want something more like this:
Table1
Table1ID, OtherFields
Table1Codes
Table1ID, Code
Where each entry in Table1 will have many entries in Table1Codes. Then you write JOIN statements to show the two sets side-by-side when needed.
FROM Table1 t
INNER JOIN Table1Codes tc1 ON tc.Table1ID = t.Table1ID AND tc.Code = 'ABC'
INNER JOIN Table1Codes tc2 ON tc.Table1ID = t.Table1ID AND tc.Code = 'CFS'
Or
FROM Table1 t
INNER JOIN Table1Codes tc1 ON tc.Table1ID = t.Table1ID AND tc.Code IN ('ABC','FGS','g6s')

If you can't change the table's schema, as in often the case, you can UNPIVOT it. For example, assuming CODE1...CODE25 come from MyTable, wrap the UNPIVOT operation inside a CTE:
;WITH
cte AS
(
SELECT upvt.*
FROM MyTable
UNPIVOT (
CodeValue FOR CodeLabel IN ([CODE1], [CODE2], ..., [CODE25])
) upvt
)
SELECT *
FROM cte
WHERE CodeValue IN ('ABC', 'DEF', ...)
The unpivot operation is not free. Make sure you filter as much as possible from MyTable before unpivoting the it.

Alternatives to FULL OUTER JOIN

We have a query that looks for discrepancies between two tables in two different databases (one SQL Server, one Oracle) that in theory should always be in sync. The query pulls the data from both tables into table variables and then does a FULL OUTER JOIN to find the discrepancies. We suspect that the FULL OUT JOIN is partially to blame for the performance issues.
Would it make sense to rely on two LEFT OUTER JOINs and look for records that don't exist on the right side of the join?
We're also thinking about using temp tables to further help with performance.

One option is to do a inner join and store results in a temp table. Then do a select from TableA where not exists in tempTableWithCommonRecords
and another select from TableB where not exists in tempTableWithCommonRecords
Cant say if this will perform better as don't have enough info for that. Its just another option.

You can try the EXCEPT operator which handles complex joins, and it should work in both PL-SQL and T-SQL. It will return any values in the left table that are not a perfect match on the right table:
SELECT [Field1], [Field2], [Field3]
FROM Table1
EXCEPT
SELECT [Field1], [Field2], [Field3]
FROM Table2
UNION
SELECT [Field1], [Field2], [Field3]
FROM Table2
EXCEPT
SELECT [Field1], [Field2], [Field3]
FROM Table1

How does t-sql update work without a join

I think my head is muddy or something. I'm trying to figure out how a t-sql update works without a join when updating one table from another. I've always used joins in the past but came across a stored proc where someone else created one without a join. This update is being used in SQL 2008R2 and it works.
Update table1
SET col1 = (SELECT TOP 1 colX FROM table2 WHERE colZ = colY),
col2 = (SELECT TOP 1 colE FROM table2 WHERE colZ = colY)
Obviously, colY is a field in table1. To get the same results in a select statement (not update), a join is required. I guess I don't understand how an update works behind the scenes but it must be doing some kind of join?

SQL Server translates those subqueries into joins. You can look at this by getting the query plan. You can write an equivalent query with UPDATE ... FROM ... JOIN syntax and observe the query plan to be essentially the same.
The sample code shown is unusual, hard to understand, redundant and inflexible. I recommend against using this style.

No it's doing a sub query, well two in this case. Be damn painful if you have another 98 col fields.
You can do something similar for select
select *,
(SELECT TOP 1 colX FROM table2 WHERE colZ = colY) as col1
From table1
A left join would simply be more efficient
Your example unless the dbms optimises it it running the subquery(ies) for each row in table.
Got to say whoever wrote it is less than competent.

These subqueries are what is called correlated subqueries. If you were to write the same query as a SELECT rather than an UPDATE it would look like this.
SELECT col1 = (SELECT TOP 1 table2.colX FROM table2 WHERE table2.colZ = table1.colY),
col2 = (SELECT TOP 1 table2.colE FROM table2 WHERE table2.colZ = table1.colY)
FROM table1
The JOIN is in the fact that you are referencing a column from an outside table on the inside of the subquery. Table1 is referenced in the UPDATE command. You can include a FROM clause but it isn't required for a setup like this.

You can use the same syntax in a SELECT with no join, but you need to alias the table if colY also exists in table2
SELECT (SELECT TOP 1 colX FROM table2 WHERE colZ = T.colY)
, (SELECT TOP 1 colE FROM table2 WHERE colZ = T.colY)
FROM table1 AS T
I only ever use this sort of thing when building up an ad hoc query just for my own infomation. If it's going to be put into any sort of permanent code I'll convert it to a join as it's easier to read and more maintainable.

Postgresql Select from values in array

i am converting multiple rows in to a array using array_agg() function,
and i need to give that array to a select statements where condition.
My query is,
SELECT * FROM table WHERE id =
ALL(SELECT array_agg(id) FROM table WHERE some_condition)
but it gives error, how can i over come it..

the error has been cleared by type casting the array, using my query like this
SELECT * FROM table WHERE id =
ALL((SELECT array_agg(id) FROM table WHERE some_condition)::bigint[])
reference link

It seems like you are over-complicating things. As far as I can tell, your query should be equivalent to simple:
SELECT * FROM table WHERE some_condition
Or, if you are selecting from 2 different tables, use join:
SELECT table1.*
FROM table1
JOIN table2 ON table1.id = table2.id
WHERE some_condition
Not only this is simpler, it is also faster than fiddling with arrays.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Unnecessary DS_BCAST_INNER after GROUP BY in Redshift - database

Related

Using special character to JOIN

Can you set multiple column names as a macro in SQL to query against?

Alternatives to FULL OUTER JOIN

How does t-sql update work without a join

Postgresql Select from values in array

Categories

Resources